Warning: Permanently added '54.242.14.38' (ED25519) to the list of known hosts. You can reproduce this build on your computer by running: sudo dnf install copr-rpmbuild /usr/bin/copr-rpmbuild --verbose --drop-resultdir --task-url https://copr.fedorainfracloud.org/backend/get-build-task/7299609-fedora-rawhide-x86_64 --chroot fedora-rawhide-x86_64 Version: 0.72 PID: 17797 Logging PID: 17798 Task: {'allow_user_ssh': False, 'appstream': False, 'background': False, 'build_id': 7299609, 'buildroot_pkgs': [], 'chroot': 'fedora-rawhide-x86_64', 'enable_net': True, 'fedora_review': False, 'git_hash': 'fc3ea6d12e110fb301228cf1a067d84f30eacfd5', 'git_repo': 'https://copr-dist-git.fedorainfracloud.org/git/rezso/ML/pytorch', 'isolation': 'default', 'memory_reqs': 2048, 'package_name': 'pytorch', 'package_version': '2.4.0-20240412.0.git7efaf54d.cu12_3', 'project_dirname': 'ML', 'project_name': 'ML', 'project_owner': 'rezso', 'repo_priority': None, 'repos': [{'baseurl': 'https://download.copr.fedorainfracloud.org/results/rezso/ML/fedora-rawhide-x86_64/', 'id': 'copr_base', 'name': 'Copr repository', 'priority': None}, {'baseurl': 'https://download.copr.fedorainfracloud.org/results/rezso/CUDA/fedora-rawhide-x86_64/', 'id': 'copr_rezso_CUDA', 'name': 'Additional repo copr_rezso_CUDA'}, {'baseurl': 'http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64', 'id': 'http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64', 'name': 'Additional repo http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64'}, {'baseurl': 'http://developer.download.nvidia.com/compute/cuda/repos/rhel8/sbsa', 'id': 'http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa', 'name': 'Additional repo http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa'}, {'baseurl': 'http://developer.download.nvidia.com/compute/cuda/repos/rhel8/ppc64le', 'id': 'http_developer_download_nvidia_com_compute_cuda_repos_rhel8_ppc64le', 'name': 'Additional repo http_developer_download_nvidia_com_compute_cuda_repos_rhel8_ppc64le'}], 'sandbox': 'rezso/ML--rezso', 'source_json': {}, 'source_type': None, 'ssh_public_keys': None, 'submitter': 'rezso', 'tags': [], 'task_id': '7299609-fedora-rawhide-x86_64', 'timeout': 172800, 'uses_devel_repo': False, 'with_opts': [], 'without_opts': []} Running: git clone https://copr-dist-git.fedorainfracloud.org/git/rezso/ML/pytorch /var/lib/copr-rpmbuild/workspace/workdir-xczt64da/pytorch --depth 500 --no-single-branch --recursive cmd: ['git', 'clone', 'https://copr-dist-git.fedorainfracloud.org/git/rezso/ML/pytorch', '/var/lib/copr-rpmbuild/workspace/workdir-xczt64da/pytorch', '--depth', '500', '--no-single-branch', '--recursive'] cwd: . rc: 0 stdout: stderr: Cloning into '/var/lib/copr-rpmbuild/workspace/workdir-xczt64da/pytorch'... Running: git checkout fc3ea6d12e110fb301228cf1a067d84f30eacfd5 -- cmd: ['git', 'checkout', 'fc3ea6d12e110fb301228cf1a067d84f30eacfd5', '--'] cwd: /var/lib/copr-rpmbuild/workspace/workdir-xczt64da/pytorch rc: 0 stdout: stderr: Note: switching to 'fc3ea6d12e110fb301228cf1a067d84f30eacfd5'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by switching back to a branch. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -c with the switch command. Example: git switch -c Or undo this operation with: git switch - Turn off this advice by setting config variable advice.detachedHead to false HEAD is now at fc3ea6d automatic import of pytorch Running: copr-distgit-client sources cmd: ['copr-distgit-client', 'sources'] cwd: /var/lib/copr-rpmbuild/workspace/workdir-xczt64da/pytorch rc: 0 stdout: stderr: INFO: Reading stdout from command: git rev-parse --abbrev-ref HEAD INFO: Reading stdout from command: git rev-parse HEAD INFO: Reading sources specification file: sources /usr/bin/tail: /var/lib/copr-rpmbuild/main.log: file truncated Running (timeout=172800): unbuffer mock --spec /var/lib/copr-rpmbuild/workspace/workdir-xczt64da/pytorch/pytorch.spec --sources /var/lib/copr-rpmbuild/workspace/workdir-xczt64da/pytorch --resultdir /var/lib/copr-rpmbuild/results --uniqueext 1712885339.434030 -r /var/lib/copr-rpmbuild/results/configs/child.cfg INFO: mock.py version 5.5 starting (python version = 3.12.1, NVR = mock-5.5-1.fc39), args: /usr/libexec/mock/mock --spec /var/lib/copr-rpmbuild/workspace/workdir-xczt64da/pytorch/pytorch.spec --sources /var/lib/copr-rpmbuild/workspace/workdir-xczt64da/pytorch --resultdir /var/lib/copr-rpmbuild/results --uniqueext 1712885339.434030 -r /var/lib/copr-rpmbuild/results/configs/child.cfg Start(bootstrap): init plugins INFO: tmpfs initialized INFO: selinux enabled INFO: chroot_scan: initialized INFO: compress_logs: initialized Finish(bootstrap): init plugins Start: init plugins INFO: tmpfs initialized INFO: selinux enabled INFO: chroot_scan: initialized INFO: compress_logs: initialized Finish: init plugins INFO: Signal handler active Start: run INFO: Start(/var/lib/copr-rpmbuild/workspace/workdir-xczt64da/pytorch/pytorch.spec) Config(fedora-rawhide-x86_64) Start: clean chroot Finish: clean chroot Mock Version: 5.5 INFO: Mock Version: 5.5 Start(bootstrap): chroot init INFO: mounting tmpfs at /var/lib/mock/fedora-rawhide-x86_64-bootstrap-1712885339.434030/root. INFO: calling preinit hooks INFO: enabled root cache INFO: enabled package manager cache Start(bootstrap): cleaning package manager metadata Finish(bootstrap): cleaning package manager metadata INFO: Guessed host environment type: unknown INFO: Using bootstrap image: registry.fedoraproject.org/fedora:rawhide INFO: Pulling image: registry.fedoraproject.org/fedora:rawhide INFO: Copy content of container registry.fedoraproject.org/fedora:rawhide to /var/lib/mock/fedora-rawhide-x86_64-bootstrap-1712885339.434030/root INFO: Checking that registry.fedoraproject.org/fedora:rawhide image matches host's architecture INFO: mounting registry.fedoraproject.org/fedora:rawhide with podman image mount INFO: image registry.fedoraproject.org/fedora:rawhide as /var/lib/containers/storage/overlay/e09a789583cac2742d1ad9889fffe50874aee577bf209fe70faa9f164049bc79/merged INFO: umounting image registry.fedoraproject.org/fedora:rawhide (/var/lib/containers/storage/overlay/e09a789583cac2742d1ad9889fffe50874aee577bf209fe70faa9f164049bc79/merged) with podman image umount INFO: Using 'dnf' instead of 'dnf5' for bootstrap chroot INFO: Package manager dnf detected and used (fallback) INFO: Bootstrap image not marked ready Start(bootstrap): installing dnf5 tooling No matches found for the following disable plugin patterns: local, spacewalk, versionlock Copr repository 2.8 MB/s | 116 kB 00:00 Additional repo copr_rezso_CUDA 2.0 MB/s | 38 kB 00:00 Additional repo http_developer_download_nvidia_ 54 MB/s | 713 kB 00:00 Additional repo http_developer_download_nvidia_ 30 MB/s | 448 kB 00:00 Additional repo http_developer_download_nvidia_ 34 MB/s | 433 kB 00:00 fedora 40 MB/s | 20 MB 00:00 Dependencies resolved. ================================================================================ Package Architecture Version Repository Size ================================================================================ Installing: dnf5 x86_64 5.1.17-1.fc41 fedora 700 k dnf5-plugins x86_64 5.1.17-1.fc41 fedora 358 k Installing dependencies: fmt x86_64 10.2.1-4.fc41 fedora 125 k libdnf5 x86_64 5.1.17-1.fc41 fedora 997 k libdnf5-cli x86_64 5.1.17-1.fc41 fedora 231 k sdbus-cpp x86_64 1.5.0-2.fc41 fedora 113 k systemd-libs x86_64 255.4-1.fc41 fedora 708 k Transaction Summary ================================================================================ Install 7 Packages Total download size: 3.2 M Installed size: 8.8 M Downloading Packages: (1/7): fmt-10.2.1-4.fc41.x86_64.rpm 7.8 MB/s | 125 kB 00:00 (2/7): dnf5-plugins-5.1.17-1.fc41.x86_64.rpm 21 MB/s | 358 kB 00:00 (3/7): dnf5-5.1.17-1.fc41.x86_64.rpm 37 MB/s | 700 kB 00:00 (4/7): libdnf5-5.1.17-1.fc41.x86_64.rpm 139 MB/s | 997 kB 00:00 (5/7): libdnf5-cli-5.1.17-1.fc41.x86_64.rpm 34 MB/s | 231 kB 00:00 (6/7): sdbus-cpp-1.5.0-2.fc41.x86_64.rpm 14 MB/s | 113 kB 00:00 (7/7): systemd-libs-255.4-1.fc41.x86_64.rpm 61 MB/s | 708 kB 00:00 -------------------------------------------------------------------------------- Total 35 MB/s | 3.2 MB 00:00 Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Preparing : 1/1 Installing : fmt-10.2.1-4.fc41.x86_64 1/7 Installing : libdnf5-5.1.17-1.fc41.x86_64 2/7 Installing : libdnf5-cli-5.1.17-1.fc41.x86_64 3/7 Installing : systemd-libs-255.4-1.fc41.x86_64 4/7 Installing : sdbus-cpp-1.5.0-2.fc41.x86_64 5/7 Installing : dnf5-5.1.17-1.fc41.x86_64 6/7 Installing : dnf5-plugins-5.1.17-1.fc41.x86_64 7/7 Running scriptlet: dnf5-plugins-5.1.17-1.fc41.x86_64 7/7 Installed: dnf5-5.1.17-1.fc41.x86_64 dnf5-plugins-5.1.17-1.fc41.x86_64 fmt-10.2.1-4.fc41.x86_64 libdnf5-5.1.17-1.fc41.x86_64 libdnf5-cli-5.1.17-1.fc41.x86_64 sdbus-cpp-1.5.0-2.fc41.x86_64 systemd-libs-255.4-1.fc41.x86_64 Complete! INFO: Switching package manager from dnf to the dnf5 (direct choice) Finish(bootstrap): installing dnf5 tooling Start(bootstrap): creating root cache Finish(bootstrap): creating root cache Finish(bootstrap): chroot init Start: chroot init INFO: mounting tmpfs at /var/lib/mock/fedora-rawhide-x86_64-1712885339.434030/root. INFO: calling preinit hooks INFO: enabled root cache INFO: enabled package manager cache Start: cleaning package manager metadata Finish: cleaning package manager metadata INFO: enabled HW Info plugin INFO: Package manager dnf5 detected and used (direct choice) INFO: Buildroot is handled by package management downloaded with a bootstrap image: rpm-4.19.1.1-1.fc40.x86_64 rpm-sequoia-1.6.0-2.fc40.x86_64 python3-dnf-4.19.2-1.fc41.noarch yum-4.19.2-1.fc41.noarch dnf5-5.1.17-1.fc41.x86_64 dnf5-plugins-5.1.17-1.fc41.x86_64 Start: installing minimal buildroot with dnf5 Updating and loading repositories: fedora 100% | 30.8 MiB/s | 20.8 MiB | 00m01s Copr repository 100% | 3.6 MiB/s | 117.7 KiB | 00m00s Additional repo copr_rezso_CUDA 100% | 1.4 MiB/s | 40.3 KiB | 00m00s Additional repo http_developer_downloa 100% | 47.3 MiB/s | 727.0 KiB | 00m00s Additional repo http_developer_downloa 100% | 48.1 MiB/s | 492.7 KiB | 00m00s Additional repo http_developer_downloa 100% | 43.4 MiB/s | 444.3 KiB | 00m00s Repositories loaded. Package Arch Version Repository Size Installing group/module packages: bash x86_64 5.2.26-3.fc40 fedora 8.1 MiB bzip2 x86_64 1.0.8-18.fc40 fedora 91.7 KiB coreutils x86_64 9.5-1.fc41 fedora 5.5 MiB cpio x86_64 2.15-1.fc40 fedora 1.1 MiB diffutils x86_64 3.10-5.fc40 fedora 1.6 MiB fedora-release-common noarch 41-0.6 fedora 19.2 KiB findutils x86_64 1:4.9.0-8.fc40 fedora 1.5 MiB gawk x86_64 5.3.0-3.fc40 fedora 1.7 MiB glibc-minimal-langpack x86_64 2.39.9000-10.fc41 fedora 0.0 B grep x86_64 3.11-7.fc40 fedora 1.0 MiB gzip x86_64 1.13-1.fc40 fedora 385.0 KiB info x86_64 7.1-2.fc40 fedora 357.8 KiB patch x86_64 2.7.6-24.fc40 fedora 262.8 KiB redhat-rpm-config noarch 287-1.fc41 fedora 185.4 KiB rpm-build x86_64 4.19.1.1-1.fc40 fedora 173.7 KiB sed x86_64 4.9-1.fc40 fedora 861.5 KiB shadow-utils x86_64 2:4.15.1-2.fc41 fedora 4.1 MiB tar x86_64 2:1.35-3.fc40 fedora 2.9 MiB unzip x86_64 6.0-63.fc40 fedora 382.8 KiB util-linux x86_64 2.40-13.fc41 fedora 3.7 MiB which x86_64 2.21-41.fc40 fedora 80.2 KiB xz x86_64 1:5.4.6-3.fc41 fedora 2.0 MiB Installing dependencies: alternatives x86_64 1.26-3.fc40 fedora 62.3 KiB ansible-srpm-macros noarch 1-14.fc40 fedora 35.7 KiB audit-libs x86_64 4.0.1-1.fc41 fedora 327.3 KiB authselect x86_64 1.5.0-5.fc41 fedora 153.6 KiB authselect-libs x86_64 1.5.0-5.fc41 fedora 818.2 KiB basesystem noarch 11-20.fc40 fedora 0.0 B binutils x86_64 2.42.50-6.fc41 fedora 27.2 MiB binutils-gold x86_64 2.42.50-6.fc41 fedora 2.0 MiB bzip2-libs x86_64 1.0.8-18.fc40 fedora 80.7 KiB ca-certificates noarch 2023.2.62_v7.0.401-6.fc40 fedora 2.3 MiB coreutils-common x86_64 9.5-1.fc41 fedora 11.2 MiB cracklib x86_64 2.9.11-5.fc40 fedora 238.9 KiB crypto-policies noarch 20240320-1.git58e3d95.fc41 fedora 119.2 KiB curl x86_64 8.7.1-1.fc41 fedora 758.1 KiB cyrus-sasl-lib x86_64 2.1.28-19.fc40 fedora 2.3 MiB debugedit x86_64 5.0-14.fc40 fedora 199.0 KiB dwz x86_64 0.15-6.fc40 fedora 290.9 KiB ed x86_64 1.20.1-1.fc41 fedora 146.5 KiB efi-srpm-macros noarch 5-11.fc40 fedora 40.1 KiB elfutils x86_64 0.191-5.fc41 fedora 2.5 MiB elfutils-debuginfod-client x86_64 0.191-5.fc41 fedora 64.9 KiB elfutils-default-yama-scope noarch 0.191-5.fc41 fedora 1.8 KiB elfutils-libelf x86_64 0.191-5.fc41 fedora 1.2 MiB elfutils-libs x86_64 0.191-5.fc41 fedora 646.2 KiB fedora-gpg-keys noarch 41-0.1 fedora 125.0 KiB fedora-release noarch 41-0.6 fedora 0.0 B fedora-release-identity-basic noarch 41-0.6 fedora 694.0 B fedora-repos noarch 41-0.1 fedora 4.9 KiB fedora-repos-rawhide noarch 41-0.1 fedora 2.2 KiB file x86_64 5.45-5.fc41 fedora 103.5 KiB file-libs x86_64 5.45-5.fc41 fedora 9.9 MiB filesystem x86_64 3.18-8.fc40 fedora 106.0 B fonts-srpm-macros noarch 1:2.0.5-14.fc40 fedora 55.3 KiB forge-srpm-macros noarch 0.3.1-1.fc41 fedora 39.0 KiB fpc-srpm-macros noarch 1.3-12.fc40 fedora 144.0 B gdb-minimal x86_64 14.2-1.fc41 fedora 12.7 MiB gdbm x86_64 1:1.23-6.fc40 fedora 460.9 KiB gdbm-libs x86_64 1:1.23-6.fc40 fedora 121.9 KiB ghc-srpm-macros noarch 1.9.1-1.fc41 fedora 747.0 B glibc x86_64 2.39.9000-10.fc41 fedora 6.7 MiB glibc-common x86_64 2.39.9000-10.fc41 fedora 1.0 MiB glibc-gconv-extra x86_64 2.39.9000-10.fc41 fedora 7.8 MiB gmp x86_64 1:6.3.0-1.fc41 fedora 803.4 KiB gnat-srpm-macros noarch 6-5.fc40 fedora 1.0 KiB go-srpm-macros noarch 3.5.0-1.fc41 fedora 60.6 KiB jansson x86_64 2.13.1-9.fc40 fedora 88.3 KiB kernel-srpm-macros noarch 1.0-23.fc41 fedora 1.9 KiB keyutils-libs x86_64 1.6.3-3.fc40 fedora 54.4 KiB krb5-libs x86_64 1.21.2-5.fc40 fedora 2.3 MiB libacl x86_64 2.3.2-1.fc40 fedora 40.0 KiB libarchive x86_64 3.7.2-3.fc41 fedora 914.6 KiB libattr x86_64 2.5.2-3.fc40 fedora 28.5 KiB libblkid x86_64 2.40-13.fc41 fedora 262.5 KiB libbrotli x86_64 1.1.0-3.fc40 fedora 829.5 KiB libcap x86_64 2.69-8.fc41 fedora 219.7 KiB libcap-ng x86_64 0.8.5-1.fc41 fedora 69.1 KiB libcom_err x86_64 1.47.0-5.fc40 fedora 67.2 KiB libcurl x86_64 8.7.1-1.fc41 fedora 793.5 KiB libeconf x86_64 0.6.2-1.fc41 fedora 58.0 KiB libevent x86_64 2.1.12-12.fc40 fedora 895.6 KiB libfdisk x86_64 2.40-13.fc41 fedora 362.9 KiB libffi x86_64 3.4.6-1.fc41 fedora 82.4 KiB libgcc x86_64 14.0.1-0.13.fc41 fedora 270.6 KiB libgomp x86_64 14.0.1-0.13.fc41 fedora 518.9 KiB libidn2 x86_64 2.3.7-1.fc40 fedora 329.1 KiB libmount x86_64 2.40-13.fc41 fedora 351.8 KiB libnghttp2 x86_64 1.61.0-1.fc41 fedora 166.1 KiB libnsl2 x86_64 2.0.1-1.fc40 fedora 57.9 KiB libpkgconf x86_64 2.1.0-1.fc40 fedora 74.2 KiB libpsl x86_64 0.21.5-3.fc40 fedora 80.5 KiB libpwquality x86_64 1.4.5-9.fc40 fedora 417.8 KiB libselinux x86_64 3.6-4.fc40 fedora 173.0 KiB libsemanage x86_64 3.6-3.fc40 fedora 293.5 KiB libsepol x86_64 3.6-3.fc40 fedora 802.0 KiB libsmartcols x86_64 2.40-13.fc41 fedora 180.4 KiB libssh x86_64 0.10.6-6.fc41 fedora 513.3 KiB libssh-config noarch 0.10.6-6.fc41 fedora 277.0 B libstdc++ x86_64 14.0.1-0.13.fc41 fedora 2.8 MiB libtasn1 x86_64 4.19.0-6.fc40 fedora 175.7 KiB libtirpc x86_64 1.3.4-1.rc3.fc41 fedora 202.8 KiB libtool-ltdl x86_64 2.4.7-10.fc40 fedora 66.2 KiB libunistring x86_64 1.1-7.fc41 fedora 1.7 MiB libutempter x86_64 1.2.1-13.fc40 fedora 57.7 KiB libuuid x86_64 2.40-13.fc41 fedora 37.4 KiB libverto x86_64 0.3.2-8.fc40 fedora 29.5 KiB libxcrypt x86_64 4.4.36-5.fc40 fedora 262.8 KiB libxml2 x86_64 2.12.6-1.fc41 fedora 1.7 MiB libzstd x86_64 1.5.6-1.fc41 fedora 787.9 KiB lua-libs x86_64 5.4.6-5.fc40 fedora 281.1 KiB lua-srpm-macros noarch 1-13.fc40 fedora 1.3 KiB lz4-libs x86_64 1.9.4-6.fc40 fedora 129.4 KiB mpfr x86_64 4.2.1-3.fc40 fedora 832.0 KiB ncurses-base noarch 6.4-12.20240127.fc40 fedora 326.2 KiB ncurses-libs x86_64 6.4-12.20240127.fc40 fedora 963.2 KiB ocaml-srpm-macros noarch 9-3.fc40 fedora 1.9 KiB openblas-srpm-macros noarch 2-17.fc41 fedora 112.0 B openldap x86_64 2.6.7-1.fc40 fedora 635.1 KiB openssl-libs x86_64 1:3.2.1-6.fc41 fedora 7.8 MiB p11-kit x86_64 0.25.3-4.fc40 fedora 2.2 MiB p11-kit-trust x86_64 0.25.3-4.fc40 fedora 391.4 KiB package-notes-srpm-macros noarch 0.5-11.fc40 fedora 1.6 KiB pam x86_64 1.6.1-1.fc41 fedora 1.8 MiB pam-libs x86_64 1.6.1-1.fc41 fedora 135.0 KiB pcre2 x86_64 10.43-1.fc41 fedora 653.5 KiB pcre2-syntax noarch 10.43-1.fc41 fedora 249.0 KiB perl-srpm-macros noarch 1-53.fc40 fedora 861.0 B pkgconf x86_64 2.1.0-1.fc40 fedora 82.4 KiB pkgconf-m4 noarch 2.1.0-1.fc40 fedora 13.9 KiB pkgconf-pkg-config x86_64 2.1.0-1.fc40 fedora 989.0 B popt x86_64 1.19-6.fc40 fedora 136.9 KiB publicsuffix-list-dafsa noarch 20240107-3.fc40 fedora 67.5 KiB pyproject-srpm-macros noarch 1.12.0-1.fc40 fedora 1.5 KiB python-srpm-macros noarch 3.12-9.fc41 fedora 50.5 KiB qt5-srpm-macros noarch 5.15.13-1.fc41 fedora 492.0 B qt6-srpm-macros noarch 6.7.0-1.fc41 fedora 456.0 B readline x86_64 8.2-8.fc40 fedora 489.2 KiB rpm x86_64 4.19.1.1-1.fc40 fedora 3.0 MiB rpm-build-libs x86_64 4.19.1.1-1.fc40 fedora 198.4 KiB rpm-libs x86_64 4.19.1.1-1.fc40 fedora 709.9 KiB rpm-sequoia x86_64 1.6.0-2.fc40 fedora 2.2 MiB rust-srpm-macros noarch 26.2-1.fc41 fedora 4.8 KiB setup noarch 2.14.5-2.fc40 fedora 720.4 KiB sqlite-libs x86_64 3.45.2-1.fc41 fedora 1.4 MiB systemd-libs x86_64 255.4-1.fc41 fedora 1.9 MiB util-linux-core x86_64 2.40-13.fc41 fedora 1.5 MiB xxhash-libs x86_64 0.8.2-2.fc40 fedora 88.5 KiB xz-libs x86_64 1:5.4.6-3.fc41 fedora 209.8 KiB zig-srpm-macros noarch 1-2.fc40 fedora 1.1 KiB zip x86_64 3.0-40.fc40 fedora 703.2 KiB zlib-ng-compat x86_64 2.1.6-2.fc40 fedora 134.0 KiB zstd x86_64 1.5.6-1.fc41 fedora 1.7 MiB Installing groups: Buildsystem building group Transaction Summary: Installing: 153 packages Total size of inbound packages is 53 MiB. Need to download 53 MiB. After this operation 179 MiB will be used (install 179 MiB, remove 0 B). [ 1/153] bzip2-0:1.0.8-18.fc40.x86_64 100% | 4.3 MiB/s | 52.4 KiB | 00m00s [ 2/153] coreutils-0:9.5-1.fc41.x86_64 100% | 73.1 MiB/s | 1.1 MiB | 00m00s [ 3/153] cpio-0:2.15-1.fc40.x86_64 100% | 57.1 MiB/s | 292.2 KiB | 00m00s [ 4/153] bash-0:5.2.26-3.fc40.x86_64 100% | 86.0 MiB/s | 1.8 MiB | 00m00s [ 5/153] diffutils-0:3.10-5.fc40.x86_6 100% | 66.0 MiB/s | 405.5 KiB | 00m00s [ 6/153] fedora-release-common-0:41-0. 100% | 5.2 MiB/s | 21.2 KiB | 00m00s [ 7/153] glibc-minimal-langpack-0:2.39 100% | 34.6 MiB/s | 106.2 KiB | 00m00s [ 8/153] findutils-1:4.9.0-8.fc40.x86_ 100% | 68.6 MiB/s | 491.9 KiB | 00m00s [ 9/153] grep-0:3.11-7.fc40.x86_64 100% | 48.9 MiB/s | 300.2 KiB | 00m00s [ 10/153] gzip-0:1.13-1.fc40.x86_64 100% | 27.8 MiB/s | 170.6 KiB | 00m00s [ 11/153] patch-0:2.7.6-24.fc40.x86_64 100% | 31.9 MiB/s | 130.7 KiB | 00m00s [ 12/153] info-0:7.1-2.fc40.x86_64 100% | 22.3 MiB/s | 182.3 KiB | 00m00s [ 13/153] redhat-rpm-config-0:287-1.fc4 100% | 11.6 MiB/s | 83.2 KiB | 00m00s [ 14/153] sed-0:4.9-1.fc40.x86_64 100% | 51.8 MiB/s | 318.2 KiB | 00m00s [ 15/153] rpm-build-0:4.19.1.1-1.fc40.x 100% | 8.5 MiB/s | 78.2 KiB | 00m00s [ 16/153] tar-2:1.35-3.fc40.x86_64 100% | 119.5 MiB/s | 856.6 KiB | 00m00s [ 17/153] shadow-utils-2:4.15.1-2.fc41. 100% | 88.2 MiB/s | 1.3 MiB | 00m00s [ 18/153] unzip-0:6.0-63.fc40.x86_64 100% | 16.4 MiB/s | 184.5 KiB | 00m00s [ 19/153] which-0:2.21-41.fc40.x86_64 100% | 4.0 MiB/s | 41.4 KiB | 00m00s [ 20/153] xz-1:5.4.6-3.fc41.x86_64 100% | 28.7 MiB/s | 557.5 KiB | 00m00s [ 21/153] util-linux-0:2.40-13.fc41.x86 100% | 63.2 MiB/s | 1.2 MiB | 00m00s [ 22/153] gawk-0:5.3.0-3.fc40.x86_64 100% | 36.9 MiB/s | 1.1 MiB | 00m00s [ 23/153] filesystem-0:3.18-8.fc40.x86_ 100% | 54.3 MiB/s | 1.1 MiB | 00m00s [ 24/153] ncurses-libs-0:6.4-12.2024012 100% | 23.2 MiB/s | 332.5 KiB | 00m00s [ 25/153] bzip2-libs-0:1.0.8-18.fc40.x8 100% | 2.7 MiB/s | 40.9 KiB | 00m00s [ 26/153] glibc-0:2.39.9000-10.fc41.x86 100% | 63.8 MiB/s | 2.2 MiB | 00m00s [ 27/153] gmp-1:6.3.0-1.fc41.x86_64 100% | 38.7 MiB/s | 316.8 KiB | 00m00s [ 28/153] libacl-0:2.3.2-1.fc40.x86_64 100% | 4.8 MiB/s | 24.4 KiB | 00m00s [ 29/153] libattr-0:2.5.2-3.fc40.x86_64 100% | 8.8 MiB/s | 18.0 KiB | 00m00s [ 30/153] libcap-0:2.69-8.fc41.x86_64 100% | 27.8 MiB/s | 85.5 KiB | 00m00s [ 31/153] coreutils-common-0:9.5-1.fc41 100% | 70.7 MiB/s | 2.1 MiB | 00m00s [ 32/153] libselinux-0:3.6-4.fc40.x86_6 100% | 10.7 MiB/s | 87.5 KiB | 00m00s [ 33/153] fedora-repos-0:41-0.1.noarch 100% | 4.6 MiB/s | 9.3 KiB | 00m00s [ 34/153] pcre2-0:10.43-1.fc41.x86_64 100% | 118.1 MiB/s | 241.9 KiB | 00m00s [ 35/153] glibc-common-0:2.39.9000-10.f 100% | 63.9 MiB/s | 392.5 KiB | 00m00s [ 36/153] openssl-libs-1:3.2.1-6.fc41.x 100% | 153.6 MiB/s | 2.3 MiB | 00m00s [ 37/153] ed-0:1.20.1-1.fc41.x86_64 100% | 16.0 MiB/s | 81.7 KiB | 00m00s [ 38/153] ansible-srpm-macros-0:1-14.fc 100% | 3.4 MiB/s | 20.8 KiB | 00m00s [ 39/153] dwz-0:0.15-6.fc40.x86_64 100% | 44.9 MiB/s | 137.8 KiB | 00m00s [ 40/153] efi-srpm-macros-0:5-11.fc40.n 100% | 5.4 MiB/s | 22.3 KiB | 00m00s [ 41/153] file-0:5.45-5.fc41.x86_64 100% | 12.0 MiB/s | 49.1 KiB | 00m00s [ 42/153] forge-srpm-macros-0:0.3.1-1.f 100% | 9.5 MiB/s | 19.4 KiB | 00m00s [ 43/153] fpc-srpm-macros-0:1.3-12.fc40 100% | 7.6 MiB/s | 7.8 KiB | 00m00s [ 44/153] fonts-srpm-macros-1:2.0.5-14. 100% | 5.2 MiB/s | 26.5 KiB | 00m00s [ 45/153] ghc-srpm-macros-0:1.9.1-1.fc4 100% | 8.8 MiB/s | 9.0 KiB | 00m00s [ 46/153] gnat-srpm-macros-0:6-5.fc40.n 100% | 2.9 MiB/s | 8.8 KiB | 00m00s [ 47/153] kernel-srpm-macros-0:1.0-23.f 100% | 4.8 MiB/s | 9.8 KiB | 00m00s [ 48/153] go-srpm-macros-0:3.5.0-1.fc41 100% | 6.7 MiB/s | 27.5 KiB | 00m00s [ 49/153] lua-srpm-macros-0:1-13.fc40.n 100% | 2.8 MiB/s | 8.7 KiB | 00m00s [ 50/153] openblas-srpm-macros-0:2-17.f 100% | 2.5 MiB/s | 7.7 KiB | 00m00s [ 51/153] package-notes-srpm-macros-0:0 100% | 2.4 MiB/s | 9.9 KiB | 00m00s [ 52/153] perl-srpm-macros-0:1-53.fc40. 100% | 2.0 MiB/s | 8.4 KiB | 00m00s [ 53/153] ocaml-srpm-macros-0:9-3.fc40. 100% | 1.1 MiB/s | 9.1 KiB | 00m00s [ 54/153] python-srpm-macros-0:3.12-9.f 100% | 4.7 MiB/s | 24.0 KiB | 00m00s [ 55/153] pyproject-srpm-macros-0:1.12. 100% | 1.9 MiB/s | 13.6 KiB | 00m00s [ 56/153] qt5-srpm-macros-0:5.15.13-1.f 100% | 1.2 MiB/s | 8.5 KiB | 00m00s [ 57/153] qt6-srpm-macros-0:6.7.0-1.fc4 100% | 2.9 MiB/s | 9.0 KiB | 00m00s [ 58/153] rust-srpm-macros-0:26.2-1.fc4 100% | 6.1 MiB/s | 12.6 KiB | 00m00s [ 59/153] rpm-0:4.19.1.1-1.fc40.x86_64 100% | 105.5 MiB/s | 540.1 KiB | 00m00s [ 60/153] zig-srpm-macros-0:1-2.fc40.no 100% | 2.0 MiB/s | 8.0 KiB | 00m00s [ 61/153] debugedit-0:5.0-14.fc40.x86_6 100% | 38.4 MiB/s | 78.7 KiB | 00m00s [ 62/153] zip-0:3.0-40.fc40.x86_64 100% | 43.1 MiB/s | 264.8 KiB | 00m00s [ 63/153] elfutils-0:0.191-5.fc41.x86_6 100% | 103.7 MiB/s | 530.7 KiB | 00m00s [ 64/153] elfutils-libelf-0:0.191-5.fc4 100% | 40.8 MiB/s | 208.7 KiB | 00m00s [ 65/153] popt-0:1.19-6.fc40.x86_64 100% | 16.3 MiB/s | 66.7 KiB | 00m00s [ 66/153] readline-0:8.2-8.fc40.x86_64 100% | 69.4 MiB/s | 213.3 KiB | 00m00s [ 67/153] rpm-build-libs-0:4.19.1.1-1.f 100% | 30.9 MiB/s | 95.0 KiB | 00m00s [ 68/153] rpm-libs-0:4.19.1.1-1.fc40.x8 100% | 60.3 MiB/s | 308.9 KiB | 00m00s [ 69/153] zstd-0:1.5.6-1.fc41.x86_64 100% | 93.6 MiB/s | 479.3 KiB | 00m00s [ 70/153] libeconf-0:0.6.2-1.fc41.x86_6 100% | 6.2 MiB/s | 31.9 KiB | 00m00s [ 71/153] libsemanage-0:3.6-3.fc40.x86_ 100% | 28.4 MiB/s | 116.4 KiB | 00m00s [ 72/153] audit-libs-0:4.0.1-1.fc41.x86 100% | 13.6 MiB/s | 125.6 KiB | 00m00s [ 73/153] pam-libs-0:1.6.1-1.fc41.x86_6 100% | 18.5 MiB/s | 56.9 KiB | 00m00s [ 74/153] libxcrypt-0:4.4.36-5.fc40.x86 100% | 14.4 MiB/s | 118.1 KiB | 00m00s [ 75/153] setup-0:2.14.5-2.fc40.noarch 100% | 21.6 MiB/s | 154.7 KiB | 00m00s [ 76/153] xz-libs-1:5.4.6-3.fc41.x86_64 100% | 13.5 MiB/s | 110.2 KiB | 00m00s [ 77/153] libblkid-0:2.40-13.fc41.x86_6 100% | 24.3 MiB/s | 124.3 KiB | 00m00s [ 78/153] libcap-ng-0:0.8.5-1.fc41.x86_ 100% | 10.5 MiB/s | 32.3 KiB | 00m00s [ 79/153] mpfr-0:4.2.1-3.fc40.x86_64 100% | 34.1 MiB/s | 349.0 KiB | 00m00s [ 80/153] libfdisk-0:2.40-13.fc41.x86_6 100% | 38.9 MiB/s | 159.3 KiB | 00m00s [ 81/153] libmount-0:2.40-13.fc41.x86_6 100% | 30.2 MiB/s | 154.7 KiB | 00m00s [ 82/153] libsmartcols-0:2.40-13.fc41.x 100% | 27.1 MiB/s | 83.3 KiB | 00m00s [ 83/153] libuuid-0:2.40-13.fc41.x86_64 100% | 9.3 MiB/s | 28.4 KiB | 00m00s [ 84/153] systemd-libs-0:255.4-1.fc41.x 100% | 138.2 MiB/s | 707.8 KiB | 00m00s [ 85/153] libutempter-0:1.2.1-13.fc40.x 100% | 2.9 MiB/s | 26.4 KiB | 00m00s [ 86/153] zlib-ng-compat-0:2.1.6-2.fc40 100% | 12.5 MiB/s | 77.1 KiB | 00m00s [ 87/153] util-linux-core-0:2.40-13.fc4 100% | 52.3 MiB/s | 536.0 KiB | 00m00s [ 88/153] glibc-gconv-extra-0:2.39.9000 100% | 152.7 MiB/s | 1.7 MiB | 00m00s [ 89/153] libgcc-0:14.0.1-0.13.fc41.x86 100% | 20.0 MiB/s | 122.8 KiB | 00m00s [ 90/153] basesystem-0:11-20.fc40.noarc 100% | 1.0 MiB/s | 7.2 KiB | 00m00s [ 91/153] ncurses-base-0:6.4-12.2024012 100% | 17.4 MiB/s | 88.9 KiB | 00m00s [ 92/153] libsepol-0:3.6-3.fc40.x86_64 100% | 47.4 MiB/s | 340.1 KiB | 00m00s [ 93/153] ca-certificates-0:2023.2.62_v 100% | 105.2 MiB/s | 862.1 KiB | 00m00s [ 94/153] crypto-policies-0:20240320-1. 100% | 14.8 MiB/s | 90.8 KiB | 00m00s [ 95/153] fedora-repos-rawhide-0:41-0.1 100% | 4.4 MiB/s | 8.9 KiB | 00m00s [ 96/153] fedora-gpg-keys-0:41-0.1.noar 100% | 32.2 MiB/s | 131.8 KiB | 00m00s [ 97/153] pcre2-syntax-0:10.43-1.fc41.n 100% | 48.4 MiB/s | 148.8 KiB | 00m00s [ 98/153] curl-0:8.7.1-1.fc41.x86_64 100% | 42.7 MiB/s | 305.9 KiB | 00m00s [ 99/153] libarchive-0:3.7.2-3.fc41.x86 100% | 56.7 MiB/s | 406.6 KiB | 00m00s [100/153] file-libs-0:5.45-5.fc41.x86_6 100% | 74.5 MiB/s | 763.0 KiB | 00m00s [101/153] elfutils-debuginfod-client-0: 100% | 9.4 MiB/s | 38.3 KiB | 00m00s [102/153] elfutils-libs-0:0.191-5.fc41. 100% | 36.1 MiB/s | 258.5 KiB | 00m00s [103/153] libstdc++-0:14.0.1-0.13.fc41. 100% | 95.6 MiB/s | 880.8 KiB | 00m00s [104/153] libzstd-0:1.5.6-1.fc41.x86_64 100% | 30.2 MiB/s | 308.9 KiB | 00m00s [105/153] libgomp-0:14.0.1-0.13.fc41.x8 100% | 37.3 MiB/s | 343.5 KiB | 00m00s [106/153] lua-libs-0:5.4.6-5.fc40.x86_6 100% | 18.4 MiB/s | 131.9 KiB | 00m00s [107/153] sqlite-libs-0:3.45.2-1.fc41.x 100% | 114.9 MiB/s | 705.7 KiB | 00m00s [108/153] lz4-libs-0:1.9.4-6.fc40.x86_6 100% | 13.1 MiB/s | 67.2 KiB | 00m00s [109/153] rpm-sequoia-0:1.6.0-2.fc40.x8 100% | 82.8 MiB/s | 847.5 KiB | 00m00s [110/153] elfutils-default-yama-scope-0 100% | 4.4 MiB/s | 13.4 KiB | 00m00s [111/153] authselect-libs-0:1.5.0-5.fc4 100% | 26.7 MiB/s | 218.6 KiB | 00m00s [112/153] libxml2-0:2.12.6-1.fc41.x86_6 100% | 55.8 MiB/s | 686.3 KiB | 00m00s [113/153] pam-0:1.6.1-1.fc41.x86_64 100% | 54.1 MiB/s | 553.5 KiB | 00m00s [114/153] authselect-0:1.5.0-5.fc41.x86 100% | 20.4 MiB/s | 146.2 KiB | 00m00s [115/153] gdbm-libs-1:1.23-6.fc40.x86_6 100% | 9.1 MiB/s | 56.2 KiB | 00m00s [116/153] libnsl2-0:2.0.1-1.fc40.x86_64 100% | 4.8 MiB/s | 29.6 KiB | 00m00s [117/153] libpwquality-0:1.4.5-9.fc40.x 100% | 39.0 MiB/s | 119.7 KiB | 00m00s [118/153] libtirpc-0:1.3.4-1.rc3.fc41.x 100% | 22.6 MiB/s | 92.5 KiB | 00m00s [119/153] cracklib-0:2.9.11-5.fc40.x86_ 100% | 22.6 MiB/s | 92.5 KiB | 00m00s [120/153] krb5-libs-0:1.21.2-5.fc40.x86 100% | 123.1 MiB/s | 756.1 KiB | 00m00s [121/153] libcom_err-0:1.47.0-5.fc40.x8 100% | 5.0 MiB/s | 25.4 KiB | 00m00s [122/153] keyutils-libs-0:1.6.3-3.fc40. 100% | 10.2 MiB/s | 31.5 KiB | 00m00s [123/153] libverto-0:0.3.2-8.fc40.x86_6 100% | 6.7 MiB/s | 20.5 KiB | 00m00s [124/153] alternatives-0:1.26-3.fc40.x8 100% | 5.6 MiB/s | 39.9 KiB | 00m00s [125/153] binutils-gold-0:2.42.50-6.fc4 100% | 76.5 MiB/s | 783.2 KiB | 00m00s [126/153] jansson-0:2.13.1-9.fc40.x86_6 100% | 5.4 MiB/s | 44.2 KiB | 00m00s [127/153] pkgconf-pkg-config-0:2.1.0-1. 100% | 2.4 MiB/s | 9.7 KiB | 00m00s [128/153] pkgconf-0:2.1.0-1.fc40.x86_64 100% | 6.1 MiB/s | 43.5 KiB | 00m00s [129/153] pkgconf-m4-0:2.1.0-1.fc40.noa 100% | 1.5 MiB/s | 13.9 KiB | 00m00s [130/153] libpkgconf-0:2.1.0-1.fc40.x86 100% | 2.8 MiB/s | 37.8 KiB | 00m00s [131/153] gdbm-1:1.23-6.fc40.x86_64 100% | 14.9 MiB/s | 152.5 KiB | 00m00s [132/153] libffi-0:3.4.6-1.fc41.x86_64 100% | 3.6 MiB/s | 40.0 KiB | 00m00s [133/153] p11-kit-0:0.25.3-4.fc40.x86_6 100% | 34.2 MiB/s | 489.8 KiB | 00m00s [134/153] libtasn1-0:4.19.0-6.fc40.x86_ 100% | 7.2 MiB/s | 73.7 KiB | 00m00s [135/153] binutils-0:2.42.50-6.fc41.x86 100% | 88.2 MiB/s | 6.3 MiB | 00m00s [136/153] p11-kit-trust-0:0.25.3-4.fc40 100% | 6.4 MiB/s | 131.5 KiB | 00m00s [137/153] fedora-release-0:41-0.6.noarc 100% | 1.0 MiB/s | 10.7 KiB | 00m00s [138/153] fedora-release-identity-basic 100% | 11.2 MiB/s | 11.5 KiB | 00m00s [139/153] xxhash-libs-0:0.8.2-2.fc40.x8 100% | 12.0 MiB/s | 36.9 KiB | 00m00s [140/153] libcurl-0:8.7.1-1.fc41.x86_64 100% | 49.1 MiB/s | 352.2 KiB | 00m00s [141/153] libbrotli-0:1.1.0-3.fc40.x86_ 100% | 36.7 MiB/s | 338.4 KiB | 00m00s [142/153] libidn2-0:2.3.7-1.fc40.x86_64 100% | 16.6 MiB/s | 118.7 KiB | 00m00s [143/153] libnghttp2-0:1.61.0-1.fc41.x8 100% | 12.4 MiB/s | 76.3 KiB | 00m00s [144/153] gdb-minimal-0:14.2-1.fc41.x86 100% | 158.9 MiB/s | 4.3 MiB | 00m00s [145/153] libssh-0:0.10.6-6.fc41.x86_64 100% | 23.0 MiB/s | 211.5 KiB | 00m00s [146/153] libpsl-0:0.21.5-3.fc40.x86_64 100% | 4.2 MiB/s | 63.9 KiB | 00m00s [147/153] openldap-0:2.6.7-1.fc40.x86_6 100% | 35.5 MiB/s | 254.3 KiB | 00m00s [148/153] libunistring-0:1.1-7.fc41.x86 100% | 25.4 MiB/s | 545.4 KiB | 00m00s [149/153] libssh-config-0:0.10.6-6.fc41 100% | 338.4 KiB/s | 9.1 KiB | 00m00s [150/153] publicsuffix-list-dafsa-0:202 100% | 1.9 MiB/s | 58.1 KiB | 00m00s [151/153] libtool-ltdl-0:2.4.7-10.fc40. 100% | 7.1 MiB/s | 36.2 KiB | 00m00s [152/153] cyrus-sasl-lib-0:2.1.28-19.fc 100% | 40.6 MiB/s | 789.3 KiB | 00m00s [153/153] libevent-0:2.1.12-12.fc40.x86 100% | 31.4 MiB/s | 257.2 KiB | 00m00s -------------------------------------------------------------------------------- [153/153] Total 100% | 77.2 MiB/s | 52.8 MiB | 00m01s Running transaction Importing PGP key 0xE99D6AD1: Userid : "Fedora (41) " Fingerprint: 466CF2D8B60BC3057AA9453ED0622462E99D6AD1 From : file:///usr/share/distribution-gpg-keys/fedora/RPM-GPG-KEY-fedora-41-primary The key was successfully imported. Importing PGP key 0xE99D6AD1: Userid : "Fedora (41) " Fingerprint: 466CF2D8B60BC3057AA9453ED0622462E99D6AD1 From : file:///usr/share/distribution-gpg-keys/fedora/RPM-GPG-KEY-fedora-41-primary The key was successfully imported. Importing PGP key 0xA15B79CC: Userid : "Fedora (40) " Fingerprint: 115DF9AEF857853EE8445D0A0727707EA15B79CC From : file:///usr/share/distribution-gpg-keys/fedora/RPM-GPG-KEY-fedora-40-primary The key was successfully imported. [ 1/155] Verify package files 100% | 725.0 B/s | 153.0 B | 00m00s >>> Running pre-transaction scriptlet: filesystem-0:3.18-8.fc40.x86_64 >>> Stop pre-transaction scriptlet: filesystem-0:3.18-8.fc40.x86_64 [ 2/155] Prepare transaction 100% | 4.7 KiB/s | 153.0 B | 00m00s [ 3/155] Installing libgcc-0:14.0.1-0. 100% | 265.9 MiB/s | 272.3 KiB | 00m00s >>> Running post-install scriptlet: libgcc-0:14.0.1-0.13.fc41.x86_64 >>> Stop post-install scriptlet: libgcc-0:14.0.1-0.13.fc41.x86_64 [ 4/155] Installing crypto-policies-0: 100% | 47.6 MiB/s | 146.2 KiB | 00m00s >>> Running post-install scriptlet: crypto-policies-0:20240320-1.git58e3d95.fc41 >>> Stop post-install scriptlet: crypto-policies-0:20240320-1.git58e3d95.fc41.no [ 5/155] Installing fedora-release-ide 100% | 0.0 B/s | 952.0 B | 00m00s [ 6/155] Installing fedora-repos-rawhi 100% | 0.0 B/s | 2.4 KiB | 00m00s [ 7/155] Installing fedora-gpg-keys-0: 100% | 55.4 MiB/s | 170.1 KiB | 00m00s [ 8/155] Installing fedora-repos-0:41- 100% | 0.0 B/s | 5.7 KiB | 00m00s [ 9/155] Installing fedora-release-com 100% | 22.7 MiB/s | 23.3 KiB | 00m00s [ 10/155] Installing fedora-release-0:4 100% | 0.0 B/s | 124.0 B | 00m00s [ 11/155] Installing setup-0:2.14.5-2.f 100% | 64.4 MiB/s | 725.8 KiB | 00m00s >>> Running post-install scriptlet: setup-0:2.14.5-2.fc40.noarch >>> Stop post-install scriptlet: setup-0:2.14.5-2.fc40.noarch [ 12/155] Installing filesystem-0:3.18- 100% | 3.8 MiB/s | 212.4 KiB | 00m00s [ 13/155] Installing basesystem-0:11-20 100% | 0.0 B/s | 124.0 B | 00m00s [ 14/155] Installing libssh-config-0:0. 100% | 0.0 B/s | 816.0 B | 00m00s [ 15/155] Installing publicsuffix-list- 100% | 0.0 B/s | 68.3 KiB | 00m00s [ 16/155] Installing pkgconf-m4-0:2.1.0 100% | 0.0 B/s | 14.3 KiB | 00m00s [ 17/155] Installing pcre2-syntax-0:10. 100% | 245.6 MiB/s | 251.5 KiB | 00m00s [ 18/155] Installing ncurses-base-0:6.4 100% | 114.5 MiB/s | 351.6 KiB | 00m00s [ 19/155] Installing glibc-minimal-lang 100% | 0.0 B/s | 124.0 B | 00m00s [ 20/155] Installing ncurses-libs-0:6.4 100% | 236.7 MiB/s | 969.7 KiB | 00m00s >>> Running pre-install scriptlet: glibc-0:2.39.9000-10.fc41.x86_64 >>> Stop pre-install scriptlet: glibc-0:2.39.9000-10.fc41.x86_64 [ 21/155] Installing glibc-0:2.39.9000- 100% | 248.2 MiB/s | 6.7 MiB | 00m00s >>> Running post-install scriptlet: glibc-0:2.39.9000-10.fc41.x86_64 >>> Stop post-install scriptlet: glibc-0:2.39.9000-10.fc41.x86_64 [ 22/155] Installing bash-0:5.2.26-3.fc 100% | 453.4 MiB/s | 8.2 MiB | 00m00s >>> Running post-install scriptlet: bash-0:5.2.26-3.fc40.x86_64 >>> Stop post-install scriptlet: bash-0:5.2.26-3.fc40.x86_64 [ 23/155] Installing glibc-common-0:2.3 100% | 204.2 MiB/s | 1.0 MiB | 00m00s [ 24/155] Installing glibc-gconv-extra- 100% | 291.4 MiB/s | 7.9 MiB | 00m00s >>> Running post-install scriptlet: glibc-gconv-extra-0:2.39.9000-10.fc41.x86_64 >>> Stop post-install scriptlet: glibc-gconv-extra-0:2.39.9000-10.fc41.x86_64 [ 25/155] Installing zlib-ng-compat-0:2 100% | 131.7 MiB/s | 134.8 KiB | 00m00s [ 26/155] Installing xz-libs-1:5.4.6-3. 100% | 206.0 MiB/s | 210.9 KiB | 00m00s [ 27/155] Installing bzip2-libs-0:1.0.8 100% | 79.9 MiB/s | 81.8 KiB | 00m00s [ 28/155] Installing popt-0:1.19-6.fc40 100% | 70.1 MiB/s | 143.5 KiB | 00m00s [ 29/155] Installing readline-0:8.2-8.f 100% | 239.9 MiB/s | 491.4 KiB | 00m00s [ 30/155] Installing libuuid-0:2.40-13. 100% | 37.6 MiB/s | 38.5 KiB | 00m00s [ 31/155] Installing libstdc++-0:14.0.1 100% | 394.6 MiB/s | 2.8 MiB | 00m00s [ 32/155] Installing libzstd-0:1.5.6-1. 100% | 385.3 MiB/s | 789.2 KiB | 00m00s [ 33/155] Installing elfutils-libelf-0: 100% | 389.8 MiB/s | 1.2 MiB | 00m00s [ 34/155] Installing libblkid-0:2.40-13 100% | 257.5 MiB/s | 263.7 KiB | 00m00s [ 35/155] Installing gmp-1:6.3.0-1.fc41 100% | 393.4 MiB/s | 805.6 KiB | 00m00s [ 36/155] Installing libattr-0:2.5.2-3. 100% | 0.0 B/s | 29.5 KiB | 00m00s [ 37/155] Installing libacl-0:2.3.2-1.f 100% | 0.0 B/s | 40.8 KiB | 00m00s [ 38/155] Installing libxcrypt-0:4.4.36 100% | 259.3 MiB/s | 265.5 KiB | 00m00s [ 39/155] Installing libeconf-0:0.6.2-1 100% | 58.3 MiB/s | 59.6 KiB | 00m00s [ 40/155] Installing lz4-libs-0:1.9.4-6 100% | 127.4 MiB/s | 130.5 KiB | 00m00s [ 41/155] Installing gdbm-libs-1:1.23-6 100% | 120.7 MiB/s | 123.6 KiB | 00m00s [ 42/155] Installing mpfr-0:4.2.1-3.fc4 100% | 407.0 MiB/s | 833.5 KiB | 00m00s [ 43/155] Installing gawk-0:5.3.0-3.fc4 100% | 345.6 MiB/s | 1.7 MiB | 00m00s [ 44/155] Installing dwz-0:0.15-6.fc40. 100% | 285.5 MiB/s | 292.3 KiB | 00m00s [ 45/155] Installing unzip-0:6.0-63.fc4 100% | 377.3 MiB/s | 386.3 KiB | 00m00s [ 46/155] Installing file-libs-0:5.45-5 100% | 763.9 MiB/s | 9.9 MiB | 00m00s [ 47/155] Installing file-0:5.45-5.fc41 100% | 102.6 MiB/s | 105.0 KiB | 00m00s [ 48/155] Installing pcre2-0:10.43-1.fc 100% | 319.8 MiB/s | 654.9 KiB | 00m00s [ 49/155] Installing grep-0:3.11-7.fc40 100% | 250.8 MiB/s | 1.0 MiB | 00m00s [ 50/155] Installing xz-1:5.4.6-3.fc41. 100% | 333.8 MiB/s | 2.0 MiB | 00m00s [ 51/155] Installing libcap-ng-0:0.8.5- 100% | 69.3 MiB/s | 71.0 KiB | 00m00s [ 52/155] Installing audit-libs-0:4.0.1 100% | 321.7 MiB/s | 329.5 KiB | 00m00s [ 53/155] Installing pam-libs-0:1.6.1-1 100% | 134.2 MiB/s | 137.4 KiB | 00m00s [ 54/155] Installing libcap-0:2.69-8.fc 100% | 109.7 MiB/s | 224.7 KiB | 00m00s [ 55/155] Installing systemd-libs-0:255 100% | 386.1 MiB/s | 1.9 MiB | 00m00s [ 56/155] Installing libsmartcols-0:2.4 100% | 177.3 MiB/s | 181.5 KiB | 00m00s [ 57/155] Installing libsepol-0:3.6-3.f 100% | 392.1 MiB/s | 803.0 KiB | 00m00s [ 58/155] Installing libselinux-0:3.6-4 100% | 170.2 MiB/s | 174.3 KiB | 00m00s [ 59/155] Installing sed-0:4.9-1.fc40.x 100% | 283.1 MiB/s | 869.7 KiB | 00m00s [ 60/155] Installing findutils-1:4.9.0- 100% | 366.5 MiB/s | 1.5 MiB | 00m00s [ 61/155] Installing libmount-0:2.40-13 100% | 344.6 MiB/s | 352.9 KiB | 00m00s [ 62/155] Installing lua-libs-0:5.4.6-5 100% | 275.7 MiB/s | 282.3 KiB | 00m00s [ 63/155] Installing libcom_err-0:1.47. 100% | 0.0 B/s | 68.3 KiB | 00m00s [ 64/155] Installing alternatives-0:1.2 100% | 0.0 B/s | 64.0 KiB | 00m00s [ 65/155] Installing jansson-0:2.13.1-9 100% | 87.6 MiB/s | 89.7 KiB | 00m00s [ 66/155] Installing libtasn1-0:4.19.0- 100% | 173.3 MiB/s | 177.5 KiB | 00m00s [ 67/155] Installing libunistring-0:1.1 100% | 346.1 MiB/s | 1.7 MiB | 00m00s [ 68/155] Installing libidn2-0:2.3.7-1. 100% | 163.6 MiB/s | 335.0 KiB | 00m00s [ 69/155] Installing libpsl-0:0.21.5-3. 100% | 79.7 MiB/s | 81.6 KiB | 00m00s [ 70/155] Installing util-linux-core-0: 100% | 296.8 MiB/s | 1.5 MiB | 00m00s [ 71/155] Installing tar-2:1.35-3.fc40. 100% | 421.5 MiB/s | 3.0 MiB | 00m00s [ 72/155] Installing libsemanage-0:3.6- 100% | 144.2 MiB/s | 295.3 KiB | 00m00s [ 73/155] Installing shadow-utils-2:4.1 100% | 181.3 MiB/s | 4.2 MiB | 00m00s >>> Running pre-install scriptlet: libutempter-0:1.2.1-13.fc40.x86_64 >>> Stop pre-install scriptlet: libutempter-0:1.2.1-13.fc40.x86_64 [ 74/155] Installing libutempter-0:1.2. 100% | 58.3 MiB/s | 59.7 KiB | 00m00s [ 75/155] Installing zip-0:3.0-40.fc40. 100% | 345.3 MiB/s | 707.1 KiB | 00m00s [ 76/155] Installing gdbm-1:1.23-6.fc40 100% | 227.4 MiB/s | 465.8 KiB | 00m00s [ 77/155] Installing cyrus-sasl-lib-0:2 100% | 380.5 MiB/s | 2.3 MiB | 00m00s [ 78/155] Installing zstd-0:1.5.6-1.fc4 100% | 419.0 MiB/s | 1.7 MiB | 00m00s [ 79/155] Installing libfdisk-0:2.40-13 100% | 355.4 MiB/s | 363.9 KiB | 00m00s [ 80/155] Installing bzip2-0:1.0.8-18.f 100% | 93.9 MiB/s | 96.2 KiB | 00m00s [ 81/155] Installing libxml2-0:2.12.6-1 100% | 425.0 MiB/s | 1.7 MiB | 00m00s [ 82/155] Installing sqlite-libs-0:3.45 100% | 351.3 MiB/s | 1.4 MiB | 00m00s [ 83/155] Installing ed-0:1.20.1-1.fc41 100% | 145.3 MiB/s | 148.8 KiB | 00m00s [ 84/155] Installing patch-0:2.7.6-24.f 100% | 258.1 MiB/s | 264.3 KiB | 00m00s [ 85/155] Installing elfutils-default-y 100% | 681.0 KiB/s | 2.0 KiB | 00m00s >>> Running post-install scriptlet: elfutils-default-yama-scope-0:0.191-5.fc41.n >>> Stop post-install scriptlet: elfutils-default-yama-scope-0:0.191-5.fc41.noar [ 86/155] Installing cpio-0:2.15-1.fc40 100% | 274.9 MiB/s | 1.1 MiB | 00m00s [ 87/155] Installing diffutils-0:3.10-5 100% | 317.2 MiB/s | 1.6 MiB | 00m00s [ 88/155] Installing libgomp-0:14.0.1-0 100% | 508.1 MiB/s | 520.3 KiB | 00m00s [ 89/155] Installing keyutils-libs-0:1. 100% | 0.0 B/s | 55.8 KiB | 00m00s [ 90/155] Installing libverto-0:0.3.2-8 100% | 0.0 B/s | 31.3 KiB | 00m00s [ 91/155] Installing libpkgconf-0:2.1.0 100% | 73.6 MiB/s | 75.3 KiB | 00m00s [ 92/155] Installing pkgconf-0:2.1.0-1. 100% | 82.9 MiB/s | 84.9 KiB | 00m00s [ 93/155] Installing pkgconf-pkg-config 100% | 0.0 B/s | 1.8 KiB | 00m00s [ 94/155] Installing libffi-0:3.4.6-1.f 100% | 81.8 MiB/s | 83.8 KiB | 00m00s [ 95/155] Installing p11-kit-0:0.25.3-4 100% | 313.5 MiB/s | 2.2 MiB | 00m00s [ 96/155] Installing p11-kit-trust-0:0. 100% | 64.0 MiB/s | 393.1 KiB | 00m00s >>> Running post-install scriptlet: p11-kit-trust-0:0.25.3-4.fc40.x86_64 >>> Stop post-install scriptlet: p11-kit-trust-0:0.25.3-4.fc40.x86_64 [ 97/155] Installing xxhash-libs-0:0.8. 100% | 87.8 MiB/s | 89.9 KiB | 00m00s [ 98/155] Installing libbrotli-0:1.1.0- 100% | 270.8 MiB/s | 831.8 KiB | 00m00s [ 99/155] Installing libnghttp2-0:1.61. 100% | 163.3 MiB/s | 167.2 KiB | 00m00s [100/155] Installing libtool-ltdl-0:2.4 100% | 0.0 B/s | 67.3 KiB | 00m00s [101/155] Installing rust-srpm-macros-0 100% | 0.0 B/s | 5.6 KiB | 00m00s [102/155] Installing qt6-srpm-macros-0: 100% | 0.0 B/s | 732.0 B | 00m00s [103/155] Installing qt5-srpm-macros-0: 100% | 0.0 B/s | 768.0 B | 00m00s [104/155] Installing perl-srpm-macros-0 100% | 0.0 B/s | 1.1 KiB | 00m00s [105/155] Installing package-notes-srpm 100% | 0.0 B/s | 2.0 KiB | 00m00s [106/155] Installing openblas-srpm-macr 100% | 0.0 B/s | 392.0 B | 00m00s [107/155] Installing ocaml-srpm-macros- 100% | 0.0 B/s | 2.2 KiB | 00m00s [108/155] Installing kernel-srpm-macros 100% | 0.0 B/s | 2.3 KiB | 00m00s [109/155] Installing gnat-srpm-macros-0 100% | 0.0 B/s | 1.3 KiB | 00m00s [110/155] Installing ghc-srpm-macros-0: 100% | 0.0 B/s | 1.0 KiB | 00m00s [111/155] Installing fpc-srpm-macros-0: 100% | 0.0 B/s | 420.0 B | 00m00s [112/155] Installing ansible-srpm-macro 100% | 35.4 MiB/s | 36.2 KiB | 00m00s [113/155] Installing coreutils-common-0 100% | 466.3 MiB/s | 11.2 MiB | 00m00s [114/155] Installing openssl-libs-1:3.2 100% | 487.2 MiB/s | 7.8 MiB | 00m00s [115/155] Installing coreutils-0:9.5-1. 100% | 349.4 MiB/s | 5.6 MiB | 00m00s >>> Running pre-install scriptlet: ca-certificates-0:2023.2.62_v7.0.401-6.fc40.n >>> Stop pre-install scriptlet: ca-certificates-0:2023.2.62_v7.0.401-6.fc40.noar [116/155] Installing ca-certificates-0: 100% | 4.5 MiB/s | 2.3 MiB | 00m01s >>> Running post-install scriptlet: ca-certificates-0:2023.2.62_v7.0.401-6.fc40. >>> Stop post-install scriptlet: ca-certificates-0:2023.2.62_v7.0.401-6.fc40.noa [117/155] Installing krb5-libs-0:1.21.2 100% | 327.8 MiB/s | 2.3 MiB | 00m00s [118/155] Installing libtirpc-0:1.3.4-1 100% | 199.8 MiB/s | 204.6 KiB | 00m00s [119/155] Installing gzip-0:1.13-1.fc40 100% | 190.7 MiB/s | 390.6 KiB | 00m00s [120/155] Installing authselect-libs-0: 100% | 203.4 MiB/s | 833.2 KiB | 00m00s [121/155] Installing libarchive-0:3.7.2 100% | 298.4 MiB/s | 916.6 KiB | 00m00s [122/155] Installing authselect-0:1.5.0 100% | 154.2 MiB/s | 157.9 KiB | 00m00s [123/155] Installing cracklib-0:2.9.11- 100% | 81.5 MiB/s | 250.3 KiB | 00m00s [124/155] Installing libpwquality-0:1.4 100% | 140.0 MiB/s | 430.1 KiB | 00m00s [125/155] Installing libnsl2-0:2.0.1-1. 100% | 57.7 MiB/s | 59.0 KiB | 00m00s [126/155] Installing pam-0:1.6.1-1.fc41 100% | 181.5 MiB/s | 1.8 MiB | 00m00s [127/155] Installing libssh-0:0.10.6-6. 100% | 251.7 MiB/s | 515.4 KiB | 00m00s [128/155] Installing rpm-sequoia-0:1.6. 100% | 445.9 MiB/s | 2.2 MiB | 00m00s [129/155] Installing rpm-libs-0:4.19.1. 100% | 347.4 MiB/s | 711.4 KiB | 00m00s [130/155] Installing libevent-0:2.1.12- 100% | 292.8 MiB/s | 899.4 KiB | 00m00s [131/155] Installing openldap-0:2.6.7-1 100% | 312.0 MiB/s | 638.9 KiB | 00m00s [132/155] Installing libcurl-0:8.7.1-1. 100% | 388.0 MiB/s | 794.6 KiB | 00m00s [133/155] Installing elfutils-libs-0:0. 100% | 316.4 MiB/s | 648.0 KiB | 00m00s [134/155] Installing elfutils-debuginfo 100% | 65.3 MiB/s | 66.9 KiB | 00m00s [135/155] Installing binutils-gold-0:2. 100% | 203.1 MiB/s | 2.0 MiB | 00m00s >>> Running post-install scriptlet: binutils-gold-0:2.42.50-6.fc41.x86_64 >>> Stop post-install scriptlet: binutils-gold-0:2.42.50-6.fc41.x86_64 [136/155] Installing binutils-0:2.42.50 100% | 439.4 MiB/s | 27.2 MiB | 00m00s >>> Running post-install scriptlet: binutils-0:2.42.50-6.fc41.x86_64 >>> Stop post-install scriptlet: binutils-0:2.42.50-6.fc41.x86_64 [137/155] Installing elfutils-0:0.191-5 100% | 425.4 MiB/s | 2.6 MiB | 00m00s [138/155] Installing gdb-minimal-0:14.2 100% | 452.2 MiB/s | 12.7 MiB | 00m00s [139/155] Installing debugedit-0:5.0-14 100% | 197.0 MiB/s | 201.7 KiB | 00m00s [140/155] Installing rpm-build-libs-0:4 100% | 194.5 MiB/s | 199.2 KiB | 00m00s [141/155] Installing curl-0:8.7.1-1.fc4 100% | 92.8 MiB/s | 760.6 KiB | 00m00s >>> Running pre-install scriptlet: rpm-0:4.19.1.1-1.fc40.x86_64 >>> Stop pre-install scriptlet: rpm-0:4.19.1.1-1.fc40.x86_64 [142/155] Installing rpm-0:4.19.1.1-1.f 100% | 199.7 MiB/s | 2.4 MiB | 00m00s [143/155] Installing efi-srpm-macros-0: 100% | 0.0 B/s | 41.2 KiB | 00m00s [144/155] Installing lua-srpm-macros-0: 100% | 0.0 B/s | 1.9 KiB | 00m00s [145/155] Installing zig-srpm-macros-0: 100% | 0.0 B/s | 1.7 KiB | 00m00s [146/155] Installing fonts-srpm-macros- 100% | 0.0 B/s | 56.5 KiB | 00m00s [147/155] Installing forge-srpm-macros- 100% | 0.0 B/s | 40.3 KiB | 00m00s [148/155] Installing go-srpm-macros-0:3 100% | 60.2 MiB/s | 61.6 KiB | 00m00s [149/155] Installing python-srpm-macros 100% | 50.5 MiB/s | 51.7 KiB | 00m00s [150/155] Installing redhat-rpm-config- 100% | 187.4 MiB/s | 191.9 KiB | 00m00s [151/155] Installing rpm-build-0:4.19.1 100% | 88.8 MiB/s | 182.0 KiB | 00m00s [152/155] Installing pyproject-srpm-mac 100% | 2.0 MiB/s | 2.1 KiB | 00m00s [153/155] Installing util-linux-0:2.40- 100% | 207.6 MiB/s | 3.7 MiB | 00m00s >>> Running post-install scriptlet: util-linux-0:2.40-13.fc41.x86_64 >>> Stop post-install scriptlet: util-linux-0:2.40-13.fc41.x86_64 [154/155] Installing which-0:2.21-41.fc 100% | 80.5 MiB/s | 82.4 KiB | 00m00s [155/155] Installing info-0:7.1-2.fc40. 100% | 503.0 KiB/s | 358.2 KiB | 00m01s >>> Running post-transaction scriptlet: filesystem-0:3.18-8.fc40.x86_64 >>> Stop post-transaction scriptlet: filesystem-0:3.18-8.fc40.x86_64 >>> Running post-transaction scriptlet: ca-certificates-0:2023.2.62_v7.0.401-6.f >>> Stop post-transaction scriptlet: ca-certificates-0:2023.2.62_v7.0.401-6.fc40 >>> Running post-transaction scriptlet: authselect-libs-0:1.5.0-5.fc41.x86_64 >>> Stop post-transaction scriptlet: authselect-libs-0:1.5.0-5.fc41.x86_64 >>> Running post-transaction scriptlet: rpm-0:4.19.1.1-1.fc40.x86_64 >>> Stop post-transaction scriptlet: rpm-0:4.19.1.1-1.fc40.x86_64 >>> Running trigger-install scriptlet: glibc-common-0:2.39.9000-10.fc41.x86_64 >>> Stop trigger-install scriptlet: glibc-common-0:2.39.9000-10.fc41.x86_64 >>> Running trigger-install scriptlet: info-0:7.1-2.fc40.x86_64 >>> Stop trigger-install scriptlet: info-0:7.1-2.fc40.x86_64 Finish: installing minimal buildroot with dnf5 Start: creating root cache Finish: creating root cache Finish: chroot init INFO: Installed packages: INFO: alternatives-1.26-3.fc40.x86_64 ansible-srpm-macros-1-14.fc40.noarch audit-libs-4.0.1-1.fc41.x86_64 authselect-1.5.0-5.fc41.x86_64 authselect-libs-1.5.0-5.fc41.x86_64 basesystem-11-20.fc40.noarch bash-5.2.26-3.fc40.x86_64 binutils-2.42.50-6.fc41.x86_64 binutils-gold-2.42.50-6.fc41.x86_64 bzip2-1.0.8-18.fc40.x86_64 bzip2-libs-1.0.8-18.fc40.x86_64 ca-certificates-2023.2.62_v7.0.401-6.fc40.noarch coreutils-9.5-1.fc41.x86_64 coreutils-common-9.5-1.fc41.x86_64 cpio-2.15-1.fc40.x86_64 cracklib-2.9.11-5.fc40.x86_64 crypto-policies-20240320-1.git58e3d95.fc41.noarch curl-8.7.1-1.fc41.x86_64 cyrus-sasl-lib-2.1.28-19.fc40.x86_64 debugedit-5.0-14.fc40.x86_64 diffutils-3.10-5.fc40.x86_64 dwz-0.15-6.fc40.x86_64 ed-1.20.1-1.fc41.x86_64 efi-srpm-macros-5-11.fc40.noarch elfutils-0.191-5.fc41.x86_64 elfutils-debuginfod-client-0.191-5.fc41.x86_64 elfutils-default-yama-scope-0.191-5.fc41.noarch elfutils-libelf-0.191-5.fc41.x86_64 elfutils-libs-0.191-5.fc41.x86_64 fedora-gpg-keys-41-0.1.noarch fedora-release-41-0.6.noarch fedora-release-common-41-0.6.noarch fedora-release-identity-basic-41-0.6.noarch fedora-repos-41-0.1.noarch fedora-repos-rawhide-41-0.1.noarch file-5.45-5.fc41.x86_64 file-libs-5.45-5.fc41.x86_64 filesystem-3.18-8.fc40.x86_64 findutils-4.9.0-8.fc40.x86_64 fonts-srpm-macros-2.0.5-14.fc40.noarch forge-srpm-macros-0.3.1-1.fc41.noarch fpc-srpm-macros-1.3-12.fc40.noarch gawk-5.3.0-3.fc40.x86_64 gdb-minimal-14.2-1.fc41.x86_64 gdbm-1.23-6.fc40.x86_64 gdbm-libs-1.23-6.fc40.x86_64 ghc-srpm-macros-1.9.1-1.fc41.noarch glibc-2.39.9000-10.fc41.x86_64 glibc-common-2.39.9000-10.fc41.x86_64 glibc-gconv-extra-2.39.9000-10.fc41.x86_64 glibc-minimal-langpack-2.39.9000-10.fc41.x86_64 gmp-6.3.0-1.fc41.x86_64 gnat-srpm-macros-6-5.fc40.noarch go-srpm-macros-3.5.0-1.fc41.noarch gpg-pubkey-a15b79cc-63d04c2c gpg-pubkey-e99d6ad1-64d2612c grep-3.11-7.fc40.x86_64 gzip-1.13-1.fc40.x86_64 info-7.1-2.fc40.x86_64 jansson-2.13.1-9.fc40.x86_64 kernel-srpm-macros-1.0-23.fc41.noarch keyutils-libs-1.6.3-3.fc40.x86_64 krb5-libs-1.21.2-5.fc40.x86_64 libacl-2.3.2-1.fc40.x86_64 libarchive-3.7.2-3.fc41.x86_64 libattr-2.5.2-3.fc40.x86_64 libblkid-2.40-13.fc41.x86_64 libbrotli-1.1.0-3.fc40.x86_64 libcap-2.69-8.fc41.x86_64 libcap-ng-0.8.5-1.fc41.x86_64 libcom_err-1.47.0-5.fc40.x86_64 libcurl-8.7.1-1.fc41.x86_64 libeconf-0.6.2-1.fc41.x86_64 libevent-2.1.12-12.fc40.x86_64 libfdisk-2.40-13.fc41.x86_64 libffi-3.4.6-1.fc41.x86_64 libgcc-14.0.1-0.13.fc41.x86_64 libgomp-14.0.1-0.13.fc41.x86_64 libidn2-2.3.7-1.fc40.x86_64 libmount-2.40-13.fc41.x86_64 libnghttp2-1.61.0-1.fc41.x86_64 libnsl2-2.0.1-1.fc40.x86_64 libpkgconf-2.1.0-1.fc40.x86_64 libpsl-0.21.5-3.fc40.x86_64 libpwquality-1.4.5-9.fc40.x86_64 libselinux-3.6-4.fc40.x86_64 libsemanage-3.6-3.fc40.x86_64 libsepol-3.6-3.fc40.x86_64 libsmartcols-2.40-13.fc41.x86_64 libssh-0.10.6-6.fc41.x86_64 libssh-config-0.10.6-6.fc41.noarch libstdc++-14.0.1-0.13.fc41.x86_64 libtasn1-4.19.0-6.fc40.x86_64 libtirpc-1.3.4-1.rc3.fc41.x86_64 libtool-ltdl-2.4.7-10.fc40.x86_64 libunistring-1.1-7.fc41.x86_64 libutempter-1.2.1-13.fc40.x86_64 libuuid-2.40-13.fc41.x86_64 libverto-0.3.2-8.fc40.x86_64 libxcrypt-4.4.36-5.fc40.x86_64 libxml2-2.12.6-1.fc41.x86_64 libzstd-1.5.6-1.fc41.x86_64 lua-libs-5.4.6-5.fc40.x86_64 lua-srpm-macros-1-13.fc40.noarch lz4-libs-1.9.4-6.fc40.x86_64 mpfr-4.2.1-3.fc40.x86_64 ncurses-base-6.4-12.20240127.fc40.noarch ncurses-libs-6.4-12.20240127.fc40.x86_64 ocaml-srpm-macros-9-3.fc40.noarch openblas-srpm-macros-2-17.fc41.noarch openldap-2.6.7-1.fc40.x86_64 openssl-libs-3.2.1-6.fc41.x86_64 p11-kit-0.25.3-4.fc40.x86_64 p11-kit-trust-0.25.3-4.fc40.x86_64 package-notes-srpm-macros-0.5-11.fc40.noarch pam-1.6.1-1.fc41.x86_64 pam-libs-1.6.1-1.fc41.x86_64 patch-2.7.6-24.fc40.x86_64 pcre2-10.43-1.fc41.x86_64 pcre2-syntax-10.43-1.fc41.noarch perl-srpm-macros-1-53.fc40.noarch pkgconf-2.1.0-1.fc40.x86_64 pkgconf-m4-2.1.0-1.fc40.noarch pkgconf-pkg-config-2.1.0-1.fc40.x86_64 popt-1.19-6.fc40.x86_64 publicsuffix-list-dafsa-20240107-3.fc40.noarch pyproject-srpm-macros-1.12.0-1.fc40.noarch python-srpm-macros-3.12-9.fc41.noarch qt5-srpm-macros-5.15.13-1.fc41.noarch qt6-srpm-macros-6.7.0-1.fc41.noarch readline-8.2-8.fc40.x86_64 redhat-rpm-config-287-1.fc41.noarch rpm-4.19.1.1-1.fc40.x86_64 rpm-build-4.19.1.1-1.fc40.x86_64 rpm-build-libs-4.19.1.1-1.fc40.x86_64 rpm-libs-4.19.1.1-1.fc40.x86_64 rpm-sequoia-1.6.0-2.fc40.x86_64 rust-srpm-macros-26.2-1.fc41.noarch sed-4.9-1.fc40.x86_64 setup-2.14.5-2.fc40.noarch shadow-utils-4.15.1-2.fc41.x86_64 sqlite-libs-3.45.2-1.fc41.x86_64 systemd-libs-255.4-1.fc41.x86_64 tar-1.35-3.fc40.x86_64 unzip-6.0-63.fc40.x86_64 util-linux-2.40-13.fc41.x86_64 util-linux-core-2.40-13.fc41.x86_64 which-2.21-41.fc40.x86_64 xxhash-libs-0.8.2-2.fc40.x86_64 xz-5.4.6-3.fc41.x86_64 xz-libs-5.4.6-3.fc41.x86_64 zig-srpm-macros-1-2.fc40.noarch zip-3.0-40.fc40.x86_64 zlib-ng-compat-2.1.6-2.fc40.x86_64 zstd-1.5.6-1.fc41.x86_64 Start: buildsrpm Start: rpmbuild -bs warning: %patchN is deprecated (2 usages found), use %patch N (or %patch -P N) Building target platforms: x86_64 Building for target x86_64 setting SOURCE_DATE_EPOCH=1554595200 Wrote: /builddir/build/SRPMS/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.src.rpm RPM build warnings: %patchN is deprecated (2 usages found), use %patch N (or %patch -P N) Finish: rpmbuild -bs cp: preserving permissions for ‘/var/lib/copr-rpmbuild/results/chroot_scan/var/lib/mock/fedora-rawhide-x86_64-1712885339.434030/root/var/log’: No such file or directory INFO: chroot_scan: 1 files copied to /var/lib/copr-rpmbuild/results/chroot_scan INFO: /var/lib/mock/fedora-rawhide-x86_64-1712885339.434030/root/var/log/dnf5.log Finish: buildsrpm INFO: Done(/var/lib/copr-rpmbuild/workspace/workdir-xczt64da/pytorch/pytorch.spec) Config(child) 0 minutes 26 seconds INFO: Results and/or logs in: /var/lib/copr-rpmbuild/results INFO: Cleaning up build root ('cleanup_on_success=True') Start: clean chroot INFO: unmounting tmpfs. Finish: clean chroot INFO: Start(/var/lib/copr-rpmbuild/results/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.src.rpm) Config(fedora-rawhide-x86_64) Start(bootstrap): chroot init INFO: mounting tmpfs at /var/lib/mock/fedora-rawhide-x86_64-bootstrap-1712885339.434030/root. INFO: reusing tmpfs at /var/lib/mock/fedora-rawhide-x86_64-bootstrap-1712885339.434030/root. INFO: calling preinit hooks INFO: enabled root cache INFO: enabled package manager cache Start(bootstrap): cleaning package manager metadata Finish(bootstrap): cleaning package manager metadata Finish(bootstrap): chroot init Start: chroot init INFO: mounting tmpfs at /var/lib/mock/fedora-rawhide-x86_64-1712885339.434030/root. INFO: calling preinit hooks INFO: enabled root cache Start: unpacking root cache Finish: unpacking root cache INFO: enabled package manager cache Start: cleaning package manager metadata Finish: cleaning package manager metadata INFO: enabled HW Info plugin INFO: Buildroot is handled by package management downloaded with a bootstrap image: rpm-4.19.1.1-1.fc40.x86_64 rpm-sequoia-1.6.0-2.fc40.x86_64 python3-dnf-4.19.2-1.fc41.noarch yum-4.19.2-1.fc41.noarch dnf5-5.1.17-1.fc41.x86_64 dnf5-plugins-5.1.17-1.fc41.x86_64 Finish: chroot init Start: build phase for pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.src.rpm Start: build setup for pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.src.rpm warning: %patchN is deprecated (2 usages found), use %patch N (or %patch -P N) Building target platforms: x86_64 Building for target x86_64 setting SOURCE_DATE_EPOCH=1554595200 Wrote: /builddir/build/SRPMS/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.src.rpm RPM build warnings: %patchN is deprecated (2 usages found), use %patch N (or %patch -P N) Updating and loading repositories: Additional repo http_developer_downloa 100% | 129.0 KiB/s | 3.5 KiB | 00m00s Additional repo copr_rezso_CUDA 100% | 101.3 KiB/s | 1.8 KiB | 00m00s Additional repo http_developer_downloa 100% | 580.9 KiB/s | 3.5 KiB | 00m00s fedora 100% | 41.6 KiB/s | 8.2 KiB | 00m00s Additional repo http_developer_downloa 100% | 1.1 MiB/s | 3.5 KiB | 00m00s Copr repository 100% | 152.5 KiB/s | 1.8 KiB | 00m00s Repositories loaded. Package Arch Version Repository Size Installing: asmjit-devel x86_64 1:0-20220702.1.gitc5984762.fc40 copr_base 1.5 MiB cpuinfo-devel x86_64 1:0-20240327.0.gitf42f5eaf.fc41 copr_base 79.6 KiB cuda-cudart-devel-12-3 x86_64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 6.6 MiB cuda-cupti-12-3 x86_64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 108.2 MiB cuda-driver-devel-12-3 x86_64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 125.1 KiB cuda-gcc-12-c++ x86_64 12.3.1-1.fc39 copr_base 60.3 MiB cuda-nvcc-12-3 x86_64 12.3.107-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 194.8 MiB cuda-nvml-devel-12-3 x86_64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 664.9 KiB cuda-nvrtc-devel-12-3 x86_64 12.3.107-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 78.1 MiB cuda-nvtx-12-3 x86_64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 404.7 KiB cuda-profiler-api-12-3 x86_64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 71.4 KiB cutlass-devel x86_64 3.4.1-20240215.0.cu12_3.fc41 copr_base 12.1 MiB doxygen x86_64 2:1.10.0-3.fc40 fedora 18.1 MiB eigen3-devel noarch 3.4.0-15.fc40 fedora 8.4 MiB fbgemm-devel x86_64 0.7.0-20240315.0.git0049a2ca.fc41 copr_base 304.4 KiB fftw-devel x86_64 3.3.10-12.fc41 fedora 284.2 KiB flatbuffers-compiler x86_64 24.3.25-1.fc41 fedora 3.0 MiB flatbuffers-devel x86_64 24.3.25-1.fc41 fedora 469.5 KiB foxi-devel x86_64 1.4.1^git20210526.c278588-2.fc41 fedora 119.6 KiB fp16-devel x86_64 1:0-20240410.0.git581ac1c7.fc41 copr_base 30.4 KiB fxdiv-devel noarch 1:0-20201208.1.git63058eff.fc40 copr_base 16.9 KiB gcc-c++ x86_64 14.0.1-0.13.fc41 fedora 38.1 MiB gemmlowp-devel noarch 0-20231104.0.git16e8662c.fc40 copr_base 2.3 MiB gflags-devel x86_64 2.2.2-14.fc40 fedora 62.3 KiB git x86_64 2.44.0-1.fc41 fedora 85.2 KiB glog-devel x86_64 0.3.5-20.fc40 fedora 112.0 KiB gloo-devel x86_64 1:0.5.0-20240302.0.git2565674c.cu12_3.fc41 copr_base 328.5 KiB gmp-devel x86_64 1:6.3.0-1.fc41 fedora 352.3 KiB hiredis-devel x86_64 1.0.2-7.fc40 fedora 118.4 KiB kineto-devel x86_64 0.4.0-20240327.0.git445909a8.cu12_3.fc41 copr_base 49.6 KiB leveldb-devel x86_64 1.23-9.fc40 fedora 137.6 KiB libcublas-devel-12-3 x86_64 12.3.4.1-2 copr_rezso_CUDA 1.1 MiB libcudnn8-devel x86_64 8.9.7.29-2.cuda12.3 copr_rezso_CUDA 199.2 KiB libcufft-devel-12-3 x86_64 11.0.12.1-2 copr_rezso_CUDA 130.2 KiB libcurand-devel-12-3 x86_64 10.3.4.107-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 93.9 MiB libcusolver-devel-12-3 x86_64 11.5.4.101-2 copr_rezso_CUDA 445.1 KiB libcusparse-devel-12-3 x86_64 12.2.0.103-2 copr_rezso_CUDA 255.3 MiB libnccl-devel x86_64 2.21.5-1+cuda12.4 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 45.3 KiB libnvjitlink-devel-12-3 x86_64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 60.7 MiB libuv-devel x86_64 1:1.48.0-1.fc40 fedora 206.1 KiB libzstd-devel x86_64 1.5.6-1.fc41 fedora 202.4 KiB lmdb-devel x86_64 0.9.32-1.fc40 fedora 72.5 KiB magma-devel x86_64 2.8.0-20240328.0.cu12_3.fc41 copr_base 21.8 MiB mesa-libGLU-devel x86_64 9.0.3-4.fc40 fedora 17.0 KiB miniz-devel x86_64 3.0.2-5.fc40 fedora 102.7 KiB mpfr-devel x86_64 4.2.1-3.fc40 fedora 62.8 KiB neon2sse-devel noarch 0-20230131.0.git097a5eca.fc38 copr_base 802.0 KiB nnpack-devel x86_64 0-20230201.0.git70a77f48.fc38 copr_base 42.7 KiB numactl-devel x86_64 2.0.16-5.fc40 fedora 25.9 KiB ocl-icd-devel x86_64 2.3.2-5.fc40 fedora 239.4 KiB onnx-devel x86_64 1.17.0-20240403.0.gitfa0b8999.fc41 copr_base 1.0 MiB onnx-optimizer-devel x86_64 0.3.19-20240303.0.gitb3a46118.fc41 copr_base 193.4 KiB openblas-devel x86_64 0.3.26-4.fc40 fedora 1.7 MiB openblas-openmp x86_64 0.3.26-4.fc40 fedora 38.9 MiB opencv-devel x86_64 4.9.0-20231227.1.cu12_3.fc40 copr_base 10.8 MiB peachpy-python3 noarch 0-20221113.1.git349e8f83.fc39 copr_base 13.2 MiB protobuf-compat-compiler x86_64 3.21.9-2.fc39 copr_base 3.1 MiB protobuf-compat-devel x86_64 3.21.9-2.fc39 copr_base 2.7 MiB psimd-devel noarch 1:0-20200517.2.git072586a7.fc40 copr_base 45.6 KiB pthreadpool-devel x86_64 1:0.1-20240121.0.git178e3e06.fc40 copr_base 100.5 KiB pybind11-devel x86_64 2.12.0-1.fc41 fedora 825.8 KiB python3-devel x86_64 3.12.2-3.fc41 fedora 1.2 MiB python3-numpy x86_64 1:1.26.4-2.fc41 fedora 43.9 MiB python3-pybind11 x86_64 2.12.0-1.fc41 fedora 873.8 KiB python3-pyyaml x86_64 6.0.1-14.fc40 fedora 786.4 KiB python3-setuptools noarch 69.2.0-1.fc41 fedora 7.2 MiB python3-six noarch 1.16.0-14.fc40 fedora 117.7 KiB python3-typing-extensions noarch 4.11.0-1.fc41 fedora 420.1 KiB qnnpack-devel x86_64 0-20190828.2.git7d2a4e99.fc38 copr_base 17.9 KiB rdma-core-devel x86_64 51.0-2.fc41 fedora 618.1 KiB rocksdb-devel x86_64 8.10.0-3.fc40 fedora 1.4 MiB sleef-devel x86_64 3.6-20240320.0.git60e76d2b.fc41 copr_base 278.2 KiB snappy-devel x86_64 1.1.10-4.fc40 fedora 45.2 KiB tbb-devel x86_64 2021.11.0-5.fc40 fedora 1.3 MiB tensorpipe-devel x86_64 0-20220513.1.gitbb1473a4.fc37 copr_base 489.8 KiB zeromq-devel x86_64 4.3.5-16.fc40 fedora 30.5 KiB Installing dependencies: MUMPS x86_64 5.6.2-4.fc41 fedora 9.5 MiB MUMPS-common noarch 5.6.2-4.fc41 fedora 948.0 KiB SuperLU x86_64 6.0.1-5.fc41 fedora 474.3 KiB abattis-cantarell-vf-fonts noarch 0.301-12.fc40 fedora 192.7 KiB adobe-mappings-cmap noarch 20230622-3.fc40 fedora 14.4 MiB adobe-mappings-cmap-deprecated noarch 20230622-3.fc40 fedora 582.1 KiB adobe-mappings-pdf noarch 20190401-7.fc40 fedora 4.4 MiB alsa-lib x86_64 1.2.11-2.fc40 fedora 1.4 MiB annobin-docs noarch 12.48-1.fc41 fedora 95.7 KiB annobin-plugin-gcc x86_64 12.48-1.fc41 fedora 970.4 KiB armadillo x86_64 12.8.1-1.fc41 fedora 90.3 KiB arpack x86_64 3.9.1-3.fc40 fedora 646.0 KiB asl x86_64 20240106-1.20240201git2f5d9de.fc41 fedora 2.2 MiB asmjit x86_64 1:0-20220702.1.gitc5984762.fc40 copr_base 433.3 KiB avahi-libs x86_64 0.8-26.fc40 fedora 166.3 KiB blosc x86_64 1.21.5-4.fc40 fedora 122.1 KiB cairo x86_64 1.18.0-3.fc40 fedora 1.7 MiB cairo-gobject x86_64 1.18.0-3.fc40 fedora 35.2 KiB cdparanoia-libs x86_64 10.2-44.fc40 fedora 113.7 KiB ceres-solver x86_64 2.2.0-4.fc40 fedora 5.2 MiB cfitsio x86_64 4.4.0-2.fc41 fedora 1.8 MiB cgnslib-libs x86_64 4.4.0-4.fc40 fedora 802.5 KiB cjson x86_64 1.7.17-1.fc41 fedora 64.0 KiB cliquer-libs x86_64 1.22-8.fc40 fedora 67.7 KiB cmake x86_64 3.28.3-1.fc41 fedora 31.5 MiB cmake-data noarch 3.28.3-1.fc41 fedora 8.0 MiB cmake-filesystem x86_64 3.28.3-1.fc41 fedora 0.0 B cmake-rpm-macros noarch 3.28.3-1.fc41 fedora 7.5 KiB codec2 x86_64 1.2.0-4.fc40 fedora 1.3 MiB coin-or-Cbc x86_64 2.10.11-2.fc41 fedora 2.4 MiB coin-or-Cgl x86_64 0.60.8-1.fc41 fedora 1.0 MiB coin-or-Clp x86_64 1.17.9-1.fc41 fedora 2.5 MiB coin-or-CoinUtils x86_64 2.11.10-1.fc41 fedora 1.2 MiB coin-or-Osi x86_64 0.108.9-2.fc41 fedora 5.7 MiB cpp x86_64 14.0.1-0.13.fc41 fedora 34.9 MiB cpuinfo x86_64 1:0-20240327.0.gitf42f5eaf.fc41 copr_base 125.7 KiB crypto-policies-scripts noarch 20240320-1.git58e3d95.fc41 fedora 323.2 KiB cuda-cccl-12-3 x86_64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 13.8 MiB cuda-crt-12-3 x86_64 12.3.107-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 1.0 MiB cuda-cudart-12-3 x86_64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 747.4 KiB cuda-gcc-12 x86_64 12.3.1-1.fc39 copr_base 114.6 MiB cuda-nvrtc-12-3 x86_64 12.3.107-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 64.4 MiB cuda-nvvm-12-3 x86_64 12.3.107-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 63.1 MiB cuda-toolkit-12-3-config-common noarch 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 0.0 B cuda-toolkit-12-config-common noarch 12.4.127-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 42.0 B cuda-toolkit-config-common noarch 12.4.127-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_sbsa 39.0 B cups-libs x86_64 1:2.4.7-13.fc41 fedora 618.8 KiB cutlass x86_64 3.4.1-20240215.0.cu12_3.fc41 copr_base 1.0 GiB dbus x86_64 1:1.14.10-3.fc40 fedora 0.0 B dbus-broker x86_64 35-4.fc40 fedora 378.7 KiB dbus-common noarch 1:1.14.10-3.fc40 fedora 11.2 KiB dbus-libs x86_64 1:1.14.10-3.fc40 fedora 368.9 KiB default-fonts-core-sans noarch 4.0-12.fc40 fedora 11.9 KiB double-conversion x86_64 3.3.0-3.fc40 fedora 96.6 KiB duktape x86_64 2.7.0-7.fc40 fedora 616.2 KiB emacs-filesystem noarch 1:29.3-5.fc41 fedora 0.0 B expat x86_64 2.6.2-1.fc41 fedora 280.8 KiB fbgemm x86_64 0.7.0-20240315.0.git0049a2ca.fc41 copr_base 11.4 MiB fdk-aac-free x86_64 2.0.0-13.fc40 fedora 603.2 KiB fftw x86_64 3.3.10-12.fc41 fedora 182.7 KiB fftw-libs x86_64 3.3.10-12.fc41 fedora 0.0 B fftw-libs-double x86_64 3.3.10-12.fc41 fedora 3.4 MiB fftw-libs-long x86_64 3.3.10-12.fc41 fedora 1.5 MiB fftw-libs-quad x86_64 3.3.10-12.fc41 fedora 2.5 MiB fftw-libs-single x86_64 3.3.10-12.fc41 fedora 3.6 MiB flatbuffers x86_64 24.3.25-1.fc41 fedora 528.6 KiB flexiblas x86_64 3.4.2-1.fc41 fedora 46.9 KiB flexiblas-netlib x86_64 3.4.2-1.fc41 fedora 10.4 MiB flexiblas-netlib64 x86_64 3.4.2-1.fc41 fedora 10.5 MiB flexiblas-openblas-openmp x86_64 3.4.2-1.fc41 fedora 39.3 KiB flexiblas-openblas-openmp64 x86_64 3.4.2-1.fc41 fedora 39.3 KiB fontconfig x86_64 2.15.0-4.fc40 fedora 767.3 KiB fonts-filesystem noarch 1:2.0.5-14.fc40 fedora 0.0 B foxi x86_64 1.4.1^git20210526.c278588-2.fc41 fedora 16.1 KiB fp16 x86_64 1:0-20240410.0.git581ac1c7.fc41 copr_base 17.4 KiB freetype x86_64 2.13.2-5.fc40 fedora 842.6 KiB freexl x86_64 2.0.0-7.fc41 fedora 89.4 KiB fribidi x86_64 1.0.13-4.fc40 fedora 365.3 KiB game-music-emu x86_64 0.6.3-14.fc40 fedora 326.7 KiB gc x86_64 8.2.2-6.fc40 fedora 258.7 KiB gcc x86_64 14.0.1-0.13.fc41 fedora 103.8 MiB gcc-plugin-annobin x86_64 14.0.1-0.13.fc41 fedora 57.1 KiB gd x86_64 2.3.3-16.fc41 fedora 399.7 KiB gdal-libs x86_64 3.8.5-1.fc41 fedora 26.8 MiB gdk-pixbuf2 x86_64 2.42.10-8.fc40 fedora 2.5 MiB gdk-pixbuf2-modules x86_64 2.42.10-8.fc40 fedora 252.1 KiB geos x86_64 3.12.1-3.fc40 fedora 3.5 MiB gflags x86_64 2.2.2-14.fc40 fedora 293.5 KiB giflib x86_64 5.2.2-1.fc41 fedora 112.2 KiB git-core x86_64 2.44.0-1.fc41 fedora 20.8 MiB git-core-doc noarch 2.44.0-1.fc41 fedora 16.8 MiB gklib x86_64 5.1.1-20230326.0.git8bd6bad7.fc39 copr_base 284.8 KiB gl-manpages noarch 1.1-31.20190306.fc40 fedora 935.5 KiB glib2 x86_64 2.80.0-1.fc41 fedora 14.4 MiB glibc-devel x86_64 2.39.9000-10.fc41 fedora 36.8 KiB glibc-headers-x86 noarch 2.39.9000-10.fc41 fedora 2.2 MiB glog x86_64 0.3.5-20.fc40 fedora 148.2 KiB gloo x86_64 1:0.5.0-20240302.0.git2565674c.cu12_3.fc41 copr_base 3.8 MiB glpk x86_64 5.0-11.fc40 fedora 870.7 KiB glx-utils x86_64 9.0.0-6.fc40 fedora 431.1 KiB gmp-c++ x86_64 1:6.3.0-1.fc41 fedora 31.8 KiB gnupg2 x86_64 2.4.5-1.fc41 fedora 9.5 MiB gnutls x86_64 3.8.5-1.fc41 fedora 3.2 MiB google-droid-sans-fonts noarch 20200215-19.fc40 fedora 6.3 MiB google-noto-fonts-common noarch 20240401-1.fc41 fedora 17.5 KiB google-noto-sans-vf-fonts noarch 20240401-1.fc41 fedora 1.2 MiB gpgme x86_64 1.23.2-3.fc40 fedora 575.3 KiB gpgmepp x86_64 1.23.2-3.fc40 fedora 424.2 KiB graphene x86_64 1.10.6-8.fc40 fedora 162.5 KiB graphite2 x86_64 1.3.14-15.fc40 fedora 192.0 KiB graphviz x86_64 10.0.1-1.fc41 fedora 21.1 MiB groff-base x86_64 1.23.0-6.fc40 fedora 3.8 MiB gsm x86_64 1.0.22-6.fc40 fedora 68.8 KiB gstreamer1 x86_64 1.24.0-1.fc41 fedora 6.1 MiB gstreamer1-plugins-base x86_64 1.24.0-1.fc41 fedora 7.2 MiB gts x86_64 0.7.6-48.20121130.fc40 fedora 650.3 KiB guile30 x86_64 3.0.7-12.fc40 fedora 51.5 MiB halide x86_64 17.0.1-20240220.0.fc41 copr_base 131.8 MiB harfbuzz x86_64 8.4.0-1.fc41 fedora 2.6 MiB hdf-libs x86_64 4.2.16.2-1.fc40 fedora 683.5 KiB hdf5 x86_64 1.12.1-15.fc40 fedora 8.4 MiB highway x86_64 1.1.0-1.fc41 fedora 3.2 MiB hiredis x86_64 1.0.2-7.fc40 fedora 82.5 KiB hwdata noarch 0.381-1.fc41 fedora 9.1 MiB hwloc-libs x86_64 2.10.0-3.fc40 fedora 2.8 MiB ilbc x86_64 3.0.4-10.fc40 fedora 87.5 KiB imath x86_64 3.1.11-1.fc41 fedora 368.0 KiB infiniband-diags x86_64 51.0-2.fc41 fedora 997.5 KiB isl x86_64 0.16.1-20.fc40 fedora 3.0 MiB iso-codes noarch 4.16.0-3.fc40 fedora 18.8 MiB jbig2dec-libs x86_64 0.20-4.fc40 fedora 169.0 KiB jbigkit-libs x86_64 2.1-29.fc40 fedora 117.6 KiB json-c x86_64 0.17-3.fc40 fedora 82.4 KiB jsoncpp x86_64 1.9.5-7.fc40 fedora 253.4 KiB kernel-headers x86_64 6.9.0-0.rc3.30.fc41 fedora 6.3 MiB keyutils-libs-devel x86_64 1.6.3-3.fc40 fedora 48.2 KiB kineto x86_64 0.4.0-20240327.0.git445909a8.cu12_3.fc41 copr_base 763.3 KiB kmod-libs x86_64 31-5.fc40 fedora 143.2 KiB krb5-devel x86_64 1.21.2-5.fc40 fedora 706.6 KiB lame-libs x86_64 3.100-17.fc40 fedora 1.2 MiB lasi x86_64 1.1.3-13.fc40 fedora 130.8 KiB lcms2 x86_64 2.16-3.fc40 fedora 420.9 KiB less x86_64 643-4.fc40 fedora 368.6 KiB leveldb x86_64 1.23-9.fc40 fedora 347.8 KiB libGLEW x86_64 2.2.0-7.fc40 fedora 748.3 KiB libICE x86_64 1.1.1-3.fc40 fedora 181.2 KiB libSM x86_64 1.2.4-3.fc40 fedora 97.3 KiB libX11 x86_64 1.8.9-1.fc41 fedora 1.3 MiB libX11-common noarch 1.8.9-1.fc41 fedora 1.1 MiB libX11-devel x86_64 1.8.9-1.fc41 fedora 1.0 MiB libX11-xcb x86_64 1.8.9-1.fc41 fedora 15.0 KiB libXau x86_64 1.0.11-6.fc40 fedora 66.9 KiB libXau-devel x86_64 1.0.11-6.fc40 fedora 6.4 KiB libXcursor x86_64 1.2.2-1.fc41 fedora 49.5 KiB libXext x86_64 1.3.6-1.fc40 fedora 90.1 KiB libXfixes x86_64 6.0.1-3.fc40 fedora 30.3 KiB libXft x86_64 2.3.8-6.fc40 fedora 164.5 KiB libXi x86_64 1.8.1-5.fc40 fedora 80.7 KiB libXpm x86_64 3.5.17-3.fc40 fedora 148.4 KiB libXrender x86_64 0.9.11-6.fc40 fedora 50.1 KiB libXt x86_64 1.3.0-3.fc40 fedora 425.9 KiB libXv x86_64 1.0.12-3.fc40 fedora 26.1 KiB libXxf86vm x86_64 1.1.5-6.fc40 fedora 25.3 KiB libaec x86_64 1.1.2-1.fc40 fedora 94.1 KiB libaom x86_64 3.8.2-1.fc41 fedora 5.0 MiB libarrow x86_64 15.0.2-3.fc41 fedora 21.8 MiB libarrow-doc noarch 15.0.2-3.fc41 fedora 115.4 KiB libassuan x86_64 2.5.7-1.fc41 fedora 163.8 KiB libavcodec-free x86_64 6.1.1-11.fc41 fedora 10.4 MiB libavformat-free x86_64 6.1.1-11.fc41 fedora 2.4 MiB libavif x86_64 1.0.4-1.fc41 fedora 183.8 KiB libavutil-free x86_64 6.1.1-11.fc41 fedora 914.4 KiB libb2 x86_64 0.98.1-11.fc40 fedora 42.2 KiB libbluray x86_64 1.3.4-6.fc41 fedora 389.8 KiB libcbor x86_64 0.11.0-1.fc40 fedora 73.9 KiB libchromaprint x86_64 1.5.1-17.fc40 fedora 68.6 KiB libcom_err-devel x86_64 1.47.0-5.fc40 fedora 16.7 KiB libcublas-12-3 x86_64 12.3.4.1-2 copr_rezso_CUDA 596.8 MiB libcudnn8 x86_64 8.9.7.29-2.cuda12.3 copr_rezso_CUDA 1.0 GiB libcufft-12-3 x86_64 11.0.12.1-2 copr_rezso_CUDA 170.6 MiB libcurand-12-3 x86_64 10.3.4.107-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 91.9 MiB libcusolver-12-3 x86_64 11.5.4.101-2 copr_rezso_CUDA 189.5 MiB libcusparse-12-3 x86_64 12.2.0.103-2 copr_rezso_CUDA 254.9 MiB libdatrie x86_64 0.2.13-9.fc40 fedora 57.9 KiB libdav1d x86_64 1.4.0-1.fc41 fedora 1.7 MiB libdc1394 x86_64 2.2.7-5.fc40 fedora 347.0 KiB libdeflate x86_64 1.20-4.fc41 fedora 116.6 KiB libdicom x86_64 1.1.0-2.fc41 fedora 502.4 KiB libdrm x86_64 2.4.120-3.fc40 fedora 401.9 KiB libedit x86_64 3.1-50.20230828cvs.fc40 fedora 243.9 KiB libevdev x86_64 1.13.1-4.fc40 fedora 86.1 KiB libfido2 x86_64 1.14.0-4.fc40 fedora 237.8 KiB libgcrypt x86_64 1.10.3-4.fc41 fedora 1.3 MiB libgeotiff x86_64 1.7.1-13.fc41 fedora 311.9 KiB libgfortran x86_64 14.0.1-0.13.fc41 fedora 2.9 MiB libglvnd x86_64 1:1.7.0-4.fc40 fedora 530.3 KiB libglvnd-core-devel x86_64 1:1.7.0-4.fc40 fedora 40.3 KiB libglvnd-devel x86_64 1:1.7.0-4.fc40 fedora 2.1 MiB libglvnd-egl x86_64 1:1.7.0-4.fc40 fedora 68.8 KiB libglvnd-gles x86_64 1:1.7.0-4.fc40 fedora 106.2 KiB libglvnd-glx x86_64 1:1.7.0-4.fc40 fedora 605.4 KiB libglvnd-opengl x86_64 1:1.7.0-4.fc40 fedora 148.8 KiB libgpg-error x86_64 1.48-1.fc41 fedora 874.4 KiB libgs x86_64 10.03.0-1.fc41 fedora 23.2 MiB libgta x86_64 1.2.1-12.fc40 fedora 70.2 KiB libgudev x86_64 238-5.fc40 fedora 87.9 KiB libharu x86_64 2.4.3-5.fc40 fedora 1.7 MiB libibumad x86_64 51.0-2.fc41 fedora 44.0 KiB libibverbs x86_64 51.0-2.fc41 fedora 1.2 MiB libicu x86_64 74.2-1.fc40 fedora 34.9 MiB libijs x86_64 0.35-22.fc40 fedora 61.6 KiB libimagequant x86_64 4.0.3-3.fc40 fedora 690.3 KiB libinput x86_64 1.25.0-4.fc41 fedora 553.1 KiB libjpeg-turbo x86_64 3.0.2-1.fc40 fedora 776.9 KiB libjxl x86_64 1:0.10.2-3.fc41 fedora 3.3 MiB libkadm5 x86_64 1.21.2-5.fc40 fedora 214.1 KiB libkml x86_64 1.3.0-47.fc40 fedora 1.2 MiB libksba x86_64 1.6.6-1.fc41 fedora 392.9 KiB libldb x86_64 2.9.0-1.fc40 fedora 549.6 KiB liblerc x86_64 4.0.0-6.fc40 fedora 603.5 KiB libmodplug x86_64 1:0.8.9.0-19.fc40 fedora 355.2 KiB libmpc x86_64 1.3.1-5.fc40 fedora 164.7 KiB libnauty x86_64 2.8.8-3.fc40 fedora 4.6 MiB libnccl x86_64 2.21.5-1+cuda12.4 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 230.3 MiB libnl3 x86_64 3.9.0-3.fc40 fedora 1.0 MiB libnpp-12-3 x86_64 12.2.3.2-2 copr_rezso_CUDA 241.2 MiB libnvjitlink-12-3 x86_64 12.3.101-1 http_developer_download_nvidia_com_compute_cuda_repos_rhel8_x86_64 49.8 MiB libogg x86_64 2:1.3.5-8.fc40 fedora 49.4 KiB libopenmpt x86_64 0.7.6-1.fc41 fedora 1.6 MiB liborc2 x86_64 2.0.0-2.fc41 fedora 1.6 MiB libpaper x86_64 1:2.1.1-3.fc40 fedora 48.8 KiB libpciaccess x86_64 0.16-12.fc40 fedora 44.6 KiB libpng x86_64 2:1.6.40-3.fc40 fedora 241.8 KiB libpq x86_64 16.1-4.fc41 fedora 943.8 KiB libproxy x86_64 0.5.5-1.fc41 fedora 111.1 KiB libqhull_r x86_64 1:8.0.2-4.fc40 fedora 475.4 KiB libquadmath x86_64 14.0.1-0.13.fc41 fedora 329.9 KiB librabbitmq x86_64 0.14.0-2.fc41 fedora 93.7 KiB libraw1394 x86_64 2.1.2-20.fc40 fedora 163.3 KiB librdmacm x86_64 51.0-2.fc41 fedora 146.2 KiB librist x86_64 0.2.7-4.fc40 fedora 153.3 KiB librsvg2 x86_64 2.57.1-4.fc40 fedora 4.2 MiB librttopo x86_64 1.1.0-14.fc40 fedora 504.8 KiB libseccomp x86_64 2.5.3-8.fc40 fedora 171.2 KiB libselinux-devel x86_64 3.6-4.fc40 fedora 126.1 KiB libsepol-devel x86_64 3.6-3.fc40 fedora 120.2 KiB libsmbclient x86_64 2:4.20.0-7.fc41 fedora 163.5 KiB libsodium x86_64 1.0.19-4.fc40 fedora 385.0 KiB libsodium-devel x86_64 1.0.19-4.fc40 fedora 3.8 MiB libspatialite x86_64 5.1.0-6.fc41 fedora 15.2 MiB libstdc++-devel x86_64 14.0.1-0.13.fc41 fedora 15.4 MiB libswresample-free x86_64 6.1.1-11.fc41 fedora 147.4 KiB libswscale-free x86_64 6.1.1-11.fc41 fedora 575.2 KiB libtalloc x86_64 2.4.2-1.fc40 fedora 52.7 KiB libtdb x86_64 1.4.10-1.fc40 fedora 97.1 KiB libtevent x86_64 0.16.1-1.fc40 fedora 94.1 KiB libthai x86_64 0.1.29-8.fc40 fedora 783.5 KiB libtheora x86_64 1:1.1.1-36.fc40 fedora 473.6 KiB libtiff x86_64 4.6.0-2.fc40 fedora 1.1 MiB libudfread x86_64 1.1.2-8.fc40 fedora 66.0 KiB libunwind x86_64 1.8.0-3.fc41 fedora 174.9 KiB libunwind-devel x86_64 1.8.0-3.fc41 fedora 139.1 KiB liburing x86_64 2.5-3.fc40 fedora 99.2 KiB libusb1 x86_64 1.0.27-1.fc41 fedora 162.3 KiB libuv x86_64 1:1.48.0-1.fc40 fedora 538.8 KiB libuv-static x86_64 1:1.48.0-1.fc40 fedora 403.4 KiB libva x86_64 2.21.0-3.fc41 fedora 313.3 KiB libvdpau x86_64 1.5-6.fc40 fedora 20.8 KiB libverto-devel x86_64 0.3.2-8.fc40 fedora 25.7 KiB libvisual x86_64 1:0.4.1-4.fc40 fedora 447.4 KiB libvmaf x86_64 2.3.0-7.fc40 fedora 779.6 KiB libvorbis x86_64 1:1.3.7-10.fc40 fedora 829.6 KiB libvpl x86_64 1:2.10.2-1.fc41 fedora 476.2 KiB libvpx x86_64 1.14.0-1.fc40 fedora 3.1 MiB libwacom x86_64 2.10.0-1.fc40 fedora 94.4 KiB libwacom-data noarch 2.10.0-1.fc40 fedora 613.0 KiB libwayland-client x86_64 1.22.0-3.fc40 fedora 58.1 KiB libwayland-cursor x86_64 1.22.0-3.fc40 fedora 37.0 KiB libwayland-egl x86_64 1.22.0-3.fc40 fedora 16.5 KiB libwayland-server x86_64 1.22.0-3.fc40 fedora 78.6 KiB libwbclient x86_64 2:4.20.0-7.fc41 fedora 68.3 KiB libwebp x86_64 1.3.2-5.fc41 fedora 793.6 KiB libxcb x86_64 1.16.1-1.fc41 fedora 1.1 MiB libxcb-devel x86_64 1.16.1-1.fc41 fedora 2.7 MiB libxcrypt-devel x86_64 4.4.36-5.fc40 fedora 30.3 KiB libxkbcommon x86_64 1.7.0-1.fc41 fedora 332.4 KiB libxkbcommon-x11 x86_64 1.7.0-1.fc41 fedora 39.6 KiB libxshmfence x86_64 1.3.2-3.fc40 fedora 15.4 KiB libyaml x86_64 0.2.5-14.fc40 fedora 130.4 KiB llvm17-libs x86_64 17.0.6-7.fc41 fedora 114.2 MiB lmdb x86_64 0.9.32-1.fc40 fedora 74.7 KiB lmdb-libs x86_64 0.9.32-1.fc40 fedora 109.3 KiB lpcnetfreedv x86_64 0.5-5.fc40 fedora 14.8 MiB magma x86_64 2.8.0-20240328.0.cu12_3.fc41 copr_base 234.7 MiB make x86_64 1:4.4.1-6.fc40 fedora 1.8 MiB mariadb-connector-c x86_64 3.3.8-3.fc40 fedora 513.4 KiB mariadb-connector-c-config noarch 3.3.8-3.fc40 fedora 497.0 B mbedtls x86_64 2.28.8-1.fc41 fedora 1.1 MiB mesa-filesystem x86_64 24.0.4-1.fc41 fedora 3.6 KiB mesa-libEGL x86_64 24.0.4-1.fc41 fedora 279.7 KiB mesa-libGL x86_64 24.0.4-1.fc41 fedora 453.8 KiB mesa-libGLU x86_64 9.0.3-4.fc40 fedora 353.8 KiB mesa-libgbm x86_64 24.0.4-1.fc41 fedora 65.3 KiB mesa-libglapi x86_64 24.0.4-1.fc41 fedora 168.4 KiB metis x86_64 5.2.1-20230403.0.gite0f1b88b.fc39 copr_base 524.3 KiB miniz x86_64 3.0.2-5.fc40 fedora 128.0 KiB minizip-ng-compat x86_64 3.0.10-8.fc41 fedora 158.5 KiB mpdecimal x86_64 2.5.1-9.fc40 fedora 200.9 KiB mpg123-libs x86_64 1.31.3-4.fc40 fedora 787.3 KiB mtdev x86_64 1.1.6-8.fc40 fedora 25.3 KiB ncurses x86_64 6.4-12.20240127.fc40 fedora 621.0 KiB netcdf x86_64 4.9.2-5.fc40 fedora 2.4 MiB netpbm x86_64 11.02.00-6.fc40 fedora 573.1 KiB nettle x86_64 3.9.1-6.fc40 fedora 790.1 KiB nnpack x86_64 0-20230201.0.git70a77f48.fc38 copr_base 146.8 KiB npth x86_64 1.7-1.fc41 fedora 49.4 KiB nspr x86_64 4.35.0-22.fc41 fedora 312.7 KiB nss x86_64 3.99.0-1.fc41 fedora 1.9 MiB nss-softokn x86_64 3.99.0-1.fc41 fedora 1.9 MiB nss-softokn-freebl x86_64 3.99.0-1.fc41 fedora 896.6 KiB nss-sysinit x86_64 3.99.0-1.fc41 fedora 18.2 KiB nss-util x86_64 3.99.0-1.fc41 fedora 226.1 KiB numactl-libs x86_64 2.0.16-5.fc40 fedora 57.0 KiB ocl-icd x86_64 2.3.2-5.fc40 fedora 190.9 KiB ogdi x86_64 4.1.1-1.fc40 fedora 797.2 KiB onnx-libs x86_64 1.17.0-20240403.0.gitfa0b8999.fc41 copr_base 3.1 MiB onnx-optimizer x86_64 0.3.19-20240303.0.gitb3a46118.fc41 copr_base 537.4 KiB openblas x86_64 0.3.26-4.fc40 fedora 96.0 KiB openblas-openmp64 x86_64 0.3.26-4.fc40 fedora 39.1 MiB openblas-openmp64_ x86_64 0.3.26-4.fc40 fedora 39.1 MiB openblas-serial x86_64 0.3.26-4.fc40 fedora 37.5 MiB openblas-serial64 x86_64 0.3.26-4.fc40 fedora 37.7 MiB openblas-serial64_ x86_64 0.3.26-4.fc40 fedora 37.7 MiB openblas-threads x86_64 0.3.26-4.fc40 fedora 38.9 MiB openblas-threads64 x86_64 0.3.26-4.fc40 fedora 39.1 MiB openblas-threads64_ x86_64 0.3.26-4.fc40 fedora 39.1 MiB opencl-headers noarch 3.0-21.20231212git2368105.fc40 fedora 722.6 KiB opencore-amr x86_64 0.1.6-6.fc40 fedora 344.9 KiB opencv x86_64 4.9.0-20231227.1.cu12_3.fc40 copr_base 20.3 MiB opencv-contrib x86_64 4.9.0-20231227.1.cu12_3.fc40 copr_base 15.4 MiB opencv-core x86_64 4.9.0-20231227.1.cu12_3.fc40 copr_base 52.4 MiB opencv-cuda x86_64 4.9.0-20231227.1.cu12_3.fc40 copr_base 571.8 MiB opencv-static x86_64 4.9.0-20231227.1.cu12_3.fc40 copr_base 2.9 MiB openexr-libs x86_64 3.1.10-5.fc40 fedora 6.4 MiB openjpeg2 x86_64 2.5.2-1.fc41 fedora 441.7 KiB openpgm x86_64 5.2.122-34.fc40 fedora 300.3 KiB openpgm-devel x86_64 5.2.122-34.fc40 fedora 339.7 KiB openslide x86_64 4.0.0-3.fc40 fedora 299.5 KiB openssh x86_64 9.6p1-1.fc41.6 fedora 1.8 MiB openssh-clients x86_64 9.6p1-1.fc41.6 fedora 2.6 MiB opus x86_64 1.5.1-1.fc41 fedora 415.8 KiB orc x86_64 0.4.38-2.fc41 fedora 763.7 KiB pango x86_64 1.51.2-1.fc41 fedora 987.1 KiB pcre x86_64 8.45-1.fc40.6 fedora 541.8 KiB pcre2-devel x86_64 10.43-1.fc41 fedora 2.0 MiB pcre2-utf16 x86_64 10.43-1.fc41 fedora 590.1 KiB pcre2-utf32 x86_64 10.43-1.fc41 fedora 557.9 KiB perl-AutoLoader noarch 5.74-506.fc40 fedora 20.5 KiB perl-B x86_64 1.88-506.fc40 fedora 492.4 KiB perl-Carp noarch 1.54-502.fc40 fedora 46.5 KiB perl-Class-Struct noarch 0.68-506.fc40 fedora 25.4 KiB perl-Data-Dumper x86_64 2.188-503.fc40 fedora 111.7 KiB perl-Digest noarch 1.20-502.fc40 fedora 35.2 KiB perl-Digest-MD5 x86_64 2.59-3.fc40 fedora 59.7 KiB perl-DynaLoader x86_64 1.54-506.fc40 fedora 32.1 KiB perl-Encode x86_64 4:3.21-505.fc41 fedora 4.7 MiB perl-Errno x86_64 1.37-506.fc40 fedora 8.3 KiB perl-Error noarch 1:0.17029-15.fc40 fedora 77.2 KiB perl-Exporter noarch 5.78-3.fc40 fedora 54.2 KiB perl-Fcntl x86_64 1.15-506.fc40 fedora 24.6 KiB perl-File-Basename noarch 2.86-506.fc40 fedora 14.0 KiB perl-File-Find noarch 1.43-506.fc40 fedora 41.9 KiB perl-File-Path noarch 2.18-503.fc40 fedora 63.5 KiB perl-File-Temp noarch 1:0.231.100-503.fc40 fedora 162.3 KiB perl-File-stat noarch 1.13-506.fc40 fedora 12.7 KiB perl-FileHandle noarch 2.05-506.fc40 fedora 9.3 KiB perl-Getopt-Long noarch 1:2.57-3.fc40 fedora 144.1 KiB perl-Getopt-Std noarch 1.13-506.fc40 fedora 11.1 KiB perl-Git noarch 2.44.0-1.fc41 fedora 64.0 KiB perl-HTTP-Tiny noarch 0.088-5.fc40 fedora 152.1 KiB perl-IO x86_64 1.52-506.fc40 fedora 151.0 KiB perl-IO-Socket-IP noarch 0.42-2.fc40 fedora 98.6 KiB perl-IO-Socket-SSL noarch 2.085-1.fc40 fedora 685.0 KiB perl-IPC-Open3 noarch 1.22-506.fc40 fedora 22.4 KiB perl-MIME-Base64 x86_64 3.16-503.fc40 fedora 46.1 KiB perl-Mozilla-CA noarch 20240313-1.fc41 fedora 9.5 KiB perl-Net-SSLeay x86_64 1.94-3.fc40 fedora 1.3 MiB perl-POSIX x86_64 2.13-506.fc40 fedora 229.0 KiB perl-PathTools x86_64 3.89-502.fc40 fedora 179.6 KiB perl-Pod-Escapes noarch 1:1.07-503.fc40 fedora 24.9 KiB perl-Pod-Perldoc noarch 3.28.01-503.fc40 fedora 163.1 KiB perl-Pod-Simple noarch 1:3.45-6.fc40 fedora 559.8 KiB perl-Pod-Usage noarch 4:2.03-503.fc40 fedora 84.7 KiB perl-Scalar-List-Utils x86_64 5:1.63-503.fc40 fedora 145.5 KiB perl-SelectSaver noarch 1.02-506.fc40 fedora 2.2 KiB perl-Socket x86_64 4:2.037-5.fc40 fedora 123.6 KiB perl-Storable x86_64 1:3.32-502.fc40 fedora 232.3 KiB perl-Symbol noarch 1.09-506.fc40 fedora 6.8 KiB perl-Term-ANSIColor noarch 5.01-504.fc40 fedora 97.5 KiB perl-Term-Cap noarch 1.18-503.fc40 fedora 29.3 KiB perl-TermReadKey x86_64 2.38-21.fc40 fedora 64.0 KiB perl-Text-ParseWords noarch 3.31-502.fc40 fedora 13.5 KiB perl-Text-Tabs+Wrap noarch 2024.001-1.fc41 fedora 22.5 KiB perl-Time-Local noarch 2:1.350-5.fc40 fedora 68.9 KiB perl-URI noarch 5.28-1.fc41 fedora 240.2 KiB perl-base noarch 2.27-506.fc40 fedora 12.5 KiB perl-constant noarch 1.33-503.fc40 fedora 26.2 KiB perl-if noarch 0.61.000-506.fc40 fedora 5.8 KiB perl-interpreter x86_64 4:5.38.2-506.fc40 fedora 119.8 KiB perl-lib x86_64 0.65-506.fc40 fedora 8.5 KiB perl-libnet noarch 3.15-503.fc40 fedora 289.0 KiB perl-libs x86_64 4:5.38.2-506.fc40 fedora 9.8 MiB perl-locale noarch 1.10-506.fc40 fedora 6.2 KiB perl-mro x86_64 1.28-506.fc40 fedora 41.6 KiB perl-overload noarch 1.37-506.fc40 fedora 71.5 KiB perl-overloading noarch 0.02-506.fc40 fedora 4.8 KiB perl-parent noarch 1:0.241-502.fc40 fedora 9.7 KiB perl-podlators noarch 1:5.01-502.fc40 fedora 308.1 KiB perl-vars noarch 1.05-506.fc40 fedora 3.9 KiB pixman x86_64 0.43.4-1.fc41 fedora 710.1 KiB poppler x86_64 24.02.0-2.fc40 fedora 3.5 MiB poppler-data noarch 0.4.11-7.fc40 fedora 12.3 MiB poppler-glib x86_64 24.02.0-2.fc40 fedora 575.1 KiB proj x86_64 9.4.0-1.fc41 fedora 4.4 MiB proj-data noarch 9.4.0-1.fc41 fedora 9.0 MiB protobuf x86_64 3.19.6-8.fc40 fedora 3.3 MiB protobuf-compat x86_64 3.21.9-2.fc39 copr_base 3.6 MiB pthreadpool x86_64 1:0.1-20240121.0.git178e3e06.fc40 copr_base 103.3 KiB pugixml x86_64 1.13-5.fc40 fedora 257.7 KiB pyproject-rpm-macros noarch 1.12.0-1.fc40 fedora 98.8 KiB python-pip-wheel noarch 24.0-2.fc41 fedora 1.5 MiB python-rpm-macros noarch 3.12-9.fc41 fedora 22.1 KiB python3 x86_64 3.12.2-3.fc41 fedora 31.9 KiB python3-libs x86_64 3.12.2-3.fc41 fedora 40.9 MiB python3-packaging noarch 24.0-1.fc41 fedora 424.8 KiB python3-rpm-generators noarch 14-10.fc40 fedora 81.7 KiB python3-rpm-macros noarch 3.12-9.fc41 fedora 6.4 KiB qnnpack x86_64 0-20190828.2.git7d2a4e99.fc38 copr_base 97.8 KiB qt-settings noarch 40.0-1.fc41 fedora 1.1 KiB qt5-qtbase x86_64 5.15.13-1.fc41 fedora 10.0 MiB qt5-qtbase-common noarch 5.15.13-1.fc41 fedora 78.0 B qt5-qtbase-gui x86_64 5.15.13-1.fc41 fedora 20.0 MiB rav1e-libs x86_64 0.7.1-1.fc40 fedora 3.0 MiB re2 x86_64 1:20220601-5.fc40 fedora 492.9 KiB rhash x86_64 1.4.3-4.fc40 fedora 344.7 KiB rocksdb x86_64 8.10.0-3.fc40 fedora 9.5 MiB rsvg-pixbuf-loader x86_64 2.57.1-4.fc40 fedora 15.5 KiB samba-client-libs x86_64 2:4.20.0-7.fc41 fedora 19.1 MiB samba-common noarch 2:4.20.0-7.fc41 fedora 141.1 KiB samba-common-libs x86_64 2:4.20.0-7.fc41 fedora 256.9 KiB scotch x86_64 7.0.4-3.fc40 fedora 706.1 KiB scotch-devel x86_64 7.0.4-3.fc40 fedora 95.7 KiB shared-mime-info x86_64 2.3-4.fc41 fedora 5.2 MiB sleef x86_64 3.6-20240320.0.git60e76d2b.fc41 copr_base 2.8 MiB snappy x86_64 1.1.10-4.fc40 fedora 67.0 KiB soxr x86_64 0.1.3-15.fc40 fedora 187.7 KiB speex x86_64 1.2.0-17.fc40 fedora 116.6 KiB srt-libs x86_64 1.5.3-2.fc40 fedora 948.9 KiB suitesparse x86_64 7.7.0-1.fc41 fedora 137.2 MiB svt-av1-libs x86_64 1.4.1-5.fc40 fedora 7.2 MiB systemd x86_64 255.4-1.fc41 fedora 14.6 MiB systemd-pam x86_64 255.4-1.fc41 fedora 1.0 MiB systemd-rpm-macros noarch 255.4-1.fc41 fedora 9.5 KiB tbb x86_64 2021.11.0-5.fc40 fedora 440.9 KiB tbb-bind x86_64 2021.11.0-5.fc40 fedora 23.7 KiB tbb2020.3 x86_64 2020.3-4.fc40 fedora 263.4 KiB tensorpipe x86_64 0-20220513.1.gitbb1473a4.fc37 copr_base 3.0 MiB tpm2-tss x86_64 4.0.1-7.fc40 fedora 1.5 MiB twolame-libs x86_64 0.4.0-4.fc40 fedora 161.6 KiB tzdata noarch 2024a-5.fc41 fedora 1.6 MiB unixODBC x86_64 2.3.12-4.fc40 fedora 1.2 MiB uriparser x86_64 0.9.7-5.fc40 fedora 140.5 KiB urw-base35-bookman-fonts noarch 20200910-19.fc40 fedora 1.4 MiB urw-base35-c059-fonts noarch 20200910-19.fc40 fedora 1.4 MiB urw-base35-d050000l-fonts noarch 20200910-19.fc40 fedora 84.3 KiB urw-base35-fonts noarch 20200910-19.fc40 fedora 5.3 KiB urw-base35-fonts-common noarch 20200910-19.fc40 fedora 37.4 KiB urw-base35-gothic-fonts noarch 20200910-19.fc40 fedora 1.2 MiB urw-base35-nimbus-mono-ps-fonts noarch 20200910-19.fc40 fedora 1.0 MiB urw-base35-nimbus-roman-fonts noarch 20200910-19.fc40 fedora 1.4 MiB urw-base35-nimbus-sans-fonts noarch 20200910-19.fc40 fedora 2.4 MiB urw-base35-p052-fonts noarch 20200910-19.fc40 fedora 1.5 MiB urw-base35-standard-symbols-ps-fonts noarch 20200910-19.fc40 fedora 44.2 KiB urw-base35-z003-fonts noarch 20200910-19.fc40 fedora 390.8 KiB utf8proc x86_64 2.7.0-7.fc40 fedora 362.4 KiB vapoursynth-libs x86_64 65-2.fc40 fedora 1.8 MiB vim-filesystem noarch 2:9.1.264-1.fc41 fedora 40.0 B vo-amrwbenc x86_64 0.1.3-20.fc40 fedora 145.9 KiB vtk x86_64 9.2.6-13.fc41 fedora 99.3 MiB xapian-core-libs x86_64 1.4.23-2.fc40 fedora 2.1 MiB xcb-util x86_64 0.4.1-5.fc40 fedora 30.4 KiB xcb-util-image x86_64 0.4.1-5.fc40 fedora 22.2 KiB xcb-util-keysyms x86_64 0.4.1-5.fc40 fedora 16.8 KiB xcb-util-renderutil x86_64 0.3.10-5.fc40 fedora 28.5 KiB xcb-util-wm x86_64 0.4.2-5.fc40 fedora 85.4 KiB xerces-c x86_64 3.2.5-2.fc40 fedora 3.6 MiB xkeyboard-config noarch 2.41-1.fc40 fedora 6.6 MiB xml-common noarch 0.6.3-63.fc40 fedora 78.4 KiB xorg-x11-proto-devel noarch 2024.1-1.fc41 fedora 1.7 MiB xvidcore x86_64 1.3.7-11.fc40 fedora 886.6 KiB zeromq x86_64 4.3.5-16.fc40 fedora 885.4 KiB zimg x86_64 3.0.5-2.fc40 fedora 813.4 KiB zlib-ng-compat-devel x86_64 2.1.6-2.fc40 fedora 103.4 KiB zvbi x86_64 0.2.35-22.fc40 fedora 1.1 MiB Transaction Summary: Installing: 580 packages Total size of inbound packages is 2 GiB. Need to download 2 GiB. After this operation 8 GiB will be used (install 8 GiB, remove 0 B). [ 1/580] cutlass-devel-0:3.4.1-2024021 100% | 21.0 MiB/s | 774.1 KiB | 00m00s [ 2/580] doxygen-2:1.10.0-3.fc40.x86_6 100% | 106.5 MiB/s | 5.3 MiB | 00m00s [ 3/580] fbgemm-devel-0:0.7.0-20240315 100% | 2.7 MiB/s | 63.5 KiB | 00m00s [ 4/580] eigen3-devel-0:3.4.0-15.fc40. 100% | 20.3 MiB/s | 1.2 MiB | 00m00s [ 5/580] fp16-devel-1:0-20240410.0.git 100% | 632.1 KiB/s | 12.6 KiB | 00m00s [ 6/580] flatbuffers-devel-0:24.3.25-1 100% | 1.7 MiB/s | 111.8 KiB | 00m00s [ 7/580] gemmlowp-devel-0:0-20231104.0 100% | 6.1 MiB/s | 157.2 KiB | 00m00s [ 8/580] git-0:2.44.0-1.fc41.x86_64 100% | 13.0 MiB/s | 53.3 KiB | 00m00s [ 9/580] kineto-devel-0:0.4.0-20240327 100% | 1.5 MiB/s | 23.0 KiB | 00m00s [ 10/580] magma-devel-0:2.8.0-20240328. 100% | 35.3 MiB/s | 903.7 KiB | 00m00s [ 11/580] nnpack-devel-0:0-20230201.0.g 100% | 788.3 KiB/s | 15.8 KiB | 00m00s [ 12/580] onnx-optimizer-devel-0:0.3.19 100% | 3.1 MiB/s | 50.5 KiB | 00m00s [ 13/580] neon2sse-devel-0:0-20230131.0 100% | 1.2 MiB/s | 84.7 KiB | 00m00s [ 14/580] peachpy-python3-0:0-20221113. 100% | 17.8 MiB/s | 674.0 KiB | 00m00s [ 15/580] protobuf-compat-compiler-0:3. 100% | 40.8 MiB/s | 919.3 KiB | 00m00s [ 16/580] protobuf-compat-devel-0:3.21. 100% | 40.6 MiB/s | 374.2 KiB | 00m00s [ 17/580] python3-pybind11-0:2.12.0-1.f 100% | 3.9 MiB/s | 202.2 KiB | 00m00s [ 18/580] flatbuffers-compiler-0:24.3.2 100% | 3.4 MiB/s | 1.1 MiB | 00m00s [ 19/580] python3-pyyaml-0:6.0.1-14.fc4 100% | 45.5 MiB/s | 232.7 KiB | 00m00s [ 20/580] python3-setuptools-0:69.2.0-1 100% | 256.2 MiB/s | 1.5 MiB | 00m00s [ 21/580] python3-numpy-1:1.26.4-2.fc41 100% | 101.8 MiB/s | 7.4 MiB | 00m00s [ 22/580] python3-six-0:1.16.0-14.fc40. 100% | 1.6 MiB/s | 40.9 KiB | 00m00s [ 23/580] python3-typing-extensions-0:4 100% | 3.4 MiB/s | 80.8 KiB | 00m00s [ 24/580] qnnpack-devel-0:0-20190828.2. 100% | 689.1 KiB/s | 12.4 KiB | 00m00s [ 25/580] tensorpipe-devel-0:0-20220513 100% | 4.6 MiB/s | 109.4 KiB | 00m00s [ 26/580] asmjit-devel-1:0-20220702.1.g 100% | 9.8 MiB/s | 229.9 KiB | 00m00s [ 27/580] cpuinfo-devel-1:0-20240327.0. 100% | 998.0 KiB/s | 24.0 KiB | 00m00s [ 28/580] cuda-cudart-devel-12-3-0:12.3 100% | 111.7 MiB/s | 2.0 MiB | 00m00s [ 29/580] cuda-driver-devel-12-3-0:12.3 100% | 2.5 MiB/s | 41.6 KiB | 00m00s [ 30/580] cuda-nvml-devel-12-3-0:12.3.1 100% | 29.4 MiB/s | 120.6 KiB | 00m00s [ 31/580] cuda-nvrtc-devel-12-3-0:12.3. 100% | 272.4 MiB/s | 22.3 MiB | 00m00s [ 32/580] cuda-cupti-12-3-0:12.3.101-1. 100% | 217.2 MiB/s | 28.7 MiB | 00m00s [ 33/580] cuda-nvtx-12-3-0:12.3.101-1.x 100% | 3.3 MiB/s | 87.7 KiB | 00m00s [ 34/580] cuda-profiler-api-12-3-0:12.3 100% | 12.6 MiB/s | 25.9 KiB | 00m00s [ 35/580] fftw-devel-0:3.3.10-12.fc41.x 100% | 2.2 MiB/s | 134.9 KiB | 00m00s [ 36/580] foxi-devel-0:1.4.1^git2021052 100% | 364.3 KiB/s | 23.3 KiB | 00m00s [ 37/580] fxdiv-devel-1:0-20201208.1.gi 100% | 752.2 KiB/s | 12.0 KiB | 00m00s [ 38/580] gflags-devel-0:2.2.2-14.fc40. 100% | 794.2 KiB/s | 24.6 KiB | 00m00s [ 39/580] cuda-nvcc-12-3-0:12.3.107-1.x 100% | 210.4 MiB/s | 63.8 MiB | 00m00s [ 40/580] glog-devel-0:0.3.5-20.fc40.x8 100% | 483.3 KiB/s | 37.7 KiB | 00m00s [ 41/580] gcc-c++-0:14.0.1-0.13.fc41.x8 100% | 107.3 MiB/s | 14.2 MiB | 00m00s [ 42/580] gmp-devel-1:6.3.0-1.fc41.x86_ 100% | 15.5 MiB/s | 174.3 KiB | 00m00s [ 43/580] gloo-devel-1:0.5.0-20240302.0 100% | 4.9 MiB/s | 74.6 KiB | 00m00s [ 44/580] libcublas-devel-12-3-0:12.3.4 100% | 5.1 MiB/s | 88.5 KiB | 00m00s [ 45/580] libcudnn8-devel-0:8.9.7.29-2. 100% | 2.1 MiB/s | 33.6 KiB | 00m00s [ 46/580] libcufft-devel-12-3-0:11.0.12 100% | 1.8 MiB/s | 33.7 KiB | 00m00s [ 47/580] leveldb-devel-0:1.23-9.fc40.x 100% | 771.8 KiB/s | 52.5 KiB | 00m00s [ 48/580] hiredis-devel-0:1.0.2-7.fc40. 100% | 489.8 KiB/s | 37.2 KiB | 00m00s [ 49/580] libcusolver-devel-12-3-0:11.5 100% | 2.0 MiB/s | 60.5 KiB | 00m00s [ 50/580] libnccl-devel-0:2.21.5-1+cuda 100% | 7.8 MiB/s | 16.0 KiB | 00m00s [ 51/580] libnvjitlink-devel-12-3-0:12. 100% | 253.8 MiB/s | 17.8 MiB | 00m00s [ 52/580] libuv-devel-1:1.48.0-1.fc40.x 100% | 855.4 KiB/s | 41.9 KiB | 00m00s [ 53/580] libzstd-devel-0:1.5.6-1.fc41. 100% | 25.3 MiB/s | 51.8 KiB | 00m00s [ 54/580] libcurand-devel-12-3-0:10.3.4 100% | 238.6 MiB/s | 53.2 MiB | 00m00s [ 55/580] lmdb-devel-0:0.9.32-1.fc40.x8 100% | 459.7 KiB/s | 25.7 KiB | 00m00s [ 56/580] mesa-libGLU-devel-0:9.0.3-4.f 100% | 5.8 MiB/s | 12.0 KiB | 00m00s [ 57/580] mpfr-devel-0:4.2.1-3.fc40.x86 100% | 771.8 KiB/s | 21.6 KiB | 00m00s [ 58/580] miniz-devel-0:3.0.2-5.fc40.x8 100% | 902.2 KiB/s | 32.5 KiB | 00m00s [ 59/580] numactl-devel-0:2.0.16-5.fc40 100% | 468.5 KiB/s | 22.0 KiB | 00m00s [ 60/580] ocl-icd-devel-0:2.3.2-5.fc40. 100% | 1.4 MiB/s | 62.0 KiB | 00m00s [ 61/580] openblas-devel-0:0.3.26-4.fc4 100% | 16.2 MiB/s | 83.0 KiB | 00m00s [ 62/580] openblas-openmp-0:0.3.26-4.fc 100% | 174.5 MiB/s | 5.1 MiB | 00m00s [ 63/580] onnx-devel-0:1.17.0-20240403. 100% | 3.5 MiB/s | 129.6 KiB | 00m00s [ 64/580] psimd-devel-1:0-20200517.2.gi 100% | 1.6 MiB/s | 13.0 KiB | 00m00s [ 65/580] opencv-devel-0:4.9.0-20231227 100% | 63.0 MiB/s | 1.3 MiB | 00m00s [ 66/580] pthreadpool-devel-1:0.1-20240 100% | 1.2 MiB/s | 14.7 KiB | 00m00s [ 67/580] python3-devel-0:3.12.2-3.fc41 100% | 61.1 MiB/s | 312.7 KiB | 00m00s [ 68/580] pybind11-devel-0:2.12.0-1.fc4 100% | 6.8 MiB/s | 180.1 KiB | 00m00s [ 69/580] rdma-core-devel-0:51.0-2.fc41 100% | 5.7 MiB/s | 436.4 KiB | 00m00s [ 70/580] rocksdb-devel-0:8.10.0-3.fc40 100% | 4.9 MiB/s | 306.5 KiB | 00m00s [ 71/580] sleef-devel-0:3.6-20240320.0. 100% | 2.3 MiB/s | 27.8 KiB | 00m00s [ 72/580] snappy-devel-0:1.1.10-4.fc40. 100% | 4.3 MiB/s | 21.8 KiB | 00m00s [ 73/580] tbb-devel-0:2021.11.0-5.fc40. 100% | 39.1 MiB/s | 240.0 KiB | 00m00s [ 74/580] zeromq-devel-0:4.3.5-16.fc40. 100% | 771.0 KiB/s | 17.0 KiB | 00m00s [ 75/580] cmake-filesystem-0:3.28.3-1.f 100% | 2.9 MiB/s | 17.5 KiB | 00m00s [ 76/580] flatbuffers-0:24.3.25-1.fc41. 100% | 3.8 MiB/s | 206.3 KiB | 00m00s [ 77/580] perl-interpreter-4:5.38.2-506 100% | 23.5 MiB/s | 72.3 KiB | 00m00s [ 78/580] xapian-core-libs-0:1.4.23-2.f 100% | 126.9 MiB/s | 779.5 KiB | 00m00s [ 79/580] fbgemm-0:0.7.0-20240315.0.git 100% | 30.9 MiB/s | 1.2 MiB | 00m00s [ 80/580] fp16-1:0-20240410.0.git581ac1 100% | 1.0 MiB/s | 11.5 KiB | 00m00s [ 81/580] libcusparse-devel-12-3-0:12.2 100% | 161.0 MiB/s | 108.2 MiB | 00m01s [ 82/580] git-core-0:2.44.0-1.fc41.x86_ 100% | 37.2 MiB/s | 4.5 MiB | 00m00s [ 83/580] perl-File-Basename-0:2.86-506 100% | 298.5 KiB/s | 17.6 KiB | 00m00s [ 84/580] git-core-doc-0:2.44.0-1.fc41. 100% | 40.7 MiB/s | 2.9 MiB | 00m00s [ 85/580] perl-File-Find-0:1.43-506.fc4 100% | 4.2 MiB/s | 25.7 KiB | 00m00s [ 86/580] perl-Getopt-Long-1:2.57-3.fc4 100% | 30.9 MiB/s | 63.2 KiB | 00m00s [ 87/580] perl-Git-0:2.44.0-1.fc41.noar 100% | 13.0 MiB/s | 40.0 KiB | 00m00s [ 88/580] perl-IPC-Open3-0:1.22-506.fc4 100% | 10.9 MiB/s | 22.3 KiB | 00m00s [ 89/580] perl-PathTools-0:3.89-502.fc4 100% | 21.3 MiB/s | 87.4 KiB | 00m00s [ 90/580] perl-TermReadKey-0:2.38-21.fc 100% | 8.6 MiB/s | 35.3 KiB | 00m00s [ 91/580] perl-lib-0:0.65-506.fc40.x86_ 100% | 3.8 MiB/s | 15.4 KiB | 00m00s [ 92/580] kineto-0:0.4.0-20240327.0.git 100% | 21.1 MiB/s | 302.6 KiB | 00m00s [ 93/580] nnpack-0:0-20230201.0.git70a7 100% | 3.2 MiB/s | 55.5 KiB | 00m00s [ 94/580] onnx-optimizer-0:0.3.19-20240 100% | 8.9 MiB/s | 201.2 KiB | 00m00s [ 95/580] protobuf-compat-0:3.21.9-2.fc 100% | 26.3 MiB/s | 1.1 MiB | 00m00s [ 96/580] flexiblas-netlib-0:3.4.2-1.fc 100% | 143.7 MiB/s | 3.2 MiB | 00m00s [ 97/580] libyaml-0:0.2.5-14.fc40.x86_6 100% | 9.6 MiB/s | 59.2 KiB | 00m00s [ 98/580] qnnpack-0:0-20190828.2.git7d2 100% | 1.9 MiB/s | 49.8 KiB | 00m00s [ 99/580] tensorpipe-0:0-20220513.1.git 100% | 32.6 MiB/s | 802.0 KiB | 00m00s [100/580] asmjit-1:0-20220702.1.gitc598 100% | 10.5 MiB/s | 204.7 KiB | 00m00s [101/580] cpuinfo-1:0-20240327.0.gitf42 100% | 2.4 MiB/s | 46.4 KiB | 00m00s [102/580] cuda-cudart-12-3-0:12.3.101-1 100% | 43.6 MiB/s | 223.1 KiB | 00m00s [103/580] cuda-crt-12-3-0:12.3.107-1.x8 100% | 13.6 MiB/s | 111.4 KiB | 00m00s [104/580] cuda-nvvm-12-3-0:12.3.107-1.x 100% | 153.1 MiB/s | 25.7 MiB | 00m00s [105/580] cuda-nvrtc-12-3-0:12.3.107-1. 100% | 194.9 MiB/s | 23.6 MiB | 00m00s [106/580] fftw-0:3.3.10-12.fc41.x86_64 100% | 1.2 MiB/s | 45.7 KiB | 00m00s [107/580] fftw-libs-0:3.3.10-12.fc41.x8 100% | 136.7 KiB/s | 8.2 KiB | 00m00s [108/580] foxi-0:1.4.1^git20210526.c278 100% | 450.7 KiB/s | 11.7 KiB | 00m00s [109/580] gcc-0:14.0.1-0.13.fc41.x86_64 100% | 197.9 MiB/s | 37.2 MiB | 00m00s [110/580] libmpc-0:1.3.1-5.fc40.x86_64 100% | 23.2 MiB/s | 71.1 KiB | 00m00s [111/580] gflags-0:2.2.2-14.fc40.x86_64 100% | 5.6 MiB/s | 98.2 KiB | 00m00s [112/580] magma-0:2.8.0-20240328.0.cu12 100% | 124.8 MiB/s | 119.1 MiB | 00m01s [113/580] glog-0:0.3.5-20.fc40.x86_64 100% | 677.5 KiB/s | 69.8 KiB | 00m00s [114/580] gmp-c++-1:6.3.0-1.fc41.x86_64 100% | 9.1 MiB/s | 18.6 KiB | 00m00s [115/580] gloo-1:0.5.0-20240302.0.git25 100% | 61.1 MiB/s | 813.1 KiB | 00m00s [116/580] cutlass-0:3.4.1-20240215.0.cu 100% | 127.7 MiB/s | 179.9 MiB | 00m01s [117/580] hiredis-0:1.0.2-7.fc40.x86_64 100% | 265.6 KiB/s | 42.2 KiB | 00m00s [118/580] leveldb-0:1.23-9.fc40.x86_64 100% | 1.0 MiB/s | 156.8 KiB | 00m00s [119/580] libnvjitlink-12-3-0:12.3.101- 100% | 246.7 MiB/s | 19.5 MiB | 00m00s [120/580] libuv-1:1.48.0-1.fc40.x86_64 100% | 27.4 MiB/s | 252.4 KiB | 00m00s [121/580] libuv-static-1:1.48.0-1.fc40. 100% | 2.9 MiB/s | 105.7 KiB | 00m00s [122/580] lmdb-0:0.9.32-1.fc40.x86_64 100% | 677.7 KiB/s | 32.5 KiB | 00m00s [123/580] libcurand-12-3-0:10.3.4.107-1 100% | 235.0 MiB/s | 52.9 MiB | 00m00s [124/580] lmdb-libs-0:0.9.32-1.fc40.x86 100% | 1.2 MiB/s | 61.1 KiB | 00m00s [125/580] mesa-libGLU-0:9.0.3-4.fc40.x8 100% | 39.7 MiB/s | 162.5 KiB | 00m00s [126/580] gl-manpages-0:1.1-31.20190306 100% | 108.1 MiB/s | 1.2 MiB | 00m00s [127/580] numactl-libs-0:2.0.16-5.fc40. 100% | 5.9 MiB/s | 30.1 KiB | 00m00s [128/580] ocl-icd-0:2.3.2-5.fc40.x86_64 100% | 9.1 MiB/s | 65.2 KiB | 00m00s [129/580] opencl-headers-0:3.0-21.20231 100% | 1.8 MiB/s | 88.8 KiB | 00m00s [130/580] miniz-0:3.0.2-5.fc40.x86_64 100% | 922.8 KiB/s | 65.5 KiB | 00m00s [131/580] openblas-0:0.3.26-4.fc40.x86_ 100% | 9.4 MiB/s | 38.6 KiB | 00m00s [132/580] onnx-libs-0:1.17.0-20240403.0 100% | 19.2 MiB/s | 863.5 KiB | 00m00s [133/580] openblas-openmp64_-0:0.3.26-4 100% | 105.0 MiB/s | 4.9 MiB | 00m00s [134/580] openblas-openmp64-0:0.3.26-4. 100% | 55.5 MiB/s | 4.9 MiB | 00m00s [135/580] libnccl-0:2.21.5-1+cuda12.4.x 100% | 251.1 MiB/s | 130.1 MiB | 00m01s [136/580] openblas-serial-0:0.3.26-4.fc 100% | 31.2 MiB/s | 4.9 MiB | 00m00s [137/580] openblas-serial64-0:0.3.26-4. 100% | 30.2 MiB/s | 4.8 MiB | 00m00s [138/580] openblas-threads64-0:0.3.26-4 100% | 73.6 MiB/s | 4.9 MiB | 00m00s [139/580] openblas-threads64_-0:0.3.26- 100% | 84.9 MiB/s | 4.9 MiB | 00m00s [140/580] openblas-serial64_-0:0.3.26-4 100% | 21.9 MiB/s | 4.8 MiB | 00m00s [141/580] libgfortran-0:14.0.1-0.13.fc4 100% | 12.0 MiB/s | 937.8 KiB | 00m00s [142/580] opencv-0:4.9.0-20231227.1.cu1 100% | 121.9 MiB/s | 4.4 MiB | 00m00s [143/580] openblas-threads-0:0.3.26-4.f 100% | 18.8 MiB/s | 5.1 MiB | 00m00s [144/580] opencv-static-0:4.9.0-2023122 100% | 12.2 MiB/s | 425.1 KiB | 00m00s [145/580] opencv-contrib-0:4.9.0-202312 100% | 38.6 MiB/s | 5.6 MiB | 00m00s [146/580] pthreadpool-1:0.1-20240121.0. 100% | 681.2 KiB/s | 43.6 KiB | 00m00s [147/580] python3-libs-0:3.12.2-3.fc41. 100% | 276.7 MiB/s | 9.1 MiB | 00m00s [148/580] infiniband-diags-0:51.0-2.fc4 100% | 11.5 MiB/s | 329.7 KiB | 00m00s [149/580] libibverbs-0:51.0-2.fc41.x86_ 100% | 42.4 MiB/s | 434.4 KiB | 00m00s [150/580] libibumad-0:51.0-2.fc41.x86_6 100% | 882.7 KiB/s | 26.5 KiB | 00m00s [151/580] opencv-cuda-0:4.9.0-20231227. 100% | 143.3 MiB/s | 37.0 MiB | 00m00s [152/580] librdmacm-0:51.0-2.fc41.x86_6 100% | 957.9 KiB/s | 71.8 KiB | 00m00s [153/580] snappy-0:1.1.10-4.fc40.x86_64 100% | 3.0 MiB/s | 37.2 KiB | 00m00s [154/580] sleef-0:3.6-20240320.0.git60e 100% | 54.7 MiB/s | 896.3 KiB | 00m00s [155/580] tbb-0:2021.11.0-5.fc40.x86_64 100% | 31.9 MiB/s | 163.3 KiB | 00m00s [156/580] tbb-bind-0:2021.11.0-5.fc40.x 100% | 4.6 MiB/s | 18.9 KiB | 00m00s [157/580] zeromq-0:4.3.5-16.fc40.x86_64 100% | 32.3 MiB/s | 463.3 KiB | 00m00s [158/580] expat-0:2.6.2-1.fc41.x86_64 100% | 22.1 MiB/s | 113.2 KiB | 00m00s [159/580] perl-libs-4:5.38.2-506.fc40.x 100% | 80.6 MiB/s | 2.3 MiB | 00m00s [160/580] less-0:643-4.fc40.x86_64 100% | 13.1 MiB/s | 174.1 KiB | 00m00s [161/580] perl-Carp-0:1.54-502.fc40.noa 100% | 7.0 MiB/s | 28.7 KiB | 00m00s [162/580] openssh-clients-0:9.6p1-1.fc4 100% | 72.9 MiB/s | 746.1 KiB | 00m00s [163/580] perl-Exporter-0:5.78-3.fc40.n 100% | 1.9 MiB/s | 30.8 KiB | 00m00s [164/580] perl-Text-ParseWords-0:3.31-5 100% | 1.1 MiB/s | 16.3 KiB | 00m00s [165/580] perl-base-0:2.27-506.fc40.noa 100% | 3.3 MiB/s | 16.6 KiB | 00m00s [166/580] perl-constant-0:1.33-503.fc40 100% | 5.6 MiB/s | 22.8 KiB | 00m00s [167/580] perl-overload-0:1.37-506.fc40 100% | 11.2 MiB/s | 46.0 KiB | 00m00s [168/580] perl-Error-1:0.17029-15.fc40. 100% | 5.6 MiB/s | 40.4 KiB | 00m00s [169/580] perl-Fcntl-0:1.15-506.fc40.x8 100% | 2.9 MiB/s | 20.6 KiB | 00m00s [170/580] perl-IO-0:1.52-506.fc40.x86_6 100% | 9.0 MiB/s | 82.7 KiB | 00m00s [171/580] perl-POSIX-0:2.13-506.fc40.x8 100% | 11.8 MiB/s | 96.9 KiB | 00m00s [172/580] perl-Symbol-0:1.09-506.fc40.n 100% | 4.8 MiB/s | 14.6 KiB | 00m00s [173/580] perl-Errno-0:1.37-506.fc40.x8 100% | 5.0 MiB/s | 15.4 KiB | 00m00s [174/580] perl-Scalar-List-Utils-5:1.63 100% | 17.8 MiB/s | 72.9 KiB | 00m00s [175/580] perl-DynaLoader-0:1.54-506.fc 100% | 3.2 MiB/s | 26.5 KiB | 00m00s [176/580] perl-vars-0:1.05-506.fc40.noa 100% | 1.5 MiB/s | 13.4 KiB | 00m00s [177/580] flexiblas-openblas-openmp-0:3 100% | 1.6 MiB/s | 17.6 KiB | 00m00s [178/580] flexiblas-0:3.4.2-1.fc41.x86_ 100% | 1.0 MiB/s | 25.1 KiB | 00m00s [179/580] libquadmath-0:14.0.1-0.13.fc4 100% | 11.9 MiB/s | 194.8 KiB | 00m00s [180/580] rocksdb-0:8.10.0-3.fc40.x86_6 100% | 14.6 MiB/s | 3.1 MiB | 00m00s [181/580] fftw-libs-double-0:3.3.10-12. 100% | 47.9 MiB/s | 1.2 MiB | 00m00s [182/580] fftw-libs-quad-0:3.3.10-12.fc 100% | 22.4 MiB/s | 735.6 KiB | 00m00s [183/580] fftw-libs-single-0:3.3.10-12. 100% | 43.5 MiB/s | 1.2 MiB | 00m00s [184/580] fftw-libs-long-0:3.3.10-12.fc 100% | 9.4 MiB/s | 500.8 KiB | 00m00s [185/580] make-1:4.4.1-6.fc40.x86_64 100% | 71.7 MiB/s | 587.6 KiB | 00m00s [186/580] libglvnd-opengl-1:1.7.0-4.fc4 100% | 5.3 MiB/s | 38.0 KiB | 00m00s [187/580] coin-or-Clp-0:1.17.9-1.fc41.x 100% | 30.8 MiB/s | 978.0 KiB | 00m00s [188/580] glib2-0:2.80.0-1.fc41.x86_64 100% | 177.6 MiB/s | 3.0 MiB | 00m00s [189/580] coin-or-CoinUtils-0:2.11.10-1 100% | 9.3 MiB/s | 494.4 KiB | 00m00s [190/580] cpp-0:14.0.1-0.13.fc41.x86_64 100% | 135.9 MiB/s | 12.0 MiB | 00m00s [191/580] gstreamer1-0:1.24.0-1.fc41.x8 100% | 79.8 MiB/s | 1.8 MiB | 00m00s [192/580] gstreamer1-plugins-base-0:1.2 100% | 45.0 MiB/s | 2.2 MiB | 00m00s [193/580] libavformat-free-0:6.1.1-11.f 100% | 19.5 MiB/s | 1.1 MiB | 00m00s [194/580] libdc1394-0:2.2.7-5.fc40.x86_ 100% | 7.1 MiB/s | 130.5 KiB | 00m00s [195/580] libjpeg-turbo-0:3.0.2-1.fc40. 100% | 73.8 MiB/s | 226.7 KiB | 00m00s [196/580] libpng-2:1.6.40-3.fc40.x86_64 100% | 58.6 MiB/s | 119.9 KiB | 00m00s [197/580] libavutil-free-0:6.1.1-11.fc4 100% | 4.9 MiB/s | 351.2 KiB | 00m00s [198/580] libtiff-0:4.6.0-2.fc40.x86_64 100% | 54.1 MiB/s | 332.4 KiB | 00m00s [199/580] libwebp-0:1.3.2-5.fc41.x86_64 100% | 69.9 MiB/s | 286.3 KiB | 00m00s [200/580] openexr-libs-0:3.1.10-5.fc40. 100% | 44.9 MiB/s | 1.1 MiB | 00m00s [201/580] openjpeg2-0:2.5.2-1.fc41.x86_ 100% | 60.6 MiB/s | 186.2 KiB | 00m00s [202/580] libavcodec-free-0:6.1.1-11.fc 100% | 26.1 MiB/s | 4.1 MiB | 00m00s [203/580] ceres-solver-0:2.2.0-4.fc40.x 100% | 79.9 MiB/s | 1.1 MiB | 00m00s [204/580] freetype-0:2.13.2-5.fc40.x86_ 100% | 133.4 MiB/s | 409.7 KiB | 00m00s [205/580] libswscale-free-0:6.1.1-11.fc 100% | 2.1 MiB/s | 192.4 KiB | 00m00s [206/580] harfbuzz-0:8.4.0-1.fc41.x86_6 100% | 63.0 MiB/s | 1.0 MiB | 00m00s [207/580] libglvnd-glx-1:1.7.0-4.fc40.x 100% | 11.8 MiB/s | 132.4 KiB | 00m00s [208/580] hdf5-0:1.12.1-15.fc40.x86_64 100% | 67.6 MiB/s | 2.2 MiB | 00m00s [209/580] libb2-0:0.98.1-11.fc40.x86_64 100% | 4.1 MiB/s | 25.5 KiB | 00m00s [210/580] mpdecimal-0:2.5.1-9.fc40.x86_ 100% | 43.3 MiB/s | 88.6 KiB | 00m00s [211/580] tzdata-0:2024a-5.fc41.noarch 100% | 233.1 MiB/s | 716.0 KiB | 00m00s [212/580] python-pip-wheel-0:24.0-2.fc4 100% | 134.1 MiB/s | 1.5 MiB | 00m00s [213/580] perl-Getopt-Std-0:1.13-506.fc 100% | 3.1 MiB/s | 16.1 KiB | 00m00s [214/580] libnl3-0:3.9.0-3.fc40.x86_64 100% | 84.5 MiB/s | 346.1 KiB | 00m00s [215/580] liburing-0:2.5-3.fc40.x86_64 100% | 4.3 MiB/s | 39.4 KiB | 00m00s [216/580] libsodium-0:1.0.19-4.fc40.x86 100% | 28.4 MiB/s | 174.5 KiB | 00m00s [217/580] libunwind-0:1.8.0-3.fc41.x86_ 100% | 14.2 MiB/s | 72.6 KiB | 00m00s [218/580] hwloc-libs-0:2.10.0-3.fc40.x8 100% | 109.8 MiB/s | 2.1 MiB | 00m00s [219/580] libedit-0:3.1-50.20230828cvs. 100% | 25.6 MiB/s | 105.0 KiB | 00m00s [220/580] opencv-core-0:4.9.0-20231227. 100% | 91.7 MiB/s | 10.0 MiB | 00m00s [221/580] openpgm-0:5.2.122-34.fc40.x86 100% | 10.1 MiB/s | 175.7 KiB | 00m00s [222/580] libfido2-0:1.14.0-4.fc40.x86_ 100% | 9.5 MiB/s | 97.6 KiB | 00m00s [223/580] openssh-0:9.6p1-1.fc41.6.x86_ 100% | 41.5 MiB/s | 425.1 KiB | 00m00s [224/580] perl-mro-0:1.28-506.fc40.x86_ 100% | 2.6 MiB/s | 29.3 KiB | 00m00s [225/580] perl-overloading-0:0.02-506.f 100% | 1.2 MiB/s | 13.3 KiB | 00m00s [226/580] perl-SelectSaver-0:1.02-506.f 100% | 2.0 MiB/s | 12.2 KiB | 00m00s [227/580] perl-File-stat-0:1.13-506.fc4 100% | 1.9 MiB/s | 17.6 KiB | 00m00s [228/580] perl-Socket-4:2.037-5.fc40.x8 100% | 5.9 MiB/s | 54.5 KiB | 00m00s [229/580] perl-locale-0:1.10-506.fc40.n 100% | 3.4 MiB/s | 14.1 KiB | 00m00s [230/580] libglvnd-1:1.7.0-4.fc40.x86_6 100% | 16.0 MiB/s | 114.5 KiB | 00m00s [231/580] guile30-0:3.0.7-12.fc40.x86_6 100% | 156.7 MiB/s | 8.1 MiB | 00m00s [232/580] coin-or-Cbc-0:2.10.11-2.fc41. 100% | 31.9 MiB/s | 848.4 KiB | 00m00s [233/580] asl-0:20240106-1.20240201git2 100% | 6.7 MiB/s | 515.5 KiB | 00m00s [234/580] MUMPS-0:5.6.2-4.fc41.x86_64 100% | 21.6 MiB/s | 2.0 MiB | 00m00s [235/580] glpk-0:5.0-11.fc40.x86_64 100% | 63.3 MiB/s | 388.8 KiB | 00m00s [236/580] gnutls-0:3.8.5-1.fc41.x86_64 100% | 157.8 MiB/s | 1.1 MiB | 00m00s [237/580] alsa-lib-0:1.2.11-2.fc40.x86_ 100% | 45.6 MiB/s | 513.9 KiB | 00m00s [238/580] cairo-0:1.18.0-3.fc40.x86_64 100% | 62.9 MiB/s | 708.9 KiB | 00m00s [239/580] coin-or-Osi-0:0.108.9-2.fc41. 100% | 37.5 MiB/s | 2.1 MiB | 00m00s [240/580] graphene-0:1.10.6-8.fc40.x86_ 100% | 6.0 MiB/s | 61.2 KiB | 00m00s [241/580] cdparanoia-libs-0:10.2-44.fc4 100% | 3.7 MiB/s | 53.7 KiB | 00m00s [242/580] libX11-0:1.8.9-1.fc41.x86_64 100% | 79.1 MiB/s | 647.8 KiB | 00m00s [243/580] libX11-xcb-0:1.8.9-1.fc41.x86 100% | 2.3 MiB/s | 11.8 KiB | 00m00s [244/580] libXext-0:1.3.6-1.fc40.x86_64 100% | 7.6 MiB/s | 38.9 KiB | 00m00s [245/580] libXi-0:1.8.1-5.fc40.x86_64 100% | 12.9 MiB/s | 39.7 KiB | 00m00s [246/580] libXv-0:1.0.12-3.fc40.x86_64 100% | 3.6 MiB/s | 18.5 KiB | 00m00s [247/580] iso-codes-0:4.16.0-3.fc40.noa 100% | 107.3 MiB/s | 3.5 MiB | 00m00s [248/580] libdrm-0:2.4.120-3.fc40.x86_6 100% | 25.7 MiB/s | 157.7 KiB | 00m00s [249/580] libglvnd-egl-1:1.7.0-4.fc40.x 100% | 11.5 MiB/s | 35.4 KiB | 00m00s [250/580] libgudev-0:238-5.fc40.x86_64 100% | 5.6 MiB/s | 34.7 KiB | 00m00s [251/580] libogg-2:1.3.5-8.fc40.x86_64 100% | 4.6 MiB/s | 32.8 KiB | 00m00s [252/580] libtheora-1:1.1.1-36.fc40.x86 100% | 20.3 MiB/s | 166.1 KiB | 00m00s [253/580] libvisual-1:0.4.1-4.fc40.x86_ 100% | 14.7 MiB/s | 150.5 KiB | 00m00s [254/580] libvorbis-1:1.3.7-10.fc40.x86 100% | 20.3 MiB/s | 187.5 KiB | 00m00s [255/580] libwayland-client-0:1.22.0-3. 100% | 3.9 MiB/s | 31.9 KiB | 00m00s [256/580] libwayland-cursor-0:1.22.0-3. 100% | 2.0 MiB/s | 18.8 KiB | 00m00s [257/580] libwayland-egl-0:1.22.0-3.fc4 100% | 1.5 MiB/s | 12.5 KiB | 00m00s [258/580] libxcb-0:1.16.1-1.fc41.x86_64 100% | 58.2 MiB/s | 238.3 KiB | 00m00s [259/580] mesa-libgbm-0:24.0.4-1.fc41.x 100% | 9.1 MiB/s | 46.6 KiB | 00m00s [260/580] orc-0:0.4.38-2.fc41.x86_64 100% | 36.7 MiB/s | 225.4 KiB | 00m00s [261/580] opus-0:1.5.1-1.fc41.x86_64 100% | 25.0 MiB/s | 230.0 KiB | 00m00s [262/580] suitesparse-0:7.7.0-1.fc41.x8 100% | 116.4 MiB/s | 19.0 MiB | 00m00s [263/580] pango-0:1.51.2-1.fc41.x86_64 100% | 18.6 MiB/s | 342.1 KiB | 00m00s [264/580] fdk-aac-free-0:2.0.0-13.fc40. 100% | 54.8 MiB/s | 336.8 KiB | 00m00s [265/580] gsm-0:1.0.22-6.fc40.x86_64 100% | 7.0 MiB/s | 35.8 KiB | 00m00s [266/580] lame-libs-0:3.100-17.fc40.x86 100% | 41.0 MiB/s | 335.9 KiB | 00m00s [267/580] lcms2-0:2.16-3.fc40.x86_64 100% | 44.0 MiB/s | 180.2 KiB | 00m00s [268/580] libaom-0:3.8.2-1.fc41.x86_64 100% | 140.9 MiB/s | 1.8 MiB | 00m00s [269/580] libdav1d-0:1.4.0-1.fc41.x86_6 100% | 100.9 MiB/s | 619.9 KiB | 00m00s [270/580] codec2-0:1.2.0-4.fc40.x86_64 100% | 8.9 MiB/s | 639.8 KiB | 00m00s [271/580] libjxl-1:0.10.2-3.fc41.x86_64 100% | 58.5 MiB/s | 1.2 MiB | 00m00s [272/580] librsvg2-0:2.57.1-4.fc40.x86_ 100% | 138.6 MiB/s | 1.5 MiB | 00m00s [273/580] libva-0:2.21.0-3.fc41.x86_64 100% | 52.7 MiB/s | 107.9 KiB | 00m00s [274/580] libvpx-0:1.14.0-1.fc40.x86_64 100% | 128.1 MiB/s | 1.2 MiB | 00m00s [275/580] ilbc-0:3.0.4-10.fc40.x86_64 100% | 713.9 KiB/s | 53.5 KiB | 00m00s [276/580] opencore-amr-0:0.1.6-6.fc40.x 100% | 58.0 MiB/s | 178.2 KiB | 00m00s [277/580] speex-0:1.2.0-17.fc40.x86_64 100% | 16.4 MiB/s | 67.2 KiB | 00m00s [278/580] rav1e-libs-0:0.7.1-1.fc40.x86 100% | 114.4 MiB/s | 1.0 MiB | 00m00s [279/580] twolame-libs-0:0.4.0-4.fc40.x 100% | 11.2 MiB/s | 68.7 KiB | 00m00s [280/580] svt-av1-libs-0:1.4.1-5.fc40.x 100% | 145.4 MiB/s | 2.0 MiB | 00m00s [281/580] libswresample-free-0:6.1.1-11 100% | 1.5 MiB/s | 69.5 KiB | 00m00s [282/580] xvidcore-0:1.3.7-11.fc40.x86_ 100% | 6.1 MiB/s | 267.8 KiB | 00m00s [283/580] zvbi-0:0.2.35-22.fc40.x86_64 100% | 8.4 MiB/s | 413.4 KiB | 00m00s [284/580] vo-amrwbenc-0:0.1.3-20.fc40.x 100% | 1.1 MiB/s | 80.3 KiB | 00m00s [285/580] libchromaprint-0:1.5.1-17.fc4 100% | 1.6 MiB/s | 41.6 KiB | 00m00s [286/580] game-music-emu-0:0.6.3-14.fc4 100% | 2.7 MiB/s | 153.4 KiB | 00m00s [287/580] libgcrypt-0:1.10.3-4.fc41.x86 100% | 54.7 MiB/s | 504.1 KiB | 00m00s [288/580] libbluray-0:1.3.4-6.fc41.x86_ 100% | 3.0 MiB/s | 172.0 KiB | 00m00s [289/580] librabbitmq-0:0.14.0-2.fc41.x 100% | 1.5 MiB/s | 43.2 KiB | 00m00s [290/580] librist-0:0.2.7-4.fc40.x86_64 100% | 4.3 MiB/s | 75.7 KiB | 00m00s [291/580] libmodplug-1:0.8.9.0-19.fc40. 100% | 2.8 MiB/s | 176.2 KiB | 00m00s [292/580] srt-libs-0:1.5.3-2.fc40.x86_6 100% | 51.7 MiB/s | 370.5 KiB | 00m00s [293/580] libsmbclient-2:4.20.0-7.fc41. 100% | 3.9 MiB/s | 79.1 KiB | 00m00s [294/580] libopenmpt-0:0.7.6-1.fc41.x86 100% | 8.2 MiB/s | 696.8 KiB | 00m00s [295/580] libraw1394-0:2.1.2-20.fc40.x8 100% | 2.7 MiB/s | 64.7 KiB | 00m00s [296/580] libusb1-0:1.0.27-1.fc41.x86_6 100% | 24.6 MiB/s | 75.5 KiB | 00m00s [297/580] jbigkit-libs-0:2.1-29.fc40.x8 100% | 51.9 MiB/s | 53.1 KiB | 00m00s [298/580] liblerc-0:4.0.0-6.fc40.x86_64 100% | 102.6 MiB/s | 210.1 KiB | 00m00s [299/580] imath-0:3.1.11-1.fc41.x86_64 100% | 32.0 MiB/s | 98.4 KiB | 00m00s [300/580] libvdpau-0:1.5-6.fc40.x86_64 100% | 295.0 KiB/s | 16.5 KiB | 00m00s [301/580] protobuf-0:3.19.6-8.fc40.x86_ 100% | 77.2 MiB/s | 1.0 MiB | 00m00s [302/580] tbb2020.3-0:2020.3-4.fc40.x86 100% | 21.4 MiB/s | 109.4 KiB | 00m00s [303/580] vapoursynth-libs-0:65-2.fc40. 100% | 4.5 MiB/s | 592.4 KiB | 00m00s [304/580] graphite2-0:1.3.14-15.fc40.x8 100% | 46.3 MiB/s | 94.8 KiB | 00m00s [305/580] libaec-0:1.1.2-1.fc40.x86_64 100% | 7.3 MiB/s | 37.2 KiB | 00m00s [306/580] mesa-libGL-0:24.0.4-1.fc41.x8 100% | 21.4 MiB/s | 174.9 KiB | 00m00s [307/580] libcbor-0:0.11.0-1.fc40.x86_6 100% | 3.0 MiB/s | 33.3 KiB | 00m00s [308/580] perl-Class-Struct-0:0.68-506. 100% | 2.4 MiB/s | 22.5 KiB | 00m00s [309/580] gc-0:8.2.2-6.fc40.x86_64 100% | 9.8 MiB/s | 110.2 KiB | 00m00s [310/580] halide-0:17.0.1-20240220.0.fc 100% | 157.8 MiB/s | 20.0 MiB | 00m00s [311/580] vtk-0:9.2.6-13.fc41.x86_64 100% | 162.7 MiB/s | 24.1 MiB | 00m00s [312/580] MUMPS-common-0:5.6.2-4.fc41.n 100% | 22.1 MiB/s | 882.7 KiB | 00m00s [313/580] scotch-0:7.0.4-3.fc40.x86_64 100% | 10.8 MiB/s | 276.5 KiB | 00m00s [314/580] coin-or-Cgl-0:0.60.8-1.fc41.x 100% | 17.8 MiB/s | 437.9 KiB | 00m00s [315/580] libnauty-0:2.8.8-3.fc40.x86_6 100% | 40.4 MiB/s | 868.1 KiB | 00m00s [316/580] fontconfig-0:2.15.0-4.fc40.x8 100% | 52.6 MiB/s | 269.5 KiB | 00m00s [317/580] scotch-devel-0:7.0.4-3.fc40.x 100% | 722.3 KiB/s | 24.6 KiB | 00m00s [318/580] nettle-0:3.9.1-6.fc40.x86_64 100% | 46.1 MiB/s | 424.9 KiB | 00m00s [319/580] libXrender-0:0.9.11-6.fc40.x8 100% | 3.0 MiB/s | 27.4 KiB | 00m00s [320/580] pixman-0:0.43.4-1.fc41.x86_64 100% | 35.8 MiB/s | 293.3 KiB | 00m00s [321/580] xml-common-0:0.6.3-63.fc40.no 100% | 3.0 MiB/s | 31.0 KiB | 00m00s [322/580] libX11-common-0:1.8.9-1.fc41. 100% | 13.2 MiB/s | 176.1 KiB | 00m00s [323/580] mesa-libEGL-0:24.0.4-1.fc41.x 100% | 11.7 MiB/s | 131.5 KiB | 00m00s [324/580] libpciaccess-0:0.16-12.fc40.x 100% | 1.8 MiB/s | 26.4 KiB | 00m00s [325/580] libXau-0:1.0.11-6.fc40.x86_64 100% | 5.2 MiB/s | 31.7 KiB | 00m00s [326/580] libwayland-server-0:1.22.0-3. 100% | 3.9 MiB/s | 39.9 KiB | 00m00s [327/580] fribidi-0:1.0.13-4.fc40.x86_6 100% | 6.9 MiB/s | 91.2 KiB | 00m00s [328/580] libXft-0:2.3.8-6.fc40.x86_64 100% | 6.4 MiB/s | 72.1 KiB | 00m00s [329/580] libthai-0:0.1.29-8.fc40.x86_6 100% | 20.9 MiB/s | 213.8 KiB | 00m00s [330/580] libvmaf-0:2.3.0-7.fc40.x86_64 100% | 25.1 MiB/s | 180.1 KiB | 00m00s [331/580] giflib-0:5.2.2-1.fc41.x86_64 100% | 6.3 MiB/s | 51.9 KiB | 00m00s [332/580] highway-0:1.1.0-1.fc41.x86_64 100% | 43.7 MiB/s | 492.3 KiB | 00m00s [333/580] shared-mime-info-0:2.3-4.fc41 100% | 38.2 MiB/s | 390.7 KiB | 00m00s [334/580] cairo-gobject-0:1.18.0-3.fc40 100% | 4.3 MiB/s | 17.5 KiB | 00m00s [335/580] gdk-pixbuf2-0:2.42.10-8.fc40. 100% | 78.9 MiB/s | 484.8 KiB | 00m00s [336/580] libXfixes-0:6.0.1-3.fc40.x86_ 100% | 2.3 MiB/s | 19.0 KiB | 00m00s [337/580] mesa-filesystem-0:24.0.4-1.fc 100% | 6.4 MiB/s | 19.7 KiB | 00m00s [338/580] soxr-0:0.1.3-15.fc40.x86_64 100% | 2.6 MiB/s | 84.8 KiB | 00m00s [339/580] libgpg-error-0:1.48-1.fc41.x8 100% | 37.8 MiB/s | 232.2 KiB | 00m00s [340/580] mpg123-libs-0:1.31.3-4.fc40.x 100% | 22.2 MiB/s | 340.6 KiB | 00m00s [341/580] cjson-0:1.7.17-1.fc41.x86_64 100% | 10.4 MiB/s | 31.8 KiB | 00m00s [342/580] libudfread-0:1.1.2-8.fc40.x86 100% | 614.1 KiB/s | 34.4 KiB | 00m00s [343/580] libtalloc-0:2.4.2-1.fc40.x86_ 100% | 5.0 MiB/s | 31.0 KiB | 00m00s [344/580] mbedtls-0:2.28.8-1.fc41.x86_6 100% | 8.9 MiB/s | 398.9 KiB | 00m00s [345/580] lpcnetfreedv-0:0.5-5.fc40.x86 100% | 48.9 MiB/s | 7.3 MiB | 00m00s [346/580] libtevent-0:0.16.1-1.fc40.x86 100% | 1.1 MiB/s | 47.8 KiB | 00m00s [347/580] samba-common-2:4.20.0-7.fc41. 100% | 16.7 MiB/s | 154.3 KiB | 00m00s [348/580] cgnslib-libs-0:4.4.0-4.fc40.x 100% | 48.0 MiB/s | 294.9 KiB | 00m00s [349/580] samba-client-libs-2:4.20.0-7. 100% | 120.5 MiB/s | 5.4 MiB | 00m00s [350/580] double-conversion-0:3.3.0-3.f 100% | 7.0 MiB/s | 50.4 KiB | 00m00s [351/580] jsoncpp-0:1.9.5-7.fc40.x86_64 100% | 48.5 MiB/s | 99.3 KiB | 00m00s [352/580] libGLEW-0:2.2.0-7.fc40.x86_64 100% | 42.5 MiB/s | 174.0 KiB | 00m00s [353/580] libXcursor-0:1.2.2-1.fc41.x86 100% | 7.2 MiB/s | 29.5 KiB | 00m00s [354/580] libharu-0:2.4.3-5.fc40.x86_64 100% | 62.3 MiB/s | 574.3 KiB | 00m00s [355/580] mariadb-connector-c-0:3.3.8-3 100% | 39.6 MiB/s | 202.9 KiB | 00m00s [356/580] gdal-libs-0:3.8.5-1.fc41.x86_ 100% | 209.2 MiB/s | 9.0 MiB | 00m00s [357/580] netcdf-0:4.9.2-5.fc40.x86_64 100% | 50.8 MiB/s | 832.9 KiB | 00m00s [358/580] openslide-0:4.0.0-3.fc40.x86_ 100% | 32.9 MiB/s | 134.6 KiB | 00m00s [359/580] zimg-0:3.0.5-2.fc40.x86_64 100% | 3.8 MiB/s | 284.5 KiB | 00m00s [360/580] pugixml-0:1.13-5.fc40.x86_64 100% | 16.6 MiB/s | 101.9 KiB | 00m00s [361/580] proj-0:9.4.0-1.fc41.x86_64 100% | 102.1 MiB/s | 1.5 MiB | 00m00s [362/580] libXxf86vm-0:1.1.5-6.fc40.x86 100% | 2.5 MiB/s | 17.7 KiB | 00m00s [363/580] libxshmfence-0:1.3.2-3.fc40.x 100% | 2.3 MiB/s | 12.0 KiB | 00m00s [364/580] mesa-libglapi-0:24.0.4-1.fc41 100% | 16.2 MiB/s | 49.7 KiB | 00m00s [365/580] default-fonts-core-sans-0:4.0 100% | 30.9 MiB/s | 31.6 KiB | 00m00s [366/580] fonts-filesystem-1:2.0.5-14.f 100% | 8.0 MiB/s | 8.2 KiB | 00m00s [367/580] hwdata-0:0.381-1.fc41.noarch 100% | 114.0 MiB/s | 1.6 MiB | 00m00s [368/580] libdatrie-0:0.2.13-9.fc40.x86 100% | 10.4 MiB/s | 32.0 KiB | 00m00s [369/580] avahi-libs-0:0.8-26.fc40.x86_ 100% | 21.7 MiB/s | 66.5 KiB | 00m00s [370/580] cliquer-libs-0:1.22-8.fc40.x8 100% | 781.0 KiB/s | 38.3 KiB | 00m00s [371/580] libldb-0:2.9.0-1.fc40.x86_64 100% | 19.7 MiB/s | 182.0 KiB | 00m00s [372/580] libtdb-0:1.4.10-1.fc40.x86_64 100% | 7.1 MiB/s | 50.8 KiB | 00m00s [373/580] libwbclient-2:4.20.0-7.fc41.x 100% | 2.6 MiB/s | 47.6 KiB | 00m00s [374/580] armadillo-0:12.8.1-1.fc41.x86 100% | 3.9 MiB/s | 31.9 KiB | 00m00s [375/580] cfitsio-0:4.4.0-2.fc41.x86_64 100% | 46.0 MiB/s | 611.8 KiB | 00m00s [376/580] freexl-0:2.0.0-7.fc41.x86_64 100% | 6.3 MiB/s | 45.3 KiB | 00m00s [377/580] libicu-0:74.2-1.fc40.x86_64 100% | 104.4 MiB/s | 10.4 MiB | 00m00s [378/580] geos-0:3.12.1-3.fc40.x86_64 100% | 69.0 MiB/s | 1.1 MiB | 00m00s [379/580] json-c-0:0.17-3.fc40.x86_64 100% | 8.6 MiB/s | 44.0 KiB | 00m00s [380/580] libdeflate-0:1.20-4.fc41.x86_ 100% | 8.2 MiB/s | 66.8 KiB | 00m00s [381/580] libgeotiff-0:1.7.1-13.fc41.x8 100% | 4.3 MiB/s | 101.0 KiB | 00m00s [382/580] libgta-0:1.2.1-12.fc40.x86_64 100% | 5.7 MiB/s | 35.3 KiB | 00m00s [383/580] libkml-0:1.3.0-47.fc40.x86_64 100% | 35.2 MiB/s | 360.6 KiB | 00m00s [384/580] libarrow-0:15.0.2-3.fc41.x86_ 100% | 94.1 MiB/s | 5.3 MiB | 00m00s [385/580] libpq-0:16.1-4.fc41.x86_64 100% | 30.6 MiB/s | 250.7 KiB | 00m00s [386/580] libqhull_r-1:8.0.2-4.fc40.x86 100% | 19.6 MiB/s | 200.7 KiB | 00m00s [387/580] llvm17-libs-0:17.0.6-7.fc41.x 100% | 109.3 MiB/s | 26.8 MiB | 00m00s [388/580] libspatialite-0:5.1.0-6.fc41. 100% | 69.2 MiB/s | 3.0 MiB | 00m00s [389/580] ogdi-0:4.1.1-1.fc40.x86_64 100% | 6.2 MiB/s | 235.1 KiB | 00m00s [390/580] unixODBC-0:2.3.12-4.fc40.x86_ 100% | 94.7 MiB/s | 484.8 KiB | 00m00s [391/580] mariadb-connector-c-config-0: 100% | 1.4 MiB/s | 8.7 KiB | 00m00s [392/580] poppler-0:24.02.0-2.fc40.x86_ 100% | 66.2 MiB/s | 1.2 MiB | 00m00s [393/580] xerces-c-0:3.2.5-2.fc40.x86_6 100% | 67.4 MiB/s | 966.4 KiB | 00m00s [394/580] gdk-pixbuf2-modules-0:2.42.10 100% | 20.9 MiB/s | 85.7 KiB | 00m00s [395/580] libdicom-0:1.1.0-2.fc41.x86_6 100% | 18.3 MiB/s | 112.4 KiB | 00m00s [396/580] abattis-cantarell-vf-fonts-0: 100% | 23.5 MiB/s | 120.3 KiB | 00m00s [397/580] google-noto-sans-vf-fonts-0:2 100% | 144.9 MiB/s | 593.5 KiB | 00m00s [398/580] dbus-libs-1:1.14.10-3.fc40.x8 100% | 25.4 MiB/s | 156.3 KiB | 00m00s [399/580] blosc-0:1.21.5-4.fc40.x86_64 100% | 2.2 MiB/s | 58.7 KiB | 00m00s [400/580] arpack-0:3.9.1-3.fc40.x86_64 100% | 41.1 MiB/s | 210.3 KiB | 00m00s [401/580] SuperLU-0:6.0.1-5.fc41.x86_64 100% | 22.7 MiB/s | 186.0 KiB | 00m00s [402/580] libarrow-doc-0:15.0.2-3.fc41. 100% | 4.6 MiB/s | 28.5 KiB | 00m00s [403/580] liborc2-0:2.0.0-2.fc41.x86_64 100% | 69.8 MiB/s | 500.2 KiB | 00m00s [404/580] proj-data-0:9.4.0-1.fc41.noar 100% | 23.6 MiB/s | 1.3 MiB | 00m00s [405/580] utf8proc-0:2.7.0-7.fc40.x86_6 100% | 11.2 MiB/s | 80.2 KiB | 00m00s [406/580] minizip-ng-compat-0:3.0.10-8. 100% | 1.7 MiB/s | 64.7 KiB | 00m00s [407/580] re2-1:20220601-5.fc40.x86_64 100% | 8.8 MiB/s | 206.1 KiB | 00m00s [408/580] uriparser-0:0.9.7-5.fc40.x86_ 100% | 13.9 MiB/s | 57.0 KiB | 00m00s [409/580] nspr-0:4.35.0-22.fc41.x86_64 100% | 67.1 MiB/s | 137.4 KiB | 00m00s [410/580] librttopo-0:1.1.0-14.fc40.x86 100% | 33.7 MiB/s | 207.0 KiB | 00m00s [411/580] nss-0:3.99.0-1.fc41.x86_64 100% | 171.8 MiB/s | 703.8 KiB | 00m00s [412/580] google-noto-fonts-common-0:20 100% | 8.5 MiB/s | 17.5 KiB | 00m00s [413/580] gpgmepp-0:1.23.2-3.fc40.x86_6 100% | 10.4 MiB/s | 138.6 KiB | 00m00s [414/580] gpgme-0:1.23.2-3.fc40.x86_64 100% | 20.6 MiB/s | 210.9 KiB | 00m00s [415/580] poppler-data-0:0.4.11-7.fc40. 100% | 84.1 MiB/s | 2.0 MiB | 00m00s [416/580] libassuan-0:2.5.7-1.fc41.x86_ 100% | 13.1 MiB/s | 66.8 KiB | 00m00s [417/580] nss-softokn-0:3.99.0-1.fc41.x 100% | 100.0 MiB/s | 409.5 KiB | 00m00s [418/580] flexiblas-netlib64-0:3.4.2-1. 100% | 116.7 MiB/s | 3.0 MiB | 00m00s [419/580] nss-sysinit-0:3.99.0-1.fc41.x 100% | 3.6 MiB/s | 18.7 KiB | 00m00s [420/580] nss-util-0:3.99.0-1.fc41.x86_ 100% | 21.5 MiB/s | 88.2 KiB | 00m00s [421/580] flexiblas-openblas-openmp64-0 100% | 5.8 MiB/s | 17.7 KiB | 00m00s [422/580] nss-softokn-freebl-0:3.99.0-1 100% | 62.8 MiB/s | 385.9 KiB | 00m00s [423/580] crypto-policies-scripts-0:202 100% | 5.1 MiB/s | 120.9 KiB | 00m00s [424/580] libksba-0:1.6.6-1.fc41.x86_64 100% | 25.8 MiB/s | 158.7 KiB | 00m00s [425/580] npth-0:1.7-1.fc41.x86_64 100% | 6.1 MiB/s | 24.9 KiB | 00m00s [426/580] tpm2-tss-0:4.0.1-7.fc40.x86_6 100% | 42.9 MiB/s | 395.4 KiB | 00m00s [427/580] gnupg2-0:2.4.5-1.fc41.x86_64 100% | 92.3 MiB/s | 2.7 MiB | 00m00s [428/580] annobin-plugin-gcc-0:12.48-1. 100% | 134.0 MiB/s | 960.9 KiB | 00m00s [429/580] gcc-plugin-annobin-0:14.0.1-0 100% | 14.9 MiB/s | 45.9 KiB | 00m00s [430/580] annobin-docs-0:12.48-1.fc41.n 100% | 21.9 MiB/s | 89.6 KiB | 00m00s [431/580] pyproject-rpm-macros-0:1.12.0 100% | 13.5 MiB/s | 41.4 KiB | 00m00s [432/580] python-rpm-macros-0:3.12-9.fc 100% | 8.8 MiB/s | 18.0 KiB | 00m00s [433/580] python3-rpm-generators-0:14-1 100% | 7.2 MiB/s | 29.6 KiB | 00m00s [434/580] python3-rpm-macros-0:3.12-9.f 100% | 6.2 MiB/s | 12.8 KiB | 00m00s [435/580] python3-packaging-0:24.0-1.fc 100% | 41.1 MiB/s | 126.2 KiB | 00m00s [436/580] zlib-ng-compat-devel-0:2.1.6- 100% | 17.6 MiB/s | 36.1 KiB | 00m00s [437/580] python3-0:3.12.2-3.fc41.x86_6 100% | 5.3 MiB/s | 27.2 KiB | 00m00s [438/580] cuda-gcc-12-c++-0:12.3.1-1.fc 100% | 119.6 MiB/s | 14.8 MiB | 00m00s [439/580] cuda-gcc-12-0:12.3.1-1.fc39.x 100% | 151.9 MiB/s | 33.9 MiB | 00m00s [440/580] libcufft-12-3-0:11.0.12.1-2.x 100% | 85.2 MiB/s | 60.4 MiB | 00m01s [441/580] libnpp-12-3-0:12.2.3.2-2.x86_ 100% | 98.6 MiB/s | 96.3 MiB | 00m01s [442/580] qt5-qtbase-0:5.15.13-1.fc41.x 100% | 131.9 MiB/s | 3.6 MiB | 00m00s [443/580] libproxy-0:0.5.5-1.fc41.x86_6 100% | 11.8 MiB/s | 48.3 KiB | 00m00s [444/580] pcre2-utf16-0:10.43-1.fc41.x8 100% | 72.2 MiB/s | 221.9 KiB | 00m00s [445/580] qt-settings-0:40.0-1.fc41.noa 100% | 1.4 MiB/s | 10.1 KiB | 00m00s [446/580] qt5-qtbase-common-0:5.15.13-1 100% | 1.7 MiB/s | 11.9 KiB | 00m00s [447/580] duktape-0:2.7.0-7.fc40.x86_64 100% | 27.6 MiB/s | 169.7 KiB | 00m00s [448/580] qt5-qtbase-gui-0:5.15.13-1.fc 100% | 206.6 MiB/s | 6.4 MiB | 00m00s [449/580] cups-libs-1:2.4.7-13.fc41.x86 100% | 84.2 MiB/s | 258.5 KiB | 00m00s [450/580] libICE-0:1.1.1-3.fc40.x86_64 100% | 24.2 MiB/s | 74.5 KiB | 00m00s [451/580] libSM-0:1.2.4-3.fc40.x86_64 100% | 14.0 MiB/s | 43.0 KiB | 00m00s [452/580] libinput-0:1.25.0-4.fc41.x86_ 100% | 70.0 MiB/s | 215.0 KiB | 00m00s [453/580] libxkbcommon-0:1.7.0-1.fc41.x 100% | 46.3 MiB/s | 142.2 KiB | 00m00s [454/580] libxkbcommon-x11-0:1.7.0-1.fc 100% | 10.6 MiB/s | 21.8 KiB | 00m00s [455/580] xcb-util-image-0:0.4.1-5.fc40 100% | 3.6 MiB/s | 18.7 KiB | 00m00s [456/580] xcb-util-keysyms-0:0.4.1-5.fc 100% | 4.6 MiB/s | 14.1 KiB | 00m00s [457/580] xcb-util-renderutil-0:0.3.10- 100% | 8.4 MiB/s | 17.1 KiB | 00m00s [458/580] xcb-util-wm-0:0.4.2-5.fc40.x8 100% | 10.1 MiB/s | 31.1 KiB | 00m00s [459/580] libevdev-0:1.13.1-4.fc40.x86_ 100% | 18.5 MiB/s | 37.8 KiB | 00m00s [460/580] libwacom-0:2.10.0-1.fc40.x86_ 100% | 13.9 MiB/s | 42.8 KiB | 00m00s [461/580] mtdev-0:1.1.6-8.fc40.x86_64 100% | 6.7 MiB/s | 20.6 KiB | 00m00s [462/580] xkeyboard-config-0:2.41-1.fc4 100% | 105.9 MiB/s | 976.0 KiB | 00m00s [463/580] xcb-util-0:0.4.1-5.fc40.x86_6 100% | 4.4 MiB/s | 18.0 KiB | 00m00s [464/580] libwacom-data-0:2.10.0-1.fc40 100% | 32.0 MiB/s | 196.4 KiB | 00m00s [465/580] libcublas-12-3-0:12.3.4.1-2.x 100% | 100.8 MiB/s | 245.0 MiB | 00m02s [466/580] isl-0:0.16.1-20.fc40.x86_64 100% | 36.1 MiB/s | 851.2 KiB | 00m00s [467/580] krb5-devel-0:1.21.2-5.fc40.x8 100% | 28.1 MiB/s | 144.0 KiB | 00m00s [468/580] libkadm5-0:1.21.2-5.fc40.x86_ 100% | 18.7 MiB/s | 76.8 KiB | 00m00s [469/580] libsodium-devel-0:1.0.19-4.fc 100% | 101.6 MiB/s | 1.1 MiB | 00m00s [470/580] libunwind-devel-0:1.8.0-3.fc4 100% | 2.1 MiB/s | 103.2 KiB | 00m00s [471/580] openpgm-devel-0:5.2.122-34.fc 100% | 9.3 MiB/s | 66.9 KiB | 00m00s [472/580] systemd-0:255.4-1.fc41.x86_64 100% | 176.2 MiB/s | 4.9 MiB | 00m00s [473/580] systemd-rpm-macros-0:255.4-1. 100% | 15.0 MiB/s | 30.7 KiB | 00m00s [474/580] dbus-1:1.14.10-3.fc40.x86_64 100% | 2.6 MiB/s | 8.0 KiB | 00m00s [475/580] kmod-libs-0:31-5.fc40.x86_64 100% | 33.3 MiB/s | 68.2 KiB | 00m00s [476/580] libseccomp-0:2.5.3-8.fc40.x86 100% | 22.9 MiB/s | 70.3 KiB | 00m00s [477/580] systemd-pam-0:255.4-1.fc41.x8 100% | 126.0 MiB/s | 387.0 KiB | 00m00s [478/580] dbus-broker-0:35-4.fc40.x86_6 100% | 83.4 MiB/s | 170.8 KiB | 00m00s [479/580] dbus-common-1:1.14.10-3.fc40. 100% | 7.2 MiB/s | 14.8 KiB | 00m00s [480/580] samba-common-libs-2:4.20.0-7. 100% | 17.5 MiB/s | 107.7 KiB | 00m00s [481/580] glx-utils-0:9.0.0-6.fc40.x86_ 100% | 35.1 MiB/s | 71.8 KiB | 00m00s [482/580] libcusparse-12-3-0:12.2.0.103 100% | 78.7 MiB/s | 108.2 MiB | 00m01s [483/580] cmake-0:3.28.3-1.fc41.x86_64 100% | 67.5 MiB/s | 9.7 MiB | 00m00s [484/580] cmake-rpm-macros-0:3.28.3-1.f 100% | 340.6 KiB/s | 17.0 KiB | 00m00s [485/580] rhash-0:1.4.3-4.fc40.x86_64 100% | 47.3 MiB/s | 193.7 KiB | 00m00s [486/580] emacs-filesystem-1:29.3-5.fc4 100% | 1.6 MiB/s | 8.3 KiB | 00m00s [487/580] vim-filesystem-2:9.1.264-1.fc 100% | 4.3 MiB/s | 17.5 KiB | 00m00s [488/580] cmake-data-0:3.28.3-1.fc41.no 100% | 133.5 MiB/s | 2.3 MiB | 00m00s [489/580] perl-MIME-Base64-0:3.16-503.f 100% | 5.8 MiB/s | 29.7 KiB | 00m00s [490/580] perl-Encode-4:3.21-505.fc41.x 100% | 96.1 MiB/s | 1.1 MiB | 00m00s [491/580] perl-Storable-1:3.32-502.fc40 100% | 19.2 MiB/s | 98.2 KiB | 00m00s [492/580] libglvnd-devel-1:1.7.0-4.fc40 100% | 39.7 MiB/s | 162.6 KiB | 00m00s [493/580] perl-parent-1:0.241-502.fc40. 100% | 2.1 MiB/s | 14.7 KiB | 00m00s [494/580] libglvnd-core-devel-1:1.7.0-4 100% | 2.8 MiB/s | 17.4 KiB | 00m00s [495/580] libX11-devel-0:1.8.9-1.fc41.x 100% | 113.1 MiB/s | 1.0 MiB | 00m00s [496/580] libglvnd-gles-1:1.7.0-4.fc40. 100% | 4.1 MiB/s | 29.2 KiB | 00m00s [497/580] xorg-x11-proto-devel-0:2024.1 100% | 41.9 MiB/s | 300.5 KiB | 00m00s [498/580] rsvg-pixbuf-loader-0:2.57.1-4 100% | 3.9 MiB/s | 16.1 KiB | 00m00s [499/580] libxcb-devel-0:1.16.1-1.fc41. 100% | 159.3 MiB/s | 1.4 MiB | 00m00s [500/580] keyutils-libs-devel-0:1.6.3-3 100% | 19.6 MiB/s | 60.3 KiB | 00m00s [501/580] libcom_err-devel-0:1.47.0-5.f 100% | 4.9 MiB/s | 15.0 KiB | 00m00s [502/580] libselinux-devel-0:3.6-4.fc40 100% | 49.1 MiB/s | 150.9 KiB | 00m00s [503/580] libsepol-devel-0:3.6-3.fc40.x 100% | 23.8 MiB/s | 48.8 KiB | 00m00s [504/580] libverto-devel-0:0.3.2-8.fc40 100% | 3.5 MiB/s | 14.2 KiB | 00m00s [505/580] libvpl-1:2.10.2-1.fc41.x86_64 100% | 4.2 MiB/s | 175.4 KiB | 00m00s [506/580] glibc-devel-0:2.39.9000-10.fc 100% | 13.2 MiB/s | 121.4 KiB | 00m00s [507/580] libstdc++-devel-0:14.0.1-0.13 100% | 72.1 MiB/s | 2.7 MiB | 00m00s [508/580] glibc-headers-x86-0:2.39.9000 100% | 39.9 MiB/s | 612.8 KiB | 00m00s [509/580] libxcrypt-devel-0:4.4.36-5.fc 100% | 7.0 MiB/s | 28.6 KiB | 00m00s [510/580] gd-0:2.3.3-16.fc41.x86_64 100% | 33.1 MiB/s | 135.8 KiB | 00m00s [511/580] gts-0:0.7.6-48.20121130.fc40. 100% | 39.3 MiB/s | 241.8 KiB | 00m00s [512/580] lasi-0:1.1.3-13.fc40.x86_64 100% | 7.7 MiB/s | 55.4 KiB | 00m00s [513/580] graphviz-0:10.0.1-1.fc41.x86_ 100% | 105.5 MiB/s | 5.0 MiB | 00m00s [514/580] libgs-0:10.03.0-1.fc41.x86_64 100% | 110.0 MiB/s | 3.4 MiB | 00m00s [515/580] poppler-glib-0:24.02.0-2.fc40 100% | 30.9 MiB/s | 190.0 KiB | 00m00s [516/580] urw-base35-fonts-0:20200910-1 100% | 2.4 MiB/s | 10.0 KiB | 00m00s [517/580] libXpm-0:3.5.17-3.fc40.x86_64 100% | 16.0 MiB/s | 65.7 KiB | 00m00s [518/580] libavif-0:1.0.4-1.fc41.x86_64 100% | 22.2 MiB/s | 90.8 KiB | 00m00s [519/580] libimagequant-0:4.0.3-3.fc40. 100% | 49.7 MiB/s | 305.6 KiB | 00m00s [520/580] netpbm-0:11.02.00-6.fc40.x86_ 100% | 45.2 MiB/s | 184.9 KiB | 00m00s [521/580] adobe-mappings-cmap-deprecate 100% | 27.8 MiB/s | 114.0 KiB | 00m00s [522/580] adobe-mappings-cmap-0:2023062 100% | 163.8 MiB/s | 2.1 MiB | 00m00s [523/580] adobe-mappings-pdf-0:20190401 100% | 84.9 MiB/s | 695.9 KiB | 00m00s [524/580] jbig2dec-libs-0:0.20-4.fc40.x 100% | 14.4 MiB/s | 73.8 KiB | 00m00s [525/580] libXt-0:1.3.0-3.fc40.x86_64 100% | 43.3 MiB/s | 177.5 KiB | 00m00s [526/580] libijs-0:0.35-22.fc40.x86_64 100% | 7.2 MiB/s | 29.3 KiB | 00m00s [527/580] google-droid-sans-fonts-0:202 100% | 142.5 MiB/s | 2.7 MiB | 00m00s [528/580] libpaper-1:2.1.1-3.fc40.x86_6 100% | 8.7 MiB/s | 26.8 KiB | 00m00s [529/580] urw-base35-c059-fonts-0:20200 100% | 170.7 MiB/s | 874.0 KiB | 00m00s [530/580] urw-base35-d050000l-fonts-0:2 100% | 18.5 MiB/s | 75.7 KiB | 00m00s [531/580] urw-base35-fonts-common-0:202 100% | 10.1 MiB/s | 20.8 KiB | 00m00s [532/580] urw-base35-gothic-fonts-0:202 100% | 125.5 MiB/s | 642.5 KiB | 00m00s [533/580] urw-base35-bookman-fonts-0:20 100% | 36.0 MiB/s | 846.9 KiB | 00m00s [534/580] urw-base35-nimbus-mono-ps-fon 100% | 77.6 MiB/s | 794.6 KiB | 00m00s [535/580] urw-base35-nimbus-roman-fonts 100% | 139.3 MiB/s | 855.9 KiB | 00m00s [536/580] urw-base35-p052-fonts-0:20200 100% | 118.8 MiB/s | 973.2 KiB | 00m00s [537/580] urw-base35-nimbus-sans-fonts- 100% | 118.7 MiB/s | 1.3 MiB | 00m00s [538/580] urw-base35-standard-symbols-p 100% | 20.3 MiB/s | 41.5 KiB | 00m00s [539/580] urw-base35-z003-fonts-0:20200 100% | 134.5 MiB/s | 275.5 KiB | 00m00s [540/580] gklib-0:5.1.1-20230326.0.git8 100% | 3.7 MiB/s | 103.1 KiB | 00m00s [541/580] pcre-0:8.45-1.fc40.6.x86_64 100% | 39.8 MiB/s | 203.9 KiB | 00m00s [542/580] metis-0:5.2.1-20230403.0.gite 100% | 3.9 MiB/s | 176.3 KiB | 00m00s [543/580] cuda-toolkit-12-3-config-comm 100% | 123.8 KiB/s | 7.7 KiB | 00m00s [544/580] cuda-toolkit-12-config-common 100% | 64.2 KiB/s | 7.8 KiB | 00m00s [545/580] cuda-toolkit-config-common-0: 100% | 111.6 KiB/s | 7.8 KiB | 00m00s [546/580] cuda-cccl-12-3-0:12.3.101-1.x 100% | 237.1 MiB/s | 1.9 MiB | 00m00s [547/580] libXau-devel-0:1.0.11-6.fc40. 100% | 6.7 MiB/s | 13.7 KiB | 00m00s [548/580] pcre2-devel-0:10.43-1.fc41.x8 100% | 169.2 MiB/s | 519.8 KiB | 00m00s [549/580] pcre2-utf32-0:10.43-1.fc41.x8 100% | 102.2 MiB/s | 209.3 KiB | 00m00s [550/580] kernel-headers-0:6.9.0-0.rc3. 100% | 145.0 MiB/s | 1.6 MiB | 00m00s [551/580] perl-Pod-Usage-4:2.03-503.fc4 100% | 9.7 MiB/s | 39.7 KiB | 00m00s [552/580] perl-Pod-Perldoc-0:3.28.01-50 100% | 16.7 MiB/s | 85.6 KiB | 00m00s [553/580] perl-podlators-1:5.01-502.fc4 100% | 61.3 MiB/s | 125.5 KiB | 00m00s [554/580] groff-base-0:1.23.0-6.fc40.x8 100% | 183.0 MiB/s | 1.1 MiB | 00m00s [555/580] perl-File-Temp-1:0.231.100-50 100% | 28.8 MiB/s | 59.0 KiB | 00m00s [556/580] perl-HTTP-Tiny-0:0.088-5.fc40 100% | 27.1 MiB/s | 55.6 KiB | 00m00s [557/580] perl-Pod-Simple-1:3.45-6.fc40 100% | 106.7 MiB/s | 218.5 KiB | 00m00s [558/580] perl-Term-ANSIColor-0:5.01-50 100% | 23.2 MiB/s | 47.6 KiB | 00m00s [559/580] perl-Term-Cap-0:1.18-503.fc40 100% | 10.7 MiB/s | 21.9 KiB | 00m00s [560/580] perl-File-Path-0:2.18-503.fc4 100% | 8.6 MiB/s | 35.0 KiB | 00m00s [561/580] perl-IO-Socket-SSL-0:2.085-1. 100% | 74.4 MiB/s | 228.6 KiB | 00m00s [562/580] perl-Mozilla-CA-0:20240313-1. 100% | 6.9 MiB/s | 14.0 KiB | 00m00s [563/580] perl-Net-SSLeay-0:1.94-3.fc40 100% | 125.3 MiB/s | 385.0 KiB | 00m00s [564/580] perl-Time-Local-2:1.350-5.fc4 100% | 16.8 MiB/s | 34.3 KiB | 00m00s [565/580] perl-Pod-Escapes-1:1.07-503.f 100% | 19.2 MiB/s | 19.6 KiB | 00m00s [566/580] perl-Text-Tabs+Wrap-0:2024.00 100% | 10.6 MiB/s | 21.6 KiB | 00m00s [567/580] perl-if-0:0.61.000-506.fc40.n 100% | 7.0 MiB/s | 14.4 KiB | 00m00s [568/580] ncurses-0:6.4-12.20240127.fc4 100% | 25.7 MiB/s | 421.2 KiB | 00m00s [569/580] perl-IO-Socket-IP-0:0.42-2.fc 100% | 13.6 MiB/s | 41.7 KiB | 00m00s [570/580] perl-URI-0:5.28-1.fc41.noarch 100% | 43.2 MiB/s | 132.7 KiB | 00m00s [571/580] perl-AutoLoader-0:5.74-506.fc 100% | 10.6 MiB/s | 21.7 KiB | 00m00s [572/580] perl-Data-Dumper-0:2.188-503. 100% | 27.4 MiB/s | 56.0 KiB | 00m00s [573/580] perl-libnet-0:3.15-503.fc40.n 100% | 62.7 MiB/s | 128.5 KiB | 00m00s [574/580] perl-B-0:1.88-506.fc40.x86_64 100% | 43.1 MiB/s | 176.3 KiB | 00m00s [575/580] perl-Digest-MD5-0:2.59-3.fc40 100% | 11.6 MiB/s | 35.8 KiB | 00m00s [576/580] perl-FileHandle-0:2.05-506.fc 100% | 15.6 MiB/s | 15.9 KiB | 00m00s [577/580] perl-Digest-0:1.20-502.fc40.n 100% | 12.0 MiB/s | 24.6 KiB | 00m00s [578/580] hdf-libs-0:4.2.16.2-1.fc40.x8 100% | 56.3 MiB/s | 288.1 KiB | 00m00s [579/580] libcusolver-12-3-0:11.5.4.101 100% | 137.5 MiB/s | 76.6 MiB | 00m01s [580/580] libcudnn8-0:8.9.7.29-2.cuda12 100% | 126.9 MiB/s | 446.6 MiB | 00m04s -------------------------------------------------------------------------------- [580/580] Total 100% | 243.1 MiB/s | 2.4 GiB | 00m10s Running transaction [ 1/582] Verify package files 100% | 70.0 B/s | 580.0 B | 00m08s >>> Running pre-transaction scriptlet: crypto-policies-scripts-0:20240320-1.git5 >>> Stop pre-transaction scriptlet: crypto-policies-scripts-0:20240320-1.git58e3 [ 2/582] Prepare transaction 100% | 2.9 KiB/s | 580.0 B | 00m00s [ 3/582] Installing cmake-filesystem-0 100% | 7.0 MiB/s | 7.1 KiB | 00m00s [ 4/582] Installing libpng-2:1.6.40-3. 100% | 237.4 MiB/s | 243.1 KiB | 00m00s [ 5/582] Installing libgfortran-0:14.0 100% | 420.4 MiB/s | 2.9 MiB | 00m00s [ 6/582] Installing libjpeg-turbo-0:3. 100% | 380.2 MiB/s | 778.6 KiB | 00m00s [ 7/582] Installing expat-0:2.6.2-1.fc 100% | 276.3 MiB/s | 282.9 KiB | 00m00s [ 8/582] Installing openblas-0:0.3.26- 100% | 0.0 B/s | 97.8 KiB | 00m00s [ 9/582] Installing cuda-toolkit-confi 100% | 0.0 B/s | 308.0 B | 00m00s [ 10/582] Installing cuda-toolkit-12-co 100% | 0.0 B/s | 316.0 B | 00m00s [ 11/582] Installing cuda-toolkit-12-3- 100% | 0.0 B/s | 124.0 B | 00m00s [ 12/582] Installing libwebp-0:1.3.2-5. 100% | 389.5 MiB/s | 797.7 KiB | 00m00s [ 13/582] Installing snappy-0:1.1.10-4. 100% | 0.0 B/s | 68.7 KiB | 00m00s [ 14/582] Installing nspr-0:4.35.0-22.f 100% | 307.1 MiB/s | 314.5 KiB | 00m00s [ 15/582] Installing libX11-xcb-0:1.8.9 100% | 0.0 B/s | 15.9 KiB | 00m00s [ 16/582] Installing openjpeg2-0:2.5.2- 100% | 216.6 MiB/s | 443.6 KiB | 00m00s [ 17/582] Installing fonts-filesystem-1 100% | 0.0 B/s | 788.0 B | 00m00s [ 18/582] Installing urw-base35-fonts-c 100% | 0.0 B/s | 38.4 KiB | 00m00s [ 19/582] Installing libtalloc-0:2.4.2- 100% | 0.0 B/s | 53.5 KiB | 00m00s [ 20/582] Installing libgpg-error-0:1.4 100% | 286.6 MiB/s | 880.3 KiB | 00m00s [ 21/582] Installing libwayland-client- 100% | 0.0 B/s | 59.3 KiB | 00m00s [ 22/582] Installing libogg-2:1.3.5-8.f 100% | 0.0 B/s | 51.0 KiB | 00m00s [ 23/582] Installing libglvnd-1:1.7.0-4 100% | 519.2 MiB/s | 531.7 KiB | 00m00s [ 24/582] Installing libglvnd-opengl-1: 100% | 0.0 B/s | 149.7 KiB | 00m00s [ 25/582] Installing nss-util-0:3.99.0- 100% | 221.8 MiB/s | 227.1 KiB | 00m00s [ 26/582] Installing cuda-cudart-12-3-0 100% | 81.3 MiB/s | 748.9 KiB | 00m00s >>> Running post-install scriptlet: cuda-cudart-12-3-0:12.3.101-1.x86_64 >>> Stop post-install scriptlet: cuda-cudart-12-3-0:12.3.101-1.x86_64 [ 27/582] Installing libcublas-12-3-0:1 100% | 501.1 MiB/s | 596.8 MiB | 00m01s >>> Running post-install scriptlet: libcublas-12-3-0:12.3.4.1-2.x86_64 >>> Stop post-install scriptlet: libcublas-12-3-0:12.3.4.1-2.x86_64 [ 28/582] Installing libuv-1:1.48.0-1.f 100% | 264.5 MiB/s | 541.6 KiB | 00m00s [ 29/582] Installing gflags-0:2.2.2-14. 100% | 289.0 MiB/s | 296.0 KiB | 00m00s [ 30/582] Installing libmpc-0:1.3.1-5.f 100% | 162.3 MiB/s | 166.2 KiB | 00m00s [ 31/582] Installing cpuinfo-1:0-202403 100% | 125.8 MiB/s | 128.8 KiB | 00m00s [ 32/582] Installing protobuf-compat-0: 100% | 450.6 MiB/s | 3.6 MiB | 00m00s [ 33/582] Installing libcublas-devel-12 100% | 565.5 MiB/s | 1.1 MiB | 00m00s [ 34/582] Installing libtheora-1:1.1.1- 100% | 464.9 MiB/s | 476.0 KiB | 00m00s [ 35/582] Installing libvorbis-1:1.3.7- 100% | 406.3 MiB/s | 832.2 KiB | 00m00s [ 36/582] Installing libgcrypt-0:1.10.3 100% | 324.0 MiB/s | 1.3 MiB | 00m00s [ 37/582] Installing libassuan-0:2.5.7- 100% | 161.7 MiB/s | 165.6 KiB | 00m00s [ 38/582] Installing libtevent-0:0.16.1 100% | 92.7 MiB/s | 94.9 KiB | 00m00s [ 39/582] Installing openblas-openmp-0: 100% | 682.5 MiB/s | 38.9 MiB | 00m00s [ 40/582] Installing libICE-0:1.1.1-3.f 100% | 178.3 MiB/s | 182.6 KiB | 00m00s [ 41/582] Installing libcudnn8-0:8.9.7. 100% | 318.0 MiB/s | 1.0 GiB | 00m03s [ 42/582] Installing python-rpm-macros- 100% | 0.0 B/s | 22.8 KiB | 00m00s [ 43/582] Installing geos-0:3.12.1-3.fc 100% | 76.7 MiB/s | 3.5 MiB | 00m00s [ 44/582] Installing libtdb-0:1.4.10-1. 100% | 95.6 MiB/s | 97.9 KiB | 00m00s [ 45/582] Installing libaec-0:1.1.2-1.f 100% | 94.2 MiB/s | 96.5 KiB | 00m00s [ 46/582] Installing hdf5-0:1.12.1-15.f 100% | 418.6 MiB/s | 8.4 MiB | 00m00s [ 47/582] Installing lcms2-0:2.16-3.fc4 100% | 206.3 MiB/s | 422.5 KiB | 00m00s [ 48/582] Installing libunwind-0:1.8.0- 100% | 173.4 MiB/s | 177.6 KiB | 00m00s [ 49/582] Installing libquadmath-0:14.0 100% | 323.5 MiB/s | 331.2 KiB | 00m00s [ 50/582] Installing pthreadpool-1:0.1- 100% | 102.2 MiB/s | 104.7 KiB | 00m00s [ 51/582] Installing lmdb-libs-0:0.9.32 100% | 108.2 MiB/s | 110.8 KiB | 00m00s [ 52/582] Installing cuda-nvrtc-12-3-0: 100% | 179.9 MiB/s | 64.4 MiB | 00m00s >>> Running post-install scriptlet: cuda-nvrtc-12-3-0:12.3.107-1.x86_64 >>> Stop post-install scriptlet: cuda-nvrtc-12-3-0:12.3.107-1.x86_64 [ 53/582] Installing fftw-libs-quad-0:3 100% | 361.3 MiB/s | 2.5 MiB | 00m00s [ 54/582] Installing python3-rpm-macros 100% | 0.0 B/s | 6.7 KiB | 00m00s [ 55/582] Installing libSM-0:1.2.4-3.fc 100% | 96.3 MiB/s | 98.6 KiB | 00m00s [ 56/582] Installing onnx-libs-0:1.17.0 100% | 448.6 MiB/s | 3.1 MiB | 00m00s [ 57/582] Installing libcurand-12-3-0:1 100% | 339.0 MiB/s | 91.9 MiB | 00m00s >>> Running post-install scriptlet: libcurand-12-3-0:10.3.4.107-1.x86_64 >>> Stop post-install scriptlet: libcurand-12-3-0:10.3.4.107-1.x86_64 [ 58/582] Installing libcufft-12-3-0:11 100% | 450.1 MiB/s | 170.6 MiB | 00m00s >>> Running post-install scriptlet: libcufft-12-3-0:11.0.12.1-2.x86_64 >>> Stop post-install scriptlet: libcufft-12-3-0:11.0.12.1-2.x86_64 [ 59/582] Installing libcusparse-12-3-0 100% | 264.7 MiB/s | 254.9 MiB | 00m01s >>> Running post-install scriptlet: libcusparse-12-3-0:12.2.0.103-2.x86_64 >>> Stop post-install scriptlet: libcusparse-12-3-0:12.2.0.103-2.x86_64 [ 60/582] Installing openblas-openmp64- 100% | 674.0 MiB/s | 39.1 MiB | 00m00s [ 61/582] Installing flexiblas-netlib-0 100% | 472.8 MiB/s | 10.4 MiB | 00m00s [ 62/582] Installing flexiblas-netlib64 100% | 477.7 MiB/s | 10.5 MiB | 00m00s [ 63/582] Installing flexiblas-openblas 100% | 0.0 B/s | 40.2 KiB | 00m00s [ 64/582] Installing flexiblas-0:3.4.2- 100% | 0.0 B/s | 48.1 KiB | 00m00s [ 65/582] Installing flexiblas-openblas 100% | 39.2 MiB/s | 40.2 KiB | 00m00s [ 66/582] Installing suitesparse-0:7.7. 100% | 635.5 MiB/s | 137.3 MiB | 00m00s [ 67/582] Installing hdf-libs-0:4.2.16. 100% | 222.9 MiB/s | 684.6 KiB | 00m00s [ 68/582] Installing adobe-mappings-cma 100% | 184.8 MiB/s | 14.4 MiB | 00m00s [ 69/582] Installing xorg-x11-proto-dev 100% | 297.1 MiB/s | 1.8 MiB | 00m00s [ 70/582] Installing libevdev-0:1.13.1- 100% | 85.2 MiB/s | 87.2 KiB | 00m00s [ 71/582] Installing pcre2-utf16-0:10.4 100% | 288.5 MiB/s | 590.9 KiB | 00m00s [ 72/582] Installing minizip-ng-compat- 100% | 155.6 MiB/s | 159.4 KiB | 00m00s [ 73/582] Installing freexl-0:2.0.0-7.f 100% | 88.4 MiB/s | 90.6 KiB | 00m00s [ 74/582] Installing dbus-libs-1:1.14.1 100% | 361.4 MiB/s | 370.1 KiB | 00m00s [ 75/582] Installing avahi-libs-0:0.8-2 100% | 164.9 MiB/s | 168.9 KiB | 00m00s [ 76/582] Installing json-c-0:0.17-3.fc 100% | 81.7 MiB/s | 83.6 KiB | 00m00s [ 77/582] Installing libicu-0:74.2-1.fc 100% | 199.7 MiB/s | 34.9 MiB | 00m00s [ 78/582] Installing mesa-libglapi-0:24 100% | 165.3 MiB/s | 169.3 KiB | 00m00s [ 79/582] Installing libxshmfence-0:1.3 100% | 0.0 B/s | 16.2 KiB | 00m00s [ 80/582] Installing jsoncpp-0:1.9.5-7. 100% | 249.0 MiB/s | 254.9 KiB | 00m00s [ 81/582] Installing double-conversion- 100% | 96.1 MiB/s | 98.4 KiB | 00m00s [ 82/582] Installing giflib-0:5.2.2-1.f 100% | 111.2 MiB/s | 113.9 KiB | 00m00s [ 83/582] Installing libwayland-server- 100% | 77.8 MiB/s | 79.7 KiB | 00m00s [ 84/582] Installing libXau-0:1.0.11-6. 100% | 66.8 MiB/s | 68.4 KiB | 00m00s [ 85/582] Installing libxcb-0:1.16.1-1. 100% | 139.5 MiB/s | 1.1 MiB | 00m00s >>> Running pre-install scriptlet: xml-common-0:0.6.3-63.fc40.noarch >>> Stop pre-install scriptlet: xml-common-0:0.6.3-63.fc40.noarch [ 86/582] Installing xml-common-0:0.6.3 100% | 79.2 MiB/s | 81.1 KiB | 00m00s [ 87/582] Installing nettle-0:3.9.1-6.f 100% | 258.2 MiB/s | 793.3 KiB | 00m00s [ 88/582] Installing gnutls-0:3.8.5-1.f 100% | 355.5 MiB/s | 3.2 MiB | 00m00s [ 89/582] Installing glib2-0:2.80.0-1.f 100% | 121.6 MiB/s | 14.5 MiB | 00m00s [ 90/582] Installing libgudev-0:238-5.f 100% | 87.3 MiB/s | 89.4 KiB | 00m00s [ 91/582] Installing shared-mime-info-0 100% | 232.4 MiB/s | 2.6 MiB | 00m00s >>> Running post-install scriptlet: shared-mime-info-0:2.3-4.fc41.x86_64 >>> Stop post-install scriptlet: shared-mime-info-0:2.3-4.fc41.x86_64 [ 92/582] Installing gdk-pixbuf2-0:2.42 100% | 276.7 MiB/s | 2.5 MiB | 00m00s [ 93/582] Installing cups-libs-1:2.4.7- 100% | 302.9 MiB/s | 620.3 KiB | 00m00s [ 94/582] Installing scotch-0:7.0.4-3.f 100% | 345.9 MiB/s | 708.4 KiB | 00m00s [ 95/582] Installing protobuf-0:3.19.6- 100% | 406.7 MiB/s | 3.3 MiB | 00m00s [ 96/582] Installing imath-0:3.1.11-1.f 100% | 180.7 MiB/s | 370.1 KiB | 00m00s [ 97/582] Installing openexr-libs-0:3.1 100% | 495.1 MiB/s | 6.4 MiB | 00m00s [ 98/582] Installing liblerc-0:4.0.0-6. 100% | 295.4 MiB/s | 605.0 KiB | 00m00s [ 99/582] Installing svt-av1-libs-0:1.4 100% | 224.1 MiB/s | 7.2 MiB | 00m00s [100/582] Installing rav1e-libs-0:0.7.1 100% | 432.8 MiB/s | 3.0 MiB | 00m00s [101/582] Installing libdav1d-0:1.4.0-1 100% | 415.5 MiB/s | 1.7 MiB | 00m00s [102/582] Installing opus-0:1.5.1-1.fc4 100% | 407.1 MiB/s | 416.9 KiB | 00m00s [103/582] Installing asl-0:20240106-1.2 100% | 439.7 MiB/s | 2.2 MiB | 00m00s [104/582] Installing libedit-0:3.1-50.2 100% | 239.8 MiB/s | 245.5 KiB | 00m00s [105/582] Installing openpgm-0:5.2.122- 100% | 294.5 MiB/s | 301.5 KiB | 00m00s [106/582] Installing libsodium-0:1.0.19 100% | 188.5 MiB/s | 386.1 KiB | 00m00s [107/582] Installing zeromq-0:4.3.5-16. 100% | 12.5 MiB/s | 898.0 KiB | 00m00s [108/582] Installing libnl3-0:3.9.0-3.f 100% | 334.7 MiB/s | 1.0 MiB | 00m00s [109/582] Installing libibverbs-0:51.0- 100% | 302.4 MiB/s | 1.2 MiB | 00m00s [110/582] Installing fftw-libs-single-0 100% | 75.3 MiB/s | 3.6 MiB | 00m00s [111/582] Installing fftw-libs-long-0:3 100% | 387.2 MiB/s | 1.5 MiB | 00m00s [112/582] Installing fftw-libs-double-0 100% | 428.5 MiB/s | 3.4 MiB | 00m00s [113/582] Installing tbb-0:2021.11.0-5. 100% | 216.7 MiB/s | 443.7 KiB | 00m00s [114/582] Installing libibumad-0:51.0-2 100% | 0.0 B/s | 44.8 KiB | 00m00s [115/582] Installing ocl-icd-0:2.3.2-5. 100% | 188.2 MiB/s | 192.7 KiB | 00m00s [116/582] Installing libnccl-0:2.21.5-1 100% | 33.0 MiB/s | 230.3 MiB | 00m07s >>> Running post-install scriptlet: libnccl-0:2.21.5-1+cuda12.4.x86_64 >>> Stop post-install scriptlet: libnccl-0:2.21.5-1+cuda12.4.x86_64 [117/582] Installing hiredis-0:1.0.2-7. 100% | 81.5 MiB/s | 83.4 KiB | 00m00s [118/582] Installing asmjit-1:0-2022070 100% | 212.2 MiB/s | 434.6 KiB | 00m00s [119/582] Installing flatbuffers-0:24.3 100% | 258.6 MiB/s | 529.6 KiB | 00m00s [120/582] Installing fbgemm-0:0.7.0-202 100% | 632.0 MiB/s | 11.4 MiB | 00m00s [121/582] Installing gloo-1:0.5.0-20240 100% | 542.3 MiB/s | 3.8 MiB | 00m00s [122/582] Installing fftw-0:3.3.10-12.f 100% | 181.2 MiB/s | 185.5 KiB | 00m00s [123/582] Installing fftw-libs-0:3.3.10 100% | 0.0 B/s | 124.0 B | 00m00s [124/582] Installing librdmacm-0:51.0-2 100% | 144.7 MiB/s | 148.2 KiB | 00m00s [125/582] Installing libsodium-devel-0: 100% | 383.1 MiB/s | 3.8 MiB | 00m00s [126/582] Installing openpgm-devel-0:5. 100% | 169.8 MiB/s | 347.7 KiB | 00m00s [127/582] Installing llvm17-libs-0:17.0 100% | 344.9 MiB/s | 114.2 MiB | 00m00s >>> Running post-install scriptlet: llvm17-libs-0:17.0.6-7.fc41.x86_64 >>> Stop post-install scriptlet: llvm17-libs-0:17.0.6-7.fc41.x86_64 [128/582] Installing halide-0:17.0.1-20 100% | 260.2 MiB/s | 132.7 MiB | 00m01s [129/582] Installing liborc2-0:2.0.0-2. 100% | 404.8 MiB/s | 1.6 MiB | 00m00s [130/582] Installing scotch-devel-0:7.0 100% | 97.5 MiB/s | 99.9 KiB | 00m00s [131/582] Installing graphene-0:1.10.6- 100% | 160.4 MiB/s | 164.3 KiB | 00m00s [132/582] Installing srt-libs-0:1.5.3-2 100% | 231.9 MiB/s | 950.0 KiB | 00m00s [133/582] Installing iso-codes-0:4.16.0 100% | 186.5 MiB/s | 19.0 MiB | 00m00s [134/582] Installing xcb-util-keysyms-0 100% | 0.0 B/s | 17.9 KiB | 00m00s [135/582] Installing xcb-util-renderuti 100% | 0.0 B/s | 29.9 KiB | 00m00s [136/582] Installing xcb-util-wm-0:0.4. 100% | 85.3 MiB/s | 87.4 KiB | 00m00s [137/582] Installing xcb-util-0:0.4.1-5 100% | 0.0 B/s | 31.8 KiB | 00m00s [138/582] Installing xcb-util-image-0:0 100% | 23.1 MiB/s | 23.6 KiB | 00m00s [139/582] Installing libXau-devel-0:1.0 100% | 1.6 MiB/s | 8.2 KiB | 00m00s [140/582] Installing libxcb-devel-0:1.1 100% | 23.3 MiB/s | 3.1 MiB | 00m00s >>> Running pre-install scriptlet: tpm2-tss-0:4.0.1-7.fc40.x86_64 >>> Stop pre-install scriptlet: tpm2-tss-0:4.0.1-7.fc40.x86_64 [141/582] Installing tpm2-tss-0:4.0.1-7 100% | 296.0 MiB/s | 1.5 MiB | 00m00s [142/582] Installing adobe-mappings-cma 100% | 285.7 MiB/s | 585.2 KiB | 00m00s [143/582] Installing glpk-0:5.0-11.fc40 100% | 283.9 MiB/s | 872.1 KiB | 00m00s [144/582] Installing coin-or-CoinUtils- 100% | 392.3 MiB/s | 1.2 MiB | 00m00s [145/582] Installing coin-or-Osi-0:0.10 100% | 382.4 MiB/s | 5.7 MiB | 00m00s [146/582] Installing arpack-0:3.9.1-3.f 100% | 316.5 MiB/s | 648.1 KiB | 00m00s [147/582] Installing libcusparse-devel- 100% | 908.6 MiB/s | 255.3 MiB | 00m00s [148/582] Installing magma-0:2.8.0-2024 100% | 351.4 MiB/s | 234.7 MiB | 00m01s [149/582] Installing libcufft-devel-12- 100% | 130.0 MiB/s | 133.1 KiB | 00m00s [150/582] Installing pyproject-rpm-macr 100% | 98.4 MiB/s | 100.8 KiB | 00m00s [151/582] Installing lmdb-0:0.9.32-1.fc 100% | 75.4 MiB/s | 77.2 KiB | 00m00s [152/582] Installing libldb-0:2.9.0-1.f 100% | 181.3 MiB/s | 557.0 KiB | 00m00s [153/582] Installing nnpack-0:0-2023020 100% | 144.7 MiB/s | 148.2 KiB | 00m00s [154/582] Installing qnnpack-0:0-201908 100% | 1.6 MiB/s | 99.2 KiB | 00m00s [155/582] Installing libunwind-devel-0: 100% | 144.1 MiB/s | 147.6 KiB | 00m00s [156/582] Installing cgnslib-libs-0:4.4 100% | 392.2 MiB/s | 803.2 KiB | 00m00s [157/582] Installing librttopo-0:1.1.0- 100% | 11.0 MiB/s | 506.6 KiB | 00m00s [158/582] Installing protobuf-compat-co 100% | 69.7 MiB/s | 3.1 MiB | 00m00s [159/582] Installing cpp-0:14.0.1-0.13. 100% | 244.3 MiB/s | 34.9 MiB | 00m00s [160/582] Installing cuda-gcc-12-0:12.3 100% | 207.3 MiB/s | 114.7 MiB | 00m01s [161/582] Installing gflags-devel-0:2.2 100% | 0.0 B/s | 64.6 KiB | 00m00s [162/582] Installing glog-0:0.3.5-20.fc 100% | 146.1 MiB/s | 149.6 KiB | 00m00s [163/582] Installing ceres-solver-0:2.2 100% | 575.4 MiB/s | 5.2 MiB | 00m00s [164/582] Installing libuv-static-1:1.4 100% | 394.3 MiB/s | 403.8 KiB | 00m00s [165/582] Installing libuv-devel-1:1.48 100% | 204.1 MiB/s | 209.0 KiB | 00m00s [166/582] Installing tensorpipe-0:0-202 100% | 492.7 MiB/s | 3.0 MiB | 00m00s [167/582] Installing nss-softokn-freebl 100% | 292.5 MiB/s | 898.7 KiB | 00m00s [168/582] Installing nss-softokn-0:3.99 100% | 465.1 MiB/s | 1.9 MiB | 00m00s [169/582] Installing mesa-libGLU-0:9.0. 100% | 346.3 MiB/s | 354.6 KiB | 00m00s [170/582] Installing libwayland-cursor- 100% | 0.0 B/s | 38.1 KiB | 00m00s [171/582] Installing libksba-0:1.6.6-1. 100% | 386.1 MiB/s | 395.4 KiB | 00m00s [172/582] Installing urw-base35-bookman 100% | 28.4 MiB/s | 1.4 MiB | 00m00s >>> Running post-install scriptlet: urw-base35-bookman-fonts-0:20200910-19.fc40. >>> Stop post-install scriptlet: urw-base35-bookman-fonts-0:20200910-19.fc40.noa [173/582] Installing urw-base35-c059-fo 100% | 199.3 MiB/s | 1.4 MiB | 00m00s >>> Running post-install scriptlet: urw-base35-c059-fonts-0:20200910-19.fc40.noa >>> Stop post-install scriptlet: urw-base35-c059-fonts-0:20200910-19.fc40.noarch [174/582] Installing urw-base35-d050000 100% | 20.8 MiB/s | 85.4 KiB | 00m00s >>> Running post-install scriptlet: urw-base35-d050000l-fonts-0:20200910-19.fc40 >>> Stop post-install scriptlet: urw-base35-d050000l-fonts-0:20200910-19.fc40.no [175/582] Installing urw-base35-gothic- 100% | 193.8 MiB/s | 1.2 MiB | 00m00s >>> Running post-install scriptlet: urw-base35-gothic-fonts-0:20200910-19.fc40.n >>> Stop post-install scriptlet: urw-base35-gothic-fonts-0:20200910-19.fc40.noar [176/582] Installing urw-base35-nimbus- 100% | 175.3 MiB/s | 1.1 MiB | 00m00s >>> Running post-install scriptlet: urw-base35-nimbus-mono-ps-fonts-0:20200910-1 >>> Stop post-install scriptlet: urw-base35-nimbus-mono-ps-fonts-0:20200910-19.f [177/582] Installing urw-base35-nimbus- 100% | 195.1 MiB/s | 1.4 MiB | 00m00s >>> Running post-install scriptlet: urw-base35-nimbus-roman-fonts-0:20200910-19. >>> Stop post-install scriptlet: urw-base35-nimbus-roman-fonts-0:20200910-19.fc4 [178/582] Installing urw-base35-nimbus- 100% | 266.0 MiB/s | 2.4 MiB | 00m00s >>> Running post-install scriptlet: urw-base35-nimbus-sans-fonts-0:20200910-19.f >>> Stop post-install scriptlet: urw-base35-nimbus-sans-fonts-0:20200910-19.fc40 [179/582] Installing urw-base35-p052-fo 100% | 212.5 MiB/s | 1.5 MiB | 00m00s >>> Running post-install scriptlet: urw-base35-p052-fonts-0:20200910-19.fc40.noa >>> Stop post-install scriptlet: urw-base35-p052-fonts-0:20200910-19.fc40.noarch [180/582] Installing urw-base35-standar 100% | 11.0 MiB/s | 45.1 KiB | 00m00s >>> Running post-install scriptlet: urw-base35-standard-symbols-ps-fonts-0:20200 >>> Stop post-install scriptlet: urw-base35-standard-symbols-ps-fonts-0:20200910 [181/582] Installing urw-base35-z003-fo 100% | 76.5 MiB/s | 391.8 KiB | 00m00s >>> Running post-install scriptlet: urw-base35-z003-fonts-0:20200910-19.fc40.noa >>> Stop post-install scriptlet: urw-base35-z003-fonts-0:20200910-19.fc40.noarch [182/582] Installing urw-base35-fonts-0 100% | 0.0 B/s | 5.6 KiB | 00m00s [183/582] Installing abattis-cantarell- 100% | 189.9 MiB/s | 194.4 KiB | 00m00s [184/582] Installing leveldb-0:1.23-9.f 100% | 341.3 MiB/s | 349.4 KiB | 00m00s [185/582] Installing blosc-0:1.21.5-4.f 100% | 122.0 MiB/s | 124.9 KiB | 00m00s [186/582] Installing netcdf-0:4.9.2-5.f 100% | 342.8 MiB/s | 2.4 MiB | 00m00s [187/582] Installing libnvjitlink-12-3- 100% | 259.5 MiB/s | 49.8 MiB | 00m00s >>> Running post-install scriptlet: libnvjitlink-12-3-0:12.3.101-1.x86_64 >>> Stop post-install scriptlet: libnvjitlink-12-3-0:12.3.101-1.x86_64 [188/582] Installing libnpp-12-3-0:12.2 100% | 252.6 MiB/s | 241.2 MiB | 00m01s >>> Running post-install scriptlet: libnpp-12-3-0:12.2.3.2-2.x86_64 >>> Stop post-install scriptlet: libnpp-12-3-0:12.2.3.2-2.x86_64 [189/582] Installing libcusolver-12-3-0 100% | 119.4 MiB/s | 189.5 MiB | 00m02s >>> Running post-install scriptlet: libcusolver-12-3-0:11.5.4.101-2.x86_64 >>> Stop post-install scriptlet: libcusolver-12-3-0:11.5.4.101-2.x86_64 [190/582] Installing openblas-openmp64_ 100% | 129.4 MiB/s | 39.1 MiB | 00m00s [191/582] Installing openblas-serial-0: 100% | 118.6 MiB/s | 37.5 MiB | 00m00s [192/582] Installing openblas-serial64- 100% | 118.9 MiB/s | 37.7 MiB | 00m00s [193/582] Installing openblas-serial64_ 100% | 139.6 MiB/s | 37.7 MiB | 00m00s [194/582] Installing openblas-threads-0 100% | 116.8 MiB/s | 38.9 MiB | 00m00s [195/582] Installing openblas-threads64 100% | 121.7 MiB/s | 39.1 MiB | 00m00s [196/582] Installing openblas-threads64 100% | 97.2 MiB/s | 39.1 MiB | 00m00s [197/582] Installing ogdi-0:4.1.1-1.fc4 100% | 261.4 MiB/s | 803.0 KiB | 00m00s [198/582] Installing zvbi-0:0.2.35-22.f 100% | 123.2 MiB/s | 1.1 MiB | 00m00s >>> Running post-install scriptlet: zvbi-0:0.2.35-22.fc40.x86_64 >>> Stop post-install scriptlet: zvbi-0:0.2.35-22.fc40.x86_64 [199/582] Installing libharu-0:2.4.3-5. 100% | 280.4 MiB/s | 1.7 MiB | 00m00s [200/582] Installing ncurses-0:6.4-12.2 100% | 76.6 MiB/s | 627.6 KiB | 00m00s >>> Running pre-install scriptlet: groff-base-0:1.23.0-6.fc40.x86_64 >>> Stop pre-install scriptlet: groff-base-0:1.23.0-6.fc40.x86_64 [201/582] Installing groff-base-0:1.23. 100% | 214.3 MiB/s | 3.9 MiB | 00m00s >>> Running post-install scriptlet: groff-base-0:1.23.0-6.fc40.x86_64 >>> Stop post-install scriptlet: groff-base-0:1.23.0-6.fc40.x86_64 [202/582] Installing perl-Digest-0:1.20 100% | 36.1 MiB/s | 37.0 KiB | 00m00s [203/582] Installing perl-B-0:1.88-506. 100% | 242.1 MiB/s | 495.7 KiB | 00m00s [204/582] Installing perl-FileHandle-0: 100% | 0.0 B/s | 9.8 KiB | 00m00s [205/582] Installing perl-Digest-MD5-0: 100% | 60.2 MiB/s | 61.6 KiB | 00m00s [206/582] Installing perl-Data-Dumper-0 100% | 110.9 MiB/s | 113.6 KiB | 00m00s [207/582] Installing perl-libnet-0:3.15 100% | 287.4 MiB/s | 294.3 KiB | 00m00s [208/582] Installing perl-AutoLoader-0: 100% | 0.0 B/s | 20.9 KiB | 00m00s [209/582] Installing perl-URI-0:5.28-1. 100% | 122.9 MiB/s | 251.8 KiB | 00m00s [210/582] Installing perl-locale-0:1.10 100% | 174.9 KiB/s | 6.6 KiB | 00m00s [211/582] Installing perl-File-Path-0:2 100% | 0.0 B/s | 64.5 KiB | 00m00s [212/582] Installing perl-Mozilla-CA-0: 100% | 0.0 B/s | 10.5 KiB | 00m00s [213/582] Installing perl-Time-Local-2: 100% | 68.9 MiB/s | 70.5 KiB | 00m00s [214/582] Installing perl-Pod-Escapes-1 100% | 0.0 B/s | 25.9 KiB | 00m00s [215/582] Installing perl-Text-Tabs+Wra 100% | 0.0 B/s | 23.8 KiB | 00m00s [216/582] Installing perl-if-0:0.61.000 100% | 0.0 B/s | 6.2 KiB | 00m00s [217/582] Installing perl-IO-Socket-IP- 100% | 98.1 MiB/s | 100.4 KiB | 00m00s [218/582] Installing perl-Net-SSLeay-0: 100% | 272.5 MiB/s | 1.4 MiB | 00m00s [219/582] Installing perl-IO-Socket-SSL 100% | 336.4 MiB/s | 689.0 KiB | 00m00s [220/582] Installing perl-POSIX-0:2.13- 100% | 3.0 MiB/s | 230.3 KiB | 00m00s [221/582] Installing perl-IPC-Open3-0:1 100% | 0.0 B/s | 23.3 KiB | 00m00s [222/582] Installing perl-Class-Struct- 100% | 0.0 B/s | 25.9 KiB | 00m00s [223/582] Installing perl-Term-ANSIColo 100% | 96.8 MiB/s | 99.1 KiB | 00m00s [224/582] Installing perl-Term-Cap-0:1. 100% | 0.0 B/s | 30.5 KiB | 00m00s [225/582] Installing perl-File-Temp-1:0 100% | 160.2 MiB/s | 164.0 KiB | 00m00s [226/582] Installing perl-HTTP-Tiny-0:0 100% | 150.6 MiB/s | 154.2 KiB | 00m00s [227/582] Installing perl-Pod-Simple-1: 100% | 278.0 MiB/s | 569.4 KiB | 00m00s [228/582] Installing perl-Symbol-0:1.09 100% | 0.0 B/s | 7.2 KiB | 00m00s [229/582] Installing perl-SelectSaver-0 100% | 0.0 B/s | 2.6 KiB | 00m00s [230/582] Installing perl-Socket-4:2.03 100% | 122.7 MiB/s | 125.6 KiB | 00m00s [231/582] Installing perl-File-stat-0:1 100% | 0.0 B/s | 13.2 KiB | 00m00s [232/582] Installing perl-Pod-Perldoc-0 100% | 164.7 MiB/s | 168.6 KiB | 00m00s [233/582] Installing perl-podlators-1:5 100% | 304.7 MiB/s | 312.1 KiB | 00m00s [234/582] Installing perl-Text-ParseWor 100% | 0.0 B/s | 14.5 KiB | 00m00s [235/582] Installing perl-base-0:2.27-5 100% | 0.0 B/s | 12.9 KiB | 00m00s [236/582] Installing perl-Fcntl-0:1.15- 100% | 0.0 B/s | 25.8 KiB | 00m00s [237/582] Installing perl-mro-0:1.28-50 100% | 0.0 B/s | 42.7 KiB | 00m00s [238/582] Installing perl-overloading-0 100% | 0.0 B/s | 5.5 KiB | 00m00s [239/582] Installing perl-IO-0:1.52-506 100% | 151.7 MiB/s | 155.3 KiB | 00m00s [240/582] Installing perl-Pod-Usage-4:2 100% | 42.1 MiB/s | 86.3 KiB | 00m00s [241/582] Installing perl-File-Basename 100% | 0.0 B/s | 14.6 KiB | 00m00s [242/582] Installing perl-constant-0:1. 100% | 0.0 B/s | 27.4 KiB | 00m00s [243/582] Installing perl-Errno-0:1.37- 100% | 0.0 B/s | 8.8 KiB | 00m00s [244/582] Installing perl-Scalar-List-U 100% | 145.2 MiB/s | 148.7 KiB | 00m00s [245/582] Installing perl-vars-0:1.05-5 100% | 0.0 B/s | 4.3 KiB | 00m00s [246/582] Installing perl-Getopt-Std-0: 100% | 0.0 B/s | 11.6 KiB | 00m00s [247/582] Installing perl-overload-0:1. 100% | 0.0 B/s | 71.9 KiB | 00m00s [248/582] Installing perl-MIME-Base64-0 100% | 47.2 MiB/s | 48.3 KiB | 00m00s [249/582] Installing perl-Storable-1:3. 100% | 228.5 MiB/s | 233.9 KiB | 00m00s [250/582] Installing perl-parent-1:0.24 100% | 473.9 KiB/s | 10.4 KiB | 00m00s [251/582] Installing perl-Getopt-Long-1 100% | 3.0 MiB/s | 146.7 KiB | 00m00s [252/582] Installing perl-Carp-0:1.54-5 100% | 0.0 B/s | 47.7 KiB | 00m00s [253/582] Installing perl-Exporter-0:5. 100% | 610.3 KiB/s | 55.5 KiB | 00m00s [254/582] Installing perl-PathTools-0:3 100% | 59.9 MiB/s | 184.2 KiB | 00m00s [255/582] Installing perl-DynaLoader-0: 100% | 0.0 B/s | 32.5 KiB | 00m00s [256/582] Installing perl-Encode-4:3.21 100% | 393.1 MiB/s | 4.7 MiB | 00m00s [257/582] Installing perl-libs-4:5.38.2 100% | 268.7 MiB/s | 9.9 MiB | 00m00s [258/582] Installing perl-interpreter-4 100% | 1.6 MiB/s | 121.4 KiB | 00m00s [259/582] Installing infiniband-diags-0 100% | 22.5 MiB/s | 1.0 MiB | 00m00s [260/582] Installing perl-File-Find-0:1 100% | 396.6 KiB/s | 42.4 KiB | 00m00s [261/582] Installing perl-TermReadKey-0 100% | 32.4 MiB/s | 66.3 KiB | 00m00s [262/582] Installing perl-lib-0:0.65-50 100% | 635.9 KiB/s | 8.9 KiB | 00m00s [263/582] Installing perl-Error-1:0.170 100% | 39.3 MiB/s | 80.4 KiB | 00m00s [264/582] Installing kernel-headers-0:6 100% | 266.6 MiB/s | 6.4 MiB | 00m00s [265/582] Installing pcre2-utf32-0:10.4 100% | 272.8 MiB/s | 558.8 KiB | 00m00s [266/582] Installing pcre2-devel-0:10.4 100% | 221.0 MiB/s | 2.0 MiB | 00m00s [267/582] Installing cuda-cccl-12-3-0:1 100% | 257.3 MiB/s | 14.2 MiB | 00m00s [268/582] Installing pcre-0:8.45-1.fc40 100% | 265.7 MiB/s | 544.1 KiB | 00m00s [269/582] Installing gklib-0:5.1.1-2023 100% | 279.8 MiB/s | 286.5 KiB | 00m00s [270/582] Installing metis-0:5.2.1-2023 100% | 257.9 MiB/s | 528.1 KiB | 00m00s [271/582] Installing SuperLU-0:6.0.1-5. 100% | 464.3 MiB/s | 475.4 KiB | 00m00s [272/582] Installing armadillo-0:12.8.1 100% | 89.5 MiB/s | 91.6 KiB | 00m00s [273/582] Installing libpaper-1:2.1.1-3 100% | 49.3 MiB/s | 50.5 KiB | 00m00s [274/582] Installing libijs-0:0.35-22.f 100% | 0.0 B/s | 62.6 KiB | 00m00s [275/582] Installing jbig2dec-libs-0:0. 100% | 166.6 MiB/s | 170.6 KiB | 00m00s [276/582] Installing adobe-mappings-pdf 100% | 488.5 MiB/s | 4.4 MiB | 00m00s [277/582] Installing netpbm-0:11.02.00- 100% | 280.8 MiB/s | 575.0 KiB | 00m00s [278/582] Installing gts-0:0.7.6-48.201 100% | 214.0 MiB/s | 657.4 KiB | 00m00s [279/582] Installing libimagequant-0:4. 100% | 225.2 MiB/s | 691.9 KiB | 00m00s [280/582] Installing glibc-headers-x86- 100% | 228.2 MiB/s | 2.3 MiB | 00m00s [281/582] Installing libxcrypt-devel-0: 100% | 31.8 MiB/s | 32.6 KiB | 00m00s [282/582] Installing glibc-devel-0:2.39 100% | 19.8 MiB/s | 40.5 KiB | 00m00s [283/582] Installing libstdc++-devel-0: 100% | 443.9 MiB/s | 15.5 MiB | 00m00s [284/582] Installing libverto-devel-0:0 100% | 0.0 B/s | 26.4 KiB | 00m00s [285/582] Installing libsepol-devel-0:3 100% | 62.4 MiB/s | 127.7 KiB | 00m00s [286/582] Installing libselinux-devel-0 100% | 52.3 MiB/s | 160.6 KiB | 00m00s [287/582] Installing libcom_err-devel-0 100% | 0.0 B/s | 18.3 KiB | 00m00s [288/582] Installing keyutils-libs-deve 100% | 53.9 MiB/s | 55.2 KiB | 00m00s [289/582] Installing libglvnd-core-deve 100% | 0.0 B/s | 41.1 KiB | 00m00s [290/582] Installing vim-filesystem-2:9 100% | 4.6 MiB/s | 4.7 KiB | 00m00s [291/582] Installing emacs-filesystem-1 100% | 0.0 B/s | 544.0 B | 00m00s [292/582] Installing rhash-0:1.4.3-4.fc 100% | 170.8 MiB/s | 349.9 KiB | 00m00s [293/582] Installing dbus-common-1:1.14 100% | 752.8 KiB/s | 13.6 KiB | 00m00s >>> Running post-install scriptlet: dbus-common-1:1.14.10-3.fc40.noarch >>> Stop post-install scriptlet: dbus-common-1:1.14.10-3.fc40.noarch >>> Running pre-install scriptlet: dbus-broker-0:35-4.fc40.x86_64 >>> Stop pre-install scriptlet: dbus-broker-0:35-4.fc40.x86_64 [294/582] Installing dbus-broker-0:35-4 100% | 74.5 MiB/s | 381.2 KiB | 00m00s >>> Running post-install scriptlet: dbus-broker-0:35-4.fc40.x86_64 >>> Stop post-install scriptlet: dbus-broker-0:35-4.fc40.x86_64 [295/582] Installing dbus-1:1.14.10-3.f 100% | 0.0 B/s | 124.0 B | 00m00s [296/582] Installing libseccomp-0:2.5.3 100% | 169.0 MiB/s | 173.1 KiB | 00m00s [297/582] Installing kmod-libs-0:31-5.f 100% | 140.9 MiB/s | 144.3 KiB | 00m00s [298/582] Installing libkadm5-0:1.21.2- 100% | 211.1 MiB/s | 216.1 KiB | 00m00s [299/582] Installing krb5-devel-0:1.21. 100% | 233.0 MiB/s | 715.9 KiB | 00m00s [300/582] Installing isl-0:0.16.1-20.fc 100% | 337.8 MiB/s | 3.0 MiB | 00m00s [301/582] Installing libwacom-data-0:2. 100% | 95.8 MiB/s | 686.8 KiB | 00m00s [302/582] Installing xkeyboard-config-0 100% | 349.0 MiB/s | 6.6 MiB | 00m00s [303/582] Installing libxkbcommon-0:1.7 100% | 326.2 MiB/s | 334.1 KiB | 00m00s [304/582] Installing systemd-pam-0:255. 100% | 206.4 MiB/s | 1.0 MiB | 00m00s [305/582] Installing systemd-0:255.4-1. 100% | 80.7 MiB/s | 14.7 MiB | 00m00s >>> Running post-install scriptlet: systemd-0:255.4-1.fc41.x86_64 >>> Stop post-install scriptlet: systemd-0:255.4-1.fc41.x86_64 >>> Running pre-install scriptlet: samba-common-2:4.20.0-7.fc41.noarch >>> Stop pre-install scriptlet: samba-common-2:4.20.0-7.fc41.noarch [306/582] Installing samba-common-2:4.2 100% | 14.0 MiB/s | 143.6 KiB | 00m00s >>> Running post-install scriptlet: samba-common-2:4.20.0-7.fc41.noarch >>> Stop post-install scriptlet: samba-common-2:4.20.0-7.fc41.noarch >>> Running pre-install scriptlet: libwbclient-2:4.20.0-7.fc41.x86_64 >>> Stop pre-install scriptlet: libwbclient-2:4.20.0-7.fc41.x86_64 [307/582] Installing libwbclient-2:4.20 100% | 67.4 MiB/s | 69.0 KiB | 00m00s [308/582] Installing samba-common-libs- 100% | 126.4 MiB/s | 258.8 KiB | 00m00s [309/582] Installing samba-client-libs- 100% | 217.1 MiB/s | 19.1 MiB | 00m00s [310/582] Installing libsmbclient-2:4.2 100% | 160.4 MiB/s | 164.2 KiB | 00m00s [311/582] Installing libxkbcommon-x11-0 100% | 0.0 B/s | 40.4 KiB | 00m00s [312/582] Installing mtdev-0:1.1.6-8.fc 100% | 0.0 B/s | 26.5 KiB | 00m00s [313/582] Installing duktape-0:2.7.0-7. 100% | 301.9 MiB/s | 618.2 KiB | 00m00s [314/582] Installing libproxy-0:0.5.5-1 100% | 110.9 MiB/s | 113.5 KiB | 00m00s [315/582] Installing qt-settings-0:40.0 100% | 0.0 B/s | 1.7 KiB | 00m00s [316/582] Installing qt5-qtbase-common- 100% | 86.9 KiB/s | 356.0 B | 00m00s >>> Running pre-install scriptlet: qt5-qtbase-0:5.15.13-1.fc41.x86_64 >>> Stop pre-install scriptlet: qt5-qtbase-0:5.15.13-1.fc41.x86_64 [317/582] Installing qt5-qtbase-0:5.15. 100% | 117.8 MiB/s | 10.0 MiB | 00m00s >>> Running post-install scriptlet: qt5-qtbase-0:5.15.13-1.fc41.x86_64 >>> Stop post-install scriptlet: qt5-qtbase-0:5.15.13-1.fc41.x86_64 [318/582] Installing zlib-ng-compat-dev 100% | 102.0 MiB/s | 104.5 KiB | 00m00s [319/582] Installing annobin-docs-0:12. 100% | 0.0 B/s | 96.7 KiB | 00m00s [320/582] Installing npth-0:1.7-1.fc41. 100% | 49.4 MiB/s | 50.6 KiB | 00m00s [321/582] Installing gnupg2-0:2.4.5-1.f 100% | 116.1 MiB/s | 9.5 MiB | 00m00s [322/582] Installing gpgme-0:1.23.2-3.f 100% | 23.5 MiB/s | 577.7 KiB | 00m00s [323/582] Installing gpgmepp-0:1.23.2-3 100% | 415.4 MiB/s | 425.3 KiB | 00m00s [324/582] Installing google-noto-fonts- 100% | 0.0 B/s | 18.3 KiB | 00m00s [325/582] Installing google-noto-sans-v 100% | 312.2 MiB/s | 1.2 MiB | 00m00s [326/582] Installing default-fonts-core 100% | 17.8 MiB/s | 18.2 KiB | 00m00s [327/582] Installing google-droid-sans- 100% | 164.7 MiB/s | 6.3 MiB | 00m00s [328/582] Installing poppler-data-0:0.4 100% | 73.8 MiB/s | 12.4 MiB | 00m00s [329/582] Installing uriparser-0:0.9.7- 100% | 139.3 MiB/s | 142.6 KiB | 00m00s [330/582] Installing libkml-0:1.3.0-47. 100% | 299.5 MiB/s | 1.2 MiB | 00m00s [331/582] Installing utf8proc-0:2.7.0-7 100% | 355.3 MiB/s | 363.8 KiB | 00m00s [332/582] Installing re2-1:20220601-5.f 100% | 241.5 MiB/s | 494.6 KiB | 00m00s [333/582] Installing libarrow-doc-0:15. 100% | 0.0 B/s | 116.2 KiB | 00m00s [334/582] Installing libarrow-0:15.0.2- 100% | 162.9 MiB/s | 21.8 MiB | 00m00s [335/582] Installing proj-data-0:9.4.0- 100% | 90.7 MiB/s | 9.0 MiB | 00m00s [336/582] Installing libdicom-0:1.1.0-2 100% | 491.7 MiB/s | 503.5 KiB | 00m00s [337/582] Installing mariadb-connector- 100% | 0.0 B/s | 1.0 KiB | 00m00s [338/582] Installing mariadb-connector- 100% | 253.2 MiB/s | 518.6 KiB | 00m00s [339/582] Installing xerces-c-0:3.2.5-2 100% | 296.3 MiB/s | 3.6 MiB | 00m00s [340/582] Installing unixODBC-0:2.3.12- 100% | 123.7 MiB/s | 1.2 MiB | 00m00s [341/582] Installing libqhull_r-1:8.0.2 100% | 9.5 MiB/s | 476.2 KiB | 00m00s [342/582] Installing libpq-0:16.1-4.fc4 100% | 308.6 MiB/s | 948.0 KiB | 00m00s [343/582] Installing libgta-0:1.2.1-12. 100% | 0.0 B/s | 71.6 KiB | 00m00s [344/582] Installing libdeflate-0:1.20- 100% | 115.2 MiB/s | 118.0 KiB | 00m00s [345/582] Installing cfitsio-0:4.4.0-2. 100% | 443.1 MiB/s | 1.8 MiB | 00m00s [346/582] Installing libdatrie-0:0.2.13 100% | 0.0 B/s | 59.0 KiB | 00m00s [347/582] Installing libthai-0:0.1.29-8 100% | 383.4 MiB/s | 785.3 KiB | 00m00s [348/582] Installing hwdata-0:0.381-1.f 100% | 350.1 MiB/s | 9.1 MiB | 00m00s [349/582] Installing libpciaccess-0:0.1 100% | 9.0 MiB/s | 46.0 KiB | 00m00s [350/582] Installing libdrm-0:2.4.120-3 100% | 99.0 MiB/s | 405.7 KiB | 00m00s [351/582] Installing mesa-libgbm-0:24.0 100% | 64.6 MiB/s | 66.1 KiB | 00m00s [352/582] Installing libglvnd-egl-1:1.7 100% | 1.3 MiB/s | 70.4 KiB | 00m00s [353/582] Installing mesa-libEGL-0:24.0 100% | 274.1 MiB/s | 280.7 KiB | 00m00s [354/582] Installing libvpl-1:2.10.2-1. 100% | 234.0 MiB/s | 479.2 KiB | 00m00s [355/582] Installing libglvnd-gles-1:1. 100% | 105.0 MiB/s | 107.6 KiB | 00m00s [356/582] Installing cliquer-libs-0:1.2 100% | 67.6 MiB/s | 69.2 KiB | 00m00s [357/582] Installing libnauty-0:2.8.8-3 100% | 511.1 MiB/s | 4.6 MiB | 00m00s [358/582] Installing pugixml-0:1.13-5.f 100% | 253.0 MiB/s | 259.1 KiB | 00m00s [359/582] Installing zimg-0:3.0.5-2.fc4 100% | 397.9 MiB/s | 814.9 KiB | 00m00s [360/582] Installing mbedtls-0:2.28.8-1 100% | 107.1 MiB/s | 1.1 MiB | 00m00s [361/582] Installing cjson-0:1.7.17-1.f 100% | 7.1 MiB/s | 65.4 KiB | 00m00s >>> Running post-install scriptlet: cjson-0:1.7.17-1.fc41.x86_64 >>> Stop post-install scriptlet: cjson-0:1.7.17-1.fc41.x86_64 [362/582] Installing librist-0:0.2.7-4. 100% | 151.2 MiB/s | 154.8 KiB | 00m00s [363/582] Installing mpg123-libs-0:1.31 100% | 257.1 MiB/s | 789.8 KiB | 00m00s [364/582] Installing libopenmpt-0:0.7.6 100% | 399.9 MiB/s | 1.6 MiB | 00m00s [365/582] Installing libudfread-0:1.1.2 100% | 65.9 MiB/s | 67.4 KiB | 00m00s [366/582] Installing mesa-filesystem-0: 100% | 0.0 B/s | 4.3 KiB | 00m00s [367/582] Installing soxr-0:0.1.3-15.fc 100% | 185.4 MiB/s | 189.8 KiB | 00m00s [368/582] Installing highway-0:1.1.0-1. 100% | 287.7 MiB/s | 3.2 MiB | 00m00s [369/582] Installing libjxl-1:0.10.2-3. 100% | 29.5 MiB/s | 3.3 MiB | 00m00s [370/582] Installing libvmaf-0:2.3.0-7. 100% | 381.3 MiB/s | 780.9 KiB | 00m00s [371/582] Installing libaom-0:3.8.2-1.f 100% | 419.8 MiB/s | 5.0 MiB | 00m00s [372/582] Installing libavif-0:1.0.4-1. 100% | 180.7 MiB/s | 185.1 KiB | 00m00s [373/582] Installing lpcnetfreedv-0:0.5 100% | 180.6 MiB/s | 14.8 MiB | 00m00s [374/582] Installing codec2-0:1.2.0-4.f 100% | 335.5 MiB/s | 1.3 MiB | 00m00s [375/582] Installing fribidi-0:1.0.13-4 100% | 179.6 MiB/s | 367.8 KiB | 00m00s [376/582] Installing libX11-common-0:1. 100% | 131.7 MiB/s | 1.2 MiB | 00m00s [377/582] Installing libX11-0:1.8.9-1.f 100% | 182.0 MiB/s | 1.3 MiB | 00m00s [378/582] Installing libXext-0:1.3.6-1. 100% | 89.2 MiB/s | 91.3 KiB | 00m00s [379/582] Installing libXrender-0:0.9.1 100% | 0.0 B/s | 51.4 KiB | 00m00s [380/582] Installing libXfixes-0:6.0.1- 100% | 0.0 B/s | 31.6 KiB | 00m00s [381/582] Installing libXcursor-0:1.2.2 100% | 50.0 MiB/s | 51.2 KiB | 00m00s [382/582] Installing libXi-0:1.8.1-5.fc 100% | 79.9 MiB/s | 81.8 KiB | 00m00s [383/582] Installing libXv-0:1.0.12-3.f 100% | 0.0 B/s | 27.3 KiB | 00m00s [384/582] Installing libvdpau-0:1.5-6.f 100% | 0.0 B/s | 22.5 KiB | 00m00s [385/582] Installing libXxf86vm-0:1.1.5 100% | 0.0 B/s | 26.6 KiB | 00m00s [386/582] Installing libglvnd-glx-1:1.7 100% | 296.3 MiB/s | 606.8 KiB | 00m00s [387/582] Installing mesa-libGL-0:24.0. 100% | 222.1 MiB/s | 454.8 KiB | 00m00s [388/582] Installing libva-0:2.21.0-3.f 100% | 154.9 MiB/s | 317.3 KiB | 00m00s [389/582] Installing libavutil-free-0:6 100% | 298.1 MiB/s | 915.7 KiB | 00m00s [390/582] Installing libswscale-free-0: 100% | 140.7 MiB/s | 576.5 KiB | 00m00s [391/582] Installing libswresample-free 100% | 145.2 MiB/s | 148.7 KiB | 00m00s [392/582] Installing libGLEW-0:2.2.0-7. 100% | 8.5 MiB/s | 749.5 KiB | 00m00s [393/582] Installing glx-utils-0:9.0.0- 100% | 140.7 MiB/s | 432.3 KiB | 00m00s [394/582] Installing libX11-devel-0:1.8 100% | 93.3 MiB/s | 1.1 MiB | 00m00s [395/582] Installing libglvnd-devel-1:1 100% | 530.1 MiB/s | 2.1 MiB | 00m00s [396/582] Installing libXpm-0:3.5.17-3. 100% | 146.3 MiB/s | 149.8 KiB | 00m00s [397/582] Installing libXt-0:1.3.0-3.fc 100% | 208.6 MiB/s | 427.1 KiB | 00m00s [398/582] Installing pixman-0:0.43.4-1. 100% | 347.3 MiB/s | 711.2 KiB | 00m00s [399/582] Installing MUMPS-common-0:5.6 100% | 463.4 MiB/s | 949.0 KiB | 00m00s [400/582] Installing MUMPS-0:5.6.2-4.fc 100% | 97.1 MiB/s | 9.5 MiB | 00m00s [401/582] Installing coin-or-Clp-0:1.17 100% | 363.2 MiB/s | 2.5 MiB | 00m00s [402/582] Installing coin-or-Cgl-0:0.60 100% | 343.8 MiB/s | 1.0 MiB | 00m00s [403/582] Installing coin-or-Cbc-0:2.10 100% | 400.3 MiB/s | 2.4 MiB | 00m00s [404/582] Installing gc-0:8.2.2-6.fc40. 100% | 85.0 MiB/s | 261.2 KiB | 00m00s [405/582] Installing guile30-0:3.0.7-12 100% | 110.5 MiB/s | 51.6 MiB | 00m00s [406/582] Installing make-1:4.4.1-6.fc4 100% | 9.6 MiB/s | 1.8 MiB | 00m00s [407/582] Installing gcc-0:14.0.1-0.13. 100% | 34.6 MiB/s | 103.9 MiB | 00m03s >>> Running trigger-install scriptlet: redhat-rpm-config-0:287-1.fc41.noarch >>> Stop trigger-install scriptlet: redhat-rpm-config-0:287-1.fc41.noarch [408/582] Installing gcc-c++-0:14.0.1-0 100% | 165.6 MiB/s | 38.1 MiB | 00m00s [409/582] Installing libcbor-0:0.11.0-1 100% | 73.5 MiB/s | 75.3 KiB | 00m00s [410/582] Installing libfido2-0:1.14.0- 100% | 233.7 MiB/s | 239.3 KiB | 00m00s [411/582] Installing graphite2-0:1.3.14 100% | 189.6 MiB/s | 194.2 KiB | 00m00s [412/582] Installing cairo-0:1.18.0-3.f 100% | 346.4 MiB/s | 1.7 MiB | 00m00s [413/582] Installing harfbuzz-0:8.4.0-1 100% | 376.3 MiB/s | 2.6 MiB | 00m00s [414/582] Installing freetype-0:2.13.2- 100% | 274.8 MiB/s | 844.3 KiB | 00m00s [415/582] Installing fontconfig-0:2.15. 100% | 703.1 KiB/s | 786.0 KiB | 00m01s >>> Running post-install scriptlet: fontconfig-0:2.15.0-4.fc40.x86_64 >>> Stop post-install scriptlet: fontconfig-0:2.15.0-4.fc40.x86_64 [416/582] Installing cairo-gobject-0:1. 100% | 35.2 MiB/s | 36.1 KiB | 00m00s [417/582] Installing libbluray-0:1.3.4- 100% | 191.1 MiB/s | 391.3 KiB | 00m00s [418/582] Installing libXft-0:2.3.8-6.f 100% | 162.1 MiB/s | 166.0 KiB | 00m00s [419/582] Installing pango-0:1.51.2-1.f 100% | 323.2 MiB/s | 992.8 KiB | 00m00s [420/582] Installing librsvg2-0:2.57.1- 100% | 379.6 MiB/s | 4.2 MiB | 00m00s [421/582] Installing rsvg-pixbuf-loader 100% | 0.0 B/s | 16.5 KiB | 00m00s [422/582] Installing lasi-0:1.1.3-13.fc 100% | 129.2 MiB/s | 132.3 KiB | 00m00s [423/582] Installing tbb2020.3-0:2020.3 100% | 258.6 MiB/s | 264.8 KiB | 00m00s [424/582] Installing jbigkit-libs-0:2.1 100% | 116.8 MiB/s | 119.6 KiB | 00m00s [425/582] Installing libtiff-0:4.6.0-2. 100% | 373.2 MiB/s | 1.1 MiB | 00m00s [426/582] Installing proj-0:9.4.0-1.fc4 100% | 436.8 MiB/s | 4.4 MiB | 00m00s [427/582] Installing libgeotiff-0:1.7.1 100% | 308.0 MiB/s | 315.4 KiB | 00m00s [428/582] Installing libspatialite-0:5. 100% | 543.9 MiB/s | 15.2 MiB | 00m00s [429/582] Installing gdk-pixbuf2-module 100% | 251.6 MiB/s | 257.6 KiB | 00m00s [430/582] Installing openslide-0:4.0.0- 100% | 294.0 MiB/s | 301.0 KiB | 00m00s [431/582] Installing gd-0:2.3.3-16.fc41 100% | 195.7 MiB/s | 400.8 KiB | 00m00s [432/582] Installing libgs-0:10.03.0-1. 100% | 612.0 MiB/s | 23.3 MiB | 00m00s [433/582] Installing libusb1-0:1.0.27-1 100% | 160.1 MiB/s | 163.9 KiB | 00m00s [434/582] Installing libraw1394-0:2.1.2 100% | 162.8 MiB/s | 166.7 KiB | 00m00s [435/582] Installing libdc1394-0:2.2.7- 100% | 340.6 MiB/s | 348.8 KiB | 00m00s [436/582] Installing librabbitmq-0:0.14 100% | 92.6 MiB/s | 94.8 KiB | 00m00s [437/582] Installing libmodplug-1:0.8.9 100% | 348.7 MiB/s | 357.1 KiB | 00m00s [438/582] Installing game-music-emu-0:0 100% | 320.5 MiB/s | 328.2 KiB | 00m00s [439/582] Installing xvidcore-0:1.3.7-1 100% | 433.7 MiB/s | 888.2 KiB | 00m00s [440/582] Installing vo-amrwbenc-0:0.1. 100% | 144.1 MiB/s | 147.5 KiB | 00m00s [441/582] Installing twolame-libs-0:0.4 100% | 158.9 MiB/s | 162.7 KiB | 00m00s [442/582] Installing speex-0:1.2.0-17.f 100% | 115.7 MiB/s | 118.5 KiB | 00m00s [443/582] Installing opencore-amr-0:0.1 100% | 339.3 MiB/s | 347.4 KiB | 00m00s [444/582] Installing libvpx-0:1.14.0-1. 100% | 450.0 MiB/s | 3.1 MiB | 00m00s [445/582] Installing lame-libs-0:3.100- 100% | 396.8 MiB/s | 1.2 MiB | 00m00s [446/582] Installing ilbc-0:3.0.4-10.fc 100% | 86.9 MiB/s | 89.0 KiB | 00m00s [447/582] Installing gsm-0:1.0.22-6.fc4 100% | 68.8 MiB/s | 70.4 KiB | 00m00s [448/582] Installing fdk-aac-free-0:2.0 100% | 295.3 MiB/s | 604.7 KiB | 00m00s [449/582] Installing libavcodec-free-0: 100% | 400.1 MiB/s | 10.4 MiB | 00m00s [450/582] Installing libchromaprint-0:1 100% | 68.5 MiB/s | 70.2 KiB | 00m00s [451/582] Installing orc-0:0.4.38-2.fc4 100% | 374.2 MiB/s | 766.3 KiB | 00m00s [452/582] Installing libwayland-egl-0:1 100% | 0.0 B/s | 17.6 KiB | 00m00s [453/582] Installing libvisual-1:0.4.1- 100% | 439.3 MiB/s | 449.8 KiB | 00m00s [454/582] Installing cdparanoia-libs-0: 100% | 112.4 MiB/s | 115.1 KiB | 00m00s [455/582] Installing alsa-lib-0:1.2.11- 100% | 277.5 MiB/s | 1.4 MiB | 00m00s [456/582] Installing openssh-0:9.6p1-1. 100% | 454.1 MiB/s | 1.8 MiB | 00m00s [457/582] Installing openssh-clients-0: 100% | 137.3 MiB/s | 2.6 MiB | 00m00s >>> Running post-install scriptlet: openssh-clients-0:9.6p1-1.fc41.6.x86_64 >>> Stop post-install scriptlet: openssh-clients-0:9.6p1-1.fc41.6.x86_64 [458/582] Installing hwloc-libs-0:2.10. 100% | 475.0 MiB/s | 2.8 MiB | 00m00s [459/582] Installing tbb-bind-0:2021.11 100% | 0.0 B/s | 24.6 KiB | 00m00s [460/582] Installing liburing-0:2.5-3.f 100% | 98.5 MiB/s | 100.8 KiB | 00m00s [461/582] Installing rocksdb-0:8.10.0-3 100% | 381.3 MiB/s | 9.5 MiB | 00m00s [462/582] Installing tzdata-0:2024a-5.f 100% | 76.0 MiB/s | 1.9 MiB | 00m00s [463/582] Installing python-pip-wheel-0 100% | 509.3 MiB/s | 1.5 MiB | 00m00s [464/582] Installing mpdecimal-0:2.5.1- 100% | 197.3 MiB/s | 202.0 KiB | 00m00s [465/582] Installing libb2-0:0.98.1-11. 100% | 0.0 B/s | 43.3 KiB | 00m00s [466/582] Installing python3-0:3.12.2-3 100% | 6.6 MiB/s | 33.6 KiB | 00m00s [467/582] Installing python3-libs-0:3.1 100% | 114.4 MiB/s | 41.3 MiB | 00m00s [468/582] Installing gstreamer1-0:1.24. 100% | 136.5 MiB/s | 6.1 MiB | 00m00s [469/582] Installing gstreamer1-plugins 100% | 128.8 MiB/s | 7.2 MiB | 00m00s [470/582] Installing cmake-rpm-macros-0 100% | 0.0 B/s | 8.1 KiB | 00m00s [471/582] Installing cmake-0:3.28.3-1.f 100% | 201.9 MiB/s | 31.5 MiB | 00m00s [472/582] Installing cmake-data-0:3.28. 100% | 33.8 MiB/s | 8.5 MiB | 00m00s [473/582] Installing pybind11-devel-0:2 100% | 62.6 MiB/s | 833.3 KiB | 00m00s [474/582] Installing vapoursynth-libs-0 100% | 440.4 MiB/s | 1.8 MiB | 00m00s [475/582] Installing libavformat-free-0 100% | 270.1 MiB/s | 2.4 MiB | 00m00s [476/582] Installing python3-six-0:1.16 100% | 117.2 MiB/s | 120.1 KiB | 00m00s [477/582] Installing onnx-optimizer-0:0 100% | 131.8 MiB/s | 539.7 KiB | 00m00s [478/582] Installing crypto-policies-sc 100% | 165.1 MiB/s | 338.1 KiB | 00m00s [479/582] Installing nss-sysinit-0:3.99 100% | 18.9 MiB/s | 19.3 KiB | 00m00s [480/582] Installing nss-0:3.99.0-1.fc4 100% | 211.0 MiB/s | 1.9 MiB | 00m00s >>> Running post-install scriptlet: nss-0:3.99.0-1.fc41.x86_64 >>> Stop post-install scriptlet: nss-0:3.99.0-1.fc41.x86_64 [481/582] Installing poppler-0:24.02.0- 100% | 346.1 MiB/s | 3.5 MiB | 00m00s [482/582] Installing gdal-libs-0:3.8.5- 100% | 339.9 MiB/s | 26.9 MiB | 00m00s [483/582] Installing vtk-0:9.2.6-13.fc4 100% | 165.9 MiB/s | 99.4 MiB | 00m01s [484/582] Installing poppler-glib-0:24. 100% | 281.3 MiB/s | 576.1 KiB | 00m00s [485/582] Installing graphviz-0:10.0.1- 100% | 121.2 MiB/s | 21.1 MiB | 00m00s [486/582] Installing python3-packaging- 100% | 212.7 MiB/s | 435.6 KiB | 00m00s [487/582] Installing python3-rpm-genera 100% | 0.0 B/s | 82.9 KiB | 00m00s [488/582] Installing libwacom-0:2.10.0- 100% | 94.3 MiB/s | 96.5 KiB | 00m00s [489/582] Installing libinput-0:1.25.0- 100% | 69.0 MiB/s | 564.9 KiB | 00m00s >>> Running post-install scriptlet: libinput-0:1.25.0-4.fc41.x86_64 >>> Stop post-install scriptlet: libinput-0:1.25.0-4.fc41.x86_64 [490/582] Installing qt5-qtbase-gui-0:5 100% | 208.5 MiB/s | 20.0 MiB | 00m00s [491/582] Installing opencv-cuda-0:4.9. 100% | 121.8 MiB/s | 571.8 MiB | 00m05s [492/582] Installing opencv-core-0:4.9. 100% | 111.1 MiB/s | 52.4 MiB | 00m00s [493/582] Installing opencv-0:4.9.0-202 100% | 146.0 MiB/s | 20.3 MiB | 00m00s [494/582] Installing opencv-contrib-0:4 100% | 93.3 MiB/s | 15.4 MiB | 00m00s [495/582] Installing opencv-static-0:4. 100% | 12.6 MiB/s | 2.9 MiB | 00m00s [496/582] Installing opencv-devel-0:4.9 100% | 24.4 MiB/s | 10.9 MiB | 00m00s [497/582] Installing less-0:643-4.fc40. 100% | 90.8 MiB/s | 372.0 KiB | 00m00s [498/582] Installing git-core-0:2.44.0- 100% | 101.0 MiB/s | 20.8 MiB | 00m00s [499/582] Installing git-core-doc-0:2.4 100% | 164.9 MiB/s | 17.0 MiB | 00m00s [500/582] Installing perl-Git-0:2.44.0- 100% | 31.7 MiB/s | 65.0 KiB | 00m00s [501/582] Installing git-0:2.44.0-1.fc4 100% | 85.4 MiB/s | 87.4 KiB | 00m00s [502/582] Installing sleef-0:3.6-202403 100% | 398.6 MiB/s | 2.8 MiB | 00m00s [503/582] Installing opencl-headers-0:3 100% | 708.3 MiB/s | 725.3 KiB | 00m00s [504/582] Installing numactl-libs-0:2.0 100% | 56.5 MiB/s | 57.8 KiB | 00m00s [505/582] Installing miniz-0:3.0.2-5.fc 100% | 42.1 MiB/s | 129.4 KiB | 00m00s [506/582] Installing gl-manpages-0:1.1- 100% | 62.1 MiB/s | 1.1 MiB | 00m00s [507/582] Installing gmp-c++-1:6.3.0-1. 100% | 31.8 MiB/s | 32.6 KiB | 00m00s [508/582] Installing gmp-devel-1:6.3.0- 100% | 86.4 MiB/s | 354.1 KiB | 00m00s [509/582] Installing foxi-0:1.4.1^git20 100% | 1.3 MiB/s | 17.4 KiB | 00m00s [510/582] Installing cuda-nvvm-12-3-0:1 100% | 28.9 MiB/s | 63.1 MiB | 00m02s [511/582] Installing cuda-crt-12-3-0:12 100% | 197.1 MiB/s | 1.0 MiB | 00m00s [512/582] Installing cuda-nvcc-12-3-0:1 100% | 81.7 MiB/s | 194.8 MiB | 00m02s [513/582] Installing libyaml-0:0.2.5-14 100% | 32.2 MiB/s | 131.8 KiB | 00m00s [514/582] Installing fp16-1:0-20240410. 100% | 18.2 MiB/s | 18.7 KiB | 00m00s [515/582] Installing xapian-core-libs-0 100% | 79.3 MiB/s | 2.1 MiB | 00m00s [516/582] Installing cuda-nvtx-12-3-0:1 100% | 133.8 MiB/s | 411.0 KiB | 00m00s [517/582] Installing cuda-driver-devel- 100% | 4.8 MiB/s | 126.8 KiB | 00m00s [518/582] Installing cutlass-0:3.4.1-20 100% | 96.4 MiB/s | 1.0 GiB | 00m11s [519/582] Installing cuda-cupti-12-3-0: 100% | 50.9 MiB/s | 108.3 MiB | 00m02s [520/582] Installing kineto-0:0.4.0-202 100% | 106.6 MiB/s | 764.4 KiB | 00m00s [521/582] Installing kineto-devel-0:0.4 100% | 12.9 MiB/s | 52.8 KiB | 00m00s [522/582] Installing cutlass-devel-0:3. 100% | 313.8 MiB/s | 12.2 MiB | 00m00s [523/582] Installing doxygen-2:1.10.0-3 100% | 145.0 MiB/s | 18.1 MiB | 00m00s [524/582] Installing fp16-devel-1:0-202 100% | 15.2 MiB/s | 31.2 KiB | 00m00s [525/582] Installing python3-pyyaml-0:6 100% | 97.7 MiB/s | 800.2 KiB | 00m00s [526/582] Installing foxi-devel-0:1.4.1 100% | 58.8 MiB/s | 120.4 KiB | 00m00s [527/582] Installing mpfr-devel-0:4.2.1 100% | 31.0 MiB/s | 63.5 KiB | 00m00s [528/582] Installing mesa-libGLU-devel- 100% | 17.1 MiB/s | 17.5 KiB | 00m00s [529/582] Installing miniz-devel-0:3.0. 100% | 50.8 MiB/s | 104.1 KiB | 00m00s [530/582] Installing numactl-devel-0:2. 100% | 13.1 MiB/s | 26.8 KiB | 00m00s [531/582] Installing ocl-icd-devel-0:2. 100% | 47.1 MiB/s | 241.1 KiB | 00m00s [532/582] Installing sleef-devel-0:3.6- 100% | 54.7 MiB/s | 280.2 KiB | 00m00s [533/582] Installing python3-devel-0:3. 100% | 106.3 MiB/s | 1.3 MiB | 00m00s [534/582] Installing onnx-optimizer-dev 100% | 50.0 MiB/s | 205.0 KiB | 00m00s [535/582] Installing peachpy-python3-0: 100% | 428.5 MiB/s | 13.3 MiB | 00m00s [536/582] Installing python3-pybind11-0 100% | 45.6 MiB/s | 887.8 KiB | 00m00s [537/582] Installing python3-numpy-1:1. 100% | 66.5 MiB/s | 44.2 MiB | 00m01s [538/582] Installing python3-setuptools 100% | 170.1 MiB/s | 7.3 MiB | 00m00s [539/582] Installing python3-typing-ext 100% | 34.4 MiB/s | 422.4 KiB | 00m00s [540/582] Installing rocksdb-devel-0:8. 100% | 83.4 MiB/s | 1.4 MiB | 00m00s [541/582] Installing tbb-devel-0:2021.1 100% | 134.5 MiB/s | 1.3 MiB | 00m00s [542/582] Installing annobin-plugin-gcc 100% | 19.0 MiB/s | 972.0 KiB | 00m00s >>> Running trigger-install scriptlet: redhat-rpm-config-0:287-1.fc41.noarch >>> Stop trigger-install scriptlet: redhat-rpm-config-0:287-1.fc41.noarch [543/582] Installing gcc-plugin-annobin 100% | 3.8 MiB/s | 58.7 KiB | 00m00s >>> Running trigger-install scriptlet: redhat-rpm-config-0:287-1.fc41.noarch >>> Stop trigger-install scriptlet: redhat-rpm-config-0:287-1.fc41.noarch [544/582] Installing protobuf-compat-de 100% | 212.2 MiB/s | 2.8 MiB | 00m00s [545/582] Installing cuda-gcc-12-c++-0: 100% | 302.3 MiB/s | 60.5 MiB | 00m00s [546/582] Installing zeromq-devel-0:4.3 100% | 7.6 MiB/s | 31.1 KiB | 00m00s [547/582] Installing cuda-cudart-devel- 100% | 19.5 MiB/s | 6.6 MiB | 00m00s [548/582] Installing rdma-core-devel-0: 100% | 9.4 MiB/s | 686.4 KiB | 00m00s [549/582] Installing openblas-devel-0:0 100% | 53.4 MiB/s | 1.7 MiB | 00m00s [550/582] Installing libcusolver-devel- 100% | 109.5 MiB/s | 448.3 KiB | 00m00s [551/582] Installing libnvjitlink-devel 100% | 49.7 MiB/s | 60.7 MiB | 00m01s [552/582] Installing leveldb-devel-0:1. 100% | 46.3 MiB/s | 142.4 KiB | 00m00s [553/582] Installing tensorpipe-devel-0 100% | 72.0 MiB/s | 516.1 KiB | 00m00s [554/582] Installing glog-devel-0:0.3.5 100% | 111.0 MiB/s | 113.6 KiB | 00m00s [555/582] Installing qnnpack-devel-0:0- 100% | 0.0 B/s | 18.8 KiB | 00m00s [556/582] Installing nnpack-devel-0:0-2 100% | 21.3 MiB/s | 43.7 KiB | 00m00s [557/582] Installing lmdb-devel-0:0.9.3 100% | 17.8 MiB/s | 73.0 KiB | 00m00s [558/582] Installing magma-devel-0:2.8. 100% | 466.5 MiB/s | 21.9 MiB | 00m00s [559/582] Installing fftw-devel-0:3.3.1 100% | 56.7 MiB/s | 290.3 KiB | 00m00s [560/582] Installing gloo-devel-1:0.5.0 100% | 112.0 MiB/s | 344.1 KiB | 00m00s [561/582] Installing fbgemm-devel-0:0.7 100% | 101.8 MiB/s | 312.7 KiB | 00m00s [562/582] Installing flatbuffers-compil 100% | 155.7 MiB/s | 3.0 MiB | 00m00s [563/582] Installing flatbuffers-devel- 100% | 116.3 MiB/s | 476.2 KiB | 00m00s [564/582] Installing asmjit-devel-1:0-2 100% | 219.0 MiB/s | 1.5 MiB | 00m00s [565/582] Installing hiredis-devel-0:1. 100% | 59.3 MiB/s | 121.4 KiB | 00m00s [566/582] Installing libnccl-devel-0:2. 100% | 1.0 MiB/s | 46.1 KiB | 00m00s >>> Running post-install scriptlet: libnccl-devel-0:2.21.5-1+cuda12.4.x86_64 >>> Stop post-install scriptlet: libnccl-devel-0:2.21.5-1+cuda12.4.x86_64 [567/582] Installing libcurand-devel-12 100% | 51.4 MiB/s | 93.9 MiB | 00m02s [568/582] Installing onnx-devel-0:1.17. 100% | 95.6 MiB/s | 1.1 MiB | 00m00s [569/582] Installing cuda-nvrtc-devel-1 100% | 61.5 MiB/s | 78.1 MiB | 00m01s [570/582] Installing pthreadpool-devel- 100% | 49.5 MiB/s | 101.5 KiB | 00m00s [571/582] Installing libcudnn8-devel-0: 100% | 17.8 MiB/s | 200.7 KiB | 00m00s >>> Running post-install scriptlet: libcudnn8-devel-0:8.9.7.29-2.cuda12.3.x86_64 >>> Stop post-install scriptlet: libcudnn8-devel-0:8.9.7.29-2.cuda12.3.x86_64 [572/582] Installing cpuinfo-devel-1:0- 100% | 26.7 MiB/s | 82.1 KiB | 00m00s [573/582] Installing snappy-devel-0:1.1 100% | 5.8 MiB/s | 47.4 KiB | 00m00s [574/582] Installing eigen3-devel-0:3.4 100% | 134.4 MiB/s | 8.5 MiB | 00m00s [575/582] Installing neon2sse-devel-0:0 100% | 31.4 MiB/s | 803.5 KiB | 00m00s [576/582] Installing systemd-rpm-macros 100% | 4.9 MiB/s | 10.0 KiB | 00m00s [577/582] Installing psimd-devel-1:0-20 100% | 22.7 MiB/s | 46.4 KiB | 00m00s [578/582] Installing libzstd-devel-0:1. 100% | 15.3 MiB/s | 203.2 KiB | 00m00s [579/582] Installing fxdiv-devel-1:0-20 100% | 8.6 MiB/s | 17.7 KiB | 00m00s [580/582] Installing cuda-profiler-api- 100% | 4.4 MiB/s | 72.9 KiB | 00m00s [581/582] Installing cuda-nvml-devel-12 100% | 10.2 MiB/s | 667.9 KiB | 00m00s [582/582] Installing gemmlowp-devel-0:0 100% | 1.1 MiB/s | 2.3 MiB | 00m02s >>> Running post-transaction scriptlet: cuda-toolkit-12-3-config-common-0:12.3.1 >>> Stop post-transaction scriptlet: cuda-toolkit-12-3-config-common-0:12.3.101- >>> Running post-transaction scriptlet: urw-base35-bookman-fonts-0:20200910-19.f >>> Stop post-transaction scriptlet: urw-base35-bookman-fonts-0:20200910-19.fc40 >>> Running post-transaction scriptlet: urw-base35-c059-fonts-0:20200910-19.fc40 >>> Stop post-transaction scriptlet: urw-base35-c059-fonts-0:20200910-19.fc40.no >>> Running post-transaction scriptlet: urw-base35-d050000l-fonts-0:20200910-19. >>> Stop post-transaction scriptlet: urw-base35-d050000l-fonts-0:20200910-19.fc4 >>> Running post-transaction scriptlet: urw-base35-gothic-fonts-0:20200910-19.fc >>> Stop post-transaction scriptlet: urw-base35-gothic-fonts-0:20200910-19.fc40. >>> Running post-transaction scriptlet: urw-base35-nimbus-mono-ps-fonts-0:202009 >>> Stop post-transaction scriptlet: urw-base35-nimbus-mono-ps-fonts-0:20200910- >>> Running post-transaction scriptlet: urw-base35-nimbus-roman-fonts-0:20200910 >>> Stop post-transaction scriptlet: urw-base35-nimbus-roman-fonts-0:20200910-19 >>> Running post-transaction scriptlet: urw-base35-nimbus-sans-fonts-0:20200910- >>> Stop post-transaction scriptlet: urw-base35-nimbus-sans-fonts-0:20200910-19. >>> Running post-transaction scriptlet: urw-base35-p052-fonts-0:20200910-19.fc40 >>> Stop post-transaction scriptlet: urw-base35-p052-fonts-0:20200910-19.fc40.no >>> Running post-transaction scriptlet: urw-base35-standard-symbols-ps-fonts-0:2 >>> Stop post-transaction scriptlet: urw-base35-standard-symbols-ps-fonts-0:2020 >>> Running post-transaction scriptlet: urw-base35-z003-fonts-0:20200910-19.fc40 >>> Stop post-transaction scriptlet: urw-base35-z003-fonts-0:20200910-19.fc40.no >>> Running post-transaction scriptlet: fontconfig-0:2.15.0-4.fc40.x86_64 >>> Stop post-transaction scriptlet: fontconfig-0:2.15.0-4.fc40.x86_64 >>> Running post-transaction scriptlet: crypto-policies-scripts-0:20240320-1.git >>> Stop post-transaction scriptlet: crypto-policies-scripts-0:20240320-1.git58e >>> Running post-transaction scriptlet: nss-0:3.99.0-1.fc41.x86_64 >>> Stop post-transaction scriptlet: nss-0:3.99.0-1.fc41.x86_64 >>> Running trigger-install scriptlet: glibc-common-0:2.39.9000-10.fc41.x86_64 >>> Stop trigger-install scriptlet: glibc-common-0:2.39.9000-10.fc41.x86_64 >>> Running trigger-install scriptlet: info-0:7.1-2.fc40.x86_64 >>> Stop trigger-install scriptlet: info-0:7.1-2.fc40.x86_64 >>> Running trigger-install scriptlet: glib2-0:2.80.0-1.fc41.x86_64 >>> Stop trigger-install scriptlet: glib2-0:2.80.0-1.fc41.x86_64 >>> Running trigger-install scriptlet: shared-mime-info-0:2.3-4.fc41.x86_64 >>> Stop trigger-install scriptlet: shared-mime-info-0:2.3-4.fc41.x86_64 >>> Running trigger-install scriptlet: gdk-pixbuf2-0:2.42.10-8.fc40.x86_64 >>> Stop trigger-install scriptlet: gdk-pixbuf2-0:2.42.10-8.fc40.x86_64 >>> Running trigger-install scriptlet: systemd-0:255.4-1.fc41.x86_64 >>> Stop trigger-install scriptlet: systemd-0:255.4-1.fc41.x86_64 >>> Running trigger-install scriptlet: systemd-0:255.4-1.fc41.x86_64 >>> Stop trigger-install scriptlet: systemd-0:255.4-1.fc41.x86_64 >>> Running trigger-install scriptlet: systemd-0:255.4-1.fc41.x86_64 >>> Stop trigger-install scriptlet: systemd-0:255.4-1.fc41.x86_64 >>> Running trigger-install scriptlet: systemd-0:255.4-1.fc41.x86_64 >>> Stop trigger-install scriptlet: systemd-0:255.4-1.fc41.x86_64 >>> Running trigger-install scriptlet: systemd-0:255.4-1.fc41.x86_64 >>> Stop trigger-install scriptlet: systemd-0:255.4-1.fc41.x86_64 >>> Running trigger-install scriptlet: systemd-0:255.4-1.fc41.x86_64 >>> Stop trigger-install scriptlet: systemd-0:255.4-1.fc41.x86_64 >>> Running trigger-install scriptlet: fontconfig-0:2.15.0-4.fc40.x86_64 >>> Stop trigger-install scriptlet: fontconfig-0:2.15.0-4.fc40.x86_64 >>> Running trigger-install scriptlet: graphviz-0:10.0.1-1.fc41.x86_64 >>> Stop trigger-install scriptlet: graphviz-0:10.0.1-1.fc41.x86_64 Warning: skipped PGP checks for 82 package(s). Finish: build setup for pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.src.rpm Start: rpmbuild pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.src.rpm warning: %patchN is deprecated (2 usages found), use %patch N (or %patch -P N) Building target platforms: x86_64 Building for target x86_64 setting SOURCE_DATE_EPOCH=1554595200 Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.aSoIWj + umask 022 + cd /builddir/build/BUILD + cd /builddir/build/BUILD + rm -rf pytorch + /usr/bin/mkdir -p pytorch + cd pytorch + rm -rf /builddir/build/BUILD/pytorch-SPECPARTS + /usr/bin/mkdir -p /builddir/build/BUILD/pytorch-SPECPARTS + /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w . + git clone --depth 1 -n -b main https://github.com/pytorch/pytorch.git . Cloning into '.'... + git fetch --depth 1 origin 7efaf54dc46034189cb36b345764a5a9a5b693d4 From https://github.com/pytorch/pytorch * branch 7efaf54dc46034189cb36b345764a5a9a5b693d4 -> FETCH_HEAD + git reset --hard 7efaf54dc46034189cb36b345764a5a9a5b693d4 HEAD is now at 7efaf54 Fakeifying views shouldnt create symbols when dynamic=False (#123348) + git submodule update --init --depth 1 third_party/fmt Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt' Cloning into '/builddir/build/BUILD/pytorch/third_party/fmt'... From https://github.com/fmtlib/fmt * branch e69e5f977d458f2650bb346dadf2ad30c5320281 -> FETCH_HEAD Submodule path 'third_party/fmt': checked out 'e69e5f977d458f2650bb346dadf2ad30c5320281' + git submodule update --init --depth 1 third_party/XNNPACK Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK' Cloning into '/builddir/build/BUILD/pytorch/third_party/XNNPACK'... From https://github.com/google/XNNPACK * branch fcbf55af6cf28a4627bcd1f703ab7ad843f0f3a2 -> FETCH_HEAD Submodule path 'third_party/XNNPACK': checked out 'fcbf55af6cf28a4627bcd1f703ab7ad843f0f3a2' + git submodule update --init --depth 1 third_party/ittapi Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi' Cloning into '/builddir/build/BUILD/pytorch/third_party/ittapi'... From https://github.com/intel/ittapi * branch 5b8a7d7422611c3a0d799fb5fc5dd4abfae35b42 -> FETCH_HEAD Submodule path 'third_party/ittapi': checked out '5b8a7d7422611c3a0d799fb5fc5dd4abfae35b42' + git submodule update --init --depth 1 third_party/pocketfft Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft' Cloning into '/builddir/build/BUILD/pytorch/third_party/pocketfft'... From https://github.com/mreineck/pocketfft * branch 9d3ab05a7fffbc71a492bc6a17be034e83e8f0fe -> FETCH_HEAD Submodule path 'third_party/pocketfft': checked out '9d3ab05a7fffbc71a492bc6a17be034e83e8f0fe' + git submodule update --init --depth 1 third_party/cudnn_frontend Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend' Cloning into '/builddir/build/BUILD/pytorch/third_party/cudnn_frontend'... From https://github.com/NVIDIA/cudnn-frontend * branch 150798fe976556078f443fdb059a1ff0361f58a2 -> FETCH_HEAD Submodule path 'third_party/cudnn_frontend': checked out '150798fe976556078f443fdb059a1ff0361f58a2' + git --no-pager log --format=fuller commit 7efaf54dc46034189cb36b345764a5a9a5b693d4 Author: Brian Hirsh AuthorDate: Thu Apr 11 08:19:28 2024 -0700 Commit: PyTorch MergeBot CommitDate: Fri Apr 12 01:12:23 2024 +0000 Fakeifying views shouldnt create symbols when dynamic=False (#123348) Fixes https://github.com/pytorch/pytorch/issues/123298 I was also seeing some crashes in torchtrain due to dynamic shapes, even when I set `compile(dynamic=False)` (cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @wanchaol). This doesn't fix the underlying dynamic shape issues with compile + DTensor, but it does prevent dynamic shapes from leaking in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123348 Approved by: https://github.com/ezyang ghstack dependencies: #122502, #122751 Patch #1 (pytorch-C.patch): + echo 'Patch #1 (pytorch-C.patch):' + /usr/bin/patch --no-backup-if-mismatch -f -p0 -b --suffix .python~ --fuzz=100 patching file torch/CMakeLists.txt Hunk #1 succeeded at 277 (offset -2 lines). Patch #5 (pytorch-cuda12.patch): + echo 'Patch #5 (pytorch-cuda12.patch):' + /usr/bin/patch --no-backup-if-mismatch -f -p1 -b --suffix .cu12~ --fuzz=100 patching file aten/src/ATen/native/nested/cuda/NestedTensorMatmul.cu patching file aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cu patching file aten/src/ATen/native/transformers/cuda/attention.cu Hunk #1 succeeded at 1 with fuzz 3. patching file aten/src/ATen/native/transformers/cuda/attention_backward.cu Hunk #1 succeeded at 1 with fuzz 3. patching file aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernel_backward.h Hunk #1 succeeded at 1 with fuzz 3. patching file aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernel_forward.h Hunk #1 succeeded at 1 with fuzz 3. patching file aten/src/ATen/native/transformers/cuda/flash_attn/flash_bwd_launch_template.h Hunk #1 succeeded at 1 with fuzz 3. patching file aten/src/ATen/native/transformers/cuda/flash_attn/flash_fwd_launch_template.h Hunk #1 succeeded at 1 with fuzz 3. + sed -i -e 's|VERSION_LESS 3.7)|VERSION_LESS 3.6)|g' cmake/Dependencies.cmake + sed -i -e 's|PY_MAJOR_VERSION == 3|PY_MAJOR_VERSION == 3 \&\& PY_MINOR_VERSION > 6|' torch/csrc/dynamo/eval_frame.c + sed -i 's|CMAKE_CXX_STANDARD 14|CMAKE_CXX_STANDARD 17|' CMakeLists.txt + sed -i -e 's|torch_cpu PUBLIC c10|torch_cpu PUBLIC c10 qnnpack gloo gloo_cuda|' caffe2/CMakeLists.txt + sed -i -e 's|USE_SYSTEM_BIND11|USE_SYSTEM_PYBIND11|g' cmake/Dependencies.cmake + rm -rf 'third_party/pthreadpool/*' + touch third_party/pthreadpool/CMakeLists.txt + sed -i -e 's|NAMES openblas|NAMES openblaso openblas|' cmake/Modules/FindOpenBLAS.cmake + sed -i -e 's|USE_ZSTD|NOT_USE_ZSTD|g' cmake/Dependencies.cmake + sed -i -e 's|add_subdirectory(zstd)|list(APPEND Caffe2_PUBLIC_DEPENDENCY_LIBS zstd)|g' caffe2/share/contrib/CMakeLists.txt + sed -i -e 's|Caffe2_DEPENDENCY_LIBS onnx_proto onnx|Caffe2_DEPENDENCY_LIBS onnx_proto onnx onnx_optimizer|' cmake/Dependencies.cmake + mkdir -p third_party/tensorpipe + echo '' + sed -i '/add_dependencies(tensorpipe_agent tensorpipe)/d' caffe2/CMakeLists.txt + echo '' + echo 'set(NNPACK_FOUND TRUE)' + sed -i '/TARGET cpuinfo PROPERTY/d' cmake/Dependencies.cmake + sed -i '/APPEND Caffe2_DEPENDENCY_LIBS fp16/d' cmake/Dependencies.cmake + mkdir -p third_party/QNNPACK + echo '' + sed -i '/TARGET qnnpack PROPERTY/d' cmake/Dependencies.cmake + sed -i -e '/target_compile_options(qnnpack/d' cmake/Dependencies.cmake + mkdir -p third_party/psimd + echo '' + sed -i '/pytorch_qnnpack PRIVATE psimd/d' aten/src/ATen/native/quantized/cpu/qnnpack/CMakeLists.txt + sed -i '/NOT TARGET fxdiv/,/endif/d' caffe2/CMakeLists.txt + sed -i '/torch_cpu PRIVATE fxdiv/d' caffe2/CMakeLists.txt + sed -i '/pytorch_qnnpack PRIVATE fxdiv/d' aten/src/ATen/native/quantized/cpu/qnnpack/CMakeLists.txt + mkdir -p third_party/fbgemm + echo '' + sed -i '/(TARGET fbgemm/d' cmake/Dependencies.cmake + sed -i 's|caffe2_fakelowp_ops fbgemm cpuinfo|caffe2_fakelowp_ops|' caffe2/contrib/fakelowp/CMakeLists.txt + sed -i 's|caffe2_dnnlowp_avx2_ops fbgemm|caffe2_dnnlowp_avx2_ops|' caffe2/quantization/server/CMakeLists.txt + mkdir -p third_party/foxi + echo '' + sed -i '/if(NOT TARGET kineto)/,/endif()/d' cmake/Dependencies.cmake + sed -i 's|libkineto/include|libkineto/include\n/usr/include/kineto|' torch/CMakeLists.txt + sed -i 's|libkineto/include|libkineto/include\n/usr/include/kineto|' caffe2/CMakeLists.txt + mkdir -p third_party/onnx-tensorrt + echo '' + sed -i /nvonnxparser_static/d cmake/Dependencies.cmake + sed -i 's|onnx_trt_library|nvonnxparser_static|g' cmake/Dependencies.cmake + rm -rf torch/csrc/jit/serialization/mobile_bytecode_generated.h + flatc --cpp --gen-mutable --scoped-enums -o torch/csrc/jit/serialization -c torch/csrc/jit/serialization/mobile_bytecode.fbs + echo '// @generated' + sed -i '/find_package(RocksDB CONFIG)/d' modules/rocksdb/CMakeLists.txt + sed -i 's|RocksDB::rocksdb|RocksDB::rocksdb-shared|' modules/rocksdb/CMakeLists.txt + mv -f cmake/Modules_CUDA_fix/FindCUDNN.cmake cmake/Modules + rm -rf cmake/Modules_CUDA_fix + find . -type d -name FindCUDA -exec rm -rf '{}' ';' + sed -i -e '/install/{:a;/COMPONENT/bb;N;ba;:b;/Modules_CUDA_fix/d;}' CMakeLists.txt + sed -i -e 's|CMAKE_CUDA_FLAGS "-D|CMAKE_CUDA_FLAGS " -D|' CMakeLists.txt + sed -i '/install(EXPORT Caffe2Targets/,/dev)/d' CMakeLists.txt + sed -i 's|SYSTEM ||g' c10/CMakeLists.txt + sed -i 's|SYSTEM ||g' torch/CMakeLists.txt + sed -i 's|SYSTEM ||g' caffe2/CMakeLists.txt + sed -i 's|BEFORE SYSTEM ||g' cmake/ProtoBuf.cmake + sed -i 's|AFTER SYSTEM ||g' cmake/Dependencies.cmake + sed -i 's|BEFORE SYSTEM ||g' cmake/Dependencies.cmake + sed -i 's|SYSTEM ||g' cmake/Dependencies.cmake + sed -i '1i #include ' c10/util/Registry.h + sed -i '1i #include ' c10/core/DispatchKey.h + sed -i '1i #include ' torch/csrc/jit/runtime/logging.cpp + sed -i '1i #include ' torch/csrc/lazy/core/multi_wait.cpp + sed -i '1i #include "stdint.h"' torch/csrc/jit/passes/quantization/quantization_type.h + RPM_EC=0 ++ jobs -p + exit 0 Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.PJ6aI0 + umask 022 + cd /builddir/build/BUILD + CFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 ' + export CFLAGS + CXXFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 ' + export CXXFLAGS + FFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 -I/usr/lib64/gfortran/modules ' + export FFLAGS + FCFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 -I/usr/lib64/gfortran/modules ' + export FCFLAGS + VALAFLAGS=-g + export VALAFLAGS + RUSTFLAGS='-Copt-level=3 -Cdebuginfo=2 -Ccodegen-units=1 -Cstrip=none -Cforce-frame-pointers=yes -Clink-arg=-specs=/usr/lib/rpm/redhat/redhat-package-notes --cap-lints=warn' + export RUSTFLAGS + LDFLAGS='-Wl,-z,relro -Wl,--as-needed -Wl,-z,pack-relative-relocs -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes -Wl,-lstdc++' + export LDFLAGS + LT_SYS_LIBRARY_PATH=/usr/lib64: + export LT_SYS_LIBRARY_PATH + CC=gcc + export CC + CXX=g++ + export CXX + cd pytorch + mkdir build + pushd build ~/build/BUILD/pytorch/build ~/build/BUILD/pytorch + export ONNX_ML=0 + ONNX_ML=0 + export BUILD_SPLIT_CUDA=ON + BUILD_SPLIT_CUDA=ON + export REL_WITH_DEB_INFO=1 + REL_WITH_DEB_INFO=1 + export TORCH_NVCC_FLAGS=-DCUDA_HAS_FP16 + TORCH_NVCC_FLAGS=-DCUDA_HAS_FP16 + export PYTHON_EXECUTABLE=/usr/bin/python3 + PYTHON_EXECUTABLE=/usr/bin/python3 + export LDFLAGS=-Wl,-lstdc++ + LDFLAGS=-Wl,-lstdc++ + export LD_LIBRARY_PATH=/usr/local/cuda-12.3/lib64/ + LD_LIBRARY_PATH=/usr/local/cuda-12.3/lib64/ + CFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 ' + export CFLAGS + CXXFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 ' + export CXXFLAGS + FFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 -I/usr/lib64/gfortran/modules ' + export FFLAGS + FCFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 -I/usr/lib64/gfortran/modules ' + export FCFLAGS + VALAFLAGS=-g + export VALAFLAGS + RUSTFLAGS='-Copt-level=3 -Cdebuginfo=2 -Ccodegen-units=1 -Cstrip=none -Cforce-frame-pointers=yes -Clink-arg=-specs=/usr/lib/rpm/redhat/redhat-package-notes --cap-lints=warn' + export RUSTFLAGS + LDFLAGS=-Wl,-lstdc++ + export LDFLAGS + LT_SYS_LIBRARY_PATH=/usr/lib64: + export LT_SYS_LIBRARY_PATH + CC=gcc + export CC + CXX=g++ + export CXX + /usr/bin/cmake -DCMAKE_C_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_CXX_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_Fortran_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_INSTALL_DO_STRIP:BOOL=OFF -DCMAKE_INSTALL_PREFIX:PATH=/usr -DINCLUDE_INSTALL_DIR:PATH=/usr/include -DLIB_INSTALL_DIR:PATH=/usr/lib64 -DSYSCONF_INSTALL_DIR:PATH=/etc -DSHARE_INSTALL_PREFIX:PATH=/usr/share -DLIB_SUFFIX=64 -DBUILD_SHARED_LIBS:BOOL=ON .. -Wno-dev -DCMAKE_SKIP_RPATH=ON -DCMAKE_VERBOSE_MAKEFILE=OFF -DCMAKE_BUILD_TYPE=Release -DCMAKE_NO_SYSTEM_FROM_IMPORTED=ON -DCMAKE_SKIP_RULE_DEPENDENCY=ON -DCMAKE_SUPPRESS_REGENERATION=ON -DUSE_CCACHE=OFF -DHAVE_SOVERSION=ON -DUSE_NATIVE_ARCH=OFF -DUSE_DISTRIBUTED=ON -DBUILD_DOCS=OFF -DBUILD_PYTHON=ON -DBUILD_FUNCTORCH=ON -DBUILD_CAFFE2=OFF -DBUILD_BINARY=OFF -DBUILD_BENCHMARK=OFF -DBUILD_CUSTOM_PROTOBUF=OFF -DBUILDING_WITH_TORCH_LIBS=ON -DPYTHON_EXECUTABLE=/usr/bin/python3 -DPYBIND11_PYTHON_VERSION=3.12 -DCAFFE2_LINK_LOCAL_PROTOBUF=OFF -DONNX_ML=OFF -DUSE_GLOG=ON -DUSE_GFLAGS=ON -DUSE_OPENMP=ON -DUSE_KINETO=ON -DUSE_BREAKPAD=OFF -DUSE_SYSTEM_ONNX=ON -DUSE_SYSTEM_GLOO=ON -DUSE_SYSTEM_PYBIND11=ON -DUSE_SYSTEM_EIGEN_INSTALL=ON -DUSE_CUDA=ON -DUSE_CUDNN=ON -DUSE_NVRTC=ON -DUSE_CUPTI_SO=ON -DUSE_FAST_NVCC=ON -DUSE_SYSTEM_NCCL=ON -DCMAKE_CUDA_FLAGS=-fPIC -DCUDA_PROPAGATE_HOST_FLAGS=OFF '-DTORCH_CUDA_ARCH_LIST=5.2+PTX 6.1 7.5 8.6 8.9 9.0' -DCUDA_HOST_COMPILER=/usr/bin/cuda-g++ -DCMAKE_CUDA_HOST_COMPILER=/usr/bin/cuda-g++ -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.3 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.3/bin/nvcc '-DCUDA_NVCC_FLAGS=--compiler-options;-fPIC;-Wno-deprecated-gpu-targets;-allow-unsupported-compiler;--fatbin-options;-compress-all' '-DCMAKE_CUDA_FLAGS=--compiler-options -fPIC -Wno-deprecated-gpu-targets -allow-unsupported-compiler --fatbin-options -compress-all' -DNCCL_INCLUDE_DIR=/usr/include/nccl -DUSE_MAGMA=ON -DBUILD_SPLIT_CUDA=ON -DUSE_TENSORRT=OFF -DBLAS=OpenBLAS -DUSE_MPI=OFF -DUSE_OBSERVERS=OFF -DUSE_ASAN=OFF -DUSE_ROCM=OFF -DUSE_MKLDNN=OFF -DUSE_FBGEMM=ON -DUSE_NNPACK=ON -DUSE_QNNPACK=ON -DUSE_PYTORCH_QNNPACK=ON -DUSE_SYSTEM_FP16=ON -DUSE_SYSTEM_PSIMD=ON -DUSE_SYSTEM_SLEEF=ON -DUSE_SYSTEM_FXDIV=ON -DUSE_SYSTEM_XNNPACK=OFF -DUSE_SYSTEM_CPUINFO=ON -DUSE_SYSTEM_PTHREADPOOL=ON -DUSE_TENSORPIPE=ON -DUSE_FAKELOWP=OFF -DUSE_OPENCL=OFF -DUSE_GLOO=ON -DUSE_ZMQ=ON -DUSE_ZSTD=ON -DUSE_LMDB=ON -DUSE_REDIS=ON -DUSE_LEVELDB=ON -DUSE_ROCKSDB=ON -DUSE_FFMPEG=OFF -DUSE_OPENCV=ON -DUSE_METAL=OFF -DUSE_TBB=OFF -DUSE_LLVM=OFF -DATEN_NO_TEST=ON -- The CXX compiler identification is GNU 14.0.1 -- The C compiler identification is GNU 14.0.1 -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- /usr/bin/g++ /builddir/build/BUILD/pytorch/torch/abi-check.cpp -o /builddir/build/BUILD/pytorch/build/abi-check -- Determined _GLIBCXX_USE_CXX11_ABI=1 -- Performing Test CAFFE2_NEED_TO_TURN_OFF_DEPRECATION_WARNING -- Performing Test CAFFE2_NEED_TO_TURN_OFF_DEPRECATION_WARNING - Failed -- Turning off deprecation warning due to glog. -- Performing Test C_HAS_AVX_1 -- Performing Test C_HAS_AVX_1 - Failed -- Performing Test C_HAS_AVX_2 -- Performing Test C_HAS_AVX_2 - Success -- Performing Test C_HAS_AVX2_1 -- Performing Test C_HAS_AVX2_1 - Failed -- Performing Test C_HAS_AVX2_2 -- Performing Test C_HAS_AVX2_2 - Success -- Performing Test C_HAS_AVX512_1 -- Performing Test C_HAS_AVX512_1 - Failed -- Performing Test C_HAS_AVX512_2 -- Performing Test C_HAS_AVX512_2 - Success -- Performing Test CXX_HAS_AVX_1 -- Performing Test CXX_HAS_AVX_1 - Failed -- Performing Test CXX_HAS_AVX_2 -- Performing Test CXX_HAS_AVX_2 - Success -- Performing Test CXX_HAS_AVX2_1 -- Performing Test CXX_HAS_AVX2_1 - Failed -- Performing Test CXX_HAS_AVX2_2 -- Performing Test CXX_HAS_AVX2_2 - Success -- Performing Test CXX_HAS_AVX512_1 -- Performing Test CXX_HAS_AVX512_1 - Failed -- Performing Test CXX_HAS_AVX512_2 -- Performing Test CXX_HAS_AVX512_2 - Success -- Current compiler supports avx2 extension. Will build perfkernels. -- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS -- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS - Success -- Current compiler supports avx512f extension. Will build fbgemm. -- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY -- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY - Success -- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY -- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY - Success -- Performing Test COMPILER_SUPPORTS_RDYNAMIC -- Performing Test COMPILER_SUPPORTS_RDYNAMIC - Success -- Found CUDA: /usr/local/cuda-12.3 (found version "12.3") -- The CUDA compiler identification is NVIDIA 12.3.107 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda-12.3/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda-12.3/include (found version "12.3.107") -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Caffe2: CUDA detected: 12.3 -- Caffe2: CUDA nvcc is: /usr/local/cuda-12.3/bin/nvcc -- Caffe2: CUDA toolkit directory: /usr/local/cuda-12.3 -- Caffe2: Header version is: 12.3 -- /usr/local/cuda-12.3/lib64/libnvrtc.so shorthash is e150bf88 -- Found CUDNN: /usr/lib64/libcudnn.so -- Could NOT find CUSPARSELT (missing: CUSPARSELT_LIBRARY_PATH CUSPARSELT_INCLUDE_PATH) CMake Warning at cmake/public/cuda.cmake:275 (message): Cannot find cuSPARSELt library. Turning the option off Call Stack (most recent call first): cmake/Dependencies.cmake:44 (include) CMakeLists.txt:760 (include) -- Added CUDA NVCC flags for: -gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_89,code=sm_89;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_52,code=compute_52 -- Caffe2: Found protobuf with new-style protobuf targets. -- Caffe2 protobuf include directory: /usr/include -- Trying to find preferred BLAS backend of choice: OpenBLAS -- Found OpenBLAS libraries: /usr/lib64/libopenblaso.so -- Found OpenBLAS include: /usr/include/openblas -- Using pocketfft in directory: /builddir/build/BUILD/pytorch/third_party/pocketfft/ -- Found pthreadpool: /usr/lib64/libpthreadpool.so Found cpuinfo: /usr/lib64/libcpuinfo.so -- The ASM compiler identification is GNU -- Found assembler: /usr/bin/gcc -- Caffe2: Found gflags with new-style gflags target. -- Caffe2: Cannot find glog automatically. Using legacy find. -- Found glog: /usr/include -- Caffe2: Found glog (include: /usr/include, library: /usr/lib64/libglog.so) -- Found LMDB: /usr/include -- Found lmdb (include: /usr/include, library: /usr/lib64/liblmdb.so) -- Found LevelDB: /usr/include -- Found LevelDB (include: /usr/include, library: /usr/lib64/libleveldb.so) -- Found Snappy: /usr/include -- Found Snappy (include: /usr/include, library: /usr/lib64/libsnappy.so) -- Found Numa: /usr/include -- Found Numa (include: /usr/include, library: /usr/lib64/libnuma.so) -- Found ZMQ: /usr/include -- Found ZMQ (include: /usr/include, library: /usr/lib64/libzmq.so) -- Found Hiredis: /usr/include -- Found Hiredis (include: /usr/include, library: /usr/lib64/libhiredis.so) -- OpenCV found (/usr/lib64/cmake/opencv4) -- Found system Eigen at /usr/include/eigen3 -- Setting Python's include dir to /usr/include/python3.12 from sysconfig -- Setting Python's library to /usr/lib64/python3.12 -- Found PythonInterp: /usr/bin/python3 (found suitable version "3.12.2", minimum required is "3.0") -- Found PythonLibs: /usr/lib64/python3.12 (found suitable version "3.12.2", minimum required is "3.0") -- Found NumPy: /usr/lib64/python3.12/site-packages/numpy/core/include (found version "1.26.4") -- NumPy ver. 1.26.4 found (include: /usr/lib64/python3.12/site-packages/numpy/core/include) -- Found PythonInterp: /usr/bin/python3 (found suitable version "3.12.2", minimum required is "3.12") -- Found PythonLibs: /usr/lib64/python3.12 -- Performing Test HAS_FLTO -- Performing Test HAS_FLTO - Success -- Found pybind11: /usr/include (found version "2.12.0") -- pybind11 include dirs: /usr/include;/usr/include/python3.12 -- Check OMP with lib /usr/lib/gcc/x86_64-redhat-linux/14/libgomp.so and flags -fopenmp -v -- Check OMP with lib /usr/lib/gcc/x86_64-redhat-linux/14/libgomp.so and flags -fopenmp -v -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Adding OpenMP CXX_FLAGS: -fopenmp -- Will link against OpenMP libraries: /usr/lib/gcc/x86_64-redhat-linux/14/libgomp.so -- Found NCCL: /usr/include -- Determining NCCL version from /usr/include/nccl.h... -- Looking for NCCL_VERSION_CODE -- Looking for NCCL_VERSION_CODE - not found -- NCCL version < 2.3.5-5 -- Found NCCL (include: /usr/include, library: /usr/lib64/libnccl.so) -- Found CUB: /usr/local/cuda-12.3/include -- Converting CMAKE_CUDA_FLAGS to CUDA_NVCC_FLAGS: CUDA_NVCC_FLAGS = --compiler-options;-fPIC;-Wno-deprecated-gpu-targets;-allow-unsupported-compiler;--fatbin-options;-compress-all;-DLIBCUDACXX_ENABLE_SIMPLIFIED_COMPLEX_OPERATIONS;-D_GLIBCXX_USE_CXX11_ABI=1;-Xfatbin;-compress-all;--compiler-options;-fPIC;-Wno-deprecated-gpu-targets;-allow-unsupported-compiler;--fatbin-options;-compress-all;-DONNX_NAMESPACE=onnx;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_89,code=sm_89;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_52,code=compute_52;-Xcudafe;--diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl;--expt-relaxed-constexpr;--expt-extended-lambda CUDA_NVCC_FLAGS_DEBUG = -g CUDA_NVCC_FLAGS_RELEASE = -O3;-DNDEBUG CUDA_NVCC_FLAGS_RELWITHDEBINFO = -O2;-g;-DNDEBUG CUDA_NVCC_FLAGS_MINSIZEREL = -O1;-DNDEBUG Found gloo: /usr/lib64/libgloo.so -- Found onnx: /usr/lib64/libonnx.so /usr/lib64/libonnx_proto.so -- Found CUDA with FP16 support, compiling with torch.cuda.HalfTensor -- Adding -DNDEBUG to compile flags -- Checking prototype magma_get_sgeqrf_nb for MAGMA_V2 -- Checking prototype magma_get_sgeqrf_nb for MAGMA_V2 - False -- Compiling with MAGMA support -- MAGMA INCLUDE DIRECTORIES: /usr/include -- MAGMA LIBRARIES: /usr/lib64/libmagma.so -- MAGMA V2 check: 0 -- Could not find hardware support for NEON on this machine. -- No OMAP3 processor on this machine. -- No OMAP4 processor on this machine. -- Looking for cheev_ -- Looking for cheev_ - found -- Looking for cgesdd_ -- Looking for cgesdd_ - found -- Found a library with LAPACK API (open). disabling ROCM because NOT USE_ROCM is set -- MIOpen not found. Compiling without MIOpen support disabling MKLDNN because USE_MKLDNN is not set -- Looking for clock_gettime in rt -- Looking for clock_gettime in rt - found -- Looking for mmap -- Looking for mmap - found -- Looking for shm_open -- Looking for shm_open - found -- Looking for shm_unlink -- Looking for shm_unlink - found -- Looking for malloc_usable_size -- Looking for malloc_usable_size - found -- -- check z16 -- Performing Test COMPILE_OUT_z16 -- Performing Test COMPILE_OUT_z16 - Failed -- Performing Test COMPILE_OUT_z15 -- check z15 -- Performing Test COMPILE_OUT_z15 - Failed -- Performing Test COMPILE_OUT_z14 -- check z14 -- Performing Test COMPILE_OUT_z14 - Failed -- -- Version: 10.2.1 -- Build type: Release -- Using Kineto with CUPTI support -- Configuring Kineto dependency: -- KINETO_SOURCE_DIR = /builddir/build/BUILD/pytorch/third_party/kineto/libkineto -- KINETO_BUILD_TESTS = OFF -- KINETO_LIBRARY_TYPE = static -- CUDA_SOURCE_DIR = /usr/local/cuda-12.3 -- CUDA_INCLUDE_DIRS = /usr/local/cuda-12.3/include -- CUPTI_INCLUDE_DIR = /usr/local/cuda-12.3/include -- CUDA_cupti_LIBRARY = /usr/local/cuda-12.3/lib64/libcupti.so -- Found CUPTI -- Configured Kineto -- GCC 14.0.1: Adding gcc and gcc_s libs to link line -- Performing Test HAS_WERROR_RETURN_TYPE -- Performing Test HAS_WERROR_RETURN_TYPE - Success -- Performing Test HAS_WERROR_NON_VIRTUAL_DTOR -- Performing Test HAS_WERROR_NON_VIRTUAL_DTOR - Success -- Performing Test HAS_WERROR_BRACED_SCALAR_INIT -- Performing Test HAS_WERROR_BRACED_SCALAR_INIT - Failed -- Performing Test HAS_WERROR_RANGE_LOOP_CONSTRUCT -- Performing Test HAS_WERROR_RANGE_LOOP_CONSTRUCT - Success -- Performing Test HAS_WERROR_BOOL_OPERATION -- Performing Test HAS_WERROR_BOOL_OPERATION - Success -- Performing Test HAS_WNARROWING -- Performing Test HAS_WNARROWING - Success -- Performing Test HAS_WNO_MISSING_FIELD_INITIALIZERS -- Performing Test HAS_WNO_MISSING_FIELD_INITIALIZERS - Success -- Performing Test HAS_WNO_TYPE_LIMITS -- Performing Test HAS_WNO_TYPE_LIMITS - Success -- Performing Test HAS_WNO_ARRAY_BOUNDS -- Performing Test HAS_WNO_ARRAY_BOUNDS - Success -- Performing Test HAS_WNO_UNKNOWN_PRAGMAS -- Performing Test HAS_WNO_UNKNOWN_PRAGMAS - Success -- Performing Test HAS_WNO_UNUSED_PARAMETER -- Performing Test HAS_WNO_UNUSED_PARAMETER - Success -- Performing Test HAS_WNO_UNUSED_FUNCTION -- Performing Test HAS_WNO_UNUSED_FUNCTION - Success -- Performing Test HAS_WNO_UNUSED_RESULT -- Performing Test HAS_WNO_UNUSED_RESULT - Success -- Performing Test HAS_WNO_STRICT_OVERFLOW -- Performing Test HAS_WNO_STRICT_OVERFLOW - Success -- Performing Test HAS_WNO_STRICT_ALIASING -- Performing Test HAS_WNO_STRICT_ALIASING - Success -- Performing Test HAS_WNO_STRINGOP_OVERFLOW -- Performing Test HAS_WNO_STRINGOP_OVERFLOW - Success -- Performing Test HAS_WVLA_EXTENSION -- Performing Test HAS_WVLA_EXTENSION - Failed -- Performing Test HAS_WSUGGEST_OVERRIDE -- Performing Test HAS_WSUGGEST_OVERRIDE - Success -- Performing Test HAS_WNEWLINE_EOF -- Performing Test HAS_WNEWLINE_EOF - Failed -- Performing Test HAS_WINCONSISTENT_MISSING_OVERRIDE -- Performing Test HAS_WINCONSISTENT_MISSING_OVERRIDE - Failed -- Performing Test HAS_WINCONSISTENT_MISSING_DESTRUCTOR_OVERRIDE -- Performing Test HAS_WINCONSISTENT_MISSING_DESTRUCTOR_OVERRIDE - Failed -- Performing Test HAS_WNO_ERROR_PEDANTIC -- Performing Test HAS_WNO_ERROR_PEDANTIC - Success -- Performing Test HAS_WNO_ERROR_OLD_STYLE_CAST -- Performing Test HAS_WNO_ERROR_OLD_STYLE_CAST - Success -- Performing Test HAS_WNO_ERROR_INCONSISTENT_MISSING_OVERRIDE -- Performing Test HAS_WNO_ERROR_INCONSISTENT_MISSING_OVERRIDE - Failed -- Performing Test HAS_WNO_ERROR_INCONSISTENT_MISSING_DESTRUCTOR_OVERRIDE -- Performing Test HAS_WNO_ERROR_INCONSISTENT_MISSING_DESTRUCTOR_OVERRIDE - Failed -- Performing Test HAS_WCONSTANT_CONVERSION -- Performing Test HAS_WCONSTANT_CONVERSION - Failed -- Performing Test HAS_WNO_INVALID_PARTIAL_SPECIALIZATION -- Performing Test HAS_WNO_INVALID_PARTIAL_SPECIALIZATION - Failed -- Performing Test HAS_WNO_ALIGNED_ALLOCATION_UNAVAILABLE -- Performing Test HAS_WNO_ALIGNED_ALLOCATION_UNAVAILABLE - Failed -- Performing Test HAS_WNO_MISSING_BRACES -- Performing Test HAS_WNO_MISSING_BRACES - Success -- Performing Test HAS_QUNUSED_ARGUMENTS -- Performing Test HAS_QUNUSED_ARGUMENTS - Failed -- Performing Test HAS_FDIAGNOSTICS_COLOR_ALWAYS -- Performing Test HAS_FDIAGNOSTICS_COLOR_ALWAYS - Success -- Performing Test HAS_FALIGNED_NEW -- Performing Test HAS_FALIGNED_NEW - Success -- Performing Test HAS_WNO_UNUSED_BUT_SET_VARIABLE -- Performing Test HAS_WNO_UNUSED_BUT_SET_VARIABLE - Success -- Performing Test HAS_WNO_MAYBE_UNINITIALIZED -- Performing Test HAS_WNO_MAYBE_UNINITIALIZED - Success -- Performing Test HAS_FSTANDALONE_DEBUG -- Performing Test HAS_FSTANDALONE_DEBUG - Failed -- Performing Test HAS_FNO_MATH_ERRNO -- Performing Test HAS_FNO_MATH_ERRNO - Success -- Performing Test HAS_FNO_TRAPPING_MATH -- Performing Test HAS_FNO_TRAPPING_MATH - Success -- Performing Test HAS_WERROR_FORMAT -- Performing Test HAS_WERROR_FORMAT - Success -- Performing Test HAS_WDEPRECATED -- Performing Test HAS_WDEPRECATED - Success -- NUMA paths: -- /usr/include -- /usr/lib64/libnuma.so -- Looking for backtrace -- Looking for backtrace - found -- backtrace facility detected in default set of libraries -- Found Backtrace: /usr/include -- headers outputs: -- sources outputs: -- declarations_yaml outputs: -- Performing Test COMPILER_SUPPORTS_NO_AVX256_SPLIT -- Performing Test COMPILER_SUPPORTS_NO_AVX256_SPLIT - Success -- Using ATen parallel backend: OMP Found sleef: /usr/lib64/libsleef.so AT_INSTALL_INCLUDE_DIR include/ATen/core core header install: /builddir/build/BUILD/pytorch/build/aten/src/ATen/core/TensorBody.h core header install: /builddir/build/BUILD/pytorch/build/aten/src/ATen/core/aten_interned_strings.h core header install: /builddir/build/BUILD/pytorch/build/aten/src/ATen/core/enum_tag.h disable test because ATEN_NO_TEST is set -- Performing Test HAS_WNO_DEPRECATED_COPY -- Performing Test HAS_WNO_DEPRECATED_COPY - Success -- _GLIBCXX_USE_CXX11_ABI=1 is already defined as a cmake variable -- Using lib/python3.12/site-packages as python relative installation path -- -- ******** Summary ******** -- General: -- CMake version : 3.28.3 -- CMake command : /usr/bin/cmake -- System : Linux -- C++ compiler : /usr/bin/g++ -- C++ compiler id : GNU -- C++ compiler version : 14.0.1 -- Using ccache if found : OFF -- CXX flags : -O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 -D_GLIBCXX_USE_CXX11_ABI=1 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DTMP_LIBKINETO_NANOSECOND -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -- Build type : Release -- Compile definitions : ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS;FLASHATTENTION_DISABLE_ALIBI -- CMAKE_PREFIX_PATH : /usr/local/cuda-12.3;/usr/local/cuda-12.3;/usr/local/cuda-12.3 -- CMAKE_INSTALL_PREFIX : /usr -- USE_GOLD_LINKER : OFF -- -- TORCH_VERSION : 2.4.0 -- BUILD_CAFFE2 : OFF -- BUILD_CAFFE2_OPS : OFF -- BUILD_STATIC_RUNTIME_BENCHMARK: OFF -- BUILD_BINARY : OFF -- BUILD_CUSTOM_PROTOBUF : OFF -- Protobuf compiler : /usr/bin/protoc -- Protobuf includes : /usr/include -- Protobuf libraries : /usr/lib64/libprotobuf.so -- BUILD_DOCS : OFF -- BUILD_PYTHON : ON -- Python version : 3.12.2 -- Python executable : /usr/bin/python3 -- Pythonlibs version : 3.12.2 -- Python library : /usr/lib64/python3.12 -- Python includes : /usr/include/python3.12 -- Python site-packages: lib/python3.12/site-packages -- BUILD_SHARED_LIBS : ON -- CAFFE2_USE_MSVC_STATIC_RUNTIME : OFF -- BUILD_TEST : OFF -- BUILD_JNI : OFF -- BUILD_MOBILE_AUTOGRAD : OFF -- BUILD_LITE_INTERPRETER: OFF -- INTERN_BUILD_MOBILE : -- TRACING_BASED : OFF -- USE_BLAS : 1 -- BLAS : open -- BLAS_HAS_SBGEMM : -- USE_LAPACK : 1 -- LAPACK : open -- USE_ASAN : OFF -- USE_TSAN : OFF -- USE_CPP_CODE_COVERAGE : OFF -- USE_CUDA : ON -- Split CUDA : ON -- CUDA static link : OFF -- USE_CUDNN : ON -- USE_EXPERIMENTAL_CUDNN_V8_API: -- USE_CUSPARSELT : OFF -- CUDA version : 12.3 -- USE_FLASH_ATTENTION : ON -- USE_MEM_EFF_ATTENTION : ON -- cuDNN version : 8.9.7 -- CUDA root directory : /usr/local/cuda-12.3 -- CUDA library : /usr/local/cuda-12.3/lib64/stubs/libcuda.so -- cudart library : /usr/local/cuda-12.3/lib64/libcudart.so -- cublas library : /usr/local/cuda-12.3/lib64/libcublas.so -- cufft library : /usr/local/cuda-12.3/lib64/libcufft.so -- curand library : /usr/local/cuda-12.3/lib64/libcurand.so -- cusparse library : /usr/local/cuda-12.3/lib64/libcusparse.so -- cuDNN library : /usr/lib64/libcudnn.so -- nvrtc : /usr/local/cuda-12.3/lib64/libnvrtc.so -- CUDA include path : /usr/local/cuda-12.3/include -- NVCC executable : /usr/local/cuda-12.3/bin/nvcc -- CUDA compiler : /usr/local/cuda-12.3/bin/nvcc -- CUDA flags : --compiler-options -fPIC -Wno-deprecated-gpu-targets -allow-unsupported-compiler --fatbin-options -compress-all -DLIBCUDACXX_ENABLE_SIMPLIFIED_COMPLEX_OPERATIONS -D_GLIBCXX_USE_CXX11_ABI=1 -Xfatbin -compress-all --compiler-options -fPIC -Wno-deprecated-gpu-targets -allow-unsupported-compiler --fatbin-options -compress-all -DONNX_NAMESPACE=onnx -gencode arch=compute_52,code=sm_52 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_52,code=compute_52 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -DCUDA_HAS_FP16 -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUB_WRAPPED_NAMESPACE=at_cuda_detail -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -- CUDA host compiler : /usr/bin/cuda-g++ -- CUDA --device-c : OFF -- USE_TENSORRT : OFF -- USE_XPU : OFF -- USE_ROCM : OFF -- BUILD_NVFUSER : -- USE_EIGEN_FOR_BLAS : -- USE_FBGEMM : ON -- USE_FAKELOWP : OFF -- USE_KINETO : ON -- USE_FFMPEG : OFF -- USE_GFLAGS : ON -- USE_GLOG : ON -- USE_LEVELDB : ON -- LevelDB version : 1.23 -- Snappy version : 1.1.10 -- USE_LITE_PROTO : OFF -- USE_LMDB : ON -- LMDB version : 0.9.32 -- USE_METAL : OFF -- USE_PYTORCH_METAL : OFF -- USE_PYTORCH_METAL_EXPORT : OFF -- USE_MPS : OFF -- USE_MKL : -- USE_MKLDNN : OFF -- USE_UCC : OFF -- USE_ITT : ON -- USE_NCCL : ON -- USE_SYSTEM_NCCL : ON -- USE_NNPACK : ON -- USE_NUMPY : ON -- USE_OBSERVERS : ON -- USE_OPENCL : OFF -- USE_OPENCV : ON -- OpenCV version : 4.9.0 -- USE_OPENMP : ON -- USE_TBB : OFF -- USE_MIMALLOC : OFF -- USE_VULKAN : OFF -- USE_PROF : OFF -- USE_QNNPACK : ON -- USE_PYTORCH_QNNPACK : ON -- USE_XNNPACK : ON -- USE_REDIS : ON -- USE_ROCKSDB : ON -- USE_ZMQ : ON -- USE_DISTRIBUTED : ON -- USE_MPI : OFF -- USE_GLOO : ON -- USE_GLOO_WITH_OPENSSL : OFF -- USE_TENSORPIPE : ON -- Public Dependencies : -- Private Dependencies : Threads::Threads;/usr/lib64/libopenblaso.so;pthreadpool;cpuinfo;qnnpack;pytorch_qnnpack;XNNPACK;fbgemm;/usr/lib64/liblmdb.so;/usr/lib64/libleveldb.so;/usr/lib64/libsnappy.so;/usr/lib64/libzmq.so;/usr/lib64/libhiredis.so;opencv_core;opencv_highgui;opencv_imgproc;opencv_imgcodecs;opencv_optflow;opencv_videoio;opencv_video;ittnotify;caffe2::openmp;tensorpipe;gloo;onnx_proto;onnx;onnx_optimizer;foxi_loader;rt;fmt::fmt-header-only;kineto;gcc_s;gcc;dl -- Public CUDA Deps. : caffe2::cuda;caffe2::nvrtc -- Private CUDA Deps. : caffe2::curand;caffe2::cufft;caffe2::cublas;torch::cudnn;__caffe2_nccl;tensorpipe_cuda;gloo_cuda;/usr/local/cuda-12.3/lib64/libcudart.so;CUDA::cusparse;CUDA::cufft;ATEN_CUDA_FILES_GEN_LIB -- USE_COREML_DELEGATE : OFF -- BUILD_LAZY_TS_BACKEND : ON -- USE_ROCM_KERNEL_ASSERT : OFF -- Performing Test HAS_WMISSING_PROTOTYPES -- Performing Test HAS_WMISSING_PROTOTYPES - Success -- Performing Test HAS_WERROR_MISSING_PROTOTYPES -- Performing Test HAS_WERROR_MISSING_PROTOTYPES - Success -- Configuring done (19.1s) CMake Warning at torch/CMakeLists.txt:282 (target_link_libraries): Target "_C" requests linking to directory "/usr/lib64/python3.12". Targets may link only to libraries. CMake is dropping the item. -- Generating done (0.7s) CMake Warning: Manually-specified variables were not used by the project: CMAKE_Fortran_FLAGS_RELEASE CMAKE_INSTALL_DO_STRIP INCLUDE_INSTALL_DIR LIB_INSTALL_DIR LIB_SUFFIX SHARE_INSTALL_PREFIX SYSCONF_INSTALL_DIR USE_BREAKPAD USE_FAST_NVCC -- Build files have been written to: /builddir/build/BUILD/pytorch/build + make -j4 [ 0%] Linking C static library ../../lib/libfp16.a [ 0%] Building C object confu-deps/clog/CMakeFiles/clog.dir/src/clog.c.o [ 0%] Linking C static library ../../lib/libfxdiv.a [ 0%] Linking C static library ../../lib/libpsimd.a [ 0%] Built target fxdiv [ 0%] Built target psimd [ 0%] Built target fp16 [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/logging.dir/src/enums/datatype-strings.c.o [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/normalization.dir/src/normalization.c.o [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/allocator.dir/src/allocator.c.o [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/logging.dir/src/enums/microkernel-type.c.o [ 0%] Built target allocator [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/logging.dir/src/enums/node-type.c.o [ 0%] Linking C static library ../../lib/libclog.a [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernel-utils.dir/src/microkernel-utils.c.o [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/logging.dir/src/enums/operator-type.c.o [ 0%] Built target clog [ 0%] Building C object confu-deps/XNNPACK/CMakeFiles/logging.dir/src/log.c.o [ 0%] Building CXX object confu-deps/XNNPACK/CMakeFiles/convolution-test-helpers.dir/test/convolution-test-helpers.cc.o [ 0%] Built target microkernel-utils [ 0%] Building C object third_party/ittapi/CMakeFiles/ittnotify.dir/src/ittnotify/ittnotify_static.c.o [ 0%] Built target logging [ 0%] Built target normalization [ 0%] Building CXX object third_party/fmt/CMakeFiles/fmt.dir/src/format.cc.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/Allocator.cpp.o [ 0%] Built target convolution-test-helpers [ 0%] Running C++/Python protocol buffer compiler on /builddir/build/BUILD/pytorch/caffe2/proto/torch.proto [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/AutogradState.cpp.o [ 0%] Running C++/Python protocol buffer compiler on /builddir/build/BUILD/pytorch/caffe2/proto/caffe2.proto [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/CPUAllocator.cpp.o [ 0%] Building CXX object caffe2/proto/CMakeFiles/Caffe2_PROTO.dir/torch.pb.cc.o [ 0%] Building C object third_party/ittapi/CMakeFiles/ittnotify.dir/src/ittnotify/jitprofiling.c.o [ 0%] Linking C static library ../../lib/libittnotify.a [ 0%] Built target ittnotify [ 0%] Building CXX object caffe2/CMakeFiles/caffe2_nvrtc.dir/__/aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.cpp.o [ 0%] Linking CXX shared library ../lib/libcaffe2_nvrtc.so Warning: Unused direct dependencies: libcuda.so.1 /lib64/libm.so.6 /lib64/libgcc_s.so.1 [ 0%] Built target caffe2_nvrtc [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/ConstantSymNodeImpl.cpp.o [ 0%] Generating ATen headers [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/CopyBytes.cpp.o [ 0%] Building CXX object caffe2/proto/CMakeFiles/Caffe2_PROTO.dir/caffe2.pb.cc.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/DefaultDtype.cpp.o [ 0%] Building CXX object third_party/fmt/CMakeFiles/fmt.dir/src/os.cc.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/Device.cpp.o [ 0%] Linking CXX static library ../../lib/libfmt.a [ 0%] Built target fmt [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/DeviceType.cpp.o [ 0%] Generating ATen headers [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/DispatchKey.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/DispatchKeySet.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/GeneratorImpl.cpp.o [ 0%] Built target Caffe2_PROTO [ 0%] Generating ATen sources [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/GradMode.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/InferenceMode.cpp.o [ 0%] Generating ATen sources [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/RefcountedDeleter.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/SafePyObject.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/Scalar.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/ScalarType.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/Storage.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/StorageImpl.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/Stream.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/SymBool.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/SymFloat.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/SymInt.cpp.o [ 0%] Building CXX object c10/CMakeFiles/c10.dir/core/SymIntArrayRef.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/SymNodeImpl.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/SymbolicShapeMeta.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/TensorImpl.cpp.o [ 1%] Generating ATen declarations_yaml [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/TensorOptions.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/UndefinedTensorImpl.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/WrapDimMinimal.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/COW.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/COWDeleter.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/DeviceGuardImplInterface.cpp.o [ 1%] Building C object caffe2/CMakeFiles/torch_global_deps.dir/__/torch/csrc/empty.c.o [ 1%] Linking C shared library ../lib/libtorch_global_deps.so Warning: Unused direct dependencies: /lib64/libstdc++.so.6 /usr/local/cuda-12.3/lib64/libnvrtc.so.12 libcuda.so.1 /usr/local/cuda-12.3/lib64/libcudart.so.12 /usr/local/cuda-12.3/lib64/libnvToolsExt.so.1 [ 1%] Built target torch_global_deps [ 1%] Built target python_copy_files [ 1%] Generating /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/Functions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/ViewFuncs.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/VariableType_0.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/VariableType_1.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/VariableType_3.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/TraceType_0.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/TraceType_1.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/TraceType_2.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/TraceType_3.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/TraceType_4.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/ADInplaceOrViewType_0.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp, /builddir/build/BUILD/pytorch/torch/csrc/inductor/aoti_torch/generated/c_shim_cpu.cpp, /builddir/build/BUILD/pytorch/torch/csrc/lazy/generated/LazyNativeFunctions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/lazy/generated/RegisterAutogradLazy.cpp, /builddir/build/BUILD/pytorch/torch/csrc/lazy/generated/RegisterLazy.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/Functions.h, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/variable_factories.h, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/ViewFuncs.h, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/VariableType.h, /builddir/build/BUILD/pytorch/torch/csrc/lazy/generated/LazyIr.h, /builddir/build/BUILD/pytorch/torch/csrc/lazy/generated/LazyNonNativeIr.h, /builddir/build/BUILD/pytorch/torch/csrc/lazy/generated/LazyNativeFunctions.h, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_functions_0.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_functions_1.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_functions_2.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_functions_3.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_functions_4.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_variable_methods.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_torch_functions_0.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_torch_functions_1.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_torch_functions_2.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_nn_functions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_fft_functions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_linalg_functions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_nested_functions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_sparse_functions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_special_functions.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_return_types.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_enum_tag.cpp, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_functions.h, /builddir/build/BUILD/pytorch/torch/csrc/autograd/generated/python_return_types.h, /builddir/build/BUILD/pytorch/torch/testing/_internal/generated/annotated_fn_args.py, /builddir/build/BUILD/pytorch/torch/csrc/inductor/aoti_torch/generated/c_shim_cuda.cpp [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/GPUTrace.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/HermeticPyObjectTLS.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/LocalDispatchKeySet.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/PyInterpreter.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/PyObjectSlot.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/PythonDispatcherTLS.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/SizesAndStrides.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/TorchDispatchModeTLS.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/impl/alloc_cpu.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/core/thread_pool.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/mobile/CPUCachingAllocator.cpp.o [ 1%] Built target generate-torch-sources [ 1%] Generating /builddir/build/BUILD/pytorch/torch/_C/__init__.pyi, /builddir/build/BUILD/pytorch/torch/_C/_VariableFunctions.pyi, /builddir/build/BUILD/pytorch/torch/nn/functional.pyi [ 1%] Building CXX object c10/CMakeFiles/c10.dir/mobile/CPUProfilingAllocator.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/ApproximateClock.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/Backtrace.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/Bfloat16.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/C++17.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/DeadlockDetection.cpp.o [ 1%] Generating /builddir/build/BUILD/pytorch/torch/utils/data/datapipes/datapipe.pyi [ 1%] Built target torch_python_stubs [ 1%] Generating /builddir/build/BUILD/pytorch/torch/version.py [ 1%] Built target gen_torch_version [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/init.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/add.c.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/Exception.cpp.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/average-pooling.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/channel-shuffle.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/clamp.c.o [ 1%] Building CXX object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/conv-prepack.cc.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/convolution.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/deconvolution.c.o [ 1%] Building CXX object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/fc-prepack.cc.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/Float8_e4m3fn.cpp.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/fully-connected.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/fully-connected-sparse.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/global-average-pooling.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/hardsigmoid.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/hardswish.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/leaky-relu.c.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/Float8_e4m3fnuz.cpp.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/max-pooling.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/sigmoid.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/softargmax.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/tanh.c.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/Float8_e5m2.cpp.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/operator-delete.c.o [ 1%] Building CXX object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/conv-run.cc.o [ 1%] Building CXX object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/deconv-run.cc.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/Float8_e5m2fnuz.cpp.o [ 1%] Building CXX object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/fc-run.cc.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/Half.cpp.o [ 1%] Building CXX object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/fc-unpack.cc.o [ 1%] Building CXX object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/fc-dynamic-run.cc.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/indirection.c.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/LeftRight.cpp.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/operator-run.c.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/Logging.cpp.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/u8lut32norm/scalar.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/x8lut/scalar.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/sgemm/6x8-psimd.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8avgpool/mp8x9p8q-sse2.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8avgpool/up8x9-sse2.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8avgpool/up8xm-sse2.c.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8conv/4x4c2-sse2.c.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/MathConstants.cpp.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8dwconv/mp8x25-sse2.c.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/Metaprogramming.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/Optional.cpp.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8dwconv/mp8x25-sse2-per-channel.c.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/ParallelGuard.cpp.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/SmallVector.cpp.o [ 1%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8dwconv/mp8x27-sse2.c.o [ 1%] Building CXX object c10/CMakeFiles/c10.dir/util/StringUtil.cpp.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8dwconv/up8x9-sse2.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/ThreadLocalDebugInfo.cpp.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8dwconv/up8x9-sse2-per-channel.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gavgpool/mp8x7p7q-sse2.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gavgpool/up8x7-sse2.c.o [ 2%] Built target ATEN_CPU_FILES_GEN_TARGET [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/TypeCast.cpp.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gavgpool/up8xm-sse2.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/2x4c8-sse2.c.o [ 2%] Built target ATEN_CUDA_FILES_GEN_TARGET [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/scalar.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/cs16-bfly4-samples1-scalar.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/cs16-bfly4-samples4-scalar.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/gen/cs16-bfly4-scalar-x1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/gen/cs16-bfly4-scalar-x2.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/TypeList.cpp.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/4x4c2-dq-sse2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-bfly4/gen/cs16-bfly4-scalar-x4.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/TypeTraits.cpp.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-fftr/gen/cs16-fftr-scalar-x1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-fftr/gen/cs16-fftr-scalar-x2.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/Type_demangle.cpp.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-fftr/gen/cs16-fftr-scalar-x4.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/4x4c2-sse2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-vsquareabs/gen/cs16-vsquareabs-scalar-x1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-vsquareabs/gen/cs16-vsquareabs-scalar-x2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-vsquareabs/gen/cs16-vsquareabs-scalar-x3.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/cs16-vsquareabs/gen/cs16-vsquareabs-scalar-x4.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-scalar-u1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-scalar-u2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-scalar-u3.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-scalar-u4.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/Type_no_demangle.cpp.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-fmagic-u1.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/Unicode.cpp.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/UniqueVoidPtr.cpp.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-fmagic-u2.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm_sparse/8x4c1x4-dq-packedA-sse2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-fmagic-u3.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-fmagic-u4.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/complex_math.cpp.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-imagic-u1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-imagic-u2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-imagic-u3.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-qs8-vcvt/gen/f16-qs8-vcvt-scalar-imagic-u4.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-scalar-u1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-scalar-u2-acc2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-scalar-u3-acc3.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/flags_use_gflags.cpp.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm_sparse/8x4-packA-sse2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-scalar-u4-acc2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmax-scalar-u4-acc4.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-scalar-u1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-scalar-u2-acc2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-scalar-u3-acc3.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-scalar-u4-acc2.c.o [ 2%] Building CXX object c10/CMakeFiles/c10.dir/util/flags_use_no_gflags.cpp.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rmin-scalar-u4-acc4.c.o [ 2%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8vadd/sse2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-scalar-u1.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-scalar-u2-acc2.c.o [ 2%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-scalar-u3-acc3.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-scalar-u4-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/gen/f16-rminmax-scalar-u4-acc4.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-argmaxpool/f32-argmaxpool-4x-scalar-c1.c.o [ 3%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/u8clamp/sse2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-argmaxpool/f32-argmaxpool-9p8x-scalar-c1.c.o [ 3%] Building CXX object c10/CMakeFiles/c10.dir/util/int128.cpp.o [ 3%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/u8maxpool/16x9p8q-sse2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-argmaxpool/f32-argmaxpool-9x-scalar-c1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-avgpool/f32-avgpool-9p8x-minmax-scalar-c1.c.o [ 3%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/u8maxpool/sub16-sse2.c.o [ 3%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/u8rmax/sse2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-avgpool/f32-avgpool-9x-minmax-scalar-c1.c.o [ 3%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/x8zip/x2-sse2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc2chw/f32-conv-hwc2chw-3x3s2p1c3x4-scalar-1x1.c.o [ 3%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/x8zip/x3-sse2.c.o [ 3%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/x8zip/x4-sse2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/f32-conv-hwc-3x3s2p0p1c3x4-scalar-1x1.c.o [ 3%] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/x8zip/xm-sse2.c.o [ 3%] Linking CXX static library ../../lib/libpytorch_qnnpack.a [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc/f32-conv-hwc-3x3s2p1c3x4-scalar-1x1.c.o [ 3%] Built target pytorch_qnnpack [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/hardware-config.dir/src/configs/hardware-config.c.o [ 3%] Built target hardware-config [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/indirection.dir/src/indirection.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-1x1-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-1x1-acc3.c.o [ 3%] Building CXX object c10/CMakeFiles/c10.dir/util/intrusive_ptr.cpp.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-1x1-acc4.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-1x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-2x1-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-2x1.c.o [ 3%] Built target indirection [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-3x1.c.o [ 3%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/jit/aarch32-assembler.cc.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-4x1.c.o [ 3%] Building CXX object c10/CMakeFiles/c10.dir/util/numa.cpp.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-5x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-scalar-6x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-1x1-acc2.c.o [ 3%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/jit/aarch64-assembler.cc.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-1x1-acc3.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-1x1-acc4.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-1x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-2x1-acc2.c.o [ 3%] Building CXX object c10/CMakeFiles/c10.dir/util/signal_handler.cpp.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-2x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-3x1.c.o [ 3%] Building CXX object confu-deps/XNNPACK/CMakeFiles/jit.dir/src/jit/assembler.cc.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-scalar-4x1.c.o [ 3%] Built target jit [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microparams-init.dir/src/microparams-init.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-1x1-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-1x1-acc3.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-1x1-acc4.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-1x1-acc5.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-1x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-2x1-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-2x1-acc3.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-2x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-3x1-acc2.c.o [ 3%] Built target microparams-init [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/packing.dir/src/packing.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-scalar-3x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-1x1-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-1x1-acc3.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-1x1-acc4.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-1x1-acc5.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-1x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-2x1-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/sse.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-2x1-acc3.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-2x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-3x1-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-scalar-3x1.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l1c1s1r-minmax-scalar-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l1c1s1r-minmax-scalar.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l1c1s1r-scalar-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l1c1s1r-scalar.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l4c1s1r-minmax-scalar-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l4c1s1r-minmax-scalar.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l4c1s1r-scalar-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-2f2m2l4c1s1r-scalar.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3f3m3l1c1s1r-scalar-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3f3m3l1c1s1r-scalar.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p1c-minmax-scalar-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p1c-minmax-scalar.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p1c-scalar-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p1c-scalar.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p2c-minmax-scalar-acc2.c.o [ 3%] Built target packing [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p2c-minmax-scalar.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/memory.dir/src/memory.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p2c-scalar-acc2.c.o [ 3%] Built target memory [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/mutex.dir/src/mutex.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p2c-scalar.c.o [ 3%] Built target mutex [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/post-operation.dir/src/operators/post-operation.c.o [ 3%] Building CXX object c10/CMakeFiles/c10.dir/util/tempfile.cpp.o [ 3%] Built target post-operation [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p1c-minmax-scalar-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/operator-utils.dir/src/operator-utils.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p1c-minmax-scalar.c.o [ 3%] Built target operator-utils [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/operator-run.dir/src/operator-run.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p1c-scalar-acc2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/sse2.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p1c-scalar.c.o [ 3%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p2c-minmax-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p2c-minmax-scalar.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p2c-scalar-acc2.c.o [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p2c-scalar.c.o [ 4%] Built target operator-run [ 4%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l1c1s1r-minmax-scalar-acc2.c.o [ 4%] Linking CXX static library ../lib/libcaffe2_protos.a [ 4%] Built target caffe2_protos [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/cache.dir/src/cache.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l1c1s1r-minmax-scalar.c.o [ 5%] Built target cache [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operator-delete.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l1c1s1r-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/argmax-pooling-nhwc.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l1c1s1r-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/average-pooling-nhwc.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l1c1s1r-minmax-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l1c1s1r-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/batch-matrix-multiply-nc.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l1c1s1r-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l1c1s1r-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/binary-elementwise-nd.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l1c1s1r-minmax-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l1c1s1r-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l1c1s1r-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/channel-shuffle-nc.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l1c1s1r-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/constant-pad-nd.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p1c-minmax-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/convolution-nchw.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p1c-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p1c-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p1c-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p2c-minmax-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/convolution-nhwc.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p2c-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p2c-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p2c-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p1c-minmax-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/deconvolution-nhwc.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p1c-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p1c-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p1c-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/dynamic-fully-connected-nc.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p2c-minmax-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/fully-connected-nc.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p2c-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p2c-scalar-acc2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/global-average-pooling-ncw.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p2c-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/global-average-pooling-nwc.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-bitcast-u1.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-bitcast-u2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-bitcast-u3.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/lut-elementwise-nc.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-bitcast-u4.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-fabsf-u1.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/max-pooling-nhwc.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-fabsf-u2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-fabsf-u3.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-scalar-fabsf-u4.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/prelu-nc.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/ssse3.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gavgpool-cw/f32-gavgpool-cw-scalar-u1.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/reduce-nd.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gavgpool/f32-gavgpool-7p7x-minmax-scalar-c1.c.o [ 5%] Building CXX object c10/CMakeFiles/c10.dir/util/thread_name.cpp.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gavgpool/f32-gavgpool-7x-minmax-scalar-c1.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/resize-bilinear-nchw.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x4-relu-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/resize-bilinear-nhwc.c.o [ 5%] Building CXX object c10/CMakeFiles/c10.dir/util/typeid.cpp.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x4-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/rope-nthc.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-2x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/scaled-dot-product-attention-nhtc.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-2x4-relu-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/sse41.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-2x4-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/slice-nd.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/softmax-nc.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2-relu-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/transpose-nd.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x4-relu-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/unary-elementwise-nc.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x4-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-2x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-scalar-p1.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-scalar-p2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-scalar-p4.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/operators.dir/src/operators/unpooling-nhwc.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear/gen/f32-ibilinear-scalar-c1.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear/gen/f32-ibilinear-scalar-c2.c.o [ 5%] Built target operators [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear/gen/f32-ibilinear-scalar-c4.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/memory-planner.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x4-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/runtime.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x4-relu-scalar.c.o [ 5%] Linking CXX shared library ../lib/libc10.so [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x4-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-2x4-minmax-scalar.c.o [ 5%] Built target c10 [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/abs.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-2x4-relu-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/add2.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-2x4-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x2-minmax-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/argmax-pooling-2d.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/average-pooling-2d.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x2-relu-scalar.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/bankers-rounding.c.o [ 5%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/batch-matrix-multiply.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x2-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/ceiling.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/clamp.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/concatenate.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/convert.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x4-relu-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/convolution-2d.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/copy.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x4-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/deconvolution-2d.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-maxpool/f32-maxpool-9p8x-minmax-scalar-c1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/depth-to-space-2d.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-pavgpool/f32-pavgpool-9p8x-minmax-scalar-c1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/depthwise-convolution-2d.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/divide.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-pavgpool/f32-pavgpool-9x-minmax-scalar-c1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/elu.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-2x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/avx.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/f16c.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/even-split.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-3x3-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x2-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/floor.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-scalar-2x1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/fully-connected-sparse.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-scalar-2x4.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/fully-connected.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-2x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/global-average-pooling.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x2-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/global-sum-pooling.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/hardswish.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/leaky-relu.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x4-relu-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/max-pooling-2d.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x4-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/maximum2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-2x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/minimum2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-2x4-relu-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/multiply2.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/negate.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-2x4-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/xop.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/prelu.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/reshape-helpers.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/scaled-dot-product-attention.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2-relu-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/sigmoid.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/softmax.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/space-to-depth-2d.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x4-relu-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/square-root.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/square.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x4-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/squared-difference.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-spmm/gen/f32-qc8w-spmm-1x1-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/static-constant-pad.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-spmm/gen/f32-qc8w-spmm-2x1-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-spmm/gen/f32-qc8w-spmm-4x1-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/static-mean.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-spmm/gen/f32-qc8w-spmm-8x1-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/static-reshape.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-spmm/gen/f32-qc8w-spmm-8x2-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/static-resize-bilinear-2d.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/static-slice.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-spmm/gen/f32-qc8w-spmm-8x4-minmax-scalar.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/static-transpose.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/subtract.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-fmagic-u1.c.o [ 6%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-fmagic-u2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/tanh.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-fmagic-u3.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-fmagic-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/unpooling-2d.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-imagic-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/subgraph/validation.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-imagic-u2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-imagic-u3.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/subgraph.dir/src/tensor.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-imagic-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-lrintf-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/fma3.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-lrintf-u2.c.o [ 7%] Built target subgraph [ 7%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDAAllocatorConfig.cpp.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-lrintf-u3.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-scalar-lrintf-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-fmagic-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-fmagic-u2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-fmagic-u3.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-fmagic-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-imagic-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-imagic-u2.c.o [ 7%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDACachingAllocator.cpp.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-imagic-u3.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-imagic-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-lrintf-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-lrintf-u2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-lrintf-u3.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-scalar-lrintf-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-lut64-p2-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-lut64-p2-u2-acc2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-lut64-p2-u2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-lut64-p2-u4-acc2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-lut64-p2-u4-acc4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-lut64-p2-u4.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-p5-u1.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-p5-u2-acc2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-p5-u2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-p5-u4-acc2.c.o [ 7%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-p5-u4-acc4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-scalar-rr2-p5-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-scalar-u2-acc2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/avx2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-scalar-u3-acc3.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-scalar-u4-acc2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-scalar-u4-acc4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-scalar-u2-acc2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-scalar-u3-acc3.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-scalar-u4-acc2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-scalar-u4-acc4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-scalar-u2-acc2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-scalar-u3-acc3.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-scalar-u4-acc2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-scalar-u4-acc4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-scalar-u2-acc2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-scalar-u3-acc3.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-scalar-u4-acc2.c.o [ 8%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDADeviceAssertionHost.cpp.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-scalar-u4-acc4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-1x1-minmax-scalar-pipelined.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-1x1-minmax-scalar.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-2x1-minmax-scalar-pipelined.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-2x1-minmax-scalar.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x1-minmax-scalar-pipelined.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x1-minmax-scalar.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x1-minmax-scalar-pipelined.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x1-minmax-scalar.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x2-minmax-scalar.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x4-minmax-scalar.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-scalar-u4.c.o [ 8%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDAException.cpp.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-scalar-u8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-relu-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-relu-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-relu-scalar-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-relu-scalar-u8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-scalar-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-scalar-u8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-scalar-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-scalar-u8.c.o [ 8%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDAFunctions.cpp.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-relu-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-relu-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-relu-scalar-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-relu-scalar-u8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-scalar-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-scalar-u8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-scalar-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-scalar-u8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-relu-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-relu-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-relu-scalar-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/avx512f.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-relu-scalar-u8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-scalar-u2.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-scalar-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-scalar-u8.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-scalar-u1.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-scalar-u2.c.o [ 8%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDAMallocAsyncAllocator.cpp.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-scalar-u4.c.o [ 8%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-relu-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-relu-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-relu-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-relu-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/avx512skx.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-relu-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-relu-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-relu-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-relu-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-relu-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-relu-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-relu-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-relu-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-relu-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-relu-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-relu-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-relu-scalar-u8.c.o [ 9%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDAMiscFunctions.cpp.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-scalar-u4.c.o [ 9%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/CUDAStream.cpp.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-relu-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-relu-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-relu-scalar-u4.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-relu-scalar-u8.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-scalar-u1.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-scalar-u2.c.o [ 9%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/avx512vbmi.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/avx512vnni.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-relu-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-relu-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-relu-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-relu-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-relu-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-relu-scalar-u2.c.o [ 10%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/impl/CUDAGuardImpl.cpp.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-relu-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-relu-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vclamp/gen/f32-vclamp-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vclamp/gen/f32-vclamp-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/avx512vnnigfni.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vclamp/gen/f32-vclamp-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-lut16-p3-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-lut16-p3-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-lut16-p3-u3.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-lut16-p3-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/amalgam/gen/avx512amx.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-lut16-p3-u5.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-lut16-p3-u6.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-p6-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-p6-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2-k-over-64.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2-k-over-2048.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-p6-u3.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2minus-k-over-4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2minus-k-over-8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2minus-k-over-16.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2minus-k-over-32.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-p6-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2minus-k-over-64.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/exp2minus-k-over-2048.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/tables/vlog.c.o [ 10%] Built target microkernels-prod [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/argmaxpool-config.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-p6-u5.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/avgpool-config.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-scalar-rr2-p6-u6.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/binary-elementwise-config.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-scalar-u1.c.o [ 10%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/impl/CUDATest.cpp.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/cmul-config.c.o [ 10%] Building CXX object c10/cuda/CMakeFiles/c10_cuda.dir/driver_api.cpp.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/conv-hwc2chw-config.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/dwconv-config.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/dwconv2d-chw-config.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vmulcaddc/gen/f32-vmulcaddc-c1-minmax-scalar-2x.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/experiments-config.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/gavgpool-config.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vmulcaddc/gen/f32-vmulcaddc-c2-minmax-scalar-2x.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/gavgpool-cw-config.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vmulcaddc/gen/f32-vmulcaddc-c4-minmax-scalar-2x.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/gemm-config.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-scalar-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-scalar-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-scalar-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-scalar-u8.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-scalar-libm-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-scalar-libm-u2.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/ibilinear-chw-config.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-scalar-libm-u4.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-scalar-libm-u1.c.o [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/ibilinear-config.c.o [ 10%] Linking CXX shared library ../../lib/libc10_cuda.so [ 10%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-scalar-libm-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-scalar-libm-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/lut32norm-config.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-scalar-libm-u1.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/maxpool-config.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-scalar-libm-u2.c.o Warning: Unused direct dependencies: libc10.so.2.4 /lib64/libgflags.so.2.2 /lib64/libglog.so.0 /lib64/libm.so.6 [ 11%] Built target c10_cuda [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-scalar-libm-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-scalar-libm-u1.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/pavgpool-config.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-scalar-libm-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-scalar-libm-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/prelu-config.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrsqrt/gen/f32-vrsqrt-scalar-rsqrt-u1.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrsqrt/gen/f32-vrsqrt-scalar-rsqrt-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/raddstoreexpminusmax-config.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrsqrt/gen/f32-vrsqrt-scalar-rsqrt-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-lut64-p2-div-u1.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-lut64-p2-div-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/reduce-config.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-lut64-p2-div-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-lut2048-p1-div-u1.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-lut2048-p1-div-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/rmax-config.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-lut2048-p1-div-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-p5-div-u1.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/spmm-config.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-p5-div-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/transpose-config.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-scalar-rr2-p5-div-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-scalar-sqrt-u1.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-scalar-sqrt-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-scalar-sqrt-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/unary-elementwise-config.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/unpool-config.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-scalar-expm1minus-rr1-lut8-p4h3ts-div-u1.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-scalar-expm1minus-rr1-lut8-p4h3ts-div-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/vmulcaddc-config.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-scalar-expm1minus-rr1-lut8-p4h3ts-div-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/xx-fill-config.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-scalar-expm1minus-rr1-p6h5ts-div-u1.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/xx-pad-config.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-scalar-expm1minus-rr1-p6h5ts-div-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/x8-lut-config.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-scalar-expm1minus-rr1-p6h5ts-div-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/configs/zip-config.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vabs-scalar-u1.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/init.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/params.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vabs-scalar-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vabs-scalar-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vneg-scalar-u1.c.o [ 11%] Linking CXX static library ../../lib/libXNNPACK.a [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vneg-scalar-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vneg-scalar-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vsqr-scalar-u1.c.o [ 11%] Built target XNNPACK [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vsqr-scalar-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vsqr-scalar-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/i16-vlshift/gen/i16-vlshift-scalar-u1.c.o [ 11%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/AccumulateType.cpp.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/i16-vlshift/gen/i16-vlshift-scalar-u2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/i16-vlshift/gen/i16-vlshift-scalar-u3.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/i16-vlshift/gen/i16-vlshift-scalar-u4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-scalar-rr2-lut4-p4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-scalar-rr2-lut8-p3.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-scalar-rr2-lut8-p4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-scalar-rr2-lut16-p3.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-scalar-rr2-lut16-p4.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-scalar-rr2-p5.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-scalar-rr2-p6.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expminus-scalar-rr2-lut64-p2.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expminus-scalar-rr2-lut2048-p1.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expminus-scalar-rr2-p5.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-f16-cvt-scalar-bitcast.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-f16-cvt-scalar-fabsf.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundd-scalar-addsub.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundd-scalar-cvt.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundd-scalar-floor.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundne-scalar-addsub.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundne-scalar-nearbyint.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundne-scalar-rint.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundu-scalar-addsub.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundu-scalar-ceil.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundu-scalar-cvt.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundz-scalar-addsub.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundz-scalar-cvt.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundz-scalar-trunc.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-scalar-rr2-lut64-p2-div.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-scalar-rr2-lut2048-p1-div.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-scalar-rr2-p5-div.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut4-p4h2ts-div.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut4-p4h2ts-rcp.c.o [ 11%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut4-p4h3ps-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut4-p4h3ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut8-p3h1ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut8-p4h2ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut8-p4h2ts-rcp.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut8-p4h3ps-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut8-p4h3ps-rcp.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut8-p4h3ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut8-p4h3ts-rcp.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut16-p3h1ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut16-p4h2ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut16-p4h2ts-rcp.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut16-p4h3ps-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut16-p4h3ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut32-p3h1ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-lut64-p3h1ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-p6h4ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-p6h5ps-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-p6h5ps-rcp.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-p6h5ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr1-p6h5ts-rcp.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut4-p4h2ts-div.c.o [ 12%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/CPUGeneratorImpl.cpp.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut4-p4h3ps-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut4-p4h3ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut8-p3h1ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut8-p4h2ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut8-p4h2ts-rcp.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut8-p4h3ps-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut8-p4h3ps-rcp.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut8-p4h3ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut8-p4h3ts-rcp.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut16-p3h1ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut16-p4h2ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut16-p4h3ps-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut16-p4h3ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut32-p3h1ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-lut64-p3h1ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-p6h5ps-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-p6h4ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1minus-rr2-p6h5ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut4-p4h2ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut4-p4h3ps-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut4-p4h3ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut8-p3h1ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut8-p4h2ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut8-p4h3ps-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut8-p4h3ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut16-p3h1ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut16-p4h2ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut16-p4h3ps-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut16-p4h3ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut32-p3h1ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-lut64-p3h1ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-p6h4ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-p6h5ps-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr1-p6h5ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut4-p4h2ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut4-p4h3ps-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut4-p4h3ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut8-p3h1ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut8-p4h2ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut8-p4h3ps-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut8-p4h3ts-div.c.o [ 12%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/CachedTensorUtils.cpp.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut16-p3h1ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut16-p4h2ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut16-p4h3ps-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut16-p4h3ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut32-p3h1ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-lut64-p3h1ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-p6h4ts-div.c.o [ 12%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-p6h5ps-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-scalar-expm1plus-rr2-p6h5ts-div.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-bitmanip.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-clz-binsearch.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-clz-newton.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-cvti32-sqrt-lrint.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-cvti64-sqrt-lrint.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-cvti64-sqrtf-lrintf.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-cvtu32-sqrt-lrint.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-hashemian.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-cvtu32-sqrtf-lrintf.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u32-sqrt-scalar-tflm.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u64-sqrt-scalar-cvtu32-sqrt-cvtsatu32f64.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u64-sqrt-scalar-cvtu32-sqrt-llrint.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/u64-sqrt-scalar-cvtu64-sqrt-llrint.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x1-minmax-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x2-minmax-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x4-minmax-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x8-minmax-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x2-minmax-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x4-minmax-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x8-minmax-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x4-minmax-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x2-minmax-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x4-minmax-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x8-minmax-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x2-minmax-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x4-minmax-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x8-minmax-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x4-minmax-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x2-minmax-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x8-minmax-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x4-minmax-scalar.c.o [ 13%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ConjugateFallback.cpp.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x2-minmax-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x4-minmax-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x8-minmax-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x4-minmax-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p1c-minmax-fp32-scalar-fmagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p1c-minmax-fp32-scalar-imagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p1c-minmax-fp32-scalar-lrintf.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p1c-minmax-rndnu-scalar.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p2c-minmax-fp32-scalar-fmagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p2c-minmax-fp32-scalar-imagic.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p2c-minmax-fp32-scalar-lrintf.c.o [ 13%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p2c-minmax-rndnu-scalar.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p4c-minmax-fp32-scalar-fmagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p4c-minmax-fp32-scalar-imagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p4c-minmax-fp32-scalar-lrintf.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p4c-minmax-rndnu-scalar.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p1c-minmax-fp32-scalar-fmagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p1c-minmax-fp32-scalar-imagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p1c-minmax-fp32-scalar-lrintf.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p2c-minmax-fp32-scalar-fmagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p2c-minmax-fp32-scalar-imagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p2c-minmax-fp32-scalar-lrintf.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p4c-minmax-fp32-scalar-fmagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p4c-minmax-fp32-scalar-imagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p4c-minmax-fp32-scalar-lrintf.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-scalar-u1.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-scalar-u2.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-scalar-u3.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-scalar-u4.c.o [ 14%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Context.cpp.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-fmagic-c1.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-fmagic-c2.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-fmagic-c4.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-imagic-c1.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-imagic-c2.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-imagic-c4.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-lrintf-c1.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-lrintf-c2.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-scalar-lrintf-c4.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-fmagic-c1.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-fmagic-c2.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-fmagic-c4.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-imagic-c1.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-imagic-c2.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-imagic-c4.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-lrintf-c1.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-lrintf-c2.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-scalar-lrintf-c4.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p1c-minmax-fp32-scalar-fmagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p2c-minmax-fp32-scalar-imagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p2c-minmax-fp32-scalar-lrintf.c.o [ 14%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/DLConvertor.cpp.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-4p2c-minmax-fp32-scalar-imagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 14%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/DeviceAccelerator.cpp.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 14%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Dispatch.cpp.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p1c-minmax-fp32-scalar-fmagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p1c-minmax-fp32-scalar-imagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p1c-minmax-fp32-scalar-lrintf.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p2c-minmax-fp32-scalar-fmagic.c.o [ 14%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p2c-minmax-fp32-scalar-imagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p2c-minmax-fp32-scalar-lrintf.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p4c-minmax-fp32-scalar-fmagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p4c-minmax-fp32-scalar-imagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p4c-minmax-fp32-scalar-lrintf.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p1c-minmax-fp32-scalar-fmagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p1c-minmax-fp32-scalar-imagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p1c-minmax-fp32-scalar-lrintf.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p2c-minmax-fp32-scalar-fmagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p2c-minmax-fp32-scalar-imagic.c.o [ 15%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/DynamicLibrary.cpp.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p2c-minmax-fp32-scalar-lrintf.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p4c-minmax-fp32-scalar-fmagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p4c-minmax-fp32-scalar-imagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p4c-minmax-fp32-scalar-lrintf.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x2-minmax-fp32-scalar-fmagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x2-minmax-fp32-scalar-imagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x2-minmax-fp32-scalar-lrintf.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4-minmax-fp32-scalar-fmagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4-minmax-fp32-scalar-imagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4-minmax-fp32-scalar-lrintf.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x2-minmax-fp32-scalar-fmagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x2-minmax-fp32-scalar-imagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x2-minmax-fp32-scalar-lrintf.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4-minmax-fp32-scalar-fmagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4-minmax-fp32-scalar-imagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4-minmax-fp32-scalar-lrintf.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x2-minmax-fp32-scalar-fmagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x2-minmax-fp32-scalar-imagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x2-minmax-fp32-scalar-lrintf.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4-minmax-fp32-scalar-fmagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4-minmax-fp32-scalar-imagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4-minmax-fp32-scalar-lrintf.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x2-minmax-fp32-scalar-fmagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x2-minmax-fp32-scalar-imagic.c.o [ 15%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/EmptyTensor.cpp.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x2-minmax-fp32-scalar-lrintf.c.o [ 15%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ExpandUtils.cpp.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4-minmax-fp32-scalar-fmagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4-minmax-fp32-scalar-imagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4-minmax-fp32-scalar-lrintf.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x2-minmax-fp32-scalar-fmagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x2-minmax-fp32-scalar-imagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x2-minmax-fp32-scalar-lrintf.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4-minmax-fp32-scalar-fmagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4-minmax-fp32-scalar-imagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4-minmax-fp32-scalar-lrintf.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x2-minmax-fp32-scalar-fmagic.c.o [ 15%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/FuncTorchTLS.cpp.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x2-minmax-fp32-scalar-imagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x2-minmax-fp32-scalar-lrintf.c.o [ 15%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/FunctionalInverses.cpp.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4-minmax-fp32-scalar-fmagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4-minmax-fp32-scalar-imagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4-minmax-fp32-scalar-lrintf.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x2-minmax-fp32-scalar-fmagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x2-minmax-fp32-scalar-imagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x2-minmax-fp32-scalar-lrintf.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4-minmax-fp32-scalar-fmagic.c.o [ 15%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/FunctionalStorageImpl.cpp.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4-minmax-fp32-scalar-imagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4-minmax-fp32-scalar-lrintf.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x2-minmax-fp32-scalar-fmagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x2-minmax-fp32-scalar-imagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x2-minmax-fp32-scalar-lrintf.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4-minmax-fp32-scalar-fmagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4-minmax-fp32-scalar-imagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4-minmax-fp32-scalar-lrintf.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-fp32-scalar-fmagic.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-fp32-scalar-lrintf.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-gemmlowp-scalar.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndna-scalar-signed64.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndna-scalar-unsigned32.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndna-scalar-unsigned64.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndnu-scalar.c.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-scalar-u1.c.o [ 15%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/FunctionalTensorWrapper.cpp.o [ 15%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-scalar-u2.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-scalar-u4.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-scalar-u1.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-scalar-u2.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-scalar-u4.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-scalar-u1.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-scalar-u2.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-scalar-u4.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-scalar-u1.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-scalar-u2.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-scalar-u4.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-scalar-andxor-u1.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-scalar-andxor-u2.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-scalar-andxor-u4.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-scalar-select-u1.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-scalar-select-u2.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-scalar-select-u4.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-scalar-u1.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-scalar-u2.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-scalar-u4.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-scalar-u1.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-scalar-u2.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-scalar-u4.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-scalar-u1.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-scalar-u2.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-scalar-u4.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-avgpool/qu8-avgpool-9p8x-minmax-fp32-scalar-imagic-c1.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-avgpool/qu8-avgpool-9x-minmax-fp32-scalar-imagic-c1.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 16%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/FunctionalizeFallbackKernel.cpp.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-fmagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-imagic.c.o [ 16%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/LegacyBatchedFallback.cpp.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l1c1s1r-minmax-fp32-scalar-lrintf.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-fmagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-imagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l2c1s1r-minmax-fp32-scalar-lrintf.c.o [ 16%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/LegacyBatchedTensorImpl.cpp.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-fmagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-imagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l4c1s1r-minmax-fp32-scalar-lrintf.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p1c-minmax-fp32-scalar-fmagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p1c-minmax-fp32-scalar-imagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p1c-minmax-fp32-scalar-lrintf.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p1c-minmax-rndnu-scalar.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p2c-minmax-fp32-scalar-fmagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p2c-minmax-fp32-scalar-imagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p2c-minmax-fp32-scalar-lrintf.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p2c-minmax-rndnu-scalar.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p4c-minmax-fp32-scalar-fmagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p4c-minmax-fp32-scalar-imagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p4c-minmax-fp32-scalar-lrintf.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p4c-minmax-rndnu-scalar.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p1c-minmax-fp32-scalar-fmagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p1c-minmax-fp32-scalar-imagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p1c-minmax-fp32-scalar-lrintf.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p2c-minmax-fp32-scalar-fmagic.c.o [ 16%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p2c-minmax-fp32-scalar-imagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p2c-minmax-fp32-scalar-lrintf.c.o [ 17%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/LegacyBatchingRegistrations.cpp.o [ 17%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/LegacyVmapMode.cpp.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p4c-minmax-fp32-scalar-fmagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p4c-minmax-fp32-scalar-imagic.c.o [ 17%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/LegacyVmapTransforms.cpp.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p4c-minmax-fp32-scalar-lrintf.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-scalar-u1.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-scalar-u2.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-scalar-u3.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-scalar-u4.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-fmagic-c1.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-fmagic-c2.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-fmagic-c4.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-imagic-c1.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-imagic-c2.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-imagic-c4.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-lrintf-c1.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-lrintf-c2.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-scalar-lrintf-c4.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-fmagic-c1.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-fmagic-c2.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-fmagic-c4.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-imagic-c1.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-imagic-c2.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-imagic-c4.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-lrintf-c1.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-lrintf-c2.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-scalar-lrintf-c4.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x2-minmax-fp32-scalar-fmagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x2-minmax-fp32-scalar-imagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x2-minmax-fp32-scalar-lrintf.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x2-minmax-rndnu-scalar.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4-minmax-fp32-scalar-fmagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4-minmax-fp32-scalar-imagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4-minmax-fp32-scalar-lrintf.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4-minmax-rndnu-scalar.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x2-minmax-fp32-scalar-fmagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x2-minmax-fp32-scalar-imagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x2-minmax-fp32-scalar-lrintf.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x2-minmax-rndnu-scalar.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4-minmax-fp32-scalar-fmagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4-minmax-fp32-scalar-imagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4-minmax-fp32-scalar-lrintf.c.o [ 17%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/MapAllocator.cpp.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4-minmax-rndnu-scalar.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x2-minmax-fp32-scalar-fmagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x2-minmax-fp32-scalar-imagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x2-minmax-fp32-scalar-lrintf.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x2-minmax-rndnu-scalar.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4-minmax-fp32-scalar-fmagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4-minmax-fp32-scalar-imagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4-minmax-fp32-scalar-lrintf.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4-minmax-rndnu-scalar.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x2-minmax-fp32-scalar-fmagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x2-minmax-fp32-scalar-imagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x2-minmax-fp32-scalar-lrintf.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x2-minmax-rndnu-scalar.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4-minmax-fp32-scalar-fmagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4-minmax-fp32-scalar-imagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4-minmax-fp32-scalar-lrintf.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4-minmax-rndnu-scalar.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x2-minmax-fp32-scalar-fmagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x2-minmax-fp32-scalar-imagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x2-minmax-fp32-scalar-lrintf.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x2-minmax-rndnu-scalar.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4-minmax-fp32-scalar-fmagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4-minmax-fp32-scalar-imagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4-minmax-fp32-scalar-lrintf.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4-minmax-rndnu-scalar.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x2-minmax-fp32-scalar-fmagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x2-minmax-fp32-scalar-imagic.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x2-minmax-fp32-scalar-lrintf.c.o [ 17%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/MemoryOverlap.cpp.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x2-minmax-rndnu-scalar.c.o [ 17%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4-minmax-fp32-scalar-fmagic.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4-minmax-fp32-scalar-imagic.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4-minmax-fp32-scalar-lrintf.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4-minmax-rndnu-scalar.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x2-minmax-fp32-scalar-fmagic.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x2-minmax-fp32-scalar-imagic.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x2-minmax-fp32-scalar-lrintf.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x2-minmax-rndnu-scalar.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4-minmax-fp32-scalar-fmagic.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4-minmax-fp32-scalar-imagic.c.o [ 18%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/NamedTensorUtils.cpp.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4-minmax-fp32-scalar-lrintf.c.o [ 18%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/NestedTensorImpl.cpp.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4-minmax-rndnu-scalar.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x2-minmax-fp32-scalar-fmagic.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x2-minmax-fp32-scalar-imagic.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x2-minmax-fp32-scalar-lrintf.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x2-minmax-rndnu-scalar.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4-minmax-fp32-scalar-fmagic.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4-minmax-fp32-scalar-imagic.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4-minmax-fp32-scalar-lrintf.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4-minmax-rndnu-scalar.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-fp32-scalar-fmagic.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-fp32-scalar-lrintf.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-gemmlowp-scalar.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-rndna-scalar-signed64.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-rndna-scalar-unsigned32.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-rndna-scalar-unsigned64.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-scalar-u1.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-scalar-u2.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-scalar-u4.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-scalar-u1.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-scalar-u2.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-scalar-u4.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-scalar-u1.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-scalar-u2.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-scalar-u4.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-scalar-u1.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-scalar-u2.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-scalar-u4.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-scalar-andxor-u1.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-scalar-andxor-u2.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-scalar-andxor-u4.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-scalar-select-u1.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-scalar-select-u2.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-scalar-select-u4.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-scalar-u1.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-scalar-u2.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-scalar-u4.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-scalar-u1.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-scalar-u2.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-scalar-u4.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-ibilinear/gen/s8-ibilinear-scalar-c1.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-ibilinear/gen/s8-ibilinear-scalar-c2.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-ibilinear/gen/s8-ibilinear-scalar-c4.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-maxpool/s8-maxpool-9p8x-minmax-scalar-c1.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-vclamp/s8-vclamp-scalar-u4.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-rmaxabs/gen/s16-rmaxabs-scalar-x1.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-rmaxabs/gen/s16-rmaxabs-scalar-x2.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-rmaxabs/gen/s16-rmaxabs-scalar-x3.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-rmaxabs/gen/s16-rmaxabs-scalar-x4.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-scalar-u1.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-scalar-u2.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-scalar-u3.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s16-window/gen/s16-window-scalar-u4.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-ibilinear/gen/u8-ibilinear-scalar-c1.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-ibilinear/gen/u8-ibilinear-scalar-c2.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-ibilinear/gen/u8-ibilinear-scalar-c4.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-lut32norm/u8-lut32norm-scalar.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-maxpool/u8-maxpool-9p8x-minmax-scalar-c1.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-rmax/u8-rmax-scalar-u2.c.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-vclamp/u8-vclamp-scalar-u4.c.o [ 18%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ParallelCommon.cpp.o [ 18%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-filterbank-accumulate/gen/u32-filterbank-accumulate-scalar-x1.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-filterbank-subtract/u32-filterbank-subtract-scalar-x2.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-vlog/gen/u32-vlog-scalar-x1.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-vlog/gen/u32-vlog-scalar-x2.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-vlog/gen/u32-vlog-scalar-x3.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u32-vlog/gen/u32-vlog-scalar-x4.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u64-u32-vsqrtshift/u64-u32-vsqrtshift-scalar-cvtu32-sqrt-cvtu32f64-u1.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-scalar-u1.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-scalar-u2.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-scalar-u4.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-scalar-u8.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-scalar-u16.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x2-gemm-goi-scalar-int-u2.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x2-gemm-goi-scalar-int-u4.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x4-gemm-goi-scalar-int-u2.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x4-gemm-goi-scalar-int-u4.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x8-gemm-goi-scalar-int-u2.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x8-gemm-goi-scalar-int-u4.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x16-gemm-goi-scalar-int-u2.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x16-gemm-goi-scalar-int-u4.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x32-gemm-goi-scalar-int-u2.c.o [ 19%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ParallelNative.cpp.o [ 19%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ParallelNativeTBB.cpp.o [ 19%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ParallelOpenMP.cpp.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-packw/gen/x8-packw-x32-gemm-goi-scalar-int-u4.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-1x2-scalar-int.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-1x4-scalar-int.c.o [ 19%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ParallelThreadPoolNative.cpp.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-2x1-scalar-int.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-2x2-scalar-int.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-2x4-scalar-int.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-4x1-scalar-int.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-4x2-scalar-int.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-4x4-scalar-int.c.o [ 19%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/PythonTorchFunctionTLS.cpp.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-x2-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-x3-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-x4-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-xm-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x8-gemm-goi-scalar-int-u4.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x16-gemm-goi-scalar-int-u4.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-1x2-scalar-int.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-1x4-scalar-int.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-2x1-scalar-int.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-2x2-scalar-int.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-2x4-scalar-int.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x1-scalar-int.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x2-scalar-int.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-4x4-scalar-int.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-1x2-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-1x4-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-2x1-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-2x2-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-2x4-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-4x1-scalar.c.o [ 19%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/SavedTensorHooks.cpp.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-4x2-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/gen/x24-transposec-4x4-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-2c1s1r-gemm-scalar-float.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-2c1s1r-gemm-scalar-int.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-2c2s1r-gemm-scalar-float.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-2c2s1r-gemm-scalar-int.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-4c1s1r-gemm-scalar-float.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-4c1s1r-gemm-scalar-int.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-4c4s1r-gemm-scalar-float.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packb/gen/x32-packb-4c4s1r-gemm-scalar-int.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x2-gemm-goi-scalar-float-u4.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x2-gemm-goi-scalar-int-u4.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x3-gemm-goi-scalar-float-u4.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x3-gemm-goi-scalar-int-u4.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x4-gemm-goi-scalar-float-u4.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x4-gemm-goi-scalar-int-u4.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8-gemm-goi-scalar-float-u4.c.o [ 19%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ScalarOps.cpp.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8-gemm-goi-scalar-int-u4.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-scalar-float-u4.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-scalar-int-u4.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/x32-packx-2x-scalar.c.o [ 19%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/x32-packx-3x-scalar.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/x32-packx-4x-scalar.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-1x2-scalar-float.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-1x2-scalar-int.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-1x4-scalar-float.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-1x4-scalar-int.c.o [ 20%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/SequenceNumber.cpp.o [ 20%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/SparseCsrTensorImpl.cpp.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x1-scalar-float.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x1-scalar-int.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-scalar-float.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x2-scalar-int.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x4-scalar-float.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-2x4-scalar-int.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x1-scalar-float.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x1-scalar-int.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x2-scalar-float.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x2-scalar-int.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-scalar-float.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-scalar-int.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-unpool/x32-unpool-scalar.c.o [ 20%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/SparseTensorImpl.cpp.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-2c1s1r-gemm-scalar-float.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-2c1s1r-gemm-scalar-int.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-2c2s1r-gemm-scalar-float.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-2c2s1r-gemm-scalar-int.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-4c1s1r-gemm-scalar-float.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-4c1s1r-gemm-scalar-int.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-4c4s1r-gemm-scalar-float.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zerob/gen/x32-zerob-4c4s1r-gemm-scalar-int.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-x2-scalar.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-x3-scalar.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-x4-scalar.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-xm-scalar.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-1x2-scalar-float.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-1x2-scalar-int.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x1-scalar-float.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x1-scalar-int.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-scalar-float.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-scalar-int.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-4x1-scalar-float.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-4x1-scalar-int.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-4x2-scalar-float.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-4x2-scalar-int.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/xx-copy/xx-copy-scalar-memcpy.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/xx-fill/xx-fill-scalar-u16.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/xx-pad/xx-pad-p4-scalar-u16.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/xx-transposev/xx-transposev-1x1-scalar-memcpy.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma-expm1minus-rr1-lut8-p4h3ts-div-u1.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma-expm1minus-rr1-lut8-p4h3ts-div-u2.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma-expm1minus-rr1-lut8-p4h3ts-div-u4.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma-expm1minus-rr1-p6h5ts-div-u1.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma-expm1minus-rr1-p6h5ts-div-u2.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma-expm1minus-rr1-p6h5ts-div-u4.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut4-p4h2ts-div.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut4-p4h2ts-rcp.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut4-p4h3ps-div.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut4-p4h3ps-rcp.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut4-p4h3ts-div.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut4-p4h3ts-rcp.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut8-p3h1ts-div.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut8-p4h2ts-div.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut8-p4h2ts-rcp.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut8-p4h3ps-div.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut8-p4h3ps-rcp.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut8-p4h3ts-div.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut8-p4h3ts-rcp.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut16-p3h1ts-div.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut16-p4h2ts-div.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut16-p4h2ts-rcp.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut16-p4h3ps-div.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut16-p4h3ts-div.c.o [ 20%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/StorageUtils.cpp.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut32-p3h1ts-div.c.o [ 20%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-lut64-p3h1ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-p6h4ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-p6h5ps-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-p6h5ps-rcp.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-p6h5ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr1-p6h5ts-rcp.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut4-p4h2ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut4-p4h3ps-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut4-p4h3ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut8-p3h1ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut8-p4h2ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut8-p4h2ts-rcp.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut8-p4h3ps-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut8-p4h3ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut16-p3h1ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut16-p4h2ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut16-p4h3ps-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut16-p4h3ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut32-p3h1ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-lut64-p3h1ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-p6h4ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-p6h5ps-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1minus-rr2-p6h5ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut4-p4h2ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut4-p4h3ps-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut4-p4h3ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut8-p3h1ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut8-p4h2ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut8-p4h3ps-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut8-p4h3ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut16-p3h1ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut16-p4h2ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut16-p4h3ps-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut16-p4h3ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut32-p3h1ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-lut64-p3h1ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-p6h4ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-p6h5ps-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr1-p6h5ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut4-p4h2ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut4-p4h3ps-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut4-p4h3ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut8-p3h1ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut8-p4h2ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut8-p4h3ps-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut8-p4h3ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut16-p3h1ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut16-p4h2ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut16-p4h3ps-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut16-p4h3ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut32-p3h1ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-lut64-p3h1ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-p6h4ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-p6h5ps-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma-expm1plus-rr2-p6h5ts-div.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-avgpool/f32-avgpool-9p8x-minmax-sse-c4.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-avgpool/f32-avgpool-9x-minmax-sse-c4.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc2chw/f32-conv-hwc2chw-3x3s2p1c3x4-sse-1x1.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-conv-hwc2chw/f32-conv-hwc2chw-3x3s2p1c3x4-sse-2x2.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-sse-1x4-acc2.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-sse-1x4-acc3.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-sse-1x4-acc4.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-sse-1x4.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-sse-2x4-acc2.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-sse-2x4.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-sse-3x4.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-sse-4x4.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-sse-5x4.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-sse-6x4.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-sse-1x4-acc2.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-sse-1x4-acc3.c.o [ 21%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-sse-1x4-acc4.c.o [ 21%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/TensorGeometry.cpp.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-sse-1x4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-sse-2x4-acc2.c.o [ 22%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/TensorIndexing.cpp.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-sse-2x4.c.o [ 22%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/TensorIterator.cpp.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-sse-3x4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3s2p1-minmax-sse-4x4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-sse-1x4-acc2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-sse-1x4-acc3.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-sse-1x4-acc4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-sse-1x4-acc5.c.o [ 22%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/TensorMeta.cpp.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-sse-1x4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-sse-2x4-acc2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-sse-2x4-acc3.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-sse-2x4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-sse-3x4-acc2.c.o [ 22%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/TensorNames.cpp.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-sse-3x4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-sse-4x4-acc2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-sse-4x4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5p2-minmax-sse-5x4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-sse-1x4-acc2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-sse-1x4-acc3.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-sse-1x4-acc4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-sse-1x4-acc5.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-sse-1x4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-sse-2x4-acc2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-sse-2x4-acc3.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-sse-2x4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-sse-3x4-acc2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-5x5s2p2-minmax-sse-3x4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p4c-minmax-sse-acc2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p4c-minmax-sse.c.o [ 22%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/TensorUtils.cpp.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p8c-minmax-sse-acc2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p8c-minmax-sse.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p4c-minmax-sse-acc2.c.o [ 22%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ThreadLocalPythonObjects.cpp.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p4c-minmax-sse.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p8c-minmax-sse-acc2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p8c-minmax-sse.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l4c4s4r-minmax-sse-acc2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l4c4s4r-minmax-sse.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l8c4s4r-minmax-sse-acc2.c.o [ 22%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ThreadLocalState.cpp.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l8c4s4r-minmax-sse.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l16c4s4r-minmax-sse-acc2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l16c4s4r-minmax-sse.c.o [ 22%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Utils.cpp.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l4c4s4r-minmax-sse-acc2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l4c4s4r-minmax-sse.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l8c4s4r-minmax-sse-acc2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l8c4s4r-minmax-sse.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l16c4s4r-minmax-sse-acc2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l16c4s4r-minmax-sse.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l4c4s4r-minmax-sse-acc2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l4c4s4r-minmax-sse.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l8c4s4r-minmax-sse-acc2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l8c4s4r-minmax-sse.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l16c4s4r-minmax-sse-acc2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l16c4s4r-minmax-sse.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p4c-minmax-sse-acc2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p4c-minmax-sse.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p8c-minmax-sse-acc2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p8c-minmax-sse.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p4c-minmax-sse-acc2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p4c-minmax-sse.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p8c-minmax-sse-acc2.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p8c-minmax-sse.c.o [ 22%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Version.cpp.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gavgpool-cw/f32-gavgpool-cw-sse-u4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gavgpool/f32-gavgpool-7p7x-minmax-sse-c4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gavgpool/f32-gavgpool-7x-minmax-sse-c4.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-sse-dup.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-sse-load1.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8s4-minmax-sse.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-3x8-minmax-sse-dup.c.o [ 22%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/VmapModeRegistrations.cpp.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-3x8-minmax-sse-load1.c.o [ 22%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-3x8s4-minmax-sse.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x2c4-minmax-sse.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-sse-dup.c.o [ 23%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ZeroTensorFallback.cpp.o [ 23%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/autocast_mode.cpp.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-sse-load1.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8s4-minmax-sse.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x8-minmax-sse-dup.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x8-minmax-sse-load1.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x8s4-minmax-sse.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x2c4-minmax-sse.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-sse-dup.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-sse-load1.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8s4-minmax-sse.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-sse-dup.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-sse-load1.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8s4-minmax-sse.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-3x8-minmax-sse-dup.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-3x8-minmax-sse-load1.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-3x8s4-minmax-sse.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-sse-dup.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-sse-load1.c.o [ 23%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/cpu/FlushDenormal.cpp.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8s4-minmax-sse.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-5x8-minmax-sse-dup.c.o [ 23%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/cpu/Utils.cpp.o [ 23%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/CPUGuardImpl.cpp.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-5x8-minmax-sse-load1.c.o [ 23%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/CUDAHooksInterface.cpp.o [ 23%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/HIPHooksInterface.cpp.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-5x8s4-minmax-sse.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-sse-dup.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-sse-load1.c.o [ 23%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/IPUHooksInterface.cpp.o [ 23%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/MPSHooksInterface.cpp.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8s4-minmax-sse.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-sse-p4.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear-chw/gen/f32-ibilinear-chw-sse-p8.c.o [ 23%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/MTIAHooksInterface.cpp.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear/gen/f32-ibilinear-sse-c4.c.o [ 23%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/MetaGuardImpl.cpp.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ibilinear/gen/f32-ibilinear-sse-c8.c.o [ 23%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/ORTHooksInterface.cpp.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-sse-dup.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-sse-load1.c.o [ 23%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/PrivateUse1HooksInterface.cpp.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8s4-minmax-sse.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-3x8-minmax-sse-dup.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-3x8-minmax-sse-load1.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-3x8s4-minmax-sse.c.o [ 23%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/detail/XPUHooksInterface.cpp.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x2c4-minmax-sse.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-sse-dup.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-sse-load1.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8s4-minmax-sse.c.o [ 23%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-5x8-minmax-sse-dup.c.o [ 24%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/ADInterpreters.cpp.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-5x8-minmax-sse-load1.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-5x8s4-minmax-sse.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x2c4-minmax-sse.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-sse-dup.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-sse-load1.c.o [ 24%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesActivation.cpp.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8s4-minmax-sse.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-maxpool/f32-maxpool-9p8x-minmax-sse-c4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-pavgpool/f32-pavgpool-9p8x-minmax-sse-c4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-pavgpool/f32-pavgpool-9x-minmax-sse-c4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-ppmm/gen/f32-ppmm-4x8-minmax-sse.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-sse-2x4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-sse-2x8.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-sse-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-sse-u8-acc2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-sse-u12-acc3.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-sse-u16-acc2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-sse-u16-acc4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-sse-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-sse-u8-acc2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-sse-u12-acc3.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-sse-u16-acc2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-sse-u16-acc4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-sse-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-sse-u8-acc2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-sse-u12-acc3.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-sse-u16-acc2.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-sse-u16-acc4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-sse-u4.c.o [ 24%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-sse-u8-acc2.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-sse-u12-acc3.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-sse-u16-acc2.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-sse-u16-acc4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-4x1-minmax-sse.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-8x1-minmax-sse.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-16x1-minmax-sse.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-spmm/gen/f32-spmm-32x1-minmax-sse.c.o [ 25%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesBinaryOps.cpp.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-sse-u4.c.o [ 25%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesConvolution.cpp.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vclamp/gen/f32-vclamp-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vclamp/gen/f32-vclamp-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-sse-u12.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vcmul/gen/f32-vcmul-sse-u16.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vmulcaddc/gen/f32-vmulcaddc-c4-minmax-sse-2x.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vmulcaddc/gen/f32-vmulcaddc-c8-minmax-sse-2x.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrsqrt/gen/f32-vrsqrt-sse-rsqrt-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrsqrt/gen/f32-vrsqrt-sse-rsqrt-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrsqrt/gen/f32-vrsqrt-sse-rsqrt-u16.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-sse-sqrt-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-sse-sqrt-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-sse-sqrt-u16.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vabs-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vabs-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vneg-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vneg-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vsqr-sse-u4.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vsqr-sse-u8.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundd-sse-addsub.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundne-sse-addsub.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundu-sse-addsub.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundz-sse-addsub.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-sse-hh1mac.c.o [ 25%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-sse-nr1mac.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-sse-nr2mac.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packx/x32-packx-4x-sse.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/x32-transposec-4x4-sse.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-sse2-int16-u8.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-sse2-int16-u16.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-sse2-int16-u24.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-sse2-int16-u32.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-sse2-int32-u8.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-sse2-int32-u16.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-sse2-int32-u24.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-sse2-int32-u32.c.o [ 26%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesDecompositions.cpp.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vunary/gen/f16-vabs-sse2-u8.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vunary/gen/f16-vabs-sse2-u16.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vunary/gen/f16-vneg-sse2-u8.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vunary/gen/f16-vneg-sse2-u16.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-argmaxpool/f32-argmaxpool-4x-sse2-c4.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-argmaxpool/f32-argmaxpool-9p8x-sse2-c4.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-argmaxpool/f32-argmaxpool-9x-sse2-c4.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-sse2-u8.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-sse2-u16.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-sse2-u24.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-sse2-u32.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-3x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-3x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-5x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-3x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-5x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-sse2-2x4.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-sse2-2x8.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-3x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-5x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-sse2-load1.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8s4-minmax-sse2.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-3x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-3x8-minmax-sse2-load1.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-3x8s4-minmax-sse2.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2c4-minmax-sse2.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-sse2-load1.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8s4-minmax-sse2.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-5x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-5x8-minmax-sse2-load1.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-5x8s4-minmax-sse2.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-sse2-dup.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-sse2-load1.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8s4-minmax-sse2.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-sse2-u8.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-sse2-u16.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-sse2-u24.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-sse2-u32.c.o [ 26%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesDynamic.cpp.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-sse2-u8.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-sse2-u16.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-sse2-u24.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-sse2-u32.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-sse2-rr2-p5-u4.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-sse2-rr2-p5-u8-acc2.c.o [ 26%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-sse2-rr2-p5-u8.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-sse2-rr2-p5-u12-acc2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-sse2-rr2-p5-u12-acc3.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-sse2-rr2-p5-u12.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-sse2-rr2-p5-u16-acc2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-sse2-rr2-p5-u16-acc4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-sse2-rr2-p5-u16.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-sse2-rr2-p5-u20-acc2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-sse2-rr2-p5-u20-acc5.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-sse2-rr2-p5-u20.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse2-rr2-lut16-p3-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse2-rr2-lut16-p3-u8.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse2-rr2-lut16-p3-u12.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse2-rr2-lut16-p3-u16.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse2-rr2-lut16-p3-u20.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse2-rr2-lut16-p3-u24.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse2-rr2-p6-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse2-rr2-p6-u8.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse2-rr2-p6-u12.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse2-rr2-p6-u16.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse2-rr2-p6-u20.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse2-rr2-p6-u24.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-sse2-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-sse2-u8.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-sse2-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-sse2-u8.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-sse2-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-sse2-u8.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-sse2-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-sse2-u8.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-sse2-u4.c.o [ 27%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesFactory.cpp.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-sse2-u8.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse2-rr2-lut64-p2-div-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse2-rr2-lut64-p2-div-u8.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse2-rr2-lut64-p2-div-u12.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse2-rr2-lut64-p2-div-u16.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse2-rr2-lut64-p2-div-u20.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse2-rr2-lut64-p2-div-u24.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse2-rr2-p5-div-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse2-rr2-p5-div-u8.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse2-rr2-p5-div-u12.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse2-rr2-p5-div-u16.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse2-rr2-p5-div-u20.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse2-rr2-p5-div-u24.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse2-expm1minus-rr1-lut8-p4h3ts-div-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse2-expm1minus-rr1-lut8-p4h3ts-div-u8.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse2-expm1minus-rr1-lut8-p4h3ts-div-u12.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse2-expm1minus-rr1-lut8-p4h3ts-div-u16.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse2-expm1minus-rr1-p6h5ts-div-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse2-expm1minus-rr1-p6h5ts-div-u8.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse2-expm1minus-rr1-p6h5ts-div-u12.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse2-expm1minus-rr1-p6h5ts-div-u16.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse2-expm1minus-rr1-p6h5ts-nr1-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse2-expm1minus-rr1-p6h5ts-nr1-u8.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse2-expm1minus-rr1-p6h5ts-nr1-u12.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse2-expm1minus-rr1-p6h5ts-nr1-u16.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse2-expm1minus-rr1-p6h5ts-nr2-u4.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse2-expm1minus-rr1-p6h5ts-nr2-u8.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse2-expm1minus-rr1-p6h5ts-nr2-u12.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse2-expm1minus-rr1-p6h5ts-nr2-u16.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-f32-cvt-sse2-int16.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-f32-cvt-sse2-int32.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-exp-sse2-rr2-lut64-p2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-exp-sse2-rr2-p5.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-sse2-rr2-lut16-p3.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-sse2-rr2-p6.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expminus-sse2-rr2-p5.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-f16-cvt-sse2.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundd-sse2-cvt.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundne-sse2-cvt.c.o [ 27%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundu-sse2-cvt.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundz-sse2-cvt.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-sse2-rr2-lut64-p2-div.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-sse2-rr2-lut64-p2-nr1.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-sse2-rr2-lut64-p2-nr2.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-sse2-rr2-p5-div.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-sse2-rr2-p5-nr1.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-sse2-rr2-p5-nr2.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-sse2-expm1minus-rr1-lut8-p4h3ps-div.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-sse2-expm1minus-rr1-p6h5ts-div.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-sse2-expm1minus-rr1-p6h5ts-nr1.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-sse2-expm1minus-rr1-p6h5ts-nr2.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-sse2-expm1minus-rr2-lut8-p4h2ts-nr1.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-sse2-expm1minus-rr2-lut8-p4h2ts-nr2.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-sse2-expm1minus-rr2-lut8-p4h3ps-nr1.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-sse2-expm1minus-rr2-lut8-p4h3ps-nr2.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-sse2-expm1minus-rr2-lut8-p4h3ts-nr1.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-sse2-expm1minus-rr2-lut8-p4h3ts-nr2.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x4c8-minmax-sse2-ld64.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x4c8-minmax-sse2-ld128.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x4c8-minmax-sse2-ld64.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x4c8-minmax-sse2-ld128.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x4c8-minmax-sse2-ld64.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x4c8-minmax-sse2-ld128.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x4c8-minmax-sse2-ld64.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x4c8-minmax-sse2-ld128.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x4c8-minmax-sse2-ld64.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x4c8-minmax-sse2-ld128.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x4c8-minmax-sse2-ld64.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x4c8-minmax-sse2-ld128.c.o [ 28%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesHelper.cpp.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x4c8-minmax-sse2-ld64.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x4c8-minmax-sse2-ld128.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x4c8-minmax-sse2-ld64.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x4c8-minmax-sse2-ld128.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x4c8-minmax-sse2-ld64.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x4c8-minmax-sse2-ld128.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x4c8-minmax-sse2-ld64.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x4c8-minmax-sse2-ld128.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x4c8-minmax-sse2-ld64.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x4c8-minmax-sse2-ld128.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x4c8-minmax-sse2-ld64.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x4c8-minmax-sse2-ld128.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l8c8s8r-minmax-fp32-sse2-mul16-add16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l8c8s8r-minmax-fp32-sse2-mul16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c8s8r-minmax-fp32-sse2-mul16-add16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c8s8r-minmax-fp32-sse2-mul16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l8c8s8r-minmax-fp32-sse2-mul16-add16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l8c8s8r-minmax-fp32-sse2-mul16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c8s8r-minmax-fp32-sse2-mul16-add16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c8s8r-minmax-fp32-sse2-mul16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l8c8s8r-minmax-fp32-sse2-mul16-add16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l8c8s8r-minmax-fp32-sse2-mul16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c8s8r-minmax-fp32-sse2-mul16-add16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c8s8r-minmax-fp32-sse2-mul16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p8c-minmax-fp32-sse2-mul16-add16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p8c-minmax-fp32-sse2-mul16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-fp32-sse2-mul16-add16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-fp32-sse2-mul16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p8c-minmax-fp32-sse2-mul16-add16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p8c-minmax-fp32-sse2-mul16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-fp32-sse2-mul16-add16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-fp32-sse2-mul16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-sse2-u8.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-sse2-u16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-sse2-u24.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-sse2-u32.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-sse2-c8.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-sse2-c16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-sse2-c24.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-sse2-c8.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-sse2-c16.c.o [ 28%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-sse2-c24.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p8c-minmax-fp32-sse2-mul16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l8c8s8r-minmax-fp32-sse2-mul16-add16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l8c8s8r-minmax-fp32-sse2-mul16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-sse2-mul16-add16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-sse2-mul16.c.o [ 29%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesLinearAlgebra.cpp.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l8c8s8r-minmax-fp32-sse2-mul16-add16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l8c8s8r-minmax-fp32-sse2-mul16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-sse2-mul16-add16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-sse2-mul16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l8c8s8r-minmax-fp32-sse2-mul16-add16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l8c8s8r-minmax-fp32-sse2-mul16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-sse2-mul16-add16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-sse2-mul16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-sse2-mul16-add16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-sse2-mul16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-sse2-mul16-add16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-sse2-mul16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-sse2-mul16-add16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-sse2-mul16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-sse2-mul16-add16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-sse2-mul16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c2-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c2-minmax-fp32-sse2-ld128.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c2s4-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c2s4-minmax-fp32-sse2-ld128.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c8-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c8-minmax-fp32-sse2-ld128.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c2-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c2-minmax-fp32-sse2-ld128.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c2s4-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c2s4-minmax-fp32-sse2-ld128.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c8-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c8-minmax-fp32-sse2-ld128.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c2-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c2-minmax-fp32-sse2-ld128.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c2s4-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c2s4-minmax-fp32-sse2-ld128.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c8-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c8-minmax-fp32-sse2-ld128.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4c2-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4c2-minmax-fp32-sse2-ld128.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4c2s4-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4c2s4-minmax-fp32-sse2-ld128.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c2-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c2-minmax-fp32-sse2-ld128.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c2s4-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c2s4-minmax-fp32-sse2-ld128.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c8-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c8-minmax-fp32-sse2-ld128.c.o [ 29%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesLoss.cpp.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c2-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c2-minmax-fp32-sse2-ld128.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c2s4-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c2s4-minmax-fp32-sse2-ld128.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c8-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c8-minmax-fp32-sse2-ld128.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c2-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c2-minmax-fp32-sse2-ld128.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c2s4-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c2s4-minmax-fp32-sse2-ld128.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c8-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c8-minmax-fp32-sse2-ld128.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4c2-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4c2-minmax-fp32-sse2-ld128.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4c2s4-minmax-fp32-sse2-ld64.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4c2s4-minmax-fp32-sse2-ld128.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-fp32-sse2.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-gemmlowp-sse2.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndna-sse2.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-sse2-mul16-ld64-u8.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-sse2-mul16-ld64-u16.c.o [ 29%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-sse2-mul16-ld64-u24.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-sse2-mul16-ld64-u32.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-sse2-mul16-ld64-u8.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-sse2-mul16-ld64-u16.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-sse2-mul16-ld64-u24.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-sse2-mul16-ld64-u32.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-sse2-u16.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-sse2-u32.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-sse2-u16.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-sse2-u32.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-sse2-u16.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-sse2-u32.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-sse2-mul16-ld64-u8.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-sse2-mul16-ld64-u16.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-sse2-mul16-ld64-u8.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-sse2-mul16-ld64-u16.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-sse2-u4.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-sse2-u8.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-sse2-u16.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-avgpool/qu8-avgpool-9p8x-minmax-fp32-sse2-c8.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-avgpool/qu8-avgpool-9x-minmax-fp32-sse2-c8.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l8c8s8r-minmax-fp32-sse2-mul16.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l16c8s8r-minmax-fp32-sse2-mul16.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l8c8s8r-minmax-fp32-sse2-mul16.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l16c8s8r-minmax-fp32-sse2-mul16.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l8c8s8r-minmax-fp32-sse2-mul16.c.o [ 30%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesModules.cpp.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l16c8s8r-minmax-fp32-sse2-mul16.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p8c-minmax-fp32-sse2-mul16.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p16c-minmax-fp32-sse2-mul16.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p8c-minmax-fp32-sse2-mul16.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p16c-minmax-fp32-sse2-mul16.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-sse2-u8.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-sse2-u16.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-sse2-u24.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-sse2-u32.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-sse2-c8.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-sse2-c16.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-sse2-c24.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-sse2-c8.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-sse2-c16.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-sse2-c24.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c2-minmax-fp32-sse2-ld64.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c2-minmax-fp32-sse2-ld128.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c2s4-minmax-fp32-sse2-ld64.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c2s4-minmax-fp32-sse2-ld128.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c8-minmax-fp32-sse2-ld64.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c8-minmax-fp32-sse2-ld128.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c2-minmax-fp32-sse2-ld64.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c2-minmax-fp32-sse2-ld128.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c2s4-minmax-fp32-sse2-ld64.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c2s4-minmax-fp32-sse2-ld128.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c8-minmax-fp32-sse2-ld64.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c8-minmax-fp32-sse2-ld128.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c2-minmax-fp32-sse2-ld64.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c2-minmax-fp32-sse2-ld128.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c2s4-minmax-fp32-sse2-ld64.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c2s4-minmax-fp32-sse2-ld128.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c8-minmax-fp32-sse2-ld64.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c8-minmax-fp32-sse2-ld128.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4c2-minmax-fp32-sse2-ld64.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4c2-minmax-fp32-sse2-ld128.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4c2s4-minmax-fp32-sse2-ld64.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4c2s4-minmax-fp32-sse2-ld128.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c2-minmax-fp32-sse2-ld64.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c2-minmax-fp32-sse2-ld128.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c2s4-minmax-fp32-sse2-ld64.c.o [ 30%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesNorm.cpp.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c2s4-minmax-fp32-sse2-ld128.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c8-minmax-fp32-sse2-ld64.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c8-minmax-fp32-sse2-ld128.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c2-minmax-fp32-sse2-ld64.c.o [ 30%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c2-minmax-fp32-sse2-ld128.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c2s4-minmax-fp32-sse2-ld64.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c2s4-minmax-fp32-sse2-ld128.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c8-minmax-fp32-sse2-ld64.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c8-minmax-fp32-sse2-ld128.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c2-minmax-fp32-sse2-ld64.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c2-minmax-fp32-sse2-ld128.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c2s4-minmax-fp32-sse2-ld64.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c2s4-minmax-fp32-sse2-ld128.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c8-minmax-fp32-sse2-ld64.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c8-minmax-fp32-sse2-ld128.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4c2-minmax-fp32-sse2-ld64.c.o [ 31%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesPooling.cpp.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4c2-minmax-fp32-sse2-ld128.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4c2s4-minmax-fp32-sse2-ld64.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4c2s4-minmax-fp32-sse2-ld128.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-fp32-sse2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-gemmlowp-sse2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-rndna-sse2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-sse2-mul16-ld64-u8.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-sse2-mul16-ld64-u16.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-sse2-mul16-ld64-u8.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-sse2-mul16-ld64-u16.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-sse2-u16.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-sse2-u32.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-sse2-u16.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-sse2-u32.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-sse2-u16.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-sse2-u32.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-sse2-mul16-ld64-u8.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-sse2-mul16-ld64-u16.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-sse2-mul16-ld64-u8.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-sse2-mul16-ld64-u16.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-ibilinear/gen/s8-ibilinear-sse2-c8.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-ibilinear/gen/s8-ibilinear-sse2-c16.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-maxpool/s8-maxpool-9p8x-minmax-sse2-c16.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-vclamp/s8-vclamp-sse2-u64.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-ibilinear/gen/u8-ibilinear-sse2-c8.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-ibilinear/gen/u8-ibilinear-sse2-c16.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-maxpool/u8-maxpool-9p8x-minmax-sse2-c16.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-rmax/u8-rmax-sse2-u16.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-vclamp/u8-vclamp-sse2-u64.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-16x16-reuse-mov-sse2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-16x16-reuse-switch-sse2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-x2-sse2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-x3-sse2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-x4-sse2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-zip/x8-zip-xm-sse2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-8x8-multi-mov-sse2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-8x8-multi-switch-sse2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-8x8-reuse-mov-sse2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-8x8-reuse-multi-sse2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-8x8-reuse-switch-sse2.c.o [ 31%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesRandomness.cpp.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/x16-transposec-4x8-sse2.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x2c4-gemm-goi-sse2-u4-prfm.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x2c4-gemm-goi-sse2-u4.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8-gemm-goi-sse2-u4-prfm.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8-gemm-goi-sse2-u4.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8-gemm-goi-sse2-u8-prfm.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8-gemm-goi-sse2-u8.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8s4-gemm-goi-sse2-u4-prfm.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8s4-gemm-goi-sse2-u4.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8s4-gemm-goi-sse2-u8-prfm.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8s4-gemm-goi-sse2-u8.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-sse2-u4-prfm.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-sse2-u4.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-sse2-u8-prfm.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-sse2-u8.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16s4-gemm-goi-sse2-u4-prfm.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16s4-gemm-goi-sse2-u4.c.o [ 31%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesReduceOps.cpp.o [ 31%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesScatterOps.cpp.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16s4-gemm-goi-sse2-u8-prfm.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16s4-gemm-goi-sse2-u8.c.o [ 31%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-multi-mov-sse2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-multi-multi-sse2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-multi-switch-sse2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-reuse-mov-sse2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-reuse-multi-sse2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-4x4-reuse-switch-sse2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-unpool/x32-unpool-sse2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-x2-sse2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-x3-sse2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-x4-sse2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-zip/x32-zip-xm-sse2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-multi-mov-sse2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-multi-multi-sse2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-multi-switch-sse2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-reuse-mov-sse2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-reuse-multi-sse2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-2x2-reuse-switch-sse2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/xx-fill/xx-fill-sse2-u64.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/xx-pad/xx-pad-p16-sse2-u16.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-ssse3-1x4-acc2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-ssse3-1x4-acc3.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-ssse3-1x4-acc4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-ssse3-1x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-ssse3-2x4-acc2.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-ssse3-2x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-ssse3-3x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-ssse3-4x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-ssse3-5x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv2d-chw/gen/f32-dwconv2d-chw-3x3p1-minmax-ssse3-6x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-gemmlowp-ssse3.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndna-ssse3.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-ssse3-u16.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-ssse3-u32.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-ssse3-u16.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-ssse3-u32.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-ssse3-u16.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-ssse3-u32.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-ssse3-u4.c.o [ 32%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesUnaryOps.cpp.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-ssse3-u8.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-ssse3-u16.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-gemmlowp-ssse3.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-rndna-ssse3.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-ssse3-u16.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-ssse3-u32.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-ssse3-u16.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-ssse3-u32.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-ssse3-u16.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-ssse3-u32.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-ssse3-u16.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-ssse3-u32.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x24-transposec/x24-transposec-4x4-ssse3.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-sse41-int16-u8.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-sse41-int16-u16.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-sse41-int16-u24.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-sse41-int16-u32.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-sse41-int32-u8.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-sse41-int32-u16.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-sse41-int32-u24.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-sse41-int32-u32.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-sse41-u8.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-sse41-u16.c.o [ 32%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchRulesViews.cpp.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-sse41-u24.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-sse41-u32.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-sse41-2x4.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-sse41-2x8.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x8-minmax-sse41-dup.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-3x8-minmax-sse41-dup.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x8-minmax-sse41-dup.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-5x8-minmax-sse41-dup.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x8-minmax-sse41-dup.c.o [ 32%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-sse41-dup.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-sse41-load1.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8s4-minmax-sse41.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-3x8-minmax-sse41-dup.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-3x8-minmax-sse41-load1.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-3x8s4-minmax-sse41.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x2c4-minmax-sse41.c.o [ 33%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchedFallback.cpp.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-sse41-dup.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-sse41-load1.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8s4-minmax-sse41.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-5x8-minmax-sse41-dup.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-5x8-minmax-sse41-load1.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-5x8s4-minmax-sse41.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x2c4-minmax-sse41.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-sse41-dup.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-sse41-load1.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8s4-minmax-sse41.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-sse41-u8.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-sse41-u16.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-sse41-u24.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-sse41-u32.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse41-rr2-lut16-p3-u4.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse41-rr2-lut16-p3-u8.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse41-rr2-lut16-p3-u12.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse41-rr2-lut16-p3-u16.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse41-rr2-lut16-p3-u20.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse41-rr2-lut16-p3-u24.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse41-rr2-p6-u4.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse41-rr2-p6-u8.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse41-rr2-p6-u12.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse41-rr2-p6-u16.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse41-rr2-p6-u20.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-sse41-rr2-p6-u24.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-sse41-u4.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-sse41-u8.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-sse41-u4.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-sse41-u8.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-sse41-u4.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-sse41-u8.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-sse41-u4.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-sse41-u8.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-sse41-u4.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-sse41-u8.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse41-rr2-lut64-p2-div-u4.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse41-rr2-lut64-p2-div-u8.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse41-rr2-lut64-p2-div-u12.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse41-rr2-lut64-p2-div-u16.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse41-rr2-lut64-p2-div-u20.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse41-rr2-lut64-p2-div-u24.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse41-rr2-p5-div-u4.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse41-rr2-p5-div-u8.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse41-rr2-p5-div-u12.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse41-rr2-p5-div-u16.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse41-rr2-p5-div-u20.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-sse41-rr2-p5-div-u24.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-lut8-p4h3ts-div-u4.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-lut8-p4h3ts-div-u8.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-lut8-p4h3ts-div-u12.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-lut8-p4h3ts-div-u16.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-lut8-p4h3ts-div-u20.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-lut8-p4h3ts-div-u24.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-p6h5ts-div-u4.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-p6h5ts-div-u8.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-p6h5ts-div-u12.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-p6h5ts-div-u16.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-p6h5ts-div-u20.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-p6h5ts-div-u24.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-p6h5ts-nr1-u4.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-p6h5ts-nr1-u8.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-p6h5ts-nr1-u12.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-p6h5ts-nr1-u16.c.o [ 33%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-p6h5ts-nr1-u20.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-p6h5ts-nr1-u24.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-p6h5ts-nr2-u4.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-p6h5ts-nr2-u8.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-p6h5ts-nr2-u12.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-p6h5ts-nr2-u16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-p6h5ts-nr2-u20.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/BatchedTensorImpl.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-sse41-expm1minus-rr1-p6h5ts-nr2-u24.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-f32-cvt-sse41-int16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-f32-cvt-sse41-int32.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-f16-cvt-sse41.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundd-sse41.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundne-sse41.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundu-sse41.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-roundz-sse41.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x4c8-minmax-sse41-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x4c8-minmax-sse41-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x4c8-minmax-sse41-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x4c8-minmax-sse41-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x4c8-minmax-sse41-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x4c8-minmax-sse41-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x4c8-minmax-sse41-ld64.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/DynamicLayer.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x4c8-minmax-sse41-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x4c8-minmax-sse41-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x4c8-minmax-sse41-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x4c8-minmax-sse41-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x4c8-minmax-sse41-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x4c8-minmax-sse41-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x4c8-minmax-sse41-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x4c8-minmax-sse41-ld64.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/FunctionalizeInterpreter.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x4c8-minmax-sse41-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x4c8-minmax-sse41-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x4c8-minmax-sse41-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x4c8-minmax-sse41-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x4c8-minmax-sse41-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x4c8-minmax-sse41-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x4c8-minmax-sse41-ld128.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x4c8-minmax-sse41-ld64.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x4c8-minmax-sse41-ld128.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/Interpreter.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l8c4s4r-minmax-fp32-sse41-mul32.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l8c8s8r-minmax-fp32-sse41-mul16-add16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l8c8s8r-minmax-fp32-sse41-mul16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c4s4r-minmax-fp32-sse41-mul32.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c8s8r-minmax-fp32-sse41-mul16-add16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c8s8r-minmax-fp32-sse41-mul16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l8c4s4r-minmax-fp32-sse41-mul32.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l8c8s8r-minmax-fp32-sse41-mul16-add16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l8c8s8r-minmax-fp32-sse41-mul16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c4s4r-minmax-fp32-sse41-mul32.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c8s8r-minmax-fp32-sse41-mul16-add16.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/LegacyVmapTransforms.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c8s8r-minmax-fp32-sse41-mul16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l8c4s4r-minmax-fp32-sse41-mul32.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l8c8s8r-minmax-fp32-sse41-mul16-add16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l8c8s8r-minmax-fp32-sse41-mul16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c4s4r-minmax-fp32-sse41-mul32.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c8s8r-minmax-fp32-sse41-mul16-add16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c8s8r-minmax-fp32-sse41-mul16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p8c-minmax-fp32-sse41-mul16-add16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p8c-minmax-fp32-sse41-mul16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p8c-minmax-fp32-sse41-mul32.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-fp32-sse41-mul16-add16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-fp32-sse41-mul16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-fp32-sse41-mul32.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p8c-minmax-fp32-sse41-mul16-add16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p8c-minmax-fp32-sse41-mul16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p8c-minmax-fp32-sse41-mul32.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/PlumbingHelper.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-fp32-sse41-mul16-add16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-fp32-sse41-mul16.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-fp32-sse41-mul32.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-sse41-u8.c.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-sse41-u16.c.o [ 34%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/PyTorchOperatorHacks.cpp.o [ 34%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-sse41-u24.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-sse41-u32.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-sse41-c8.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-sse41-c16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7p7x-minmax-fp32-sse41-c24.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-sse41-c8.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/TensorWrapper.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-sse41-c16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-gavgpool/gen/qs8-gavgpool-7x-minmax-fp32-sse41-c24.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p8c-minmax-fp32-sse41-mul16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l8c4s4r-minmax-fp32-sse41-mul32.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l8c8s8r-minmax-fp32-sse41-mul16-add16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l8c8s8r-minmax-fp32-sse41-mul16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c4s4r-minmax-fp32-sse41-mul32.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-sse41-mul16-add16.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/VmapInterpreter.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-sse41-mul16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l8c4s4r-minmax-fp32-sse41-mul32.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l8c8s8r-minmax-fp32-sse41-mul16-add16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l8c8s8r-minmax-fp32-sse41-mul16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c4s4r-minmax-fp32-sse41-mul32.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-sse41-mul16-add16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-sse41-mul16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l8c4s4r-minmax-fp32-sse41-mul32.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l8c8s8r-minmax-fp32-sse41-mul16-add16.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/functorch/VmapModeRegistrations.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l8c8s8r-minmax-fp32-sse41-mul16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c4s4r-minmax-fp32-sse41-mul32.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-sse41-mul16-add16.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/record_function.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-sse41-mul16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-sse41-mul16-add16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-sse41-mul16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-sse41-mul32.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-sse41-mul16-add16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-sse41-mul16.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/ATenGeneral.cpp.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/BackendSelectFallbackKernel.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-sse41-mul32.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-sse41-mul16-add16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-sse41-mul16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-sse41-mul32.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-sse41-mul16-add16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-sse41-mul16.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-sse41-mul32.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c2-minmax-fp32-sse41-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c2-minmax-fp32-sse41-ld128.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c2s4-minmax-fp32-sse41-ld64.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/DeprecatedTypeProperties.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c2s4-minmax-fp32-sse41-ld128.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c8-minmax-fp32-sse41-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c8-minmax-fp32-sse41-ld128.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c2-minmax-fp32-sse41-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c2-minmax-fp32-sse41-ld128.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c2s4-minmax-fp32-sse41-ld64.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/DeprecatedTypePropertiesRegistry.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c2s4-minmax-fp32-sse41-ld128.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c8-minmax-fp32-sse41-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c8-minmax-fp32-sse41-ld128.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c2-minmax-fp32-sse41-ld64.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/Dict.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c2-minmax-fp32-sse41-ld128.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c2s4-minmax-fp32-sse41-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c2s4-minmax-fp32-sse41-ld128.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c8-minmax-fp32-sse41-ld64.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/Dimname.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c8-minmax-fp32-sse41-ld128.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4c2-minmax-fp32-sse41-ld64.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/Formatting.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4c2-minmax-fp32-sse41-ld128.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4c2s4-minmax-fp32-sse41-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4c2s4-minmax-fp32-sse41-ld128.c.o [ 35%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/Generator.cpp.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c2-minmax-fp32-sse41-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c2-minmax-fp32-sse41-ld128.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c2s4-minmax-fp32-sse41-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c2s4-minmax-fp32-sse41-ld128.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c8-minmax-fp32-sse41-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c8-minmax-fp32-sse41-ld128.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c2-minmax-fp32-sse41-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c2-minmax-fp32-sse41-ld128.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c2s4-minmax-fp32-sse41-ld64.c.o [ 35%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c2s4-minmax-fp32-sse41-ld128.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c8-minmax-fp32-sse41-ld64.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c8-minmax-fp32-sse41-ld128.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c2-minmax-fp32-sse41-ld64.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c2-minmax-fp32-sse41-ld128.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c2s4-minmax-fp32-sse41-ld64.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/GeneratorForPrivateuseone.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c2s4-minmax-fp32-sse41-ld128.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c8-minmax-fp32-sse41-ld64.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c8-minmax-fp32-sse41-ld128.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4c2-minmax-fp32-sse41-ld64.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/List.cpp.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/MetaFallbackKernel.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4c2-minmax-fp32-sse41-ld128.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4c2s4-minmax-fp32-sse41-ld64.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4c2s4-minmax-fp32-sse41-ld128.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-fp32-sse41.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-gemmlowp-sse41.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndna-sse41.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndnu-sse41-sra.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-requantization/qs8-requantization-rndnu-sse41-srl.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/NamedRegistrations.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-sse41-mul16-ld64-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-sse41-mul16-ld64-u16.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-sse41-mul16-ld64-u24.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-sse41-mul16-ld64-u32.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-sse41-mul32-ld32-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-sse41-mul32-ld32-u16.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-sse41-mul32-ld32-u24.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-sse41-mul32-ld32-u32.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-sse41-mul16-ld64-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-sse41-mul16-ld64-u16.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-sse41-mul16-ld64-u24.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-sse41-mul16-ld64-u32.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/NamedTensor.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-sse41-mul32-ld32-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-sse41-mul32-ld32-u16.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/NestedIntSymNodeImpl.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-sse41-mul32-ld32-u24.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-sse41-mul32-ld32-u32.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/PythonFallbackKernel.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-sse41-u8.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/PythonOpRegistrationTrampoline.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-sse41-u16.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-sse41-u32.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-sse41-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-sse41-u16.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-sse41-u32.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-sse41-u8.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/Range.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-sse41-u16.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/Tensor.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-sse41-u32.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/TorchDispatchUtils.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-sse41-mul16-ld64-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-sse41-mul16-ld64-u16.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-sse41-mul16-ld64-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-sse41-mul16-ld64-u16.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-sse41-u4.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/VariableFallbackKernel.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-sse41-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-sse41-u16.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l8c4s4r-minmax-fp32-sse41-mul32.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l8c8s8r-minmax-fp32-sse41-mul16.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l16c4s4r-minmax-fp32-sse41-mul32.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l16c8s8r-minmax-fp32-sse41-mul16.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/VariableHooksInterface.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l8c4s4r-minmax-fp32-sse41-mul32.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l8c8s8r-minmax-fp32-sse41-mul16.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/Vitals.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l16c4s4r-minmax-fp32-sse41-mul32.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/adaption.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l16c8s8r-minmax-fp32-sse41-mul16.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/blob.cpp.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/boxing/KernelFunction.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l8c4s4r-minmax-fp32-sse41-mul32.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l8c8s8r-minmax-fp32-sse41-mul16.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/class_type.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l16c4s4r-minmax-fp32-sse41-mul32.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l16c8s8r-minmax-fp32-sse41-mul16.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p8c-minmax-fp32-sse41-mul16.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p8c-minmax-fp32-sse41-mul32.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p16c-minmax-fp32-sse41-mul16.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/custom_class.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p16c-minmax-fp32-sse41-mul32.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p8c-minmax-fp32-sse41-mul16.c.o [ 36%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/dispatch/DispatchKeyExtractor.cpp.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p8c-minmax-fp32-sse41-mul32.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p16c-minmax-fp32-sse41-mul16.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p16c-minmax-fp32-sse41-mul32.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-sse41-u8.c.o [ 36%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-sse41-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-sse41-u24.c.o [ 37%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/dispatch/Dispatcher.cpp.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-sse41-u32.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-sse41-c8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-sse41-c16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7p7x-minmax-fp32-sse41-c24.c.o [ 37%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/dispatch/ObservedOperators.cpp.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-sse41-c8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-sse41-c16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gavgpool/gen/qu8-gavgpool-7x-minmax-fp32-sse41-c24.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c2-minmax-fp32-sse41-ld64.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c2-minmax-fp32-sse41-ld128.c.o [ 37%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/dispatch/OperatorEntry.cpp.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c2s4-minmax-fp32-sse41-ld64.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c2s4-minmax-fp32-sse41-ld128.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c8-minmax-fp32-sse41-ld64.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c8-minmax-fp32-sse41-ld128.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c2-minmax-fp32-sse41-ld64.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c2-minmax-fp32-sse41-ld128.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c2s4-minmax-fp32-sse41-ld64.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c2s4-minmax-fp32-sse41-ld128.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c8-minmax-fp32-sse41-ld64.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c8-minmax-fp32-sse41-ld128.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c2-minmax-fp32-sse41-ld64.c.o [ 37%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/dynamic_type.cpp.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c2-minmax-fp32-sse41-ld128.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c2s4-minmax-fp32-sse41-ld64.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c2s4-minmax-fp32-sse41-ld128.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c8-minmax-fp32-sse41-ld64.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c8-minmax-fp32-sse41-ld128.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4c2-minmax-fp32-sse41-ld64.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4c2-minmax-fp32-sse41-ld128.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4c2s4-minmax-fp32-sse41-ld64.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4c2s4-minmax-fp32-sse41-ld128.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c2-minmax-fp32-sse41-ld64.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c2-minmax-fp32-sse41-ld128.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c2s4-minmax-fp32-sse41-ld64.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c2s4-minmax-fp32-sse41-ld128.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c8-minmax-fp32-sse41-ld64.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c8-minmax-fp32-sse41-ld128.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c2-minmax-fp32-sse41-ld64.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c2-minmax-fp32-sse41-ld128.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c2s4-minmax-fp32-sse41-ld64.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c2s4-minmax-fp32-sse41-ld128.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c8-minmax-fp32-sse41-ld64.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c8-minmax-fp32-sse41-ld128.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c2-minmax-fp32-sse41-ld64.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c2-minmax-fp32-sse41-ld128.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c2s4-minmax-fp32-sse41-ld64.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c2s4-minmax-fp32-sse41-ld128.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c8-minmax-fp32-sse41-ld64.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c8-minmax-fp32-sse41-ld128.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4c2-minmax-fp32-sse41-ld64.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4c2-minmax-fp32-sse41-ld128.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4c2s4-minmax-fp32-sse41-ld64.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4c2s4-minmax-fp32-sse41-ld128.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-gemmlowp-sse41.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-requantization/qu8-requantization-rndna-sse41.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-sse41-mul16-ld64-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-sse41-mul16-ld64-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-sse41-mul32-ld32-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-sse41-mul32-ld32-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-sse41-mul16-ld64-u8.c.o [ 37%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/function_schema.cpp.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-sse41-mul16-ld64-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-sse41-mul32-ld32-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-sse41-mul32-ld32-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-sse41-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-sse41-u16.c.o [ 37%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/interned_strings.cpp.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-sse41-u32.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-sse41-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-sse41-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-sse41-u32.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-sse41-u8.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-sse41-u16.c.o [ 37%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-sse41-u32.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-sse41-mul16-ld64-u8.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-sse41-mul16-ld64-u16.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-sse41-mul16-ld64-u8.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-sse41-mul16-ld64-u16.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-ibilinear/gen/s8-ibilinear-sse41-c8.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-ibilinear/gen/s8-ibilinear-sse41-c16.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-maxpool/s8-maxpool-9p8x-minmax-sse41-c16.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/s8-vclamp/s8-vclamp-sse41-u64.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-ibilinear/gen/u8-ibilinear-sse41-c8.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/u8-ibilinear/gen/u8-ibilinear-sse41-c16.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-avx-int16-u8.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-avx-int16-u16.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-avx-int16-u24.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-avx-int16-u32.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-avx-int32-u8.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-avx-int32-u16.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-avx-int32-u24.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-avx-int32-u32.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p8c-minmax-avx-acc2.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p8c-minmax-avx.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p16c-minmax-avx-acc2.c.o [ 38%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/ivalue.cpp.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p16c-minmax-avx.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p8c-minmax-avx-acc2.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p8c-minmax-avx.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p16c-minmax-avx-acc2.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p16c-minmax-avx.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l8c8s4r-minmax-avx-acc2.c.o [ 38%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/library.cpp.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l8c8s4r-minmax-avx.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l16c8s4r-minmax-avx-acc2.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l16c8s4r-minmax-avx.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l8c8s4r-minmax-avx-acc2.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l8c8s4r-minmax-avx.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l16c8s4r-minmax-avx-acc2.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-6f6m7l16c8s4r-minmax-avx.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l8c8s4r-minmax-avx-acc2.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l8c8s4r-minmax-avx.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l16c8s4r-minmax-avx-acc2.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-8f8m9l16c8s4r-minmax-avx.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p8c-minmax-avx-acc2.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p8c-minmax-avx.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p16c-minmax-avx-acc2.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p16c-minmax-avx.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p8c-minmax-avx-acc2.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p8c-minmax-avx.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p16c-minmax-avx-acc2.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p16c-minmax-avx.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-avx-u8.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-avx-u16.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-avx-u24.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-avx-u32.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-avx-broadcast.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x16-minmax-avx-broadcast.c.o [ 38%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/op_registration/infer_schema.cpp.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-3x16-minmax-avx-broadcast.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-avx-broadcast.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x16-minmax-avx-broadcast.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x8-minmax-avx-broadcast.c.o [ 38%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/op_registration/op_registration.cpp.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x16-minmax-avx-broadcast.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-avx-broadcast.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x16-minmax-avx-broadcast.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-7x8-minmax-avx-broadcast.c.o [ 38%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/operator_name.cpp.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-avx-broadcast.c.o [ 38%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/register_symbols.cpp.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x16-minmax-avx-broadcast.c.o [ 38%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-3x16-minmax-avx-broadcast.c.o [ 39%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/tensor_type.cpp.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-avx-broadcast.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x16-minmax-avx-broadcast.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-5x8-minmax-avx-broadcast.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-5x16-minmax-avx-broadcast.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-avx-broadcast.c.o [ 39%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/type.cpp.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x16-minmax-avx-broadcast.c.o [ 39%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-7x8-minmax-avx-broadcast.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-avx-broadcast.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/type_factory.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x16-minmax-avx-broadcast.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-3x16-minmax-avx-broadcast.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-avx-broadcast.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/union_type.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x16-minmax-avx-broadcast.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-5x8-minmax-avx-broadcast.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-5x16-minmax-avx-broadcast.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-avx-broadcast.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/error_report.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x16-minmax-avx-broadcast.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-7x8-minmax-avx-broadcast.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-avx-2x8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-avx-2x16.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/function_schema_parser.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x16-minmax-avx-broadcast.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-2x16-minmax-avx-broadcast.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-3x16-minmax-avx-broadcast.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x16-minmax-avx-broadcast.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/lexer.cpp.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/schema_type_parser.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-5x16-minmax-avx-broadcast.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x16-minmax-avx-broadcast.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-7x16-minmax-avx-broadcast.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/strtod.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-8x16-minmax-avx-broadcast.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/source_range.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x16-minmax-avx-broadcast.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-2x16-minmax-avx-broadcast.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-3x16-minmax-avx-broadcast.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x16-minmax-avx-broadcast.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-5x16-minmax-avx-broadcast.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Activation.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x16-minmax-avx-broadcast.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-7x16-minmax-avx-broadcast.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-8x16-minmax-avx-broadcast.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-avx-u8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-avx-u16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-avx-u24.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-avx-u32.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AdaptiveAveragePooling.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-avx-u8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-avx-u16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-avx-u24.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-avx-u32.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-avx-u8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-avx-u16-acc2.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AdaptiveAveragePooling3d.cpp.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AdaptiveMaxPooling2d.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-avx-u24-acc3.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-avx-u32-acc2.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-avx-u32-acc4.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AdaptiveMaxPooling3d.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-avx-u8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-avx-u16-acc2.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-avx-u24-acc3.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AffineGridGenerator.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-avx-u32-acc2.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-avx-u32-acc4.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-avx-u8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-avx-u16-acc2.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-avx-u24-acc3.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AmpKernels.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-avx-u32-acc2.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AutogradComposite.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-avx-u32-acc4.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-avx-u8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-avx-u16-acc2.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AveragePool2d.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-avx-u24-acc3.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-avx-u32-acc2.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-avx-u32-acc4.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-avx-u8.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/AveragePool3d.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-avx-u16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-avx-u8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-avx-u16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-avx-u8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-avx-u16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-avx-u8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-avx-u16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-avx-u8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-avx-u16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-avx-u8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-avx-u16.c.o [ 40%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/BatchLinearAlgebra.cpp.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-avx-u8.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-avx-u16.c.o [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-avx-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-avx-u16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-avx-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-avx-u16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-avx-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-avx-u16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-avx-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-avx-u16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-avx-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-avx-u16.c.o [ 41%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/BatchLinearAlgebraKernel.cpp.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-avx-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-avx-u16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-avx-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-avx-u16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-avx-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-avx-u16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-avx-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-avx-u16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vclamp/gen/f32-vclamp-avx-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vclamp/gen/f32-vclamp-avx-u16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx-rr2-lut4-p4-perm-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx-rr2-lut4-p4-perm-u16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx-rr2-lut4-p4-perm-u24.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx-rr2-lut4-p4-perm-u32.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx-rr2-lut4-p4-perm-u40.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx-rr2-lut4-p4-perm-u48.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx-rr2-lut16-p3-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx-rr2-lut16-p3-u16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx-rr2-lut16-p3-u24.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx-rr2-lut16-p3-u32.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx-rr2-lut16-p3-u40.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx-rr2-lut16-p3-u48.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx-rr2-p6-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx-rr2-p6-u16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx-rr2-p6-u24.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx-rr2-p6-u32.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx-rr2-p6-u40.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx-rr2-p6-u48.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-avx-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-avx-u16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-avx-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-avx-u16.c.o [ 41%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/BinaryOps.cpp.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-avx-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-avx-u16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-avx-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-avx-u16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-avx-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-avx-u16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-avx-u8.c.o [ 41%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Blas.cpp.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-avx-u16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-avx-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-avx-u16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrsqrt/gen/f32-vrsqrt-avx-rsqrt-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrsqrt/gen/f32-vrsqrt-avx-rsqrt-u16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrsqrt/gen/f32-vrsqrt-avx-rsqrt-u32.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx-rr2-p5-div-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx-rr2-p5-div-u16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx-rr2-p5-div-u24.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx-rr2-p5-div-u32.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx-rr2-p5-div-u40.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx-rr2-p5-div-u48.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx-rr2-p5-div-u56.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx-rr2-p5-div-u64.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx-rr2-p5-div-u72.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx-rr2-p5-div-u80.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx-rr2-p5-nr2-u8.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx-rr2-p5-nr2-u16.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx-rr2-p5-nr2-u24.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx-rr2-p5-nr2-u32.c.o [ 41%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/BlasKernel.cpp.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx-rr2-p5-nr2-u40.c.o [ 41%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx-rr2-p5-nr2-u48.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx-rr2-p5-nr2-u56.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx-rr2-p5-nr2-u64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx-rr2-p5-nr2-u72.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx-rr2-p5-nr2-u80.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Bucketization.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-avx-sqrt-u8.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-avx-sqrt-u16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-avx-sqrt-u32.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-lut4-p4h2ts-perm-div-u8.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-lut4-p4h2ts-perm-div-u16.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/CPUBlas.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-lut4-p4h2ts-perm-div-u24.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-lut4-p4h2ts-perm-div-u32.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-lut4-p4h2ts-perm-div-u40.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-lut4-p4h2ts-perm-div-u48.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-lut4-p4h2ts-perm-div-u56.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-lut4-p4h2ts-perm-div-u64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-lut4-p4h2ts-perm-div-u72.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-lut4-p4h2ts-perm-div-u80.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-lut8-p4h3ts-div-u8.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-lut8-p4h3ts-div-u16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-lut8-p4h3ts-div-u24.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-lut8-p4h3ts-div-u32.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-div-u8.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-div-u16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-div-u24.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-div-u32.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-div-u40.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-div-u48.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-div-u56.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-div-u64.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/CPUFallback.cpp.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ChanelShuffle.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-div-u72.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-div-u80.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-nr1-u8.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-nr1-u16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-nr1-u24.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-nr1-u32.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-nr1-u40.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-nr1-u48.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-nr1-u56.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-nr1-u64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-nr1-u72.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-nr1-u80.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-nr2-u8.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-nr2-u16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-nr2-u24.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-nr2-u32.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-nr2-u40.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Col2Im.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-nr2-u48.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-nr2-u56.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-nr2-u64.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-nr2-u72.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx-expm1minus-rr1-p6h5ts-nr2-u80.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vabs-avx-u8.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vabs-avx-u16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vneg-avx-u8.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ComparisonUtils.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vneg-avx-u16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vsqr-avx-u8.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vsqr-avx-u16.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-exp-avx-rr2-p5.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-avx-rr2-lut4-p4-perm.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-avx-rr2-lut16-p3.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-avx-rr2-p6.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx-rr2-lut64-p2-div.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx-rr2-p5-div.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx-rr2-p5-nr1.c.o [ 42%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Constraints.cpp.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx-rr2-p5-nr2.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx-expm1minus-rr1-lut4-p4h2ts-perm-div.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx-expm1minus-rr1-lut8-p4h3ps-div.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx-expm1minus-rr1-p6h5ts-div.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx-expm1minus-rr1-p6h5ts-nr1.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx-expm1minus-rr1-p6h5ts-nr2.c.o [ 42%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx-expm1minus-rr2-lut8-p4h2ts-nr1.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx-expm1minus-rr2-lut8-p4h2ts-nr2.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Convolution.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx-expm1minus-rr2-lut8-p4h3ps-nr1.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx-expm1minus-rr2-lut8-p4h3ps-nr2.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx-expm1minus-rr2-lut8-p4h3ts-nr1.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx-expm1minus-rr2-lut8-p4h3ts-nr2.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x4c8-minmax-avx-ld64.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x4c8-minmax-avx-ld128.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x4c8-minmax-avx-ld64.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x4c8-minmax-avx-ld128.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ConvolutionMM2d.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x4c8-minmax-avx-ld64.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x4c8-minmax-avx-ld128.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x4c8-minmax-avx-ld64.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x4c8-minmax-avx-ld128.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x4c8-minmax-avx-ld64.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x4c8-minmax-avx-ld128.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x4c8-minmax-avx-ld64.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x4c8-minmax-avx-ld128.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x4c8-minmax-avx-ld64.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x4c8-minmax-avx-ld128.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x4c8-minmax-avx-ld64.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x4c8-minmax-avx-ld128.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x4c8-minmax-avx-ld64.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x4c8-minmax-avx-ld128.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x4c8-minmax-avx-ld64.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x4c8-minmax-avx-ld128.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x4c8-minmax-avx-ld64.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x4c8-minmax-avx-ld128.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x4c8-minmax-avx-ld64.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x4c8-minmax-avx-ld128.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l8c4s4r-minmax-fp32-avx-mul32.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c4s4r-minmax-fp32-avx-mul32.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l8c4s4r-minmax-fp32-avx-mul32.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c4s4r-minmax-fp32-avx-mul32.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l8c4s4r-minmax-fp32-avx-mul32.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ConvolutionMM3d.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c4s4r-minmax-fp32-avx-mul32.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ConvolutionTBC.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p8c-minmax-fp32-avx-mul16-add16.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p8c-minmax-fp32-avx-mul16.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p8c-minmax-fp32-avx-mul32.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-fp32-avx-mul16-add16.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-fp32-avx-mul16.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-fp32-avx-mul32.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p8c-minmax-fp32-avx-mul16-add16.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p8c-minmax-fp32-avx-mul16.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p8c-minmax-fp32-avx-mul32.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-fp32-avx-mul16-add16.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-fp32-avx-mul16.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-fp32-avx-mul32.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-avx-u8.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-avx-u16.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-avx-u24.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-avx-u32.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p16c-minmax-fp32-avx-mul16-add16.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l8c4s4r-minmax-fp32-avx-mul32.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c4s4r-minmax-fp32-avx-mul32.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l8c4s4r-minmax-fp32-avx-mul32.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c4s4r-minmax-fp32-avx-mul32.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Copy.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l8c4s4r-minmax-fp32-avx-mul32.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c4s4r-minmax-fp32-avx-mul32.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-avx-mul16-add16.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-avx-mul16.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-avx-mul32.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-avx-mul16-add16.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-avx-mul16.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-avx-mul32.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-avx-mul16-add16.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Correlation.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-avx-mul16.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-avx-mul32.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-avx-mul16-add16.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-avx-mul16.c.o [ 43%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Cross.cpp.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-avx-mul32.c.o [ 43%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c2-minmax-fp32-avx-ld64.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c2-minmax-fp32-avx-ld128.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c2s4-minmax-fp32-avx-ld64.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c2s4-minmax-fp32-avx-ld128.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c8-minmax-fp32-avx-ld64.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c8-minmax-fp32-avx-ld128.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c2-minmax-fp32-avx-ld64.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c2-minmax-fp32-avx-ld128.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c2s4-minmax-fp32-avx-ld64.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c2s4-minmax-fp32-avx-ld128.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c8-minmax-fp32-avx-ld64.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/DilatedMaxPool2d.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c8-minmax-fp32-avx-ld128.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/DilatedMaxPool3d.cpp.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/DispatchStub.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c2-minmax-fp32-avx-ld64.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c2-minmax-fp32-avx-ld128.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c2s4-minmax-fp32-avx-ld64.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c2s4-minmax-fp32-avx-ld128.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Distance.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c8-minmax-fp32-avx-ld64.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c8-minmax-fp32-avx-ld128.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4c2-minmax-fp32-avx-ld64.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4c2-minmax-fp32-avx-ld128.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4c2s4-minmax-fp32-avx-ld64.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4c2s4-minmax-fp32-avx-ld128.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c2-minmax-fp32-avx-ld64.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c2-minmax-fp32-avx-ld128.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c2s4-minmax-fp32-avx-ld64.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c2s4-minmax-fp32-avx-ld128.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c8-minmax-fp32-avx-ld64.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c8-minmax-fp32-avx-ld128.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Distributions.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c2-minmax-fp32-avx-ld64.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c2-minmax-fp32-avx-ld128.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Dropout.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c2s4-minmax-fp32-avx-ld64.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c2s4-minmax-fp32-avx-ld128.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c8-minmax-fp32-avx-ld64.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c8-minmax-fp32-avx-ld128.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c2-minmax-fp32-avx-ld64.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c2-minmax-fp32-avx-ld128.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c2s4-minmax-fp32-avx-ld64.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c2s4-minmax-fp32-avx-ld128.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c8-minmax-fp32-avx-ld64.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Embedding.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c8-minmax-fp32-avx-ld128.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4c2-minmax-fp32-avx-ld64.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4c2-minmax-fp32-avx-ld128.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4c2s4-minmax-fp32-avx-ld64.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4c2s4-minmax-fp32-avx-ld128.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-avx-mul16-ld64-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-avx-mul16-ld64-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-avx-mul16-ld64-u24.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-avx-mul16-ld64-u32.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-avx-mul32-ld32-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-avx-mul32-ld32-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-avx-mul32-ld32-u24.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/EmbeddingBag.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-avx-mul32-ld32-u32.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-avx-mul16-ld64-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-avx-mul16-ld64-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-avx-mul16-ld64-u24.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-avx-mul16-ld64-u32.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-avx-mul32-ld32-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-avx-mul32-ld32-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-avx-mul32-ld32-u24.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-avx-mul32-ld32-u32.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Fill.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-avx-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-avx-u16.c.o [ 44%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ForeachOpsKernels.cpp.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-avx-u32.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-avx-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-avx-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vhswish/gen/qs8-vhswish-avx-u32.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-avx-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-avx-u16.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-avx-u32.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-avx-mul16-ld64-u8.c.o [ 44%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmul/gen/qs8-vmul-minmax-fp32-avx-mul16-ld64-u16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-avx-mul16-ld64-u8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vmulc/gen/qs8-vmulc-minmax-fp32-avx-mul16-ld64-u16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-avx-u4.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-avx-u8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs16-qs8-vcvt/gen/qs16-qs8-vcvt-avx-u16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l8c4s4r-minmax-fp32-avx-mul32.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l16c4s4r-minmax-fp32-avx-mul32.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l8c4s4r-minmax-fp32-avx-mul32.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l16c4s4r-minmax-fp32-avx-mul32.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l8c4s4r-minmax-fp32-avx-mul32.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l16c4s4r-minmax-fp32-avx-mul32.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p8c-minmax-fp32-avx-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p8c-minmax-fp32-avx-mul32.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/FractionalMaxPool2d.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p16c-minmax-fp32-avx-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p16c-minmax-fp32-avx-mul32.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p8c-minmax-fp32-avx-mul16.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/FractionalMaxPool3d.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p8c-minmax-fp32-avx-mul32.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p16c-minmax-fp32-avx-mul16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p16c-minmax-fp32-avx-mul32.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-avx-u8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-avx-u16.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-avx-u24.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-avx-u32.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c2-minmax-fp32-avx-ld64.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c2-minmax-fp32-avx-ld128.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c2s4-minmax-fp32-avx-ld64.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c2s4-minmax-fp32-avx-ld128.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c8-minmax-fp32-avx-ld64.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c8-minmax-fp32-avx-ld128.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c2-minmax-fp32-avx-ld64.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/FunctionOfAMatrixUtils.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c2-minmax-fp32-avx-ld128.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c2s4-minmax-fp32-avx-ld64.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c2s4-minmax-fp32-avx-ld128.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c8-minmax-fp32-avx-ld64.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/GatedLinearUnit.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c8-minmax-fp32-avx-ld128.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c2-minmax-fp32-avx-ld64.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c2-minmax-fp32-avx-ld128.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c2s4-minmax-fp32-avx-ld64.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c2s4-minmax-fp32-avx-ld128.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c8-minmax-fp32-avx-ld64.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c8-minmax-fp32-avx-ld128.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/GridSampler.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4c2-minmax-fp32-avx-ld64.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4c2-minmax-fp32-avx-ld128.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4c2s4-minmax-fp32-avx-ld64.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4c2s4-minmax-fp32-avx-ld128.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Histogram.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c2-minmax-fp32-avx-ld64.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c2-minmax-fp32-avx-ld128.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c2s4-minmax-fp32-avx-ld64.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c2s4-minmax-fp32-avx-ld128.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c8-minmax-fp32-avx-ld64.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c8-minmax-fp32-avx-ld128.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Im2Col.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c2-minmax-fp32-avx-ld64.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c2-minmax-fp32-avx-ld128.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c2s4-minmax-fp32-avx-ld64.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c2s4-minmax-fp32-avx-ld128.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c8-minmax-fp32-avx-ld64.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c8-minmax-fp32-avx-ld128.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c2-minmax-fp32-avx-ld64.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c2-minmax-fp32-avx-ld128.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c2s4-minmax-fp32-avx-ld64.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c2s4-minmax-fp32-avx-ld128.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c8-minmax-fp32-avx-ld64.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c8-minmax-fp32-avx-ld128.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4c2-minmax-fp32-avx-ld64.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4c2-minmax-fp32-avx-ld128.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4c2s4-minmax-fp32-avx-ld64.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4c2s4-minmax-fp32-avx-ld128.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-avx-mul16-ld64-u8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-avx-mul16-ld64-u16.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/IndexingUtils.cpp.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-avx-mul32-ld32-u8.c.o [ 45%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-avx-mul32-ld32-u16.c.o [ 45%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Integration.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-avx-mul16-ld64-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-avx-mul16-ld64-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-avx-mul32-ld32-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-avx-mul32-ld32-u16.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Itertools.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-avx-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-avx-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-avx-u32.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-avx-u8.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LegacyBatching.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-avx-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vhswish/gen/qu8-vhswish-avx-u32.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-avx-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-avx-u16.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LegacyBridge.cpp.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Lerp.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-avx-u32.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Linear.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-avx-mul16-ld64-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmul/gen/qu8-vmul-minmax-fp32-avx-mul16-ld64-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-avx-mul16-ld64-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vmulc/gen/qu8-vmulc-minmax-fp32-avx-mul16-ld64-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-avx-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-avx-u32.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LinearAlgebra.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-avx-u48.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Loss.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-avx-u64.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8-gemm-goi-avx-u4-prfm.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8-gemm-goi-avx-u4.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8s4-gemm-goi-avx-u4-prfm.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x8s4-gemm-goi-avx-u4.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-avx-u4-prfm.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-avx-u4.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16s4-gemm-goi-avx-u4-prfm.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16s4-gemm-goi-avx-u4.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossCTC.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-8x8-multi-mov-avx.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-8x8-multi-switch-avx.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-8x8-reuse-mov-avx.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossMultiLabelMargin.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-8x8-reuse-multi-avx.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-transposec/gen/x32-transposec-8x8-reuse-switch-avx.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-4x4-multi-mov-avx.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-4x4-multi-multi-avx.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-4x4-multi-switch-avx.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-4x4-reuse-mov-avx.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-4x4-reuse-multi-avx.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x64-transposec/gen/x64-transposec-4x4-reuse-switch-avx.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-avgpool/f16-avgpool-9p8x-minmax-f16c-c8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-avgpool/f16-avgpool-9x-minmax-f16c-c8.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossMultiMargin.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-f16c-u8.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossNLL.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-f16c-u16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-rsum/gen/f16-f32acc-rsum-f16c-u8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-rsum/gen/f16-f32acc-rsum-f16c-u16-acc2.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-rsum/gen/f16-f32acc-rsum-f16c-u24-acc3.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/LossNLL2d.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-rsum/gen/f16-f32acc-rsum-f16c-u32-acc2.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-rsum/gen/f16-f32acc-rsum-f16c-u32-acc4.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7p7x-minmax-f16c-c8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7p7x-minmax-f16c-c16.c.o [ 46%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxPooling.cpp.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7p7x-minmax-f16c-c24.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7p7x-minmax-f16c-c32.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7x-minmax-f16c-c8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7x-minmax-f16c-c16.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7x-minmax-f16c-c24.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gavgpool/gen/f16-gavgpool-7x-minmax-f16c-c32.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-maxpool/f16-maxpool-9p8x-minmax-f16c-c8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-prelu/gen/f16-prelu-f16c-2x8.c.o [ 46%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-prelu/gen/f16-prelu-f16c-2x16.c.o [ 47%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MaxUnpooling.cpp.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-rminmax/f16-rmax-f16c-u32.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vadd-minmax-f16c-u8.c.o [ 47%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Memory.cpp.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vadd-minmax-f16c-u16.c.o [ 47%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/MetaTensor.cpp.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vaddc-minmax-f16c-u8.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vaddc-minmax-f16c-u16.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdiv-minmax-f16c-u8.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdiv-minmax-f16c-u16.c.o [ 47%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/NNPACK.cpp.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdivc-minmax-f16c-u8.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vdivc-minmax-f16c-u16.c.o [ 47%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/NaiveConvolutionTranspose2d.cpp.o [ 47%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/NaiveConvolutionTranspose3d.cpp.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmax-f16c-u8.c.o [ 47%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmax-f16c-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmaxc-f16c-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmaxc-f16c-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmin-f16c-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmin-f16c-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vminc-f16c-u8.c.o [ 48%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/NaiveDilatedConvolution.cpp.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vminc-f16c-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmul-minmax-f16c-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmul-minmax-f16c-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmulc-minmax-f16c-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vmulc-minmax-f16c-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrdivc-minmax-f16c-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrdivc-minmax-f16c-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrsubc-minmax-f16c-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vrsubc-minmax-f16c-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiff-f16c-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiff-f16c-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiffc-f16c-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsqrdiffc-f16c-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsub-minmax-f16c-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsub-minmax-f16c-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsubc-minmax-f16c-u8.c.o [ 48%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/NamedTensor.cpp.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vbinary/gen/f16-vsubc-minmax-f16c-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vclamp/gen/f16-vclamp-f16c-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vclamp/gen/f16-vclamp-f16c-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vhswish/gen/f16-vhswish-f16c-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vhswish/gen/f16-vhswish-f16c-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vlrelu/gen/f16-vlrelu-f16c-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vlrelu/gen/f16-vlrelu-f16c-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndd-f16c-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndd-f16c-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndne-f16c-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndne-f16c-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndu-f16c-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndu-f16c-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndz-f16c-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vrnd/gen/f16-vrndz-f16c-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-f16c-rsqrt-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-f16c-rsqrt-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-f16c-rsqrt-u32.c.o [ 48%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/NegateFallback.cpp.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-f16c-sqrt-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-f16c-sqrt-u8.c.o [ 48%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Normalization.cpp.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsqrt/gen/f16-vsqrt-f16c-sqrt-u32.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-expm1minus-rr1-p3h2ts-div-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-expm1minus-rr1-p3h2ts-div-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-expm1minus-rr1-p3h2ts-div-u24.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-expm1minus-rr1-p3h2ts-div-u32.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-expm1minus-rr1-p3h2ts-div-u40.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-expm1minus-rr1-p3h2ts-div-u48.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-expm1minus-rr1-p3h2ts-div-u56.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-expm1minus-rr1-p3h2ts-div-u64.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-expm1minus-rr1-p3h2ts-div-u72.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-expm1minus-rr1-p3h2ts-div-u80.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-expm1minus-rr1-p3h2ts-rcp-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-expm1minus-rr1-p3h2ts-rcp-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-expm1minus-rr1-p3h2ts-rcp-u24.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-expm1minus-rr1-p3h2ts-rcp-u32.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-expm1minus-rr1-p3h2ts-rcp-u40.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-expm1minus-rr1-p3h2ts-rcp-u48.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-expm1minus-rr1-p3h2ts-rcp-u56.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-expm1minus-rr1-p3h2ts-rcp-u64.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-expm1minus-rr1-p3h2ts-rcp-u72.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-expm1minus-rr1-p3h2ts-rcp-u80.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-polynomial-p19h9t2-u8.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-polynomial-p19h9t2-u16.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-polynomial-p19h9t2-u24.c.o [ 48%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Onehot.cpp.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-polynomial-p19h9t2-u32.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-polynomial-p19h9t2-u40.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-polynomial-p19h9t2-u48.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-polynomial-p19h9t2-u56.c.o [ 48%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-polynomial-p19h9t2-u64.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-polynomial-p19h9t2-u72.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-f16c-polynomial-p19h9t2-u80.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vunary/gen/f16-vsqr-f16c-u8.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vunary/gen/f16-vsqr-f16c-u16.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-f16c-u8.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-f16c-u16.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-f32-cvt-f16c.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-f16-cvt-f16c.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-f16c-expm1minus-rr1-p3h2ts-div.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-f16c-expm1minus-rr1-p3h2ts-rcp.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-f16c-polynomial-p17h8t2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-f16c-polynomial-p19h9t2.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x4c8-minmax-xop-ld64.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/PackedSequence.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x4c8-minmax-xop-ld128.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x4c8-minmax-xop-ld64.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x4c8-minmax-xop-ld128.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x4c8-minmax-xop-ld64.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x4c8-minmax-xop-ld128.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x4c8-minmax-xop-ld64.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x4c8-minmax-xop-ld128.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x4c8-minmax-xop-ld64.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x4c8-minmax-xop-ld128.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x4c8-minmax-xop-ld64.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x4c8-minmax-xop-ld128.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x4c8-minmax-xop-ld64.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x4c8-minmax-xop-ld128.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x4c8-minmax-xop-ld64.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x4c8-minmax-xop-ld128.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x4c8-minmax-xop-ld64.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/PadNd.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x4c8-minmax-xop-ld128.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x4c8-minmax-xop-ld64.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/PixelShuffle.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x4c8-minmax-xop-ld128.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/PointwiseOps.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x4c8-minmax-xop-ld64.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x4c8-minmax-xop-ld128.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x4c8-minmax-xop-ld64.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x4c8-minmax-xop-ld128.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Pooling.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l8c4s4r-minmax-fp32-xop-mul32.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c4s4r-minmax-fp32-xop-mul32.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l8c4s4r-minmax-fp32-xop-mul32.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Pow.cpp.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/QuantizedLinear.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c4s4r-minmax-fp32-xop-mul32.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l8c4s4r-minmax-fp32-xop-mul32.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c4s4r-minmax-fp32-xop-mul32.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/RNN.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p8c-minmax-fp32-xop-mul16-add16.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p8c-minmax-fp32-xop-mul32.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-fp32-xop-mul16-add16.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-fp32-xop-mul32.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/RangeFactories.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p8c-minmax-fp32-xop-mul16-add16.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p8c-minmax-fp32-xop-mul32.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-fp32-xop-mul16-add16.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-fp32-xop-mul32.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ReduceAllOps.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p16c-minmax-fp32-xop-mul16-add16.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l8c4s4r-minmax-fp32-xop-mul32.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c4s4r-minmax-fp32-xop-mul32.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l8c4s4r-minmax-fp32-xop-mul32.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c4s4r-minmax-fp32-xop-mul32.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l8c4s4r-minmax-fp32-xop-mul32.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ReduceOps.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c4s4r-minmax-fp32-xop-mul32.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ReflectionPad.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-xop-mul16-add16.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-xop-mul32.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-xop-mul16-add16.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-xop-mul32.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-xop-mul16-add16.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-xop-mul32.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-xop-mul16-add16.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-xop-mul32.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Repeat.cpp.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c2-minmax-fp32-xop-ld64.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c2-minmax-fp32-xop-ld128.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c2s4-minmax-fp32-xop-ld64.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c2s4-minmax-fp32-xop-ld128.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c8-minmax-fp32-xop-ld64.c.o [ 49%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x4c8-minmax-fp32-xop-ld128.c.o [ 49%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ReplicationPadding.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c2-minmax-fp32-xop-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c2-minmax-fp32-xop-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c2s4-minmax-fp32-xop-ld64.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Resize.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c2s4-minmax-fp32-xop-ld128.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/RowwisePrune.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c8-minmax-fp32-xop-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x4c8-minmax-fp32-xop-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c2-minmax-fp32-xop-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c2-minmax-fp32-xop-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c2s4-minmax-fp32-xop-ld64.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Scalar.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c2s4-minmax-fp32-xop-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c8-minmax-fp32-xop-ld64.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/SegmentReduce.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x4c8-minmax-fp32-xop-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4c2-minmax-fp32-xop-ld64.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/SobolEngineOps.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4c2-minmax-fp32-xop-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4c2s4-minmax-fp32-xop-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x4c2s4-minmax-fp32-xop-ld128.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/SobolEngineOpsUtils.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c2-minmax-fp32-xop-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c2-minmax-fp32-xop-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c2s4-minmax-fp32-xop-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c2s4-minmax-fp32-xop-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c8-minmax-fp32-xop-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x4c8-minmax-fp32-xop-ld128.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/SoftMax.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c2-minmax-fp32-xop-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c2-minmax-fp32-xop-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c2s4-minmax-fp32-xop-ld64.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Sorting.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c2s4-minmax-fp32-xop-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c8-minmax-fp32-xop-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x4c8-minmax-fp32-xop-ld128.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/SparseTensorUtils.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c2-minmax-fp32-xop-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c2-minmax-fp32-xop-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c2s4-minmax-fp32-xop-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c2s4-minmax-fp32-xop-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c8-minmax-fp32-xop-ld64.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/SpectralOps.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x4c8-minmax-fp32-xop-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4c2-minmax-fp32-xop-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4c2-minmax-fp32-xop-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4c2s4-minmax-fp32-xop-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x4c2s4-minmax-fp32-xop-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-xop-mul32-ld32-u8.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-xop-mul32-ld32-u16.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-xop-mul32-ld32-u24.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-xop-mul32-ld32-u32.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/SummaryOps.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-xop-mul32-ld32-u8.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-xop-mul32-ld32-u16.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-xop-mul32-ld32-u24.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-xop-mul32-ld32-u32.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l8c4s4r-minmax-fp32-xop-mul32.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l16c4s4r-minmax-fp32-xop-mul32.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorAdvancedIndexing.cpp.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorCompare.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l8c4s4r-minmax-fp32-xop-mul32.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l16c4s4r-minmax-fp32-xop-mul32.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l8c4s4r-minmax-fp32-xop-mul32.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorConversions.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l16c4s4r-minmax-fp32-xop-mul32.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p8c-minmax-fp32-xop-mul32.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p16c-minmax-fp32-xop-mul32.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p8c-minmax-fp32-xop-mul32.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p16c-minmax-fp32-xop-mul32.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c2-minmax-fp32-xop-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c2-minmax-fp32-xop-ld128.c.o [ 50%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorFactories.cpp.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c2s4-minmax-fp32-xop-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c2s4-minmax-fp32-xop-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c8-minmax-fp32-xop-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x4c8-minmax-fp32-xop-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c2-minmax-fp32-xop-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c2-minmax-fp32-xop-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c2s4-minmax-fp32-xop-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c2s4-minmax-fp32-xop-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c8-minmax-fp32-xop-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x4c8-minmax-fp32-xop-ld128.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c2-minmax-fp32-xop-ld64.c.o [ 50%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c2-minmax-fp32-xop-ld128.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c2s4-minmax-fp32-xop-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c2s4-minmax-fp32-xop-ld128.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c8-minmax-fp32-xop-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x4c8-minmax-fp32-xop-ld128.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4c2-minmax-fp32-xop-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4c2-minmax-fp32-xop-ld128.c.o [ 51%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorIteratorReduce.cpp.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4c2s4-minmax-fp32-xop-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x4c2s4-minmax-fp32-xop-ld128.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c2-minmax-fp32-xop-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c2-minmax-fp32-xop-ld128.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c2s4-minmax-fp32-xop-ld64.c.o [ 51%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorProperties.cpp.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c2s4-minmax-fp32-xop-ld128.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c8-minmax-fp32-xop-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x4c8-minmax-fp32-xop-ld128.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c2-minmax-fp32-xop-ld64.c.o [ 51%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorShape.cpp.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c2-minmax-fp32-xop-ld128.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c2s4-minmax-fp32-xop-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c2s4-minmax-fp32-xop-ld128.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c8-minmax-fp32-xop-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x4c8-minmax-fp32-xop-ld128.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c2-minmax-fp32-xop-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c2-minmax-fp32-xop-ld128.c.o [ 51%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TensorTransformations.cpp.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c2s4-minmax-fp32-xop-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c2s4-minmax-fp32-xop-ld128.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c8-minmax-fp32-xop-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x4c8-minmax-fp32-xop-ld128.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4c2-minmax-fp32-xop-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4c2-minmax-fp32-xop-ld128.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4c2s4-minmax-fp32-xop-ld64.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x4c2s4-minmax-fp32-xop-ld128.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-xop-mul32-ld32-u8.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-xop-mul32-ld32-u16.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-xop-mul32-ld32-u8.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-xop-mul32-ld32-u16.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-3p8c-minmax-fma3-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-3p8c-minmax-fma3.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-3p16c-minmax-fma3-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-3p16c-minmax-fma3.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-3p32c-minmax-fma3-acc2.c.o [ 51%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TestOps.cpp.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-3p32c-minmax-fma3.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-4p8c-minmax-fma3-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-4p8c-minmax-fma3.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-4p16c-minmax-fma3-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-4p16c-minmax-fma3.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-4p32c-minmax-fma3-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-4p32c-minmax-fma3.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-5f5m5l8c8s4r-minmax-fma3-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-5f5m5l8c8s4r-minmax-fma3.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-5f5m5l16c8s4r-minmax-fma3-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-5f5m5l16c8s4r-minmax-fma3.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-5f5m5l32c8s4r-minmax-fma3-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-5f5m5l32c8s4r-minmax-fma3.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-6f6m7l8c8s4r-minmax-fma3-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-6f6m7l8c8s4r-minmax-fma3.c.o [ 51%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TriangularOps.cpp.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-6f6m7l16c8s4r-minmax-fma3-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-6f6m7l16c8s4r-minmax-fma3.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-6f6m7l32c8s4r-minmax-fma3-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-6f6m7l32c8s4r-minmax-fma3.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-8f8m9l8c8s4r-minmax-fma3-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-8f8m9l8c8s4r-minmax-fma3.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-8f8m9l16c8s4r-minmax-fma3-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-8f8m9l16c8s4r-minmax-fma3.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-8f8m9l32c8s4r-minmax-fma3-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-8f8m9l32c8s4r-minmax-fma3.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-9p8c-minmax-fma3-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-9p8c-minmax-fma3.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-9p16c-minmax-fma3-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-9p16c-minmax-fma3.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-9p32c-minmax-fma3-acc2.c.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-9p32c-minmax-fma3.c.o [ 51%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/TypeProperties.cpp.o [ 51%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-25p8c-minmax-fma3-acc2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-25p8c-minmax-fma3.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-25p16c-minmax-fma3-acc2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-25p16c-minmax-fma3.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-25p32c-minmax-fma3-acc2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-dwconv/gen/f16-dwconv-25p32c-minmax-fma3.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-ibilinear/gen/f16-ibilinear-fma3-c8.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-ibilinear/gen/f16-ibilinear-fma3-c16.c.o [ 52%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UnaryOps.cpp.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vmulcaddc/gen/f16-vmulcaddc-c8-minmax-fma3-2x.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vmulcaddc/gen/f16-vmulcaddc-c16-minmax-fma3-2x.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-expm1minus-rr1-p3h2ts-div-u8.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-expm1minus-rr1-p3h2ts-div-u16.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-expm1minus-rr1-p3h2ts-div-u24.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-expm1minus-rr1-p3h2ts-div-u32.c.o [ 52%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Unfold2d.cpp.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-expm1minus-rr1-p3h2ts-div-u40.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-expm1minus-rr1-p3h2ts-div-u48.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-expm1minus-rr1-p3h2ts-div-u56.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-expm1minus-rr1-p3h2ts-div-u64.c.o [ 52%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Unfold3d.cpp.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-expm1minus-rr1-p3h2ts-div-u72.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-expm1minus-rr1-p3h2ts-div-u80.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-expm1minus-rr1-p3h2ts-rcp-u8.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-expm1minus-rr1-p3h2ts-rcp-u16.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-expm1minus-rr1-p3h2ts-rcp-u24.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-expm1minus-rr1-p3h2ts-rcp-u32.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-expm1minus-rr1-p3h2ts-rcp-u40.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-expm1minus-rr1-p3h2ts-rcp-u48.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-expm1minus-rr1-p3h2ts-rcp-u56.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-expm1minus-rr1-p3h2ts-rcp-u64.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-expm1minus-rr1-p3h2ts-rcp-u72.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-expm1minus-rr1-p3h2ts-rcp-u80.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-polynomial-p19h9t2-u8.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-polynomial-p19h9t2-u16.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-polynomial-p19h9t2-u24.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-polynomial-p19h9t2-u32.c.o [ 52%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UnfoldBackward.cpp.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-polynomial-p19h9t2-u40.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-polynomial-p19h9t2-u48.c.o [ 52%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/Unique.cpp.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-polynomial-p19h9t2-u56.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-polynomial-p19h9t2-u64.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-polynomial-p19h9t2-u72.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-fma3-polynomial-p19h9t2-u80.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p8c-minmax-fma3-acc2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p8c-minmax-fma3.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p16c-minmax-fma3-acc2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p16c-minmax-fma3.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p8c-minmax-fma3-acc2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p8c-minmax-fma3.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p16c-minmax-fma3-acc2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p16c-minmax-fma3.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l8c8s4r-minmax-fma3-acc2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l8c8s4r-minmax-fma3.c.o [ 52%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSample.cpp.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l16c8s4r-minmax-fma3-acc2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l16c8s4r-minmax-fma3.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l32c8s4r-minmax-fma3-acc2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l32c8s4r-minmax-fma3.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-7f6m6l8c8s4r-minmax-fma3-acc2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-7f6m6l8c8s4r-minmax-fma3.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-7f6m6l16c8s4r-minmax-fma3-acc2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-7f6m6l16c8s4r-minmax-fma3.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-7f6m6l32c8s4r-minmax-fma3-acc2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-7f6m6l32c8s4r-minmax-fma3.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p8c-minmax-fma3-acc2.c.o [ 52%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSampleBicubic2d.cpp.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p8c-minmax-fma3.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p16c-minmax-fma3-acc2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p16c-minmax-fma3.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p8c-minmax-fma3-acc2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p8c-minmax-fma3.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p16c-minmax-fma3-acc2.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p16c-minmax-fma3.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x8-minmax-fma3-broadcast.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x16-minmax-fma3-broadcast.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x16s4-minmax-fma3-broadcast.c.o [ 52%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-3x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-3x16s4-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x8-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x16s4-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x8-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x16s4-minmax-fma3-broadcast.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSampleBilinear2d.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x8-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x16s4-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-7x8-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-8x8-minmax-fma3-broadcast.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSampleLinear1d.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x8-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x16s4-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-3x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-3x16s4-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x8-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x16s4-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-5x8-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-5x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-5x16s4-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x8-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x16s4-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-7x8-minmax-fma3-broadcast.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSampleNearest1d.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-8x8-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x8-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x16-minmax-fma3-broadcast.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSampleNearest2d.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x16s4-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-3x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-3x16s4-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x8-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x16s4-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-5x8-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-5x16-minmax-fma3-broadcast-prfm.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-5x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-5x16s4-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x8-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x16-minmax-fma3-broadcast-prfm.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x16s4-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-7x8-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-8x8-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x16-minmax-fma3-broadcast.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSampleNearest3d.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-2x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-3x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-5x16-minmax-fma3-broadcast.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/UpSampleTrilinear3d.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-7x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-8x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-2x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-3x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-5x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-7x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-8x16-minmax-fma3-broadcast.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-fma3-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-fma3-u16.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrsqrt/gen/f32-vrsqrt-fma3-rsqrt-u8.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrsqrt/gen/f32-vrsqrt-fma3-rsqrt-u16.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/VariableMethodStubs.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrsqrt/gen/f32-vrsqrt-fma3-rsqrt-u32.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/WeightNorm.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-fma3-nr1fma1adj-u8.c.o [ 53%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/group_norm.cpp.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-fma3-nr1fma1adj-u16.c.o [ 53%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-fma3-nr1fma1adj-u32.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-div-u8.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-div-u16.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-div-u24.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-div-u32.c.o [ 54%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/layer_norm.cpp.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-div-u40.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-div-u48.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-div-u56.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-div-u64.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-div-u72.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-div-u80.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u8.c.o [ 54%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/prim_native_functions.cpp.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u16.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u24.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u32.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u40.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u48.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u56.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u64.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u72.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u80.c.o [ 54%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/verbose_wrapper.cpp.o [ 54%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/library.cpp.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut8-p4h3ts-div-u8.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut8-p4h3ts-div-u16.c.o [ 54%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/quantized/cpu/fbgemm_utils.cpp.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut8-p4h3ts-div-u24.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut8-p4h3ts-div-u32.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut8-p4h3ts-nr1adj-u8.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut8-p4h3ts-nr1adj-u16.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut8-p4h3ts-nr1adj-u24.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-lut8-p4h3ts-nr1adj-u32.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-div-u8.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-div-u16.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-div-u24.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-div-u32.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-div-u40.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-div-u48.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-div-u56.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-div-u64.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-div-u72.c.o [ 54%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear.cpp.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-div-u80.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-nr1-u8.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-nr1-u16.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-nr1-u24.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-nr1-u32.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-nr1-u40.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-nr1-u48.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-nr1-u56.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-nr1-u64.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-nr1-u72.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-nr1-u80.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-nr1adj-u8.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-nr1adj-u16.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-nr1adj-u24.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-nr1adj-u32.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-nr1adj-u40.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-nr1adj-u48.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-nr1adj-u56.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-nr1adj-u64.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-nr1adj-u72.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-fma3-expm1minus-rr1-p6h5ts-nr1adj-u80.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-fma3-nr1fma1adj.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-fma3-nr1fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-fma3-nr2fma.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-fma3-expm1minus-rr1-p3h2ts-div.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-fma3-expm1minus-rr1-p3h2ts-rcp.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-fma3-polynomial-p17h8t2.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-fma3-polynomial-p19h9t2.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-div.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma3-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj.c.o [ 54%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_deserialize.cpp.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma3-expm1minus-rr1-lut8-p4h3ps-div.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma3-expm1minus-rr1-lut8-p4h3ps-nr1.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma3-expm1minus-rr1-lut8-p4h3ps-nr1adj.c.o [ 54%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma3-expm1minus-rr1-p6h5ts-div.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma3-expm1minus-rr1-p6h5ts-nr1.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-fma3-expm1minus-rr1-p6h5ts-nr1adj.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-gemm/gen/f16-f32acc-gemm-1x8-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-gemm/gen/f16-f32acc-gemm-1x16-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-gemm/gen/f16-f32acc-gemm-3x16-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-gemm/gen/f16-f32acc-gemm-4x8-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-gemm/gen/f16-f32acc-gemm-4x16-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-gemm/gen/f16-f32acc-gemm-5x8-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-gemm/gen/f16-f32acc-gemm-5x16-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-gemm/gen/f16-f32acc-gemm-6x8-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-gemm/gen/f16-f32acc-gemm-7x8-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-igemm/gen/f16-f32acc-igemm-1x8-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-igemm/gen/f16-f32acc-igemm-1x16-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-igemm/gen/f16-f32acc-igemm-3x16-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-igemm/gen/f16-f32acc-igemm-4x8-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-igemm/gen/f16-f32acc-igemm-4x16-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-igemm/gen/f16-f32acc-igemm-5x8-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-igemm/gen/f16-f32acc-igemm-5x16-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-igemm/gen/f16-f32acc-igemm-6x8-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32acc-igemm/gen/f16-f32acc-igemm-7x8-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-1x8-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-1x16-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-3x16-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-4x8-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-4x16-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-5x8-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-5x16-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-6x8-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-7x8-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-1x8-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-1x16-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-3x16-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-4x8-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-4x16-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-5x8-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-5x16-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-6x8-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-igemm/gen/f16-igemm-7x8-minmax-avx2-broadcast.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-pavgpool/f16-pavgpool-9p8x-minmax-avx2-c8.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-pavgpool/f16-pavgpool-9x-minmax-avx2-c8.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-avx2-rr1-p2-u32-acc2.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-avx2-rr1-p2-u32-acc4.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-avx2-rr1-p2-u32.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-avx2-rr1-p2-u40-acc2.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-avx2-rr1-p2-u40-acc5.c.o [ 55%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-avx2-rr1-p2-u40.c.o [ 56%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_dynamic.cpp.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-avx2-rr1-p2-u48-acc2.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-avx2-rr1-p2-u48-acc3.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-avx2-rr1-p2-u48.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-avx2-rr1-p2-u64-acc2.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-avx2-rr1-p2-u64-acc4.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-avx2-rr1-p2-u64.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-avx2-rr1-p2-u72-acc3.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-avx2-rr1-p2-u72.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-avx2-rr1-p2-u80-acc2.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-avx2-rr1-p2-u80-acc5.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-avx2-rr1-p2-u80.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-avx2-rr1-p2-u96-acc2.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-avx2-rr1-p2-u96-acc3.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-avx2-rr1-p2-u96-acc6.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-raddstoreexpminusmax/gen/f16-raddstoreexpminusmax-avx2-rr1-p2-u96.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-velu/gen/f16-velu-avx2-rr1-p3-u8.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-velu/gen/f16-velu-avx2-rr1-p3-u16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-avx2-rr1-p2-div-u8.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-avx2-rr1-p2-div-u16.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-avx2-rr1-p2-div-u24.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-avx2-rr1-p2-div-u32.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-avx2-rr1-p2-div-u40.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-avx2-rr1-p2-div-u48.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-avx2-rr1-p2-div-u56.c.o [ 56%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-avx2-rr1-p2-div-u64.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-avx2-rr1-p2-rcp-u8.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-avx2-rr1-p2-rcp-u16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-avx2-rr1-p2-rcp-u24.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-avx2-rr1-p2-rcp-u32.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-avx2-rr1-p2-rcp-u40.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-avx2-rr1-p2-rcp-u48.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-avx2-rr1-p2-rcp-u56.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vsigmoid/gen/f16-vsigmoid-avx2-rr1-p2-rcp-u64.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-avx2-expm1minus-rr1-p3h2ts-div-u8.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-avx2-expm1minus-rr1-p3h2ts-div-u16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-avx2-expm1minus-rr1-p3h2ts-div-u24.c.o [ 57%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_prepack.cpp.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-avx2-expm1minus-rr1-p3h2ts-div-u32.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-avx2-expm1minus-rr1-p3h2ts-div-u40.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-avx2-expm1minus-rr1-p3h2ts-div-u48.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-avx2-expm1minus-rr1-p3h2ts-div-u56.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-avx2-expm1minus-rr1-p3h2ts-div-u64.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-avx2-expm1minus-rr1-p3h2ts-div-u72.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-avx2-expm1minus-rr1-p3h2ts-div-u80.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-avx2-expm1minus-rr1-p3h2ts-rcp-u8.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-avx2-expm1minus-rr1-p3h2ts-rcp-u16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-avx2-expm1minus-rr1-p3h2ts-rcp-u24.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-avx2-expm1minus-rr1-p3h2ts-rcp-u32.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-avx2-expm1minus-rr1-p3h2ts-rcp-u40.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-avx2-expm1minus-rr1-p3h2ts-rcp-u48.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-avx2-expm1minus-rr1-p3h2ts-rcp-u56.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-avx2-expm1minus-rr1-p3h2ts-rcp-u64.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-avx2-expm1minus-rr1-p3h2ts-rcp-u72.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-vtanh/gen/f16-vtanh-avx2-expm1minus-rr1-p3h2ts-rcp-u80.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-1x16-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-2x16-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-3x16-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-4x16-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-5x16-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-6x16-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-7x16-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc4w-gemm/gen/f32-qc4w-gemm-8x16-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x8-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x16-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x16s4-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-2x16-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-2x16s4-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-3x16-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-3x16s4-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x8-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x16-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x16s4-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-5x8-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-5x16-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-5x16s4-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x8-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x16-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x16s4-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-7x8-minmax-avx2-broadcast.c.o [ 57%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_serialize.cpp.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-7x16-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-8x8-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-8x16-minmax-avx2-broadcast.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-avx2-u16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-avx2-u32.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-avx2-u48.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-avx2-u64.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-avx2-u16.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-avx2-u32.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-avx2-u48.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-avx2-u64.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx2-p5-u64-acc2.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx2-p5-u64-acc4.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx2-p5-u64.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx2-p5-u72.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx2-p5-u72-acc3.c.o [ 57%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx2-p5-u80-acc2.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx2-p5-u80-acc5.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_unpack.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx2-p5-u80.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx2-p5-u96-acc2.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx2-p5-u96-acc3.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx2-p5-u96-acc6.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx2-p5-u96.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx2-p5-u64-acc2.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx2-p5-u64-acc4.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx2-p5-u64.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx2-p5-u72-acc3.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx2-p5-u72.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx2-p5-u80-acc2.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx2-p5-u80-acc5.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx2-p5-u80.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx2-p5-u96-acc2.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx2-p5-u96-acc3.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx2-p5-u96-acc6.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx2-p5-u96.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx2-rr1-p5-u64-acc2.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx2-rr1-p5-u64-acc4.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx2-rr1-p5-u64.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/FlattenIndicesKernel.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx2-rr1-p5-u72-acc3.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx2-rr1-p5-u72.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx2-rr1-p5-u80-acc2.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx2-rr1-p5-u80-acc5.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx2-rr1-p5-u80.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx2-rr1-p5-u96-acc2.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/ParamUtils.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx2-rr1-p5-u96-acc3.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx2-rr1-p5-u96-acc6.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx2-rr1-p5-u96.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut4-p4-perm-u8.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut4-p4-perm-u16.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut4-p4-perm-u24.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut4-p4-perm-u32.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut4-p4-perm-u40.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut4-p4-perm-u48.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut4-p4-perm-u56.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut4-p4-perm-u64.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut4-p4-perm-u72.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut4-p4-perm-u80.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut8-p4-perm-u8.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut8-p4-perm-u16.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SoftMax.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut8-p4-perm-u24.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut8-p4-perm-u32.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut8-p4-perm-u40.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut8-p4-perm-u48.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut8-p4-perm-u56.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseBinaryOpIntersectionKernel.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut8-p4-perm-u64.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut8-p4-perm-u72.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut8-p4-perm-u80.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut16-p3-gather-u8.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut16-p3-gather-u16.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseBlas.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut16-p3-gather-u24.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut16-p3-gather-u32.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut16-p3-gather-u40.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut16-p3-gather-u48.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut16-p3-gather-u56.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut16-p3-gather-u64.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut16-p3-gather-u72.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseBlasImpl.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-lut16-p3-gather-u80.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-p6-u8.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-p6-u16.c.o [ 58%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseCsrTensor.cpp.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-p6-u24.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-p6-u32.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-p6-u40.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-p6-u48.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-p6-u56.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-p6-u64.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-p6-u72.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx2-rr1-p6-u80.c.o [ 58%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx2-p5-u8.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx2-p5-u16.c.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx2-p5-u24.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx2-p5-u32.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx2-p5-u40.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx2-p5-u48.c.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseFactories.cpp.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx2-p5-u56.c.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseMatMul.cpp.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx2-p5-u64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx2-p5-u72.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx2-p5-u80.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx2-p5-u88.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx2-p5-u96.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx2-p5-u8.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx2-p5-u16.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx2-p5-u24.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx2-p5-u32.c.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseTensor.cpp.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx2-p5-u40.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx2-p5-u48.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx2-p5-u56.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx2-p5-u64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx2-p5-u72.c.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseTensorMath.cpp.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx2-p5-u80.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx2-p5-u88.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx2-p5-u96.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-div-u8.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-div-u16.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-div-u24.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-div-u32.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-div-u40.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-div-u48.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-div-u56.c.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/SparseUnaryOps.cpp.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-div-u64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-div-u72.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-div-u80.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-nr1fma-u8.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-nr1fma-u16.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-nr1fma-u24.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-nr1fma-u32.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-nr1fma-u40.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-nr1fma-u48.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-nr1fma-u56.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-nr1fma-u64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-nr1fma-u72.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-nr1fma-u80.c.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorAliases.cpp.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/sparse/ValidateCompressedIndicesKernel.cpp.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-nr2fma-u8.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-nr2fma-u16.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-nr2fma-u24.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-nr2fma-u32.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-nr2fma-u40.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-nr2fma-u48.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-nr2fma-u56.c.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorBackward.cpp.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-nr2fma-u64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-nr2fma-u72.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx2-rr1-p5-nr2fma-u80.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-div-u8.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-div-u16.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-div-u24.c.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorBinaryOps.cpp.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-div-u32.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-div-u40.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-div-u48.c.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorFactories.cpp.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-div-u56.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-div-u64.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-div-u72.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-div-u80.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u8.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u16.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u24.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u32.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u40.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u48.c.o [ 59%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorMath.cpp.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u56.c.o [ 59%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u72.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u80.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-gather-div-u8.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-gather-div-u16.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorMatmul.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-gather-div-u24.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-gather-div-u32.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorTransformerFunctions.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-gather-div-u40.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-gather-div-u48.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-gather-div-u56.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-gather-div-u64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-gather-div-u72.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-gather-div-u80.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-gather-nr1adj-u8.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-gather-nr1adj-u16.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-gather-nr1adj-u24.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-gather-nr1adj-u32.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-gather-nr1adj-u40.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-gather-nr1adj-u48.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-gather-nr1adj-u56.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-gather-nr1adj-u64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-gather-nr1adj-u72.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorUnaryOps.cpp.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/nested/NestedTensorUtils.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-gather-nr1adj-u80.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-perm-div-u8.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-perm-div-u16.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-perm-div-u24.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/AffineQuantizer.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-perm-div-u32.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-perm-div-u40.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-perm-div-u48.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-perm-div-u56.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-perm-div-u64.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/AffineQuantizerBase.cpp.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/Copy.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-perm-div-u72.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-perm-div-u80.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/FakeQuantPerChannelAffine.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-perm-nr1adj-u8.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-perm-nr1adj-u16.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-perm-nr1adj-u24.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-perm-nr1adj-u32.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-perm-nr1adj-u40.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-perm-nr1adj-u48.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/FakeQuantPerTensorAffine.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-perm-nr1adj-u56.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-perm-nr1adj-u64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-perm-nr1adj-u72.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-lut8-p4h3ts-perm-nr1adj-u80.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-div-u8.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-div-u16.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-div-u24.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/QTensor.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-div-u32.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-div-u40.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-div-u48.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-div-u56.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-div-u64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-div-u72.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/TensorAdvancedIndexing.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-div-u80.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-nr1-u8.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-nr1-u16.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-nr1-u24.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-nr1-u32.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/TensorCompare.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-nr1-u40.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-nr1-u48.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-nr1-u56.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-nr1-u64.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-nr1-u72.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-nr1-u80.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-nr1adj-u8.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/TensorFactories.cpp.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-nr1adj-u16.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-nr1adj-u24.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-nr1adj-u32.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-nr1adj-u40.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-nr1adj-u48.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-nr1adj-u56.c.o [ 60%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-nr1adj-u64.c.o [ 60%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/AdaptiveAveragePooling.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-nr1adj-u72.c.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/AveragePool2d.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx2-expm1minus-rr1-p6h5ts-nr1adj-u80.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-expm1minus-avx2-rr1-p2.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-expm1minus-avx2-rr1-p3.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-expminus-avx2-rr1-p2.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-expminus-avx2-rr1-p3.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-avx2-rr1-p2-div.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-avx2-rr1-p2-rcp.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-avx2-rr1-p3-div.c.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/AveragePool3d.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f16-sigmoid-avx2-rr1-p3-rcp.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-exp-avx2-rr2-lut8-p3-perm.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-exp-avx2-rr2-lut8-p4-perm.c.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/BinaryOps.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-exp-avx2-rr2-p5.c.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/ChannelShuffle.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-avx2-rr1-lut4-p4-perm.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-avx2-rr1-lut8-p4-perm.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-avx2-rr1-lut16-p3-gather.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-avx2-rr1-p6.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expminus-avx2-rr1-p5.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expminus-avx2-rr2-p5.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-extexp-avx2-p5.c.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/IntReprQuant.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx2-rr1-lut64-p2-gather-div.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx2-rr1-lut64-p2-gather-nr1fma.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx2-rr1-lut64-p2-gather-nr2fma1adj.c.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/LinearUnpackImpl.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx2-rr1-lut64-p2-gather-nr2fma.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx2-rr1-p5-div.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx2-rr1-p5-nr1fma.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx2-rr1-p5-nr2fma.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx2-rr2-lut64-p2-gather-div.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx2-rr2-lut64-p2-gather-nr1fma.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx2-rr2-lut64-p2-gather-nr2fma1adj.c.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/MakePerTensorQuantizedTensor.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx2-rr2-lut64-p2-gather-nr2fma.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx2-rr2-p5-div.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx2-rr2-p5-nr1fma.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx2-rr2-p5-nr2fma.c.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/Normalization.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-avx2-expm1minus-rr1-p3h2ts-div.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f16-tanh-avx2-expm1minus-rr1-p3h2ts-rcp.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-div.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx2-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx2-expm1minus-rr1-lut8-p4h3ps-gather-div.c.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/Pooling.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx2-expm1minus-rr1-lut8-p4h3ps-gather-nr1.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx2-expm1minus-rr1-lut8-p4h3ps-gather-nr1adj.c.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/ReduceOps.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx2-expm1minus-rr1-lut8-p4h3ps-perm-div.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx2-expm1minus-rr1-lut8-p4h3ps-perm-nr1.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx2-expm1minus-rr1-lut8-p4h3ps-perm-nr1adj.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx2-expm1minus-rr1-p6h5ts-div.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx2-expm1minus-rr1-p6h5ts-nr1.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx2-expm1minus-rr1-p6h5ts-nr1adj.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-1x8c8-minmax-avx2.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-2x8c8-minmax-avx2.c.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/RuyUtils.cpp.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/Sorting.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-3x8c8-minmax-avx2.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-4x8c8-minmax-avx2.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-1x8c8-minmax-avx2.c.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/TensorOperators.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-2x8c8-minmax-avx2.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-3x8c8-minmax-avx2.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-4x8c8-minmax-avx2.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-1x8c8-minmax-avx2.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-2x8c8-minmax-avx2.c.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/TensorShape.cpp.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/UpSampleBilinear2d.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-3x8c8-minmax-avx2.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-4x8c8-minmax-avx2.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x8c8-minmax-avx2.c.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/UpSampleNearest2d.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x8c8-minmax-avx2.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x8c8-minmax-avx2.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x8c8-minmax-avx2.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x8c8-minmax-avx2.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x8c8-minmax-avx2.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x8c8-minmax-avx2.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x8c8-minmax-avx2.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x8c8-minmax-avx2.c.o [ 61%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/UpSampleNearest3d.cpp.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x8c8-minmax-avx2.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x8c8-minmax-avx2.c.o [ 61%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x8c8-minmax-avx2.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l8c8s8r-minmax-fp32-avx2-mul32.c.o [ 62%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/XnnpackUtils.cpp.o [ 62%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c8s8r-minmax-fp32-avx2-mul32.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c16s16r-minmax-fp32-avx2-mul16-add16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c16s16r-minmax-fp32-avx2-mul16-vpmovsx.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c16s16r-minmax-fp32-avx2-mul16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l32c8s8r-minmax-fp32-avx2-mul32.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l32c16s16r-minmax-fp32-avx2-mul16-add16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l32c16s16r-minmax-fp32-avx2-mul16-vpmovsx.c.o [ 62%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/fused_obs_fake_quant.cpp.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l32c16s16r-minmax-fp32-avx2-mul16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l8c8s8r-minmax-fp32-avx2-mul32.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c8s8r-minmax-fp32-avx2-mul32.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c16s16r-minmax-fp32-avx2-mul16-add16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c16s16r-minmax-fp32-avx2-mul16-vpmovsx.c.o [ 62%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/init_qnnpack.cpp.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c16s16r-minmax-fp32-avx2-mul16-vpunpck.c.o [ 62%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qclamp.cpp.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l32c8s8r-minmax-fp32-avx2-mul32.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l32c16s16r-minmax-fp32-avx2-mul16-add16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l32c16s16r-minmax-fp32-avx2-mul16-vpmovsx.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l32c16s16r-minmax-fp32-avx2-mul16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l8c8s8r-minmax-fp32-avx2-mul32.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c8s8r-minmax-fp32-avx2-mul32.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c16s16r-minmax-fp32-avx2-mul16-add16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c16s16r-minmax-fp32-avx2-mul16-vpmovsx.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c16s16r-minmax-fp32-avx2-mul16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l32c8s8r-minmax-fp32-avx2-mul32.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l32c16s16r-minmax-fp32-avx2-mul16-add16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l32c16s16r-minmax-fp32-avx2-mul16-vpmovsx.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l32c16s16r-minmax-fp32-avx2-mul16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p8c-minmax-fp32-avx2-mul32.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-fp32-avx2-mul16-add16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-fp32-avx2-mul16-vpmovsx.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-fp32-avx2-mul16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-fp32-avx2-mul32.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p32c-minmax-fp32-avx2-mul16-add16-vpunpck.c.o [ 62%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qconv.cpp.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p32c-minmax-fp32-avx2-mul16-vpmovsx.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p32c-minmax-fp32-avx2-mul16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p32c-minmax-fp32-avx2-mul32.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p8c-minmax-fp32-avx2-mul32.c.o [ 62%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-fp32-avx2-mul16-add16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-fp32-avx2-mul16-vpmovsx.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-fp32-avx2-mul16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-fp32-avx2-mul32.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p32c-minmax-fp32-avx2-mul16-add16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p32c-minmax-fp32-avx2-mul16-vpmovsx.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p32c-minmax-fp32-avx2-mul16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p32c-minmax-fp32-avx2-mul32.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f16-vcvt/gen/qs8-f16-vcvt-avx2-u16.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f16-vcvt/gen/qs8-f16-vcvt-avx2-u24.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f16-vcvt/gen/qs8-f16-vcvt-avx2-u32.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f16-vcvt/gen/qs8-f16-vcvt-avx2-u64.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-avx2-u8.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-avx2-u16.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-avx2-u24.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-avx2-u32.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p16c-minmax-fp32-avx2-mul32.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l8c8s8r-minmax-fp32-avx2-mul32.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c8s8r-minmax-fp32-avx2-mul32.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c16s16r-minmax-fp32-avx2-mul16-add16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c16s16r-minmax-fp32-avx2-mul16-vpmovsx.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c16s16r-minmax-fp32-avx2-mul16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l32c8s8r-minmax-fp32-avx2-mul32.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l32c16s16r-minmax-fp32-avx2-mul16-add16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l32c16s16r-minmax-fp32-avx2-mul16-vpmovsx.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l32c16s16r-minmax-fp32-avx2-mul16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l8c8s8r-minmax-fp32-avx2-mul32.c.o [ 62%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c8s8r-minmax-fp32-avx2-mul32.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c16s16r-minmax-fp32-avx2-mul16-add16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c16s16r-minmax-fp32-avx2-mul16-vpmovsx.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c16s16r-minmax-fp32-avx2-mul16-vpunpck.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l32c8s8r-minmax-fp32-avx2-mul32.c.o [ 62%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l32c16s16r-minmax-fp32-avx2-mul16-add16-vpunpck.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l32c16s16r-minmax-fp32-avx2-mul16-vpmovsx.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l32c16s16r-minmax-fp32-avx2-mul16-vpunpck.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l8c8s8r-minmax-fp32-avx2-mul32.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c8s8r-minmax-fp32-avx2-mul32.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c16s16r-minmax-fp32-avx2-mul16-add16-vpunpck.c.o [ 63%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qconv_unpack_impl.cpp.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c16s16r-minmax-fp32-avx2-mul16-vpmovsx.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c16s16r-minmax-fp32-avx2-mul16-vpunpck.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l32c8s8r-minmax-fp32-avx2-mul32.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l32c16s16r-minmax-fp32-avx2-mul16-add16-vpunpck.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l32c16s16r-minmax-fp32-avx2-mul16-vpmovsx.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l32c16s16r-minmax-fp32-avx2-mul16-vpunpck.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p8c-minmax-fp32-avx2-mul32.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-avx2-mul16-add16-vpunpck.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-avx2-mul16-vpmovsx.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-avx2-mul16-vpunpck.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-avx2-mul32.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p32c-minmax-fp32-avx2-mul16-add16-vpunpck.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p32c-minmax-fp32-avx2-mul16-vpmovsx.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p32c-minmax-fp32-avx2-mul16-vpunpck.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p32c-minmax-fp32-avx2-mul32.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p8c-minmax-fp32-avx2-mul32.c.o [ 63%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qdropout.cpp.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-avx2-mul16-add16-vpunpck.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-avx2-mul16-vpmovsx.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-avx2-mul16-vpunpck.c.o [ 63%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qelu.cpp.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-avx2-mul32.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p32c-minmax-fp32-avx2-mul16-add16-vpunpck.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p32c-minmax-fp32-avx2-mul16-vpmovsx.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p32c-minmax-fp32-avx2-mul16-vpunpck.c.o [ 63%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qembeddingbag.cpp.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p32c-minmax-fp32-avx2-mul32.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-avx2.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-avx2.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x8c8-minmax-fp32-avx2.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x8c8-minmax-fp32-avx2.c.o [ 63%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qembeddingbag_prepack.cpp.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-avx2.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-avx2.c.o [ 63%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qembeddingbag_unpack.cpp.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x8c8-minmax-fp32-avx2.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x8c8-minmax-fp32-avx2.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-avx2-mul32-ld64-u8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-avx2-mul32-ld64-u16.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-avx2-mul32-ld64-u24.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-avx2-mul32-ld64-u32.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-avx2-mul32-ld64-u8.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-avx2-mul32-ld64-u16.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-avx2-mul32-ld64-u24.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-avx2-mul32-ld64-u32.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-avx2-u16.c.o [ 63%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qgelu.cpp.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-avx2-u32.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vcvt/gen/qs8-vcvt-avx2-u64.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-avx2-u16.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-avx2-u32.c.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vlrelu/gen/qs8-vlrelu-avx2-u64.c.o [ 63%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qhardsigmoid.cpp.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l8c8s8r-minmax-fp32-avx2-mul32.c.o [ 63%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qhardswish.cpp.o [ 63%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l16c8s8r-minmax-fp32-avx2-mul32.c.o [ 64%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qlinear.cpp.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l32c8s8r-minmax-fp32-avx2-mul32.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l8c8s8r-minmax-fp32-avx2-mul32.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l16c8s8r-minmax-fp32-avx2-mul32.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l32c8s8r-minmax-fp32-avx2-mul32.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l8c8s8r-minmax-fp32-avx2-mul32.c.o [ 64%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l16c8s8r-minmax-fp32-avx2-mul32.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l32c8s8r-minmax-fp32-avx2-mul32.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p8c-minmax-fp32-avx2-mul32.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p16c-minmax-fp32-avx2-mul32.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p32c-minmax-fp32-avx2-mul32.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p8c-minmax-fp32-avx2-mul32.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p16c-minmax-fp32-avx2-mul32.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p32c-minmax-fp32-avx2-mul32.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-avx2-u8.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-avx2-u16.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-avx2-u24.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-avx2-u32.c.o [ 64%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x8c8-minmax-fp32-avx2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x8c8-minmax-fp32-avx2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x8c8-minmax-fp32-avx2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x8c8-minmax-fp32-avx2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x8c8-minmax-fp32-avx2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x8c8-minmax-fp32-avx2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x8c8-minmax-fp32-avx2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x8c8-minmax-fp32-avx2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-avx2-mul32-ld64-u8.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-avx2-mul32-ld64-u16.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qmatmul.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-avx2-mul32-ld64-u8.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-avx2-mul32-ld64-u16.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-avx2-u16.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-avx2-u32.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vcvt/gen/qu8-vcvt-avx2-u64.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qmul.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-avx2-u16.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-avx2-u32.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vlrelu/gen/qu8-vlrelu-avx2-u64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-avx2-u32.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-avx2-u64.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-avx2-u96.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-avx2-u128.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-32x32-reuse-mov-avx2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-transposec/gen/x8-transposec-32x32-reuse-switch-avx2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x8-gemm-goi-avx2-u16-prfm.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qnormalization.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x8-gemm-goi-avx2-u16.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x16-gemm-goi-avx2-u16-prfm.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qrelu.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-packw/gen/x16-packw-x16-gemm-goi-avx2-u16.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-16x16-reuse-mov-avx2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x16-transposec/gen/x16-transposec-16x16-reuse-switch-avx2.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qsigmoid.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p16c-minmax-avx512f-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p16c-minmax-avx512f.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p32c-minmax-avx512f-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-3p32c-minmax-avx512f.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p16c-minmax-avx512f-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p16c-minmax-avx512f.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qsoftmax.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p32c-minmax-avx512f-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-4p32c-minmax-avx512f.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l16c16s1r-minmax-avx512f-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l16c16s1r-minmax-avx512f.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l32c16s1r-minmax-avx512f-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-5f5m5l32c16s1r-minmax-avx512f.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qtanh.cpp.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qthreshold.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p16c-minmax-avx512f-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p16c-minmax-avx512f.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p32c-minmax-avx512f-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-9p32c-minmax-avx512f.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p16c-minmax-avx512f-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p16c-minmax-avx512f.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p32c-minmax-avx512f-acc2.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-dwconv/gen/f32-dwconv-25p32c-minmax-avx512f.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/library.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-1x16-minmax-avx512f-broadcast.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-4x16-minmax-avx512f-broadcast.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-5x16-minmax-avx512f-broadcast.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/qconv_unpack.cpp.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/qlinear_unpack.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-6x16-minmax-avx512f-broadcast.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-7x16-minmax-avx512f-broadcast.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemm/gen/f32-gemm-8x16-minmax-avx512f-broadcast.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-1x16-minmax-avx512f-broadcast.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-4x16-minmax-avx512f-broadcast.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-5x16-minmax-avx512f-broadcast.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-6x16-minmax-avx512f-broadcast.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-7x16-minmax-avx512f-broadcast.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-gemminc/gen/f32-gemminc-8x16-minmax-avx512f-broadcast.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkl/LinearAlgebra.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-1x16-minmax-avx512f-broadcast.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkl/SparseBlasImpl.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-4x16-minmax-avx512f-broadcast.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-5x16-minmax-avx512f-broadcast.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-6x16-minmax-avx512f-broadcast.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-7x16-minmax-avx512f-broadcast.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-igemm/gen/f32-igemm-8x16-minmax-avx512f-broadcast.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-avx512f-2x16.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-prelu/gen/f32-prelu-avx512f-2x32.c.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx512f-p5-scalef-u128-acc2.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkl/SparseCsrLinearAlgebra.cpp.o [ 65%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx512f-p5-scalef-u128-acc4.c.o [ 65%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkl/SpectralOps.cpp.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx512f-p5-scalef-u128.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx512f-p5-scalef-u144-acc3.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx512f-p5-scalef-u144.c.o [ 66%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/BinaryOps.cpp.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx512f-p5-scalef-u160-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx512f-p5-scalef-u160-acc5.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx512f-p5-scalef-u160.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx512f-p5-scalef-u192-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx512f-p5-scalef-u192-acc3.c.o [ 66%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Conv.cpp.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx512f-p5-scalef-u192-acc6.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddexpminusmax/gen/f32-raddexpminusmax-avx512f-p5-scalef-u192.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx512f-p5-scalef-u128-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx512f-p5-scalef-u128-acc4.c.o [ 66%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/ConvPrepack.cpp.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx512f-p5-scalef-u128.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx512f-p5-scalef-u144-acc3.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx512f-p5-scalef-u144.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx512f-p5-scalef-u160-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx512f-p5-scalef-u160-acc5.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx512f-p5-scalef-u160.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx512f-p5-scalef-u192-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx512f-p5-scalef-u192-acc3.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx512f-p5-scalef-u192-acc6.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddextexp/gen/f32-raddextexp-avx512f-p5-scalef-u192.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx512f-rr1-p5-scalef-u128-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx512f-rr1-p5-scalef-u128-acc4.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx512f-rr1-p5-scalef-u128.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx512f-rr1-p5-scalef-u144-acc3.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx512f-rr1-p5-scalef-u144.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx512f-rr1-p5-scalef-u160-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx512f-rr1-p5-scalef-u160-acc5.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx512f-rr1-p5-scalef-u160.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx512f-rr1-p5-scalef-u192-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx512f-rr1-p5-scalef-u192-acc3.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx512f-rr1-p5-scalef-u192-acc6.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-raddstoreexpminusmax/gen/f32-raddstoreexpminusmax-avx512f-rr1-p5-scalef-u192.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-avx512f-u16.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-avx512f-u32-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-avx512f-u48-acc3.c.o [ 66%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Copy.cpp.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-avx512f-u64-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmax-avx512f-u64-acc4.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-avx512f-u16.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-avx512f-u32-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-avx512f-u48-acc3.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-avx512f-u64-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rmin-avx512f-u64-acc4.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-avx512f-u16.c.o [ 66%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Gelu.cpp.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-avx512f-u32-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-avx512f-u48-acc3.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-avx512f-u64-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rminmax/gen/f32-rminmax-avx512f-u64-acc4.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-avx512f-u16.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-avx512f-u32-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-avx512f-u48-acc3.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-avx512f-u64-acc2.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-rsum/gen/f32-rsum-avx512f-u64-acc4.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-avx512f-u16.c.o [ 66%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/IDeepRegistration.cpp.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vadd-minmax-avx512f-u32.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-avx512f-u16.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vaddc-minmax-avx512f-u32.c.o [ 66%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Linear.cpp.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-avx512f-u16.c.o [ 66%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/MKLDNNCommon.cpp.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdiv-minmax-avx512f-u32.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-avx512f-u16.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vdivc-minmax-avx512f-u32.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-avx512f-u16.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmax-avx512f-u32.c.o [ 66%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-avx512f-u16.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmaxc-avx512f-u32.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-avx512f-u16.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmin-avx512f-u32.c.o [ 66%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Matmul.cpp.o [ 66%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/MkldnnTensorMath.cpp.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-avx512f-u16.c.o [ 66%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vminc-avx512f-u32.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-avx512f-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmul-minmax-avx512f-u32.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-avx512f-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vmulc-minmax-avx512f-u32.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Normalization.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-avx512f-u16.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/OpContext.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrdivc-minmax-avx512f-u32.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-avx512f-u16.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Pooling.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vrsubc-minmax-avx512f-u32.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Prelu.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-avx512f-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiff-avx512f-u32.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-avx512f-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsqrdiffc-avx512f-u32.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-avx512f-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsub-minmax-avx512f-u32.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-avx512f-u16.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/RNN.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vbinary/gen/f32-vsubc-minmax-avx512f-u32.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vclamp/gen/f32-vclamp-avx512f-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vclamp/gen/f32-vclamp-avx512f-u32.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx512f-rr1-lut16-p3-perm-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx512f-rr1-lut16-p3-perm-u32.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/RegisterMkldnnOpContextClass.cpp.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Relu.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx512f-rr1-lut16-p3-perm-u48.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx512f-rr1-lut16-p3-perm-u64.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx512f-rr1-lut16-p3-perm-u80.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/SoftMax.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx512f-rr1-lut16-p3-perm-u96.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx512f-rr1-lut16-p3-perm-u112.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/TensorFactories.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx512f-rr1-lut16-p3-perm-u128.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx512f-rr1-p6-u16.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/TensorShape.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx512f-rr1-p6-u32.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx512f-rr1-p6-u48.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/UnaryOps.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx512f-rr1-p6-u64.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx512f-rr1-p6-u80.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/mkldnn/Utils.cpp.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/transformers/attention.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx512f-rr1-p6-u96.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx512f-rr1-p6-u112.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/transformers/sdp_utils_cpp.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-velu/gen/f32-velu-avx512f-rr1-p6-u128.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-avx512f-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vhswish/gen/f32-vhswish-avx512f-u32.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-avx512f-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vlrelu/gen/f32-vlrelu-avx512f-u32.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-avx512f-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrelu/gen/f32-vrelu-avx512f-u32.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/transformers/transformer.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-avx512f-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndd-avx512f-u32.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-avx512f-u16.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/utils/Factory.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndne-avx512f-u32.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-avx512f-u16.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/Activation.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndu-avx512f-u32.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-avx512f-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrnd/gen/f32-vrndz-avx512f-u32.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrsqrt/gen/f32-vrsqrt-avx512f-rsqrt-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrsqrt/gen/f32-vrsqrt-avx512f-rsqrt-u32.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/AveragePooling.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vrsqrt/gen/f32-vrsqrt-avx512f-rsqrt-u64.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/ChannelShuffle.cpp.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/Convolution.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx512f-p5-scalef-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx512f-p5-scalef-u32.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx512f-p5-scalef-u48.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx512f-p5-scalef-u64.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx512f-p5-scalef-u80.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/Init.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx512f-p5-scalef-u96.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/Linear.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx512f-p5-scalef-u112.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/MaxPooling.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx512f-p5-scalef-u128.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx512f-p5-scalef-u144.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx512f-p5-scalef-u160.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx512f-p5-scalef-u176.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleexpminusmax/gen/f32-vscaleexpminusmax-avx512f-p5-scalef-u192.c.o [ 67%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/OpContext.cpp.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx512f-p5-scalef-u16.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx512f-p5-scalef-u32.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx512f-p5-scalef-u48.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx512f-p5-scalef-u64.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx512f-p5-scalef-u80.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx512f-p5-scalef-u96.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx512f-p5-scalef-u112.c.o [ 67%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx512f-p5-scalef-u128.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx512f-p5-scalef-u144.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx512f-p5-scalef-u160.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx512f-p5-scalef-u176.c.o [ 68%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/RegisterOpContextClass.cpp.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vscaleextexp/gen/f32-vscaleextexp-avx512f-p5-scalef-u192.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-lut16-p3-perm-scalef-div-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-lut16-p3-perm-scalef-div-u32.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-lut16-p3-perm-scalef-div-u48.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-lut16-p3-perm-scalef-div-u64.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-lut16-p3-perm-scalef-div-u80.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-lut16-p3-perm-scalef-div-u96.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-lut16-p3-perm-scalef-div-u112.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-lut16-p3-perm-scalef-div-u128.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-lut16-p3-perm-scalef-nr1fma-u16.c.o [ 68%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/xnnpack/Shim.cpp.o [ 68%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/CompositeViewCopyKernels.cpp.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-lut16-p3-perm-scalef-nr1fma-u32.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-lut16-p3-perm-scalef-nr1fma-u48.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-lut16-p3-perm-scalef-nr1fma-u64.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-lut16-p3-perm-scalef-nr1fma-u80.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-lut16-p3-perm-scalef-nr1fma-u96.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-lut16-p3-perm-scalef-nr1fma-u112.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-lut16-p3-perm-scalef-nr1fma-u128.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-p5-scalef-div-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-p5-scalef-div-u32.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-p5-scalef-div-u48.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-p5-scalef-div-u64.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-p5-scalef-div-u80.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-p5-scalef-div-u96.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-p5-scalef-div-u112.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-p5-scalef-div-u128.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-p5-scalef-nr1fma-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-p5-scalef-nr1fma-u32.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-p5-scalef-nr1fma-u48.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-p5-scalef-nr1fma-u64.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-p5-scalef-nr1fma-u80.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-p5-scalef-nr1fma-u96.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-p5-scalef-nr1fma-u112.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr1-p5-scalef-nr1fma-u128.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr2-lut32-p2-perm2-scalef-div-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr2-lut32-p2-perm2-scalef-div-u32.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr2-lut32-p2-perm2-scalef-div-u48.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr2-lut32-p2-perm2-scalef-div-u64.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr2-lut32-p2-perm2-scalef-div-u80.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr2-lut32-p2-perm2-scalef-div-u96.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr2-lut32-p2-perm2-scalef-div-u112.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr2-lut32-p2-perm2-scalef-div-u128.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr2-lut32-p2-perm2-scalef-nr1fma-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr2-lut32-p2-perm2-scalef-nr1fma-u32.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr2-lut32-p2-perm2-scalef-nr1fma-u48.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr2-lut32-p2-perm2-scalef-nr1fma-u64.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr2-lut32-p2-perm2-scalef-nr1fma-u80.c.o [ 68%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Functions.cpp.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr2-lut32-p2-perm2-scalef-nr1fma-u96.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr2-lut32-p2-perm2-scalef-nr1fma-u112.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsigmoid/gen/f32-vsigmoid-avx512f-rr2-lut32-p2-perm2-scalef-nr1fma-u128.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-avx512f-nr1fma1adj-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-avx512f-nr1fma1adj-u32.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vsqrt/gen/f32-vsqrt-avx512f-nr1fma1adj-u64.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vabs-avx512f-u16.c.o [ 68%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_0.cpp.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vabs-avx512f-u32.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vneg-avx512f-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vneg-avx512f-u32.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vsqr-avx512f-u16.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vunary/gen/f32-vsqr-avx512f-u32.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-exp-avx512f-rr2-lut16-p3-perm-scalef.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-exp-avx512f-rr2-lut16-p3-perm.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-exp-avx512f-rr2-lut32-p2-perm2-scalef.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-exp-avx512f-rr2-lut32-p2-perm2.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-exp-avx512f-rr2-p5-scalef.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-exp-avx512f-rr2-p5.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-avx512f-rr1-lut16-p3-perm.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-expm1minus-avx512f-rr1-p6.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-extexp-avx512f-p5.c.o [ 68%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr1-lut16-p3-perm-scalef-div.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr1-lut16-p3-perm-scalef-nr1fma1adj.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr1-lut16-p3-perm-scalef-nr1fma.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr1-lut32-p2-perm2-scalef-div.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr1-lut32-p2-perm2-scalef-nr1fma1adj.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr1-lut32-p2-perm2-scalef-nr1fma.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr1-lut64-p2-gather-scalef-div.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr1-lut64-p2-gather-scalef-nr1fma1adj.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr1-lut64-p2-gather-scalef-nr1fma.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr1-p5-scalef-div.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr1-p5-scalef-nr1fma1adj.c.o [ 69%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_1.cpp.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr1-p5-scalef-nr1fma.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr2-lut16-p3-perm-scalef-div.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr2-lut16-p3-perm-scalef-nr1fma1adj.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr2-lut16-p3-perm-scalef-nr1fma.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr2-lut32-p2-perm2-scalef-div.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr2-lut32-p2-perm2-scalef-nr1fma1adj.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr2-lut32-p2-perm2-scalef-nr1fma.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr2-lut64-p2-gather-scalef-div.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr2-lut64-p2-gather-scalef-nr1fma1adj.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr2-lut64-p2-gather-scalef-nr1fma.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr2-p5-scalef-div.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr2-p5-scalef-nr1fma1adj.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sigmoid-avx512f-rr2-p5-scalef-nr1fma.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-avx512f-nr1fma1adj.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-avx512f-nr1fma.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/f32-sqrt-avx512f-nr2fma.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-avx512f-u4-prfm.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x32-packw/gen/x32-packw-x16-gemm-goi-avx512f-u4.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-avx512skx-u16.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f16-f32-vcvt/gen/f16-f32-vcvt-avx512skx-u32.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-avx512skx-u16.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-f16-vcvt/gen/f32-f16-vcvt-avx512skx-u32.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc4w-gemm-1x32-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc4w-gemm-2x32-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc4w-gemm-3x32-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc4w-gemm-4x32-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc4w-gemm-5x32-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc4w-gemm-6x32-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc4w-gemm-7x32-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc4w-gemm-8x32-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x16-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-1x32-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-2x16-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-2x32-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-3x16-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-3x32-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x16-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-4x32-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-5x16-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-5x32-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x16-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-6x32-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-7x16-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-7x32-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-8x16-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qc8w-gemm/gen/f32-qc8w-gemm-8x32-minmax-avx512skx-broadcast.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-avx512skx-u32.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-avx512skx-u64.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-avx512skx-u96.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qs8-vcvt/gen/f32-qs8-vcvt-avx512skx-u128.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-avx512skx-u32.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-avx512skx-u64.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-avx512skx-u96.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-qu8-vcvt/gen/f32-qu8-vcvt-avx512skx-u128.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-div-u16.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-div-u32.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-div-u48.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-div-u64.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-div-u80.c.o [ 69%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-div-u96.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-div-u112.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-div-u128.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-div-u144.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-div-u160.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u16.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u32.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u48.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u64.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u80.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u96.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u112.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u128.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u144.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj-u160.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-gather-div-u16.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-gather-div-u32.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-gather-div-u48.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-gather-div-u64.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-gather-div-u80.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-gather-div-u96.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-gather-div-u112.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-gather-div-u128.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-gather-div-u144.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-gather-div-u160.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-gather-nr1adj-u16.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-gather-nr1adj-u32.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-gather-nr1adj-u48.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-gather-nr1adj-u64.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-gather-nr1adj-u80.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-gather-nr1adj-u96.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-gather-nr1adj-u112.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-gather-nr1adj-u128.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-gather-nr1adj-u144.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-gather-nr1adj-u160.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-perm-div-u16.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-perm-div-u32.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-perm-div-u48.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-perm-div-u64.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-perm-div-u80.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-perm-div-u96.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-perm-div-u112.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-perm-div-u128.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-perm-div-u144.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-perm-div-u160.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-perm-nr1adj-u16.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-perm-nr1adj-u32.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-perm-nr1adj-u48.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-perm-nr1adj-u64.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-perm-nr1adj-u80.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-perm-nr1adj-u96.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-perm-nr1adj-u112.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-perm-nr1adj-u128.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-perm-nr1adj-u144.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-lut8-p4h3ts-perm-nr1adj-u160.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-p6h5ts-div-u16.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-p6h5ts-div-u32.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-p6h5ts-div-u48.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-p6h5ts-div-u64.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-p6h5ts-div-u80.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-p6h5ts-div-u96.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-p6h5ts-div-u112.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-p6h5ts-div-u128.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-p6h5ts-div-u144.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-p6h5ts-div-u160.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-p6h5ts-nr1-u16.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-p6h5ts-nr1-u32.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-p6h5ts-nr1-u48.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-p6h5ts-nr1-u64.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-p6h5ts-nr1-u80.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-p6h5ts-nr1-u96.c.o [ 70%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-p6h5ts-nr1-u112.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-p6h5ts-nr1-u128.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-p6h5ts-nr1-u144.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/f32-vtanh/gen/f32-vtanh-avx512skx-expm1minus-rr1-p6h5ts-nr1-u160.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-div.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx512skx-expm1minus-rr1-lut4-p4h3ts-perm-nr1adj.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx512skx-expm1minus-rr1-lut8-p4h3ps-gather-div.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx512skx-expm1minus-rr1-lut8-p4h3ps-gather-nr1.c.o [ 71%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_2.cpp.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx512skx-expm1minus-rr1-lut8-p4h3ps-gather-nr1adj.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx512skx-expm1minus-rr1-lut8-p4h3ps-perm-div.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx512skx-expm1minus-rr1-lut8-p4h3ps-perm-nr1.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx512skx-expm1minus-rr1-lut8-p4h3ps-perm-nr1adj.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx512skx-expm1minus-rr1-p6h5ts-div.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx512skx-expm1minus-rr1-p6h5ts-nr1.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/math/gen/f32-tanh-avx512skx-expm1minus-rr1-p6h5ts-nr1adj.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-1x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-2x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-3x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-4x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-5x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-6x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-7x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-8x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-1x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-2x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-3x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-4x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-5x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-6x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-7x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-8x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-1x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-2x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-3x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-4x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-5x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-6x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-7x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-8x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x16c8-minmax-avx512skx-prfm.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x16c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x16c8-minmax-avx512skx-prfm.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x16c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x16c8-minmax-avx512skx-prfm.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x16c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x16c8-minmax-avx512skx-prfm.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x16c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-5x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-5x16c8-minmax-avx512skx-prfm.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-5x16c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x16c8-minmax-avx512skx-prfm.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x16c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-7x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-7x16c8-minmax-avx512skx-prfm.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-7x16c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-8x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-8x16c8-minmax-avx512skx-prfm.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-8x16c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x16c8-minmax-avx512skx-prfm.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x16c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x16c8-minmax-avx512skx-prfm.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x16c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x8c8-minmax-avx512skx.c.o [ 71%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x16c8-minmax-avx512skx-prfm.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x16c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x8c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16c8-minmax-avx512skx-prfm.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-5x8c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-5x16c8-minmax-avx512skx-prfm.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-5x16c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-6x8c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-6x16c8-minmax-avx512skx-prfm.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-6x16c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-7x8c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-7x16c8-minmax-avx512skx-prfm.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-7x16c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-8x8c8-minmax-avx512skx.c.o [ 72%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_3.cpp.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-8x16c8-minmax-avx512skx-prfm.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-8x16c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x8c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x16c8-minmax-avx512skx-prfm.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x16c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x8c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x16c8-minmax-avx512skx-prfm.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x16c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x8c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x16c8-minmax-avx512skx-prfm.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x16c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x8c8-minmax-avx512skx.c.o [ 72%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_4.cpp.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x16c8-minmax-avx512skx-prfm.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x16c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-5x8c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-5x16c8-minmax-avx512skx-prfm.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-5x16c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x8c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x16c8-minmax-avx512skx-prfm.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x16c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-7x8c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-7x16c8-minmax-avx512skx-prfm.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-7x16c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-8x8c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-8x16c8-minmax-avx512skx-prfm.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-8x16c8-minmax-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l16c16s1r-minmax-fp32-avx512skx-mul32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-5f5m5l32c16s1r-minmax-fp32-avx512skx-mul32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l16c16s1r-minmax-fp32-avx512skx-mul32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-6f6m7l32c16s1r-minmax-fp32-avx512skx-mul32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l16c16s1r-minmax-fp32-avx512skx-mul32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-8f8m9l32c16s1r-minmax-fp32-avx512skx-mul32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p16c-minmax-fp32-avx512skx-mul32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-9p32c-minmax-fp32-avx512skx-mul32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p16c-minmax-fp32-avx512skx-mul32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-dwconv/gen/qs8-dwconv-25p32c-minmax-fp32-avx512skx-mul32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-avx512skx-u16.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-avx512skx-u32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-avx512skx-u48.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-f32-vcvt/gen/qs8-f32-vcvt-avx512skx-u64.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-3p32c-minmax-fp32-avx512skx-mul32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l16c16s1r-minmax-fp32-avx512skx-mul32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-5f5m5l32c16s1r-minmax-fp32-avx512skx-mul32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l16c16s1r-minmax-fp32-avx512skx-mul32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-6f6m7l32c16s1r-minmax-fp32-avx512skx-mul32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l16c16s1r-minmax-fp32-avx512skx-mul32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-8f8m9l32c16s1r-minmax-fp32-avx512skx-mul32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p16c-minmax-fp32-avx512skx-mul32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-9p32c-minmax-fp32-avx512skx-mul32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p16c-minmax-fp32-avx512skx-mul32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-dwconv/gen/qs8-qc8w-dwconv-25p32c-minmax-fp32-avx512skx-mul32.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16c8-minmax-fp32-avx512skx-prfm.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16c8-minmax-fp32-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-avx512skx.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x16c8-minmax-fp32-avx512skx-prfm.c.o [ 72%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x8c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x16c8-minmax-fp32-avx512skx-prfm.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x8c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c8-minmax-fp32-avx512skx-prfm.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-5x16c8-minmax-fp32-avx512skx-prfm.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-5x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x16c8-minmax-fp32-avx512skx-prfm.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-7x16c8-minmax-fp32-avx512skx-prfm.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-7x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-8x16c8-minmax-fp32-avx512skx-prfm.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-8x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x16c8-minmax-fp32-avx512skx-prfm.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x16c8-minmax-fp32-avx512skx-prfm.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x8c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x16c8-minmax-fp32-avx512skx-prfm.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x8c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16c8-minmax-fp32-avx512skx-prfm.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-5x16c8-minmax-fp32-avx512skx-prfm.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-5x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x16c8-minmax-fp32-avx512skx-prfm.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-7x16c8-minmax-fp32-avx512skx-prfm.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-7x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-8x16c8-minmax-fp32-avx512skx-prfm.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-8x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-avx512skx-mul32-ld128-u16.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vadd/gen/qs8-vadd-minmax-avx512skx-mul32-ld128-u32.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-avx512skx-mul32-ld128-u16.c.o [ 73%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterBackendSelect.cpp.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-vaddc/gen/qs8-vaddc-minmax-avx512skx-mul32-ld128-u32.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l16c16s1r-minmax-fp32-avx512skx-mul32.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-5f5m5l32c16s1r-minmax-fp32-avx512skx-mul32.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l16c16s1r-minmax-fp32-avx512skx-mul32.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-6f6m7l32c16s1r-minmax-fp32-avx512skx-mul32.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l16c16s1r-minmax-fp32-avx512skx-mul32.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-8f8m9l32c16s1r-minmax-fp32-avx512skx-mul32.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p16c-minmax-fp32-avx512skx-mul32.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-9p32c-minmax-fp32-avx512skx-mul32.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p16c-minmax-fp32-avx512skx-mul32.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-dwconv/gen/qu8-dwconv-25p32c-minmax-fp32-avx512skx-mul32.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-avx512skx-u16.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-avx512skx-u32.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-avx512skx-u48.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-f32-vcvt/gen/qu8-f32-vcvt-avx512skx-u64.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x8c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x16c8-minmax-fp32-avx512skx-prfm.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-1x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x8c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x16c8-minmax-fp32-avx512skx-prfm.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-2x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x8c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x16c8-minmax-fp32-avx512skx-prfm.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-3x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCPU.cpp.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x8c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16c8-minmax-fp32-avx512skx-prfm.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-4x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-5x16c8-minmax-fp32-avx512skx-prfm.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-5x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-6x16c8-minmax-fp32-avx512skx-prfm.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-6x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-7x16c8-minmax-fp32-avx512skx-prfm.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-7x16c8-minmax-fp32-avx512skx.c.o [ 73%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-8x16c8-minmax-fp32-avx512skx-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-gemm/gen/qu8-gemm-8x16c8-minmax-fp32-avx512skx.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x8c8-minmax-fp32-avx512skx.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x16c8-minmax-fp32-avx512skx-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-1x16c8-minmax-fp32-avx512skx.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x8c8-minmax-fp32-avx512skx.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x16c8-minmax-fp32-avx512skx-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-2x16c8-minmax-fp32-avx512skx.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x8c8-minmax-fp32-avx512skx.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x16c8-minmax-fp32-avx512skx-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-3x16c8-minmax-fp32-avx512skx.c.o [ 74%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x8c8-minmax-fp32-avx512skx.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16c8-minmax-fp32-avx512skx-prfm.c.o [ 74%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCompositeExplicitAutogradNonFunctional.cpp.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-4x16c8-minmax-fp32-avx512skx.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-5x16c8-minmax-fp32-avx512skx-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-5x16c8-minmax-fp32-avx512skx.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-6x16c8-minmax-fp32-avx512skx-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-6x16c8-minmax-fp32-avx512skx.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-7x16c8-minmax-fp32-avx512skx-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-7x16c8-minmax-fp32-avx512skx.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-8x16c8-minmax-fp32-avx512skx-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-igemm/gen/qu8-igemm-8x16c8-minmax-fp32-avx512skx.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-avx512skx-mul32-ld128-u16.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vadd/gen/qu8-vadd-minmax-avx512skx-mul32-ld128-u32.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-avx512skx-mul32-ld128-u16.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qu8-vaddc/gen/qu8-vaddc-minmax-avx512skx-mul32-ld128-u32.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-avx512skx-vpshufb-u64.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-avx512skx-vpshufb-u128.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-avx512skx-vpshufb-u192.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-avx512skx-vpshufb-u256.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-avx512vbmi-vpermx2b-u64.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-avx512vbmi-vpermx2b-u128.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-avx512vbmi-vpermx2b-u192.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/x8-lut/gen/x8-lut-avx512vbmi-vpermx2b-u256.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-1x8c8-minmax-avx512vnni-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-1x8c8-minmax-avx512vnni.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-2x8c8-minmax-avx512vnni-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-2x8c8-minmax-avx512vnni.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-3x8c8-minmax-avx512vnni-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-3x8c8-minmax-avx512vnni.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-4x8c8-minmax-avx512vnni-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-4x8c8-minmax-avx512vnni.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-5x8c8-minmax-avx512vnni-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-5x8c8-minmax-avx512vnni.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-6x8c8-minmax-avx512vnni-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-6x8c8-minmax-avx512vnni.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-7x8c8-minmax-avx512vnni-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-7x8c8-minmax-avx512vnni.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-8x8c8-minmax-avx512vnni-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-8x8c8-minmax-avx512vnni.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-1x8c8-minmax-avx512vnni-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-1x8c8-minmax-avx512vnni.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-2x8c8-minmax-avx512vnni-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-2x8c8-minmax-avx512vnni.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-3x8c8-minmax-avx512vnni-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-3x8c8-minmax-avx512vnni.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-4x8c8-minmax-avx512vnni-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-4x8c8-minmax-avx512vnni.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-5x8c8-minmax-avx512vnni-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-5x8c8-minmax-avx512vnni.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-6x8c8-minmax-avx512vnni-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-6x8c8-minmax-avx512vnni.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-7x8c8-minmax-avx512vnni-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-7x8c8-minmax-avx512vnni.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-8x8c8-minmax-avx512vnni-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-gemm/gen/qd8-f16-qc8w-gemm-8x8c8-minmax-avx512vnni.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-1x8c8-minmax-avx512vnni-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-1x8c8-minmax-avx512vnni.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-2x8c8-minmax-avx512vnni-prfm.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-2x8c8-minmax-avx512vnni.c.o [ 74%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-3x8c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-3x8c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-4x8c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-4x8c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-5x8c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-5x8c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-6x8c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-6x8c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-7x8c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-7x8c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-8x8c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-8x8c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x8c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x8c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x16c4-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x16c4-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x16c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x16c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x8c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x8c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x16c4-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x16c4-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x16c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x16c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x8c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x8c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x16c4-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x16c4-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x16c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x16c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x8c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x8c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x16c4-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x16c4-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x16c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x16c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-5x8c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-5x8c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-5x16c4-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-5x16c4-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-5x16c8-minmax-avx512vnni-prfm.c.o [ 75%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCompositeImplicitAutograd.cpp.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-5x16c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x8c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x8c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x16c4-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x16c4-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x16c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x16c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-7x8c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-7x8c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-7x16c4-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-7x16c4-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-7x16c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-7x16c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-8x8c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-8x8c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-8x16c4-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-8x16c4-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-8x16c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-8x16c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x8c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x8c8-minmax-avx512vnni.c.o [ 75%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCompositeImplicitAutogradNestedTensor.cpp.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x16c4-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x16c4-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x16c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-1x16c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x8c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x8c8-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x16c4-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x16c4-minmax-avx512vnni.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x16c8-minmax-avx512vnni-prfm.c.o [ 75%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-2x16c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x8c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x8c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x16c4-minmax-avx512vnni-prfm.c.o [ 76%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterFunctionalization_0.cpp.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x16c4-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x16c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-3x16c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x8c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x8c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16c4-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16c4-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-4x16c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-5x8c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-5x8c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-5x16c4-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-5x16c4-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-5x16c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-5x16c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-6x8c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-6x8c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-6x16c4-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-6x16c4-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-6x16c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-6x16c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-7x8c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-7x8c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-7x16c4-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-7x16c4-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-7x16c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-7x16c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-8x8c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-8x8c8-minmax-avx512vnni.c.o [ 76%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterFunctionalization_1.cpp.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-8x16c4-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-8x16c4-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-8x16c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-gemm/gen/qd8-f32-qc8w-gemm-8x16c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x8c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x8c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x16c4-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x16c4-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x16c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-1x16c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x8c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x8c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x16c4-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x16c4-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x16c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-2x16c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x8c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x8c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x16c4-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x16c4-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x16c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-3x16c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x8c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x8c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x16c4-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x16c4-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x16c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-4x16c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-5x8c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-5x8c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-5x16c4-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-5x16c4-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-5x16c8-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-5x16c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x8c8-minmax-avx512vnni-prfm.c.o [ 76%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterFunctionalization_2.cpp.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x8c8-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x16c4-minmax-avx512vnni-prfm.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x16c4-minmax-avx512vnni.c.o [ 76%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x16c8-minmax-avx512vnni-prfm.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-6x16c8-minmax-avx512vnni.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-7x8c8-minmax-avx512vnni-prfm.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-7x8c8-minmax-avx512vnni.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-7x16c4-minmax-avx512vnni-prfm.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-7x16c4-minmax-avx512vnni.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-7x16c8-minmax-avx512vnni-prfm.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-7x16c8-minmax-avx512vnni.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-8x8c8-minmax-avx512vnni-prfm.c.o [ 77%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-8x8c8-minmax-avx512vnni.c.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterFunctionalization_3.cpp.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-8x16c4-minmax-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-8x16c4-minmax-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-8x16c8-minmax-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc8w-igemm/gen/qd8-f32-qc8w-igemm-8x16c8-minmax-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x8c8-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16c4-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16c4-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16c8-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-1x16c8-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x8c8-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x16c4-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x16c4-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x16c8-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-2x16c8-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x8c8-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x8c8-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x16c4-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x16c4-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x16c8-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-3x16c8-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x8c8-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x8c8-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c4-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c4-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c8-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-4x16c8-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-5x8c8-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-5x8c8-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-5x16c4-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-5x16c4-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-5x16c8-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-5x16c8-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x8c8-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x8c8-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x16c4-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterMeta.cpp.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x16c4-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x16c8-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-6x16c8-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-7x8c8-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-7x8c8-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-7x16c4-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-7x16c4-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-7x16c8-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-7x16c8-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-8x8c8-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-8x8c8-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-8x16c4-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-8x16c4-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-8x16c8-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-gemm/gen/qs8-qc8w-gemm-8x16c8-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x8c8-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x16c4-minmax-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x16c4-minmax-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x16c8-minmax-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-1x16c8-minmax-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-avx512vnni-prfm.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x8c8-minmax-fp32-avx512vnni.c.o [ 78%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x16c4-minmax-avx512vnni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x16c4-minmax-avx512vnni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x16c8-minmax-avx512vnni-prfm.c.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterMkldnnCPU.cpp.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-2x16c8-minmax-avx512vnni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x8c8-minmax-fp32-avx512vnni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x8c8-minmax-fp32-avx512vnni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x16c4-minmax-avx512vnni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x16c4-minmax-avx512vnni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x16c8-minmax-avx512vnni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-3x16c8-minmax-avx512vnni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x8c8-minmax-fp32-avx512vnni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x8c8-minmax-fp32-avx512vnni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16c4-minmax-avx512vnni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16c4-minmax-avx512vnni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16c8-minmax-avx512vnni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-4x16c8-minmax-avx512vnni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-5x8c8-minmax-fp32-avx512vnni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-5x8c8-minmax-fp32-avx512vnni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-5x16c4-minmax-avx512vnni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-5x16c4-minmax-avx512vnni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-5x16c8-minmax-avx512vnni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-5x16c8-minmax-avx512vnni.c.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterNestedTensorCPU.cpp.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x8c8-minmax-fp32-avx512vnni-prfm.c.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterNestedTensorMeta.cpp.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x8c8-minmax-fp32-avx512vnni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x16c4-minmax-avx512vnni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x16c4-minmax-avx512vnni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x16c8-minmax-avx512vnni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-6x16c8-minmax-avx512vnni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-7x8c8-minmax-fp32-avx512vnni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-7x8c8-minmax-fp32-avx512vnni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-7x16c4-minmax-avx512vnni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-7x16c4-minmax-avx512vnni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-7x16c8-minmax-avx512vnni-prfm.c.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterQuantizedCPU.cpp.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-7x16c8-minmax-avx512vnni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-8x8c8-minmax-fp32-avx512vnni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-8x8c8-minmax-fp32-avx512vnni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-8x16c4-minmax-avx512vnni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-8x16c4-minmax-avx512vnni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-8x16c8-minmax-avx512vnni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qs8-qc8w-igemm/gen/qs8-qc8w-igemm-8x16c8-minmax-avx512vnni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-1x8c8-minmax-avx512vnnigfni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-1x8c8-minmax-avx512vnnigfni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-2x8c8-minmax-avx512vnnigfni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-2x8c8-minmax-avx512vnnigfni.c.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterQuantizedMeta.cpp.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-3x8c8-minmax-avx512vnnigfni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-3x8c8-minmax-avx512vnnigfni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-4x8c8-minmax-avx512vnnigfni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-4x8c8-minmax-avx512vnnigfni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-5x8c8-minmax-avx512vnnigfni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-5x8c8-minmax-avx512vnnigfni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-6x8c8-minmax-avx512vnnigfni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-6x8c8-minmax-avx512vnnigfni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-7x8c8-minmax-avx512vnnigfni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-7x8c8-minmax-avx512vnnigfni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-8x8c8-minmax-avx512vnnigfni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f16-qc4w-gemm/gen/qd8-f16-qc4w-gemm-8x8c8-minmax-avx512vnnigfni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x8c8-minmax-avx512vnnigfni-prfm.c.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSchema.cpp.o [ 79%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSparseCPU.cpp.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x8c8-minmax-avx512vnnigfni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x16c4-minmax-avx512vnnigfni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x16c4-minmax-avx512vnnigfni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x16c8-minmax-avx512vnnigfni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-1x16c8-minmax-avx512vnnigfni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x8c8-minmax-avx512vnnigfni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x8c8-minmax-avx512vnnigfni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x16c4-minmax-avx512vnnigfni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x16c4-minmax-avx512vnnigfni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x16c8-minmax-avx512vnnigfni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-2x16c8-minmax-avx512vnnigfni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x8c8-minmax-avx512vnnigfni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x8c8-minmax-avx512vnnigfni.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x16c4-minmax-avx512vnnigfni-prfm.c.o [ 79%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x16c4-minmax-avx512vnnigfni.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x16c8-minmax-avx512vnnigfni-prfm.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-3x16c8-minmax-avx512vnnigfni.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x8c8-minmax-avx512vnnigfni-prfm.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x8c8-minmax-avx512vnnigfni.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x16c4-minmax-avx512vnnigfni-prfm.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x16c4-minmax-avx512vnnigfni.c.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSparseCsrCPU.cpp.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x16c8-minmax-avx512vnnigfni-prfm.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-4x16c8-minmax-avx512vnnigfni.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-5x8c8-minmax-avx512vnnigfni-prfm.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-5x8c8-minmax-avx512vnnigfni.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-5x16c4-minmax-avx512vnnigfni-prfm.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-5x16c4-minmax-avx512vnnigfni.c.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSparseCsrMeta.cpp.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-5x16c8-minmax-avx512vnnigfni-prfm.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-5x16c8-minmax-avx512vnnigfni.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x8c8-minmax-avx512vnnigfni-prfm.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x8c8-minmax-avx512vnnigfni.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x16c4-minmax-avx512vnnigfni-prfm.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x16c4-minmax-avx512vnnigfni.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x16c8-minmax-avx512vnnigfni-prfm.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-6x16c8-minmax-avx512vnnigfni.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-7x8c8-minmax-avx512vnnigfni-prfm.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-7x8c8-minmax-avx512vnnigfni.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-7x16c4-minmax-avx512vnnigfni-prfm.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-7x16c4-minmax-avx512vnnigfni.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-7x16c8-minmax-avx512vnnigfni-prfm.c.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSparseMeta.cpp.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-7x16c8-minmax-avx512vnnigfni.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-8x8c8-minmax-avx512vnnigfni-prfm.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-8x8c8-minmax-avx512vnnigfni.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-8x16c4-minmax-avx512vnnigfni-prfm.c.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterZeroTensor.cpp.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-8x16c4-minmax-avx512vnnigfni.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-8x16c8-minmax-avx512vnnigfni-prfm.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/qd8-f32-qc4w-gemm/gen/qd8-f32-qc4w-gemm-8x16c8-minmax-avx512vnnigfni.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2-k-over-64.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2-k-over-2048.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-4.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-8.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-16.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-32.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-64.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-2048.c.o [ 80%] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/vlog.c.o [ 80%] Built target microkernels-all [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/UfuncCPU_add.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/ATenOpList.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/TensorMethods.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/quantized/QTensorImpl.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/quantized/Quantizer.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/nnapi/nnapi_bind.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/nnapi/nnapi_model_loader.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/nnapi/nnapi_register.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/nnapi/nnapi_wrapper.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/UfuncCPUKernel_add.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/spherical_bessel_j0.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/scaled_modified_bessel_k1.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/scaled_modified_bessel_k0.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/layer_norm_kernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/int8mm_kernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/int4mm_kernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/group_norm_kernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/batch_norm_kernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/airy_ai.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/WeightNormKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/UpSampleMoreKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/UpSampleKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/UnfoldBackwardKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/Unfold2d.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/TensorCompareKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SumKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/StackKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SpmmReduceKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SparseFactories.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SortingKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SoftMaxKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ScatterGatherKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SampledAddmmKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/RenormKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ReduceOpsKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ReduceAllOpsKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/RangeFactoriesKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PowKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PixelShuffleKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PaddingKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/NativeMultiheadAttnKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MultinomialKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxUnpoolKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxPooling.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxPoolKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/LinearAlgebraKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/LerpKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/IndexKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/HistogramKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/GridSamplerKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/FunctionOfAMatrixUtilsKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/FlashAttentionKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/FillKernel.cpp.DEFAULT.cpp.o [ 80%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/DistributionKernels.cpp.DEFAULT.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/DistanceOpsKernel.cpp.DEFAULT.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/DepthwiseConvKernel.cpp.DEFAULT.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/CrossKernel.cpp.DEFAULT.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ComplexKernel.cpp.DEFAULT.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ChannelShuffleKernel.cpp.DEFAULT.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/CatKernel.cpp.DEFAULT.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/BlasKernel.cpp.DEFAULT.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/BinaryOpsKernel.cpp.DEFAULT.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/AvgPoolKernel.cpp.DEFAULT.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/AmpGradScalerKernels.cpp.DEFAULT.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/AdaptiveMaxPoolKernel.cpp.DEFAULT.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/AdaptiveAvgPoolKernel.cpp.DEFAULT.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/Activation.cpp.DEFAULT.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/vulkan/Context.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/metal/Context.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/core/common.cc.o [ 81%] Building C object caffe2/CMakeFiles/torch_cpu.dir/__/third_party/miniz-2.1.0/miniz.c.o /builddir/build/BUILD/pytorch/third_party/miniz-2.1.0/miniz.c:3157:9: note: ‘#pragma message: Using fopen, ftello, fseeko, stat() etc. path for file I/O - this path may not support large files.’ 3157 | #pragma message("Using fopen, ftello, fseeko, stat() etc. path for file I/O - this path may not support large files.") | ^~~~~~~ [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/serialize/inline_container.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/serialize/istream_adapter.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/serialize/file_adapter.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/serialize/crc.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/serialize/read_adapter_interface.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/utils/string_utils.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/utils/threadpool/ThreadPool.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/utils/threadpool/pthreadpool-cpp.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/utils/threadpool/thread_pool_guard.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/utils/proto_wrap.cc.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/Functions.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/ViewFuncs.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_0.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_1.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_2.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_3.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/VariableType_4.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/TraceType_0.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/TraceType_1.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/TraceType_2.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/TraceType_3.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/TraceType_4.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/ADInplaceOrViewType_0.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/inductor/aoti_torch/generated/c_shim_cpu.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/generated/LazyNativeFunctions.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/generated/RegisterAutogradLazy.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/generated/RegisterLazy.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/anomaly_mode.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/autograd.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/autograd_meta.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/autograd_not_implemented_fallback.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/cpp_hook.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/custom_function.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/engine.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/forward_grad.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/function.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/functions/accumulate_grad.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/functions/basic_ops.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/functions/tensor.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/functions/utils.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/input_buffer.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/input_metadata.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/jit_decomp_interface.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/profiler_kineto.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/profiler_legacy.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/record_function_ops.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/saved_variable.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/utils/warnings.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/variable.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/variable_info.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/inductor/aoti_runner/model_container_runner.cpp.o [ 81%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/inductor/aoti_runner/model_container_runner_cpu.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/inductor/aoti_torch/shim_common.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/inductor/aoti_torch/tensor_converter.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/inductor/inductor_ops.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/api/function_impl.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/api/module.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/api/object.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/backends/backend_debug_handler.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/backends/backend_debug_info.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/backends/backend_detail.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/backends/backend_interface.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/backends/backend_resolver.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/fuser/codegen.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/fuser/compiler.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/fuser/executor.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/fuser/fallback.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/fuser/interface.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/fuser/kernel_cache.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/builtin_functions.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/canonicalize_modified_loop.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/convert_to_ssa.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/edit_distance.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/exit_transforms.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/inline_loop_condition.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/ir_emitter.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/name_mangler.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/parser.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/schema_matching.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/script_type_parser.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/sugared_value.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/tracer.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/versioned_symbols.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/alias_analysis.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/attributes.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/constants.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/graph_utils.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/ir.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/irparser.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/node_hashing.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/scope.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/subgraph_matcher.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/type_hashing.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/jit_log.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/jit_opt_limit.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/compatibility/model_compatibility.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/compatibility/runtime_compatibility.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/flatbuffer_loader.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/function.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/import.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/interpreter.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/module.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/nnc/aot_compiler.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/nnc/backend.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/nnc/context.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/nnc/registry.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/observer.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/parse_bytecode.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/parse_operators.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/prim_ops_registery.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/promoted_prim_ops.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/quantization.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/register_ops_common_utils.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/type_parser.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/upgrader_mobile.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/operator_upgraders/upgraders.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/operator_upgraders/upgraders_entry.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/operator_upgraders/utils.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/operator_upgraders/version_map.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/add_if_then_else.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/annotate_warns.cpp.o [ 82%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/bailout_graph.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/batch_mm.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/canonicalize.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/canonicalize_graph_fuser_ops.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/check_strict_fusion.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/clear_profiling.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/clear_undefinedness.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/common_subexpression_elimination.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/concat_opt.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/constant_pooling.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/constant_propagation.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/create_autodiff_subgraphs.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/create_functional_graphs.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/dbr_quantization/remove_redundant_aliases.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/dead_code_elimination.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/decompose_ops.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/device_type_analysis.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/dtype_analysis.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/eliminate_no_ops.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/erase_number_types.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/fixup_trace_scope_blocks.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/fold_conv_bn.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/fold_linear_bn.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/freeze_module.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/frozen_concat_linear.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/frozen_conv_add_relu_fusion.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/frozen_conv_folding.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/frozen_graph_optimizations.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/frozen_linear_folding.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/frozen_linear_transpose.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/frozen_ops_to_mkldnn.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/fuse_linear.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/fuse_relu.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/graph_fuser.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/graph_rewrite_helper.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/guard_elimination.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/hoist_conv_packed_params.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/inline_autodiff_subgraphs.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/inline_fork_wait.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/inline_forked_closures.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/inliner.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/inplace_check.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/insert_guards.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/integer_value_refinement.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/lift_closures.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/liveness.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/loop_unrolling.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/lower_grad_of.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/lower_tuples.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/metal_rewrite.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/mkldnn_rewrite.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/normalize_ops.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/pass_manager.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/peephole.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/peephole_alias_sensitive.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/peephole_dict_idioms.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/peephole_list_idioms.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/peephole_non_tensor.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/prepack_folding.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/dedup_module_uses.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/finalize.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/fusion_passes.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/helper.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/insert_observers.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/insert_quant_dequant.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/quantization_type.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/quantization/register_packed_params.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/refine_tuple_types.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/remove_dropout.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/remove_exceptions.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/remove_expands.cpp.o [ 83%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/remove_mutation.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/remove_redundant_profiles.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/replacement_of_old_operators.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/requires_grad_analysis.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/restore_mutation.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/shape_analysis.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/specialize_autogradzero.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/subgraph_rewrite.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/symbolic_shape_analysis.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/symbolic_shape_cache.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/symbolic_shape_runtime_fusion.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/tensorexpr_fuser.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/update_differentiable_graph_requires_grad.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/utils/memory_dag.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/utils/op_registry.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/utils/optimization_utils.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/utils/subgraph_utils.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/value_refinement_utils.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/variadic_ops.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/vulkan_rewrite.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/xnnpack_rewrite.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/python/update_graph_executor_opt.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/python/utf8_decoding_ignore.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/argument_spec.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/autodiff.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/decomposition_registry.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/decomposition_registry_util.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/graph_executor.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/instruction.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/interpreter.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/interpreter/frame.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/interpreter/preprocess_graph.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/jit_exception.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/jit_trace.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/logging.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/operator.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/print_handler.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/profiling_graph_executor_impl.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/profiling_record.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/register_ops_utils.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/script_profile.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/serialized_shape_function_registry.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/simple_graph_executor_impl.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/slice_indices_adjust.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/fusion.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/generated_ops.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/impl.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/memory_planner.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/native_ops.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/ops.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/passes.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/te_wrapper.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/symbolic_script.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/symbolic_shape_registry.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/symbolic_shape_registry_util.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/vararg_functions.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/callstack_debug_info_serialization.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/import.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/import_export_helpers.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/import_read.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/import_source.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/pickle.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/pickler.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/python_print.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/source_range_serialization.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/type_name_uniquer.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/unpickler.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/block_codegen.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/bounds_inference.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/bounds_overlap.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/codegen.cpp.o [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/cpp_codegen.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/eval.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/expr.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/external_functions.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/external_functions_codegen.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/external_functions_core.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/external_functions_registry.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/graph_opt.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/hash_provider.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/intrinsic_symbols.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/ir.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/ir_cloner.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/ir_mutator.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/ir_printer.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/ir_simplifier.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/ir_verifier.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/ir_visitor.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/kernel.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/llvm_codegen.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/llvm_jit.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/loopnest.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/loopnest_randomization.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/lowerings.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/mem_dependency_checker.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/conv2d.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/matmul.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/misc.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/norm.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/pointwise.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/quantization.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/reduction.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/operators/softmax.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/reduction.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/registerizer.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/tensor.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/types.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/unique_name_manager.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/testing/file_check.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/testing/hooks_for_testing.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/backend/backend_device.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/backend/backend_interface.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/backend/lowering_context.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/config.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/debug_util.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/hash.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/helpers.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/ir.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/ir_dump_util.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/ir_metadata.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/ir_util.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/lazy_graph_executor.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/metrics.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/multi_wait.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/ops/arithmetic_ir_ops.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/ops/utils.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/permutation_util.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/shape.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/shape_inference.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/tensor.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/tensor_impl.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/tensor_util.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/thread_pool.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/core/trie.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/monitor/counters.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/monitor/events.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/collection.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/combined_traceback.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/data_flow.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/kineto_client_interface.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/kineto_shim.cpp.o [ 85%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/orchestration/observer.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/orchestration/python_tracer.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/orchestration/vulkan.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/perf.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/standalone/execution_trace_observer.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/standalone/itt_observer.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/standalone/nvtx_observer.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/stubs/base.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/unwind/unwind.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/unwind/unwind_fb.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/util.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/utils/cpp_stacktraces.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/utils/schema_info.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/utils/tensor_flatten.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/utils/variadic.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/cuda/interface.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/autocast.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/lower_graph.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/remove_inplace_ops.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/passes/utils/check_alias_annotation.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/register_c10_ops.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/register_prim_ops.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/register_prim_ops_fulljit.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/register_special_ops.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/debug_info.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/dynamic_ir.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/config.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ops/device_data.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ops/generic.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/tensor_aten_ops.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ts_autograd_functions.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ts_backend_impl.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ts_eager_fallback.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ts_lowering_context.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ts_native_functions.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ts_node.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/lazy/ts_backend/ts_node_lowering.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/import_data.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/train/export_data.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/train/optim/sgd.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/train/random.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/train/sequential.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/flatbuffer_serializer.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/FunctionsManual.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/utils/out_types.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/TraceTypeManual.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/VariableTypeManual.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/itt_wrapper.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/profiler/stubs/itt.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/jit.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/compatibility/backport.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/mobile/compatibility/backport_manager.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/onnx.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/export.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/export_bytecode.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/export_module.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/codegen/fuser/cpu/fused_kernel.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/api/module_save.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/utils/byte_order.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/Backend.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/FileStore.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/Functional.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/GlooDeviceFactory.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/GroupRegistry.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/Ops.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/ParamCommsUtils.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/PrefixStore.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/ProcessGroup.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/ProcessGroupGloo.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/ProcessGroupMPI.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/ProcessGroupWrapper.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/Store.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/TCPStore.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/TCPStoreBackend.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/TCPStoreLibUvBackend.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/Utils.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/comm.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/debug.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/default_comm_hooks.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/logger.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/logging.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/quantization/quantization.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/reducer.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/sequence_num.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/socket.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/Work.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/autograd.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/utils.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/context/container.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/context/context.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/engine/dist_engine.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/functions/recvrpc_backward.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/functions/sendrpc_backward.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/autograd_metadata.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/propagate_gradients_req.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/propagate_gradients_resp.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/cleanup_autograd_context_req.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/cleanup_autograd_context_resp.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/rpc_with_autograd.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/rpc_with_profiling_req.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/rpc_with_profiling_resp.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/rref_backward_req.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/autograd/rpc_messages/rref_backward_resp.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/HashStore.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/ProcessGroupRoundRobin.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/agent_utils.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/message.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/profiler/remote_profiler_manager.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/profiler/server_process_global_profiler.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/python_call.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/python_remote_call.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/python_resp.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/request_callback.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/request_callback_no_python.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/rpc_agent.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/rref_context.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/rref_impl.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/rref_proto.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/script_call.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/script_remote_call.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/script_resp.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/tensorpipe_agent.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/tensorpipe_utils.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/testing/faulty_tensorpipe_agent.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/torchscript_functions.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/types.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/rpc/utils.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/cuda.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/data/datasets/mnist.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/data/samplers/distributed.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/data/samplers/random.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/data/samplers/sequential.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/data/samplers/stream.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/enum.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/imethod.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/serialize.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/mps.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/init.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/module.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/_functions.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/activation.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/adaptive.cpp.o [ 87%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/batchnorm.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/normalization.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/instancenorm.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/conv.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/dropout.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/distance.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/embedding.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/fold.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/linear.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/loss.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/padding.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/pixelshuffle.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/pooling.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/rnn.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/upsampling.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/transformer.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/modules/container/functional.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/activation.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/adaptive.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/batchnorm.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/embedding.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/instancenorm.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/normalization.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/conv.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/dropout.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/linear.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/padding.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/pooling.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/rnn.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/vision.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/nn/options/transformer.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/adagrad.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/adam.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/adamw.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/lbfgs.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/optimizer.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/rmsprop.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/serialize.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/sgd.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/schedulers/lr_scheduler.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/schedulers/step_lr.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/optim/schedulers/reduce_on_plateau_scheduler.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/serialize/input-archive.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/serialize/output-archive.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/api/src/xpu.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/UfuncCPUKernel_add.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/spherical_bessel_j0.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/scaled_modified_bessel_k1.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/scaled_modified_bessel_k0.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/layer_norm_kernel.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/int8mm_kernel.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/int4mm_kernel.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/group_norm_kernel.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/batch_norm_kernel.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/airy_ai.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/WeightNormKernel.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/UpSampleMoreKernel.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/UpSampleKernel.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/UnfoldBackwardKernel.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/Unfold2d.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/TensorCompareKernel.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SumKernel.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/StackKernel.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SpmmReduceKernel.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SparseFactories.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SortingKernel.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SoftMaxKernel.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ScatterGatherKernel.cpp.AVX2.cpp.o [ 88%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SampledAddmmKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/RenormKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ReduceOpsKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ReduceAllOpsKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/RangeFactoriesKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PowKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PixelShuffleKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PaddingKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/NativeMultiheadAttnKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MultinomialKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxUnpoolKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxPooling.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxPoolKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/LinearAlgebraKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/LerpKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/IndexKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/HistogramKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/GridSamplerKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/FunctionOfAMatrixUtilsKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/FlashAttentionKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/FillKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/DistributionKernels.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/DistanceOpsKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/DepthwiseConvKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/CrossKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/CopyKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ComplexKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ChannelShuffleKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/CatKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/BlasKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/BinaryOpsKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/AvgPoolKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/AmpGradScalerKernels.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/AdaptiveMaxPoolKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/AdaptiveAvgPoolKernel.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/Activation.cpp.AVX2.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/UfuncCPUKernel_add.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/spherical_bessel_j0.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/scaled_modified_bessel_k1.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/scaled_modified_bessel_k0.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/layer_norm_kernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/int8mm_kernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/int4mm_kernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/group_norm_kernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/batch_norm_kernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/airy_ai.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/WeightNormKernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/UpSampleMoreKernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/UpSampleKernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/UnfoldBackwardKernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/Unfold2d.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/TensorCompareKernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SumKernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/StackKernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SpmmReduceKernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SparseFactories.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SortingKernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SoftMaxKernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ScatterGatherKernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/SampledAddmmKernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/RenormKernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ReduceOpsKernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ReduceAllOpsKernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/RangeFactoriesKernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PowKernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PixelShuffleKernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PaddingKernel.cpp.AVX512.cpp.o [ 89%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/NativeMultiheadAttnKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MultinomialKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxUnpoolKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxPooling.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxPoolKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/LinearAlgebraKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/LerpKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/IndexKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/HistogramKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/GridSamplerKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/FunctionOfAMatrixUtilsKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/FlashAttentionKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/FillKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/DistributionKernels.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/DistanceOpsKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/DepthwiseConvKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/CrossKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/CopyKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ComplexKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/ChannelShuffleKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/CatKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/BlasKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/BinaryOpsKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/AvgPoolKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/AmpGradScalerKernels.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/AdaptiveMaxPoolKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/AdaptiveAvgPoolKernel.cpp.AVX512.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/Activation.cpp.AVX512.cpp.o [ 90%] Linking CXX shared library ../lib/libtorch_cpu.so Warning: Unused direct dependencies: libc10.so.2.4 /lib64/libqnnpack.so.1 /lib64/libgloo_cuda.so.1 /lib64/liblmdb.so.0.0.0 /lib64/libleveldb.so.1 /lib64/libsnappy.so.1 /lib64/libzmq.so.5 /lib64/libhiredis.so.1.0.0 /lib64/libopencv_highgui.so.409 /lib64/libopencv_optflow.so.409 /lib64/libopencv_videoio.so.409 /lib64/libonnx_optimizer.so /lib64/libfoxi_loader.so.1.4.1 /lib64/libopencv_ximgproc.so.409 /lib64/libopencv_imgcodecs.so.409 /lib64/libopencv_video.so.409 /lib64/libopencv_dnn.so.409 /lib64/libopencv_calib3d.so.409 /lib64/libopencv_features2d.so.409 /lib64/libopencv_imgproc.so.409 /lib64/libopencv_flann.so.409 /lib64/libopencv_core.so.409 /lib64/libopencv_cudev.so.409 /usr/local/cuda-12.3/lib64/libcudart.so.12 [ 90%] Built target torch_cpu [ 90%] Building CXX object caffe2/torch/lib/libshm/CMakeFiles/shm.dir/core.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CUDAContext.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CUDAGeneratorImpl.cpp.o [ 90%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CUDAGraph.cpp.o [ 90%] Linking CXX shared library ../../../../lib/libshm.so Warning: Unused direct dependencies: libtorch_cpu.so.2.4 /lib64/libprotobuf.so.32 libc10.so.2.4 /lib64/libgflags.so.2.2 /lib64/libglog.so.0 /lib64/libqnnpack.so.1 /lib64/libgloo.so.1 /lib64/libgloo_cuda.so.1 /lib64/libm.so.6 [ 90%] Built target shm [ 90%] Building CXX object caffe2/torch/lib/libshm/CMakeFiles/torch_shm_manager.dir/manager.cpp.o [ 91%] Linking CXX executable ../../../../bin/torch_shm_manager [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CUDASparseDescriptors.cpp.o Warning: Unused direct dependencies: libshm.so.2.4 libc10.so.2.4 /lib64/libgflags.so.2.2 /lib64/libglog.so.0 /lib64/libm.so.6 [ 91%] Built target torch_shm_manager [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CachingHostAllocator.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CuSparseHandlePool.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/EmptyTensor.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/Exceptions.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/PeerToPeerAccess.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/PinnedMemoryAllocator.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/detail/CUDAHooks.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/detail/LazyNVRTC.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/llvm_basic.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/llvm_complex.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Resize.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SpectralOps.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorCompare.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/AffineGridGenerator.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/BatchNorm.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/ConvPlaceholders.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/ConvShared.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/Conv_v7.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/Conv_v8.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/GridSampler.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/LossCTC.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/MHA.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/RNN.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/miopen/BatchNorm_miopen.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/miopen/Conv_miopen.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/miopen/RNN_miopen.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/nested/cuda/NestedTensorTransformerUtils.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/Activation.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/BinaryOps.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/Conv.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/ConvPrepack.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/ConvUnpackImpl.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/Linear.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/LinearPrepack.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/LinearUnpackImpl.cpp.o [ 91%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cudnn/Pooling.cpp.o [ 92%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/cuSPARSELtOps.cpp.o [ 92%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp.o [ 92%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cudnn/AutocastRNN.cpp.o [ 92%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cudnn/Descriptors.cpp.o [ 92%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cudnn/Handle.cpp.o [ 92%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cudnn/Types.cpp.o [ 92%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/cuda/nccl.cpp.o [ 92%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/reducer_cuda.cpp.o [ 92%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/NCCLUtils.cpp.o [ 92%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp.o [ 92%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/ProcessGroupUCC.cpp.o [ 92%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/UCCTracing.cpp.o [ 92%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/UCCUtils.cpp.o [ 92%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/intra_node_comm.cpp.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/intra_node_comm.cu.o [ 92%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/rpc/tensorpipe_cuda.cpp.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/quantization/quantization_gpu.cu.o [ 92%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/inductor/aoti_torch/generated/c_shim_cuda.cpp.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorFactories.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/Sleep.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/cub-RadixSortKeys.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/cub-RadixSortPairs.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/cub.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/detail/IndexUtils.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/jiterator.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AbsKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationEluKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationGeluKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationGluKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationHardshrinkKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationHardsigmoidKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationHardswishKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationHardtanhKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationLeakyReluKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationLogSigmoidKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationMishKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationPreluKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationSiluKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationSoftplusKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationSoftshrinkKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ActivationThresholdKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AdaptiveAveragePooling.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AdaptiveAveragePooling3d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AdaptiveMaxPooling2d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AdaptiveMaxPooling3d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AmpKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AveragePool2d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/AveragePool3d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryBitwiseOpsKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryDivFloorKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryDivTrueKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryDivTruncKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryGeometricKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryLogicalOpsKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryMiscBackwardOpsKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryMiscOpsKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryMulKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryRemainderKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/BinaryShiftOpsKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Bucketization.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CUDAScalar.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Col2Im.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CompareEQKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CompareKernels.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ComplexKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ConvolutionMM2d.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Copy.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CopysignKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CrossKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CumminmaxKernel.cu.o [ 92%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CumprodKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/CumsumKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DepthwiseConv2d.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DepthwiseConv3d.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DilatedMaxPool2d.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DilatedMaxPool3d.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistanceKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionBernoulli.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionCauchyKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionExponentialKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionGeometricKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionLogNormalKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionNormal.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionRandomKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/DistributionUniform.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Distributions.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Dropout.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Embedding.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/EmbeddingBackwardKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/EmbeddingBag.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FillKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FlattenIndicesKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachBinaryOpList.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachBinaryOpScalar.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachBinaryOpScalarList.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachBinaryOpScalarTensor.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachPointwiseOp.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachReduceOp.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachTernaryOp.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachUnaryOp.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FractionalMaxPool2d.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FractionalMaxPool3d.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FunctionOfAMatrixUtilsKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FusedAdamKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FusedAdamWKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/FusedSgdKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/GcdLcmKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/GridSampler.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/IGammaKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Im2Col.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/IndexKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Indexing.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LegacyThrustHelpers.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Lerp.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LinearAlgebra.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LogAddExpKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LogcumsumexpKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Loss.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LossCTC.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/MaxMinElementwiseKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/MaxUnpooling.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/MixedDtypesLinear.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/MultiLabelMarginCriterion.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/MultiMarginLoss.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/MultinomialKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/NLLLoss2d.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/NaiveConvolutionTranspose2d.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/NaiveConvolutionTranspose3d.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/NaiveDilatedConvolution.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Nonzero.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Normalization.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/PointwiseOpsKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/PowKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/RNN.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Randperm.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/RangeFactories.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/RecordStream.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Reduce.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceAMinMaxKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceArgMaxKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceArgMinKernel.cu.o [ 93%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceLogicKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceMaxValuesKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceMinValuesKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceMomentKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceNormKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceSumProdKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReflectionPad.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/RenormKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Repeat.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReplicationPadding.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/RreluWithNoise.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ScatterGatherKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SegmentReduce.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Shape.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SoftMax.cu.o /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Sort.cu.o /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ Remark: The warnings can be suppressed with "-diag-suppress " /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu(844): warning #191-D: type qualifier is meaningless on cast type [&] { const auto& the_type = input.scalar_type(); constexpr const char* at_dispatch_name = "host_softmax"; at::ScalarType _st = ::detail::scalar_type(the_type); ; switch (_st) { case at::ScalarType::Double: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Double)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Double), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Float: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Float)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Float), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::Half: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::Half)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::Half), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } case at::ScalarType::BFloat16: { do { if constexpr (!at::should_include_kernel_dtype( at_dispatch_name, at::ScalarType::BFloat16)) { do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str("dtype '", toString(at::ScalarType::BFloat16), "' not selected for kernel tag ", at_dispatch_name)))); }; } while (false); } } while (0); using scalar_t __attribute__((__unused__)) = c10::impl::ScalarTypeToCPPTypeT; return [&] { using accscalar_t = acc_type; if (!half_to_float) { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1L << 30L) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(880), true); } while (0); } } else { auto output_ptr = output.mutable_data_ptr(); auto input_ptr = input.const_data_ptr(); if (dim_size <= 1024 && dim_size*sizeof(scalar_t) <= 4096) { int64_t remaining = outer_size; int64_t chunk_size = (1<<30) / dim_size; while(remaining > 0) { dispatch_softmax_forward( output_ptr, input_ptr, dim_size, dim_size, std::min(remaining, chunk_size), nullptr ); input_ptr += chunk_size * dim_size; output_ptr += chunk_size * dim_size; remaining -= chunk_size; } } else { constexpr int ILP = sizeof(float4) / sizeof(scalar_t); dim3 block = SoftMaxForward_getBlockSize(dim_size); size_t smem_reduction_sz = block.x / 32 * sizeof(accscalar_t); auto max_elements_per_smem = (at::cuda::getCurrentDeviceProperties()->sharedMemPerBlock - smem_reduction_sz) / sizeof(scalar_t); bool can_use_smem = dim_size < max_elements_per_smem; can_use_smem &= !(reinterpret_cast(input_ptr) % ALIGN_BYTES); can_use_smem &= (!(reinterpret_cast(output_ptr) % ALIGN_BYTES)); can_use_smem &= !(dim_size % ILP); if (can_use_smem) { size_t smem_sz = dim_size * sizeof(scalar_t) + smem_reduction_sz; cunn_SoftMaxForwardSmem <<>>(output_ptr, input_ptr, dim_size); } else { cunn_SoftMaxForward <<>>(output_ptr, input_ptr, dim_size); } do { const cudaError_t __err = cudaGetLastError(); c10::cuda::c10_cuda_check_implementation( static_cast(__err), "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", __func__, static_cast(916), true); } while (0); } } }(); } default: do { ::c10::detail::deprecated_AT_ERROR(); if (!(false)) { ::c10::detail::torchCheckFail( __func__, "/builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu", static_cast(844), (::c10::detail::torchCheckMsgImpl( "Expected " "false" " to be true, but got false. " "(Could this error message be improved? If so, " "please report an enhancement request to PyTorch.)", ::c10::str('"', at_dispatch_name, "\" not implemented for '", toString(_st), "'")))); }; } while (false); } }() ^ [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SortImpl.cu.o /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu: In instantiation of ‘at::Tensor at::native::_GLOBAL__N__08542f1a_10_SoftMax_cu_9f978f63::host_softmax(const at::Tensor&, int64_t, bool, const at::Tensor&) [with Epilogue = LogSoftMaxForwardEpilogue; bool is_log_softmax = true; int64_t = long int]’: /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:1072:56: required from here /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844:2132: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] 844 | AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, input.scalar_type(), "host_softmax", [&] { | ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] 844 | AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, input.scalar_type(), "host_softmax", [&] { | /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu: In instantiation of ‘at::Tensor at::native::_GLOBAL__N__08542f1a_10_SoftMax_cu_9f978f63::host_softmax(const at::Tensor&, int64_t, bool, const at::Tensor&) [with Epilogue = SoftMaxForwardEpilogue; bool is_log_softmax = false; int64_t = long int]’: /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:1096:54: required from here /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844:2132: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] 844 | AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, input.scalar_type(), "host_softmax", [&] { | ^ /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] 844 | AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, input.scalar_type(), "host_softmax", [&] { | /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] /builddir/build/BUILD/pytorch/aten/src/ATen/native/cuda/SoftMax.cu:844: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compare] [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SortStable.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Sorting.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SparseBinaryOpIntersectionKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SparseMM.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SpectralOps.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/StepKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/SummaryOps.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorCompare.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorModeKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorShape.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorTopK.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorTransformations.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TriangularOps.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryComplexKernels.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryFractionKernels.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGammaKernels.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricAcosKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricAcoshKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricAsinKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricAsinhKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricAtanKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricAtanhKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricCosKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricCoshKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricSinKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricSinhKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricTanKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryGeometricTanhKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryLogKernels.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnaryOpsKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnarySignKernels.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnarySpecialOpsKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UnfoldBackwardKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UniqueCub.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UpSampleBicubic2d.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UpSampleBilinear2d.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UpSampleLinear1d.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UpSampleNearest1d.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UpSampleNearest2d.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UpSampleNearest3d.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/UpSampleTrilinear3d.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ValidateCompressedIndicesKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/WeightNorm.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ZetaKernel.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/airy_ai.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/bessel_j0.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/bessel_j1.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/bessel_y0.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/bessel_y1.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/chebyshev_polynomial_t.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/chebyshev_polynomial_u.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/chebyshev_polynomial_v.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/chebyshev_polynomial_w.cu.o [ 94%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/fused_adam_amsgrad_impl.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/fused_adam_impl.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/fused_adamw_amsgrad_impl.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/fused_adamw_impl.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/group_norm_kernel.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/hermite_polynomial_h.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/hermite_polynomial_he.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/int4mm.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/laguerre_polynomial_l.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/layer_norm_kernel.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/legendre_polynomial_p.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/modified_bessel_i0.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/modified_bessel_i1.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/modified_bessel_k0.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/modified_bessel_k1.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/scaled_modified_bessel_k0.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/scaled_modified_bessel_k1.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/shifted_chebyshev_polynomial_t.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/shifted_chebyshev_polynomial_u.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/shifted_chebyshev_polynomial_v.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/shifted_chebyshev_polynomial_w.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/spherical_bessel_j0.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/nested/cuda/NestedTensorBinaryOps.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/nested/cuda/NestedTensorMatmul.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SoftMax.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseCUDATensor.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseCsrTensorMath.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseMatMul.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseSemiStructuredLinear.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseSemiStructuredOps.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/Activation.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/AffineQuantizer.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/EmbeddingBag.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/FakeQuantizeCore.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/FusedObsFakeQuant.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/IntReprQuant.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/quantized/cuda/MakePerTensorQuantizedTensor.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/attention.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/attention_backward.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim128_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim128_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim160_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim160_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim192_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim192_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim224_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim224_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim256_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim256_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim32_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim32_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim64_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim64_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim96_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim96_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim128_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim128_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim160_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim160_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim192_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim192_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim224_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim224_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim256_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim256_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim32_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim32_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim64_bf16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim64_fp16_sm80.cu.o [ 95%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim96_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_hdim96_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim128_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim128_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim160_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim160_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim192_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim192_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim224_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim224_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim256_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim256_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim32_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim32_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim64_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim64_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim96_bf16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_fwd_split_hdim96_fp16_sm80.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k128.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k128_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k32.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k32_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k64.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k64_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k65536.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k65536_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_bf16_aligned_k96.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k128.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k128_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k32.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k32_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k64.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k64_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k65536.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k65536_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k96.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k128.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k128_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k32.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k32_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k64.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k64_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k65536.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k65536_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k128.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k128_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k32.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k32_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k64.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k64_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k65536.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_aligned_k65536_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k128.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k128_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k32.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k32_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k64.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k64_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k65536.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f32_notaligned_k65536_dropout.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassF_bf16_aligned.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassF_f16_aligned.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassF_f16_notaligned.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassF_f32_aligned.cu.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassF_f32_notaligned.cu.o [ 96%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/RegisterCUDA.cpp.o [ 96%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/RegisterNestedTensorCUDA.cpp.o [ 96%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/RegisterQuantizedCUDA.cpp.o [ 96%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/RegisterSparseCUDA.cpp.o [ 96%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/RegisterSparseCsrCUDA.cpp.o [ 96%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/UfuncCUDA_add.cu.o [ 96%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CUDABlas.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CUDASparseBlas.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/CublasHandlePool.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/tunable/StreamTimer.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/cuda/tunable/Tunable.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Activation.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LinearAlgebraStubs.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Blas.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Distributions.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Equal.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/GridSampler.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/IndexKernel.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceOps.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ScanKernels.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Sort.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Sorting.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorModeKernel.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorShapeCUDA.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/TensorTopK.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/jit_utils.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseBlas.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseBlasImpl.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseBlasLegacy.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/CudaIPCTypes.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/cuda/comm.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/cuda/memory_snapshot.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/inductor/aoti_runner/model_container_runner_cuda.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/inductor/aoti_torch/shim_cuda.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/jit/codegen/fuser/cuda/fused_kernel.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/profiler/stubs/cuda.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/autograd/functions/comm.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/jit/passes/frozen_conv_add_relu_fusion_cuda.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/jit/tensorexpr/cuda_codegen.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/jit/runtime/register_cuda_ops.cpp.o [ 97%] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Unique.cu.o [ 97%] Linking CXX shared library ../lib/libtorch_cuda.so Warning: Unused direct dependencies: libc10_cuda.so /lib64/libgloo_cuda.so.1 /usr/local/cuda-12.3/lib64/libcurand.so.10 libc10.so.2.4 /lib64/libgflags.so.2.2 libtorch_cpu.so.2.4 [ 97%] Built target torch_cuda [ 97%] Building CXX object caffe2/CMakeFiles/torch.dir/__/empty.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda_linalg.dir/__/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda_linalg.dir/__/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda_linalg.dir/__/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLibBlas.cpp.o [ 97%] Linking CXX shared library ../lib/libtorch.so Warning: Unused direct dependencies: /lib64/libstdc++.so.6 libtorch_cpu.so.2.4 libtorch_cuda.so [ 97%] Built target torch [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda_linalg.dir/__/aten/src/ATen/native/cuda/linalg/CUDASolver.cpp.o [ 97%] Building CXX object caffe2/CMakeFiles/torch_cuda_linalg.dir/__/aten/src/ATen/native/cuda/linalg/CusolverDnHandlePool.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_functions_0.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_functions_1.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_functions_2.cpp.o [ 97%] Linking CXX shared library ../lib/libtorch_cuda_linalg.so Warning: Unused direct dependencies: libtorch_cpu.so.2.4 libtorch_cuda.so libc10_cuda.so /usr/local/cuda-12.3/lib64/libnvToolsExt.so.1 /lib64/libprotobuf.so.32 libc10.so.2.4 /lib64/libgflags.so.2.2 /lib64/libglog.so.0 /lib64/libqnnpack.so.1 /lib64/libgloo.so.1 /lib64/libgloo_cuda.so.1 [ 97%] Built target torch_cuda_linalg [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_functions_3.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_functions_4.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_variable_methods.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_torch_functions_0.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_torch_functions_1.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_torch_functions_2.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_nn_functions.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_fft_functions.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_linalg_functions.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_nested_functions.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_sparse_functions.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_special_functions.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_return_types.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/generated/python_enum_tag.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/DataLoader.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Device.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Dtype.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/DynamicTypes.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Exceptions.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Generator.cpp.o [ 97%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Layout.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/MemoryFormat.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/QScheme.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Module.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/PyInterpreter.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/python_dimname.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Size.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Storage.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/StorageMethods.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/StorageSharing.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/Stream.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/TypeInfo.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/api/src/python/init.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/functions/init.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/init.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/profiler_python.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_anomaly_mode.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_saved_variable_hooks.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_cpp_function.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_engine.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_function.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_hook.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_legacy_variable.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_nested_functions_manual.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_torch_functions_manual.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_variable.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/autograd/python_variable_indexing.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/python_compiled_autograd.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/cache_entry.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/cpp_shim.cpp.o [ 98%] Building C object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/cpython_defs.c.o [ 98%] Building C object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/eval_frame.c.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/extra_state.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/guards.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/dynamo/init.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/functorch/init.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/mps/Module.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/inductor/aoti_runner/pybind.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/backends/backend_init.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/init.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/cast_all_constant_to_floating.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/deduplicate_initializers.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/eval_peephole.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/constant_fold.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/constant_map.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/eliminate_unused_items.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/fixup_onnx_controlflow.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/list_model_parameters.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/function_substitution.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/helper.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/peephole.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/preprocess_for_onnx.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/prepare_division_for_onnx.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/scalar_type_analysis.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/unpack_quantized_weights.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/remove_inplace_ops_for_onnx.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/shape_type_inference.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/function_extraction.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/onnx_log.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/naming.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/pybind_utils.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/pattern_conversion/autograd_function_process.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/pattern_conversion/common.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/pattern_conversion/pattern_encapsulation.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/passes/onnx/pattern_conversion/pattern_conversion.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_arg_flatten.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_custom_class.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_dict.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_interpreter.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_ir.cpp.o [ 98%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_list.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_tracer.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/script_init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/frontend/concrete_module_type.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/frontend/tree_views.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_sugared_value.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/python/python_tree_views.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/runtime/static/init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/tensorexpr/tensorexpr_init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/monitor/python_init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/multiprocessing/init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/onnx/init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/profiler/python/init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/profiler/python/combined_traceback.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/serialization.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/tensor/python_tensor.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/throughput_benchmark.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/device_lazy_init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/invalid_arguments.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/nested.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/object_ptr.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/python_arg_parser.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/python_dispatch.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/python_symnode.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/pybind.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/pyobject_preservation.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/structseq.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_apply.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_dtypes.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_layouts.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_memoryformats.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_qschemes.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_list.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_new.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_numpy.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/tensor_types.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/disable_torch_function.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/utils/verbose.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cpu/Module.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/lazy/python/init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/lazy/python/python_util.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/itt.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/Event.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/Module.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/python_comm.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/Stream.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/Graph.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/shared/cudart.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/shared/nvtx.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/utils.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/CUDAPluggableAllocator.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/shared/cudnn.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/c10d/init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/c10d/python_comm_hook.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/autograd/init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/py_rref.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/python_functions.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/python_rpc_handler.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/request_callback_impl.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/testing/init.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/unpickled_python_call.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/rpc/unpickled_python_remote_call.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/jit/runtime/register_distributed_ops.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/cuda/python_nccl.cpp.o [ 99%] Linking CXX shared library ../../lib/libtorch_python.so Warning: Unused direct dependencies: libshm.so.2.4 libtorch.so.2.4 libtorch_cpu.so.2.4 libtorch_cuda.so libc10_cuda.so libc10.so.2.4 [ 99%] Built target torch_python [ 99%] Building CXX object caffe2/torch/CMakeFiles/nnapi_backend.dir/csrc/jit/backends/nnapi/nnapi_backend_preprocess.cpp.o [ 99%] Building CXX object caffe2/torch/CMakeFiles/nnapi_backend.dir/csrc/jit/backends/nnapi/nnapi_backend_lib.cpp.o [ 99%] Building C object caffe2/torch/CMakeFiles/_C.dir/csrc/stub.c.o [ 99%] Building CXX object functorch/CMakeFiles/functorch.dir/csrc/dim/dim.cpp.o [ 99%] Linking C shared library ../../lib/_C.so Warning: Unused direct dependencies: /lib64/libstdc++.so.6 libtorch_python.so.2.4 [ 99%] Built target _C [ 99%] Building C object functorch/CMakeFiles/functorch.dir/csrc/dim/dim_opcode.c.o [ 99%] Building CXX object functorch/CMakeFiles/functorch.dir/csrc/init_dim_only.cpp.o [ 99%] Linking CXX shared module functorch.so [ 99%] Built target functorch [100%] Linking CXX shared library ../../lib/libnnapi_backend.so Warning: Unused direct dependencies: libtorch.so.2.4 libtorch_python.so.2.4 libtorch_cpu.so.2.4 libtorch_cuda.so libc10.so.2.4 [100%] Built target nnapi_backend + popd ~/build/BUILD/pytorch + RPM_EC=0 ++ jobs -p + exit 0 Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.gcu3SM + umask 022 + cd /builddir/build/BUILD + '[' /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64 '!=' / ']' + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64 ++ dirname /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64 + mkdir -p /builddir/build/BUILDROOT + mkdir /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64 + CFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 ' + export CFLAGS ~/build/BUILD/pytorch/build ~/build/BUILD/pytorch + CXXFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 ' + export CXXFLAGS + FFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 -I/usr/lib64/gfortran/modules ' + export FFLAGS + FCFLAGS='-O2 -fexceptions -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -w -fpermissive -Wno-sign-compare -Wno-deprecated-declarations -Wno-nonnull -DEIGEN_HAS_CXX11_MATH=1 -I/usr/lib64/gfortran/modules ' + export FCFLAGS + VALAFLAGS=-g + export VALAFLAGS + RUSTFLAGS='-Copt-level=3 -Cdebuginfo=2 -Ccodegen-units=1 -Cstrip=none -Cforce-frame-pointers=yes -Clink-arg=-specs=/usr/lib/rpm/redhat/redhat-package-notes --cap-lints=warn' + export RUSTFLAGS + LDFLAGS='-Wl,-z,relro -Wl,--as-needed -Wl,-z,pack-relative-relocs -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes -Wl,-lstdc++' + export LDFLAGS + LT_SYS_LIBRARY_PATH=/usr/lib64: + export LT_SYS_LIBRARY_PATH + CC=gcc + export CC + CXX=g++ + export CXX + cd pytorch + pushd build + export PYTHON_EXECUTABLE=/usr/bin/python3 + PYTHON_EXECUTABLE=/usr/bin/python3 + make install DESTDIR=/builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64 [ 0%] Built target clog [ 0%] Built target fp16 [ 1%] Built target pytorch_qnnpack [ 1%] Built target fxdiv [ 1%] Built target psimd [ 71%] Built target microkernels-all [ 71%] Built target microkernels-prod [ 71%] Built target logging [ 71%] Built target hardware-config [ 71%] Built target indirection [ 71%] Built target jit [ 71%] Built target microparams-init [ 71%] Built target normalization [ 71%] Built target packing [ 71%] Built target allocator [ 71%] Built target memory [ 72%] Built target cache [ 72%] Built target microkernel-utils [ 72%] Built target mutex [ 72%] Built target post-operation [ 72%] Built target operator-utils [ 72%] Built target operators [ 72%] Built target operator-run [ 73%] Built target subgraph [ 73%] Built target convolution-test-helpers [ 73%] Built target XNNPACK [ 73%] Built target ittnotify [ 73%] Built target fmt [ 74%] Built target c10 [ 74%] Built target c10_cuda [ 74%] Built target Caffe2_PROTO [ 74%] Built target caffe2_protos [ 74%] Built target caffe2_nvrtc [ 74%] Built target ATEN_CPU_FILES_GEN_TARGET [ 90%] Built target torch_cpu [ 90%] Built target ATEN_CUDA_FILES_GEN_TARGET [ 96%] Built target torch_cuda [ 96%] Built target torch [ 96%] Built target torch_cuda_linalg [ 96%] Built target torch_global_deps [ 96%] Built target python_copy_files [ 96%] Built target shm [ 96%] Built target generate-torch-sources [ 96%] Built target torch_python_stubs [ 96%] Built target gen_torch_version [ 98%] Built target torch_python [ 98%] Built target _C [ 99%] Built target nnapi_backend [100%] Built target torch_shm_manager [100%] Built target functorch Install the project... -- Install configuration: "Release" + mkdir -p /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib64 + find /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/ -name '*.a' -type f -prune -exec rm -rf '{}' + + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/python3.12 + mv -f /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libc10.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libc10.so.2.4 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libc10.so.2.4.0 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libc10_cuda.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libcaffe2_nvrtc.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libshm.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libshm.so.2.4 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libshm.so.2.4.0 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libtorch.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libtorch.so.2.4 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libtorch.so.2.4.0 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libtorch_cpu.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libtorch_cpu.so.2.4 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libtorch_cpu.so.2.4.0 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libtorch_cuda.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libtorch_cuda_linalg.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libtorch_global_deps.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libtorch_global_deps.so.2.4 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libtorch_global_deps.so.2.4.0 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libtorch_python.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libtorch_python.so.2.4 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib/libtorch_python.so.2.4.0 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib64/ + popd ~/build/BUILD/pytorch + install -D -pm 755 build/lib/libnnapi_backend.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/ + mkdir -p /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/torch/bin + install -D -pm 644 build/lib/_C.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/torch/ + install -D -pm 644 build/functorch/functorch.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/functorch/_C.so + install -D -pm 644 aten/src/THC/THCDeviceUtils.cuh /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/include/THC/ + ln -sf /usr/include /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/torch/include + ln -sf /usr/lib64 /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/torch/lib + ln -sf /usr/bin/torch_shm_manager /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/torch/bin/torch_shm_manager ++ find ./torch/ -name '*.py' + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/version.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/version.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/xpu/streams.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/xpu/streams.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/xpu/random.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/xpu/random.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/xpu/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/xpu/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/xpu/_gpu_trace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/xpu/_gpu_trace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/xpu/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/xpu/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/weak.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/weak.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/viz/_cycles.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/viz/_cycles.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/viz/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/viz/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/throughput_benchmark.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/throughput_benchmark.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/writer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/writer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/summary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/summary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/_pytorch_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/_pytorch_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/_proto_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/_proto_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/_onnx_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/_onnx_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/_embedding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/_embedding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/_convert_np.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/_convert_np.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/_caffe2_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/_caffe2_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/tensorboard/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/tensorboard/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/show_pickle.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/show_pickle.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/model_zoo.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/model_zoo.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/model_dump/__main__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/model_dump/__main__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/model_dump/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/model_dump/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/mobile_optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/mobile_optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/mkldnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/mkldnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/jit/log_extract.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/jit/log_extract.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/jit/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/jit/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/hipify/version.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/hipify/version.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/hipify/hipify_python.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/hipify/hipify_python.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/hipify/cuda_to_hip_mappings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/hipify/cuda_to_hip_mappings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/hipify/constants.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/hipify/constants.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/hipify/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/hipify/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/flop_counter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/flop_counter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/file_baton.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/file_baton.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/dlpack.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/dlpack.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/deterministic.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/deterministic.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/sampler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/sampler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/graph_settings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/graph_settings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/distributed.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/distributed.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/dataset.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/dataset.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/utils/snapshot.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/utils/snapshot.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/utils/decoder.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/utils/decoder.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/utils/common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/utils/common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/utils/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/utils/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/map/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/map/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/map/grouping.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/map/grouping.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/map/combining.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/map/combining.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/map/combinatorics.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/map/combinatorics.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/map/callable.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/map/callable.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/map/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/map/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/streamreader.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/streamreader.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/sharding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/sharding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/selecting.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/selecting.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/routeddecoder.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/routeddecoder.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/grouping.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/grouping.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/fileopener.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/fileopener.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/filelister.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/filelister.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/combining.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/combining.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/combinatorics.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/combinatorics.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/callable.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/callable.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/iter/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/iter/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/gen_pyi.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/gen_pyi.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/datapipe.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/datapipe.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/dataframe/structures.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/dataframe/structures.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/dataframe/datapipes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/dataframe/datapipes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/dataframe/dataframes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/dataframe/dataframes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/dataframe/dataframe_wrapper.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/dataframe/dataframe_wrapper.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/dataframe/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/dataframe/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/_typing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/_typing.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/_hook_iterator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/_hook_iterator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/_decorator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/_decorator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/datapipes/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/datapipes/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/dataloader.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/dataloader.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/backward_compatibility.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/backward_compatibility.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/_utils/worker.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/_utils/worker.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/_utils/signal_handling.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/_utils/signal_handling.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/_utils/pin_memory.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/_utils/pin_memory.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/_utils/fetch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/_utils/fetch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/_utils/collate.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/_utils/collate.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/_utils/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/_utils/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/data/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/data/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/cpp_extension.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/cpp_extension.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/cpp_backtrace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/cpp_backtrace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/collect_env.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/collect_env.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/checkpoint.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/checkpoint.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/bundled_inputs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/bundled_inputs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/bottleneck/__main__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/bottleneck/__main__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/bottleneck/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/bottleneck/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/valgrind_wrapper/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/valgrind_wrapper/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/timer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/timer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/sparse_fuzzer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/sparse_fuzzer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/fuzzer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/fuzzer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/cpp_jit.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/cpp_jit.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/compile.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/compile.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/compare.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/compare.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/_stubs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/_stubs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/utils/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/utils/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/op_fuzzers/unary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/op_fuzzers/unary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/op_fuzzers/spectral.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/op_fuzzers/spectral.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/op_fuzzers/sparse_unary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/op_fuzzers/sparse_unary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/op_fuzzers/sparse_binary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/op_fuzzers/sparse_binary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/op_fuzzers/binary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/op_fuzzers/binary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/op_fuzzers/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/op_fuzzers/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/spectral_ops_fuzz_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/spectral_ops_fuzz_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/sparse/op_benchmark.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/sparse/op_benchmark.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/sparse/fuzzer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/sparse/fuzzer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/sparse/compare.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/sparse/compare.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/simple_timeit.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/simple_timeit.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/op_benchmark.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/op_benchmark.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/fuzzer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/fuzzer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/compare.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/compare.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/blas_compare_setup.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/blas_compare_setup.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/examples/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/examples/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/benchmark/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/benchmark/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/backend_registration.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/backend_registration.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/backcompat/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/backcompat/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_zip.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_zip.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_typing_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_typing_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_triton.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_triton.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_traceback.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_traceback.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_sympy/value_ranges.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_sympy/value_ranges.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_sympy/solve.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_sympy/solve.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_sympy/singleton_int.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_sympy/singleton_int.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_sympy/reference.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_sympy/reference.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_sympy/interp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_sympy/interp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_sympy/functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_sympy/functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_sympy/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_sympy/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_stats.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_stats.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_pytree.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_pytree.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_python_dispatch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_python_dispatch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_mode_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_mode_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_import_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_import_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_freeze.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_freeze.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_foreach_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_foreach_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_exposed_in.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_exposed_in.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_device.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_device.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_cxx_pytree.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_cxx_pytree.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_cpp_extension_versioner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_cpp_extension_versioner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_contextlib.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_contextlib.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_content_store.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_content_store.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/_config_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/_config_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/utils/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/utils/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/torch_version.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/torch_version.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/two_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/two_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/triton_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/triton_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/torchbind_impls.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/torchbind_impls.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/test_module/no_future_div.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/test_module/no_future_div.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/test_module/future_div.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/test_module/future_div.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/test_module/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/test_module/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/static_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/static_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/quantization_torch_package_models.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/quantization_torch_package_models.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/optests/make_fx.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/optests/make_fx.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/optests/generate_tests.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/optests/generate_tests.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/optests/fake_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/optests/fake_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/optests/autograd_registration.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/optests/autograd_registration.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/optests/aot_autograd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/optests/aot_autograd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/optests/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/optests/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/refs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/refs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/definitions/special.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/definitions/special.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/definitions/sparse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/definitions/sparse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/definitions/signal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/definitions/signal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/definitions/linalg.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/definitions/linalg.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/definitions/fft.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/definitions/fft.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/definitions/_masked.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/definitions/_masked.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/definitions/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/definitions/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/core.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/core.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/opinfo/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/opinfo/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/logging_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/logging_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/logging_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/logging_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/jit_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/jit_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/jit_metaprogramming_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/jit_metaprogramming_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/inductor_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/inductor_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/hypothesis_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/hypothesis_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/hop_db.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/hop_db.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/generated/annotated_fn_args.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/generated/annotated_fn_args.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/generated/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/generated/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/dynamo_test_failures.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/dynamo_test_failures.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/tensorpipe_rpc_agent_test_fixture.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/tensorpipe_rpc_agent_test_fixture.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/rpc_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/rpc_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/rpc_agent_test_fixture.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/rpc_agent_test_fixture.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/jit/rpc_test_faulty.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/jit/rpc_test_faulty.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/jit/rpc_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/jit/rpc_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/jit/dist_autograd_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/jit/dist_autograd_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/jit/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/jit/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/faulty_rpc_agent_test_fixture.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/faulty_rpc_agent_test_fixture.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/faulty_agent_rpc_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/faulty_agent_rpc_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/examples/reinforcement_learning_rpc_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/examples/reinforcement_learning_rpc_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/examples/parameter_server_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/examples/parameter_server_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/examples/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/examples/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/dist_optimizer_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/dist_optimizer_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/dist_autograd_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/dist_autograd_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/rpc/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/rpc/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/pipeline/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/pipeline/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/pipe_with_ddp_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/pipe_with_ddp_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/nn/api/remote_module_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/nn/api/remote_module_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/nn/api/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/nn/api/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/nn/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/nn/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/multi_threaded_pg.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/multi_threaded_pg.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/fake_pg.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/fake_pg.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/distributed_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/distributed_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/distributed_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/distributed_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/ddp_under_dist_autograd_test.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/ddp_under_dist_autograd_test.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/common_state_dict.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/common_state_dict.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/checkpoint_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/checkpoint_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/_tensor/common_dtensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/_tensor/common_dtensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/_tensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/_tensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/_shard/test_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/_shard/test_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/_shard/sharded_tensor/_test_st_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/_shard/sharded_tensor/_test_st_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/_shard/sharded_tensor/_test_ops_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/_shard/sharded_tensor/_test_ops_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/_shard/sharded_tensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/_shard/sharded_tensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/_shard/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/_shard/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/distributed/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/distributed/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/dist_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/dist_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/data/network2.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/data/network2.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/data/network1.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/data/network1.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/data/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/data/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/custom_op_db.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/custom_op_db.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/composite_compliance.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/composite_compliance.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_subclass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_subclass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_quantized.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_quantized.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_quantization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_quantization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_pruning.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_pruning.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_optimizers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_optimizers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_nn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_nn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_modules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_modules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_mkldnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_mkldnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_methods_invocations.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_methods_invocations.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_jit.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_jit.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_fsdp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_fsdp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_dtype.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_dtype.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_distributed.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_distributed.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_dist_composable.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_dist_composable.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_device_type.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_device_type.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/common_cuda.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/common_cuda.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/codegen/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/codegen/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/check_kernel_launches.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/check_kernel_launches.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/autograd_function_db.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/autograd_function_db.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/autocast_test_lists.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/autocast_test_lists.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_internal/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_internal/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_creation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_creation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/_comparison.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/_comparison.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/testing/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/testing/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/storage.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/storage.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/special/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/special/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/sparse/semi_structured.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/sparse/semi_structured.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/sparse/_triton_ops_meta.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/sparse/_triton_ops_meta.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/sparse/_triton_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/sparse/_triton_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/sparse/_semi_structured_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/sparse/_semi_structured_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/sparse/_semi_structured_conversions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/sparse/_semi_structured_conversions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/sparse/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/sparse/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/signal/windows/windows.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/signal/windows/windows.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/signal/windows/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/signal/windows/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/signal/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/signal/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/serialization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/serialization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/return_types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/return_types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/random.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/random.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quasirandom.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quasirandom.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/stubs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/stubs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/quantize_jit.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/quantize_jit.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/quantize_fx.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/quantize_fx.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/quantize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/quantize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/quantization_mappings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/quantization_mappings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/quant_type.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/quant_type.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/qconfig.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/qconfig.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/observer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/observer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/quantization_types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/quantization_types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/quantization_patterns.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/quantization_patterns.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/prepare.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/prepare.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/pattern_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/pattern_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/match_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/match_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/graph_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/graph_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/fusion_patterns.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/fusion_patterns.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/fuse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/fuse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/convert.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/convert.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/_equalize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/_equalize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fx/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/fx/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fuser_method_mappings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/fuser_method_mappings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fuse_modules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/fuse_modules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/fake_quantize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/fake_quantize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/_quantized_conversions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/_quantized_conversions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/_numeric_suite_fx.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/_numeric_suite_fx.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/_numeric_suite.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/_numeric_suite.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/quantization/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/quantization/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/profiler/python_tracer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/profiler/python_tracer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/profiler/profiler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/profiler/profiler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/profiler/itt.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/profiler/itt.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/profiler/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/profiler/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/profiler/_pattern_matcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/profiler/_pattern_matcher.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/profiler/_memory_profiler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/profiler/_memory_profiler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/profiler/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/profiler/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/package_importer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/package/package_importer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/package_exporter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/package/package_exporter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/importer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/package/importer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/glob_group.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/package/glob_group.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/find_file_dependencies.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/package/find_file_dependencies.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/file_structure_representation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/package/file_structure_representation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/analyze/trace_dependencies.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/package/analyze/trace_dependencies.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/analyze/is_from_package.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/package/analyze/is_from_package.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/analyze/find_first_use_of_broken_modules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/package/analyze/find_first_use_of_broken_modules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/analyze/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/package/analyze/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_stdlib.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/package/_stdlib.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_package_unpickler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/package/_package_unpickler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_package_pickler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/package/_package_pickler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_mock.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/package/_mock.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_mangling.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/package/_mangling.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_importlib.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/package/_importlib.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_directory_reader.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/package/_directory_reader.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/_digraph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/package/_digraph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/package/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/package/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/overrides.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/overrides.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/swa_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/optim/swa_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/sparse_adam.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/optim/sparse_adam.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/sgd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/optim/sgd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/rprop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/optim/rprop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/rmsprop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/optim/rmsprop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/radam.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/optim/radam.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/optim/optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/nadam.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/optim/nadam.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/lr_scheduler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/optim/lr_scheduler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/lbfgs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/optim/lbfgs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/asgd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/optim/asgd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/adamw.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/optim/adamw.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/adamax.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/optim/adamax.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/adam.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/optim/adam.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/adagrad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/optim/adagrad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/adadelta.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/optim/adadelta.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/_multi_tensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/optim/_multi_tensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/_functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/optim/_functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/optim/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/optim/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/verification.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/verification.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset9.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset9.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset8.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset8.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset7.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset7.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset20.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset20.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset19.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset19.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset18.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset18.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset17.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset17.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset16.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset16.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset15.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset15.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset14.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset14.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset13.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset13.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset12.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset12.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset11.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset11.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_opset10.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_opset10.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_helper.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_helper.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/symbolic_caffe2.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/symbolic_caffe2.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/operators.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/operators.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/errors.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/errors.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_type_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_type_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_onnx_supported_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_onnx_supported_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/registration.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/registration.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/onnxruntime.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/onnxruntime.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/onnx_proto_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/onnx_proto_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/jit_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/jit_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/io_adapter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/io_adapter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/type_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/type_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/torch_export_graph_extractor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/torch_export_graph_extractor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/serialization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/serialization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/registration.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/registration.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/patcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/patcher.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/virtualization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/virtualization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/type_promotion.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/type_promotion.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/readability.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/readability.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/modularization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/modularization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/functionalization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/functionalization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/decomp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/decomp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/passes/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/passes/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/op_validation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/op_validation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/onnxfunction_dispatcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/onnxfunction_dispatcher.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/fx_symbolic_graph_extractor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/fx_symbolic_graph_extractor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/fx_onnx_interpreter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/fx_onnx_interpreter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/dynamo_graph_extractor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/dynamo_graph_extractor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/diagnostics.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/diagnostics.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/decomposition_table.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/decomposition_table.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/decomposition_skip.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/decomposition_skip.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/analysis/unsupported_nodes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/analysis/unsupported_nodes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/analysis/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/analysis/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/fx/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/fx/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/exporter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/exporter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/version.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/version.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_web_response.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_web_response.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_web_request.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_web_request.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_version_control_details.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_version_control_details.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_translation_metadata.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_translation_metadata.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_tool_component_reference.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_tool_component_reference.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_tool_component.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_tool_component.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_tool.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_tool.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_thread_flow_location.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_thread_flow_location.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_thread_flow.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_thread_flow.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_suppression.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_suppression.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_stack_frame.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_stack_frame.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_stack.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_stack.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_special_locations.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_special_locations.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_sarif_log.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_sarif_log.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_run_automation_details.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_run_automation_details.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_run.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_run.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_result_provenance.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_result_provenance.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_result.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_result.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_descriptor_relationship.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_descriptor_relationship.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_descriptor_reference.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_descriptor_reference.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_descriptor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_descriptor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_configuration.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_reporting_configuration.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_replacement.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_replacement.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_region.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_region.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_rectangle.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_rectangle.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_property_bag.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_property_bag.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_physical_location.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_physical_location.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_notification.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_notification.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_node.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_node.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_multiformat_message_string.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_multiformat_message_string.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_message.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_message.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_logical_location.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_logical_location.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_location_relationship.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_location_relationship.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_location.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_location.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_invocation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_invocation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_graph_traversal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_graph_traversal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_fix.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_fix.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_external_property_file_references.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_external_property_file_references.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_external_property_file_reference.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_external_property_file_reference.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_external_properties.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_external_properties.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_exception.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_exception.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_edge_traversal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_edge_traversal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_edge.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_edge.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_conversion.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_conversion.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_configuration_override.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_configuration_override.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_code_flow.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_code_flow.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_attachment.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_attachment.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_artifact_location.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_artifact_location.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_artifact_content.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_artifact_content.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_artifact_change.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_artifact_change.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_artifact.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_artifact.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/_address.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/_address.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/sarif/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/sarif/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/formatter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/formatter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/decorator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/decorator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/context.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/context.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/_infra.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/_infra.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/infra/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/infra/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/_rules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/_rules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/_diagnostic.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/_diagnostic.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/diagnostics/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/diagnostics/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/_beartype.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/_beartype.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_internal/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_internal/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_globals.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_globals.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_exporter_states.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_exporter_states.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_experimental.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_experimental.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_deprecation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_deprecation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/_constants.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/_constants.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/onnx/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/onnx/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/weight_norm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/weight_norm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/stateless.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/stateless.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/spectral_norm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/spectral_norm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/prune.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/prune.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/parametrize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/parametrize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/parametrizations.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/parametrizations.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/memory_format.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/memory_format.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/init.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/init.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/fusion.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/fusion.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/convert_parameters.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/convert_parameters.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/clip_grad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/clip_grad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_per_sample_grad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_per_sample_grad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_named_member_accessor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_named_member_accessor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/linear_expanded_weights.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/linear_expanded_weights.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/layer_norm_expanded_weights.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/layer_norm_expanded_weights.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/instance_norm_expanded_weights.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/instance_norm_expanded_weights.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/group_norm_expanded_weights.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/group_norm_expanded_weights.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/expanded_weights_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/expanded_weights_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/expanded_weights_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/expanded_weights_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/embedding_expanded_weights.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/embedding_expanded_weights.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/conv_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/conv_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/conv_expanded_weights.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/conv_expanded_weights.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_expanded_weights/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_expanded_weights/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/_deprecation_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/_deprecation_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/utils/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/utils/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/normalization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/normalization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/functional_modules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/functional_modules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/embedding_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/embedding_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/dropout.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/dropout.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/batchnorm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/batchnorm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/activation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/activation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/dynamic/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/dynamic/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/dynamic/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/dynamic/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/dynamic/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/dynamic/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/dynamic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/dynamic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/dynamic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/dynamic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/_reference/modules/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/_reference/modules/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/_reference/modules/sparse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/_reference/modules/sparse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/_reference/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/_reference/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/_reference/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/_reference/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/_reference/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/_reference/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/_reference/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/_reference/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/_reference/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/_reference/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantized/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantized/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantizable/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantizable/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantizable/modules/activation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantizable/modules/activation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantizable/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantizable/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/quantizable/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/quantizable/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/qat/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/modules/embedding_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/qat/modules/embedding_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/qat/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/qat/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/dynamic/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/qat/dynamic/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/dynamic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/qat/dynamic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/dynamic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/qat/dynamic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/qat/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/qat/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parameter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/parameter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/scatter_gather.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/scatter_gather.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/replicate.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/replicate.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/parallel_apply.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/parallel_apply.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/distributed.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/distributed.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/data_parallel.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/data_parallel.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/comm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/comm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/_functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/_functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/parallel/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/parallel/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/upsampling.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/upsampling.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/transformer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/transformer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/sparse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/sparse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/pooling.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/pooling.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/pixelshuffle.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/pixelshuffle.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/padding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/padding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/normalization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/normalization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/loss.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/loss.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/lazy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/lazy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/instancenorm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/instancenorm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/fold.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/fold.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/flatten.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/flatten.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/dropout.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/dropout.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/distance.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/distance.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/container.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/container.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/channelshuffle.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/channelshuffle.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/batchnorm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/batchnorm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/adaptive.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/adaptive.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/activation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/activation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/_functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/_functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/modules/linear_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/modules/linear_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/modules/conv_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/modules/conv_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/modules/bn_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/modules/bn_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/dynamic/modules/linear_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/dynamic/modules/linear_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/dynamic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/dynamic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/dynamic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/dynamic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/quantized/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/quantized/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/qat/modules/linear_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/qat/modules/linear_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/qat/modules/linear_fused.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/qat/modules/linear_fused.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/qat/modules/conv_fused.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/qat/modules/conv_fused.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/qat/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/qat/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/qat/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/qat/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/modules/fused.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/modules/fused.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/intrinsic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/intrinsic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/init.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/init.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/grad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/grad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/cpp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/cpp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/common_types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/common_types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/backends/thnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/backends/thnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/backends/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/backends/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/attention/bias.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/attention/bias.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/attention/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/attention/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/attention/_templated_attention.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/attention/_templated_attention.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/attention/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/attention/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/_reduction.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/_reduction.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nn/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nn/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nested/_internal/sdpa.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nested/_internal/sdpa.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nested/_internal/ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nested/_internal/ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nested/_internal/nested_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nested/_internal/nested_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nested/_internal/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nested/_internal/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/nested/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/nested/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/multiprocessing/spawn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/multiprocessing/spawn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/multiprocessing/reductions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/multiprocessing/reductions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/multiprocessing/queue.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/multiprocessing/queue.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/multiprocessing/pool.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/multiprocessing/pool.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/multiprocessing/_atfork.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/multiprocessing/_atfork.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/multiprocessing/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/multiprocessing/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/mps/profiler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/mps/profiler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/mps/event.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/mps/event.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/mps/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/mps/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/monitor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/monitor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/unary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/unary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/reductions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/reductions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/passthrough.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/passthrough.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/creation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/creation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/core.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/core.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/binary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/binary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/_ops_refs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/_ops_refs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/maskedtensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/masked/maskedtensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/masked/_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/_docs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/masked/_docs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/masked/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/masked/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/linalg/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/linalg/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/library.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/library.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/unsupported_tensor_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/unsupported_tensor_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/supported_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/supported_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/quantized.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/quantized.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/mobile/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/mobile/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/generate_bytecode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/generate_bytecode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/frontend.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/frontend.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/annotations.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/annotations.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_trace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/_trace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_state.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/_state.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_shape_functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/_shape_functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_serialization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/_serialization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_script.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/_script.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_recursive.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/_recursive.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_pickle.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/_pickle.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_passes/_property_propagation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/_passes/_property_propagation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_passes/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/_passes/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_monkeytype_config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/_monkeytype_config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_logging.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/_logging.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_ir_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/_ir_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_fuser.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/_fuser.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_freeze.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/_freeze.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_decompositions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/_decompositions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_decomposition_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/_decomposition_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_dataclass_impls.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/_dataclass_impls.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_check.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/_check.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_builtins.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/_builtins.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_await.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/_await.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/_async.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/_async.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/jit/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/jit/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/hub.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/hub.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/traceback.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/traceback.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/tensor_type.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/tensor_type.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/subgraph_rewriter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/subgraph_rewriter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/proxy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/proxy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/utils/source_matcher_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/utils/source_matcher_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/utils/matcher_with_name_node_map_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/utils/matcher_with_name_node_map_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/utils/matcher_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/utils/matcher_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/utils/fuser_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/utils/fuser_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/utils/common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/utils/common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/utils/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/utils/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/tools_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/tools_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/tests/test_pass_manager.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/tests/test_pass_manager.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/tests/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/tests/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/splitter_base.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/splitter_base.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/split_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/split_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/split_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/split_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/shape_prop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/shape_prop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/reinplace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/reinplace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/pass_manager.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/pass_manager.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/param_fetch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/param_fetch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/operator_support.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/operator_support.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/net_min_base.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/net_min_base.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/infra/pass_manager.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/infra/pass_manager.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/infra/pass_base.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/infra/pass_base.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/infra/partitioner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/infra/partitioner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/infra/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/infra/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/graph_manipulation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/graph_manipulation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/graph_drawer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/graph_drawer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/fake_tensor_prop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/fake_tensor_prop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/dialect/common/cse_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/dialect/common/cse_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/dialect/common/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/dialect/common/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/dialect/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/dialect/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/backends/cudagraphs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/backends/cudagraphs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/backends/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/backends/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/annotate_getitem_nodes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/annotate_getitem_nodes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/passes/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/passes/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/operator_schemas.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/operator_schemas.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/node.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/node.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/interpreter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/interpreter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/immutable_collections.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/immutable_collections.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/graph_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/graph_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/validator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/validator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unify_refinements.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unify_refinements.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/variable.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/variable.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/unification_tools.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/unification_tools.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/multipledispatch/variadic.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/multipledispatch/variadic.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/multipledispatch/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/multipledispatch/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/multipledispatch/dispatcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/multipledispatch/dispatcher.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/multipledispatch/core.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/multipledispatch/core.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/multipledispatch/conflict.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/multipledispatch/conflict.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/multipledispatch/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/multipledispatch/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/more.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/more.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/match.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/match.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/dispatch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/dispatch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/core.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/core.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/unification/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/unification/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/symbolic_shapes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/symbolic_shapes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/sym_node.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/sym_node.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/shape_inference/infer_symbol_values.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/shape_inference/infer_symbol_values.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/shape_inference/infer_shape.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/shape_inference/infer_shape.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/schema_type_annotation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/schema_type_annotation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/rewriter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/rewriter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/refinement_types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/refinement_types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/recording.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/recording.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/proxy_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/proxy_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/partitioner_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/partitioner_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/optimization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/optimization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/normalize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/normalize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/z3_types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/z3_types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/util.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/util.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/transform_to_z3.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/transform_to_z3.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/operation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/operation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/constraint_transformation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/constraint_transformation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/constraint_generator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/constraint_generator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/constraint.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/constraint.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/migrate_gradual_types/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/migrate_gradual_types/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/meta_tracer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/meta_tracer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/merge_matmul.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/merge_matmul.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/graph_gradual_typechecker.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/graph_gradual_typechecker.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/debug.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/debug.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/const_fold.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/const_fold.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/accelerator_partitioner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/accelerator_partitioner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/_sym_dispatch_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/_sym_dispatch_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/_config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/_config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/_backward_state.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/_backward_state.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/experimental/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/experimental/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/annotate.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/annotate.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/_symbolic_trace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/_symbolic_trace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/_pytree.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/_pytree.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/_lazy_graph_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/_lazy_graph_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/_compatibility.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/_compatibility.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fx/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fx/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/futures/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/futures/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/func/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/func/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/fft/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/fft/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/unflatten.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/export/unflatten.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/graph_signature.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/export/graph_signature.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/exported_program.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/export/exported_program.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/dynamic_shapes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/export/dynamic_shapes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/custom_obj.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/export/custom_obj.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/_unlift.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/export/_unlift.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/_tree_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/export/_tree_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/_trace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/export/_trace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/_safeguard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/export/_safeguard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/_remove_effect_tokens_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/export/_remove_effect_tokens_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/_remove_auto_functionalized_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/export/_remove_auto_functionalized_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/export/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/export/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/wishart.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/wishart.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/weibull.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/weibull.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/von_mises.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/von_mises.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/uniform.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/uniform.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/transforms.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/transforms.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/transformed_distribution.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/transformed_distribution.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/studentT.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/studentT.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/relaxed_categorical.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/relaxed_categorical.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/relaxed_bernoulli.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/relaxed_bernoulli.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/poisson.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/poisson.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/pareto.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/pareto.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/one_hot_categorical.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/one_hot_categorical.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/normal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/normal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/negative_binomial.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/negative_binomial.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/multivariate_normal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/multivariate_normal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/multinomial.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/multinomial.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/mixture_same_family.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/mixture_same_family.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/lowrank_multivariate_normal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/lowrank_multivariate_normal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/logistic_normal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/logistic_normal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/log_normal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/log_normal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/lkj_cholesky.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/lkj_cholesky.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/laplace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/laplace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/kumaraswamy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/kumaraswamy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/kl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/kl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/inverse_gamma.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/inverse_gamma.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/independent.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/independent.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/half_normal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/half_normal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/half_cauchy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/half_cauchy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/gumbel.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/gumbel.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/geometric.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/geometric.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/gamma.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/gamma.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/fishersnedecor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/fishersnedecor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/exponential.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/exponential.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/exp_family.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/exp_family.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/distribution.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/distribution.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/dirichlet.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/dirichlet.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/continuous_bernoulli.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/continuous_bernoulli.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/constraints.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/constraints.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/constraint_registry.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/constraint_registry.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/chi2.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/chi2.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/cauchy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/cauchy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/categorical.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/categorical.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/binomial.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/binomial.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/beta.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/beta.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/bernoulli.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/bernoulli.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributions/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributions/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/style.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/style.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/loss.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/loss.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/input_reshard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/input_reshard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/fsdp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/fsdp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/ddp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/ddp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/_data_parallel_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/_data_parallel_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/parallel/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/parallel/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/tensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/tensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/run.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/run.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/server_process_global_profiler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/server_process_global_profiler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/rref_proxy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/rref_proxy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/options.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/options.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/internal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/internal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/constants.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/constants.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/backend_registry.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/backend_registry.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/_testing/faulty_agent_backend_registry.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/_testing/faulty_agent_backend_registry.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/_testing/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/_testing/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rpc/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/rpc/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/rendezvous.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/rendezvous.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/remote_device.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/remote_device.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/worker.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/worker.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/stream.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/stream.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/skip/tracker.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/skip/tracker.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/skip/skippable.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/skip/skippable.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/skip/portal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/skip/portal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/skip/namespace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/skip/namespace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/skip/layout.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/skip/layout.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/skip/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/skip/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/pipeline.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/pipeline.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/pipe.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/pipe.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/phony.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/phony.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/microbatch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/microbatch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/dependency.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/dependency.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/copy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/copy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/checkpoint.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/checkpoint.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/batchnorm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/batchnorm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/_balance/profile.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/_balance/profile.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/_balance/blockpartition.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/_balance/blockpartition.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/_balance/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/_balance/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/sync/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/sync/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/pipeline/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/pipeline/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/zero_redundancy_optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/zero_redundancy_optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/post_localSGD_optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/post_localSGD_optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/named_optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/named_optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_sgd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_sgd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_rprop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_rprop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_rmsprop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_rmsprop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_adamw.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_adamw.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_adamax.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_adamax.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_adam.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_adam.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_adagrad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_adagrad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/functional_adadelta.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/functional_adadelta.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/apply_optimizer_in_backward.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/apply_optimizer_in_backward.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/optim/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/optim/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/jit/templates/remote_module_template.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/jit/templates/remote_module_template.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/jit/templates/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/jit/templates/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/jit/instantiator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/jit/instantiator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/jit/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/jit/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/api/remote_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/api/remote_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/api/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/api/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/nn/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/nn/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/logging_handlers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/logging_handlers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/launcher/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/launcher/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/launcher/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/launcher/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/launch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/launch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/wrap.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/wrap.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/sharded_grad_scaler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/sharded_grad_scaler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/fully_sharded_data_parallel.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/fully_sharded_data_parallel.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_wrap_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_wrap_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_unshard_param_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_unshard_param_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_traversal_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_traversal_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_trace_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_trace_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_state_dict_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_state_dict_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_shard_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_shard_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_runtime_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_runtime_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_optim_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_optim_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_limiter_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_limiter_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_init_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_init_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_fsdp_extensions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_fsdp_extensions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_flat_param.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_flat_param.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_exec_order_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_exec_order_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_dynamo_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_dynamo_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_debug_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_debug_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/_common_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/_common_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/fsdp/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/fsdp/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/examples/memory_tracker_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/examples/memory_tracker_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/store.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/store.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/logging.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/logging.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/log_level.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/log_level.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/distributed.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/distributed.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/data/elastic_distributed_sampler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/data/elastic_distributed_sampler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/data/cycling_iterator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/data/cycling_iterator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/data/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/data/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/utils/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/utils/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/timer/local_timer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/timer/local_timer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/timer/file_based_local_timer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/timer/file_based_local_timer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/timer/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/timer/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/timer/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/timer/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/static_tcp_rendezvous.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/static_tcp_rendezvous.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/registry.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/registry.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/etcd_store.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/etcd_store.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/etcd_server.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/etcd_server.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/etcd_rendezvous_backend.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/etcd_rendezvous_backend.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/etcd_rendezvous.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/etcd_rendezvous.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/dynamic_rendezvous.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/dynamic_rendezvous.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/c10d_rendezvous_backend.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/c10d_rendezvous_backend.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/rendezvous/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/rendezvous/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/tail_log.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/tail_log.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/subprocess_handler/subprocess_handler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/subprocess_handler/subprocess_handler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/subprocess_handler/handlers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/subprocess_handler/handlers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/subprocess_handler/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/subprocess_handler/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/redirects.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/redirects.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/errors/handlers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/errors/handlers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/errors/error_handler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/errors/error_handler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/errors/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/errors/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/multiprocessing/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/multiprocessing/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/metrics/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/metrics/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/metrics/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/metrics/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/events/handlers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/events/handlers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/events/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/events/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/events/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/events/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/agent/server/local_elastic_agent.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/agent/server/local_elastic_agent.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/agent/server/health_check_server.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/agent/server/health_check_server.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/agent/server/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/agent/server/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/agent/server/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/agent/server/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/agent/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/agent/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/elastic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/elastic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/distributed_c10d.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/distributed_c10d.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/device_mesh.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/device_mesh.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/constants.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/constants.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/collective_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/collective_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/storage.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/storage.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/stateful.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/stateful.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/state_dict_saver.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/state_dict_saver.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/state_dict_loader.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/state_dict_loader.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/state_dict.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/state_dict.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/resharding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/resharding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/planner_helpers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/planner_helpers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/planner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/planner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/metadata.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/metadata.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/logging_handlers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/logging_handlers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/logger.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/logger.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/format_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/format_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/filesystem.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/filesystem.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/examples/stateful_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/examples/stateful_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/examples/fsdp_checkpoint_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/examples/fsdp_checkpoint_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/examples/async_checkpointing_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/examples/async_checkpointing_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/default_planner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/default_planner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_traverse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_traverse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_storage_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_storage_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_sharded_tensor_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_sharded_tensor_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_nested_dict.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_nested_dict.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_fsspec_filesystem.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_fsspec_filesystem.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_dedup_tensors.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_dedup_tensors.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_dedup_save_plans.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_dedup_save_plans.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/_checkpointer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/_checkpointer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/checkpoint/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/checkpoint/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/c10d_logger.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/c10d_logger.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/benchmarks/benchmark_ddp_rpc.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/benchmarks/benchmark_ddp_rpc.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/autograd/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/autograd/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/argparse_util.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/argparse_util.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/model_averaging/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/model_averaging/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/model_averaging/averagers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/model_averaging/averagers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/model_averaging/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/model_averaging/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/join.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/join.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/quantization_hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/quantization_hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/post_localSGD_hook.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/post_localSGD_hook.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/optimizer_overlap_hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/optimizer_overlap_hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/mixed_precision_hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/mixed_precision_hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/debugging_hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/debugging_hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/ddp_zero_hook.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/ddp_zero_hook.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/ddp_comm_hooks/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/ddp_comm_hooks/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_quantization/quantization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_quantization/quantization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_quantization/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_quantization/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_optimizer_overlap/optimizer_overlap.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_optimizer_overlap/optimizer_overlap.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_optimizer_overlap/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_optimizer_overlap/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_comm_hooks/default_hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_comm_hooks/default_hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_comm_hooks/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_comm_hooks/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/_checkpoint/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/_checkpoint/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/algorithms/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/algorithms/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tools/memory_tracker.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tools/memory_tracker.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tools/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tools/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/tp_conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/tp_conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/sharding_prop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/sharding_prop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/redistribute.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/redistribute.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/random.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/random.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/placement_types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/placement_types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/view_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/view_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/tensor_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/tensor_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/random_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/random_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/pointwise_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/pointwise_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/matrix_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/matrix_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/math_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/math_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/experimental_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/experimental_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/embedding_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/embedding_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/conv_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/conv_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/common_rules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/common_rules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/basic_strategy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/basic_strategy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/ops/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/ops/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/op_schema.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/op_schema.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/experimental/tp_transform.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/experimental/tp_transform.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/experimental/attention.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/experimental/attention.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/experimental/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/experimental/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/examples/visualize_sharding_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/examples/visualize_sharding_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/examples/torchrec_sharding_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/examples/torchrec_sharding_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/examples/convnext_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/examples/convnext_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/examples/checkpoint_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/examples/checkpoint_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/dispatch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/dispatch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/device_mesh.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/device_mesh.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/debug/visualize_sharding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/debug/visualize_sharding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/debug/op_coverage.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/debug/op_coverage.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/debug/comm_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/debug/comm_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/debug/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/debug/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/_collective_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/_collective_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_tensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_tensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_state_dict_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_state_dict_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/partial_lower.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/partial_lower.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/parallel_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/parallel_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/log_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/log_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/iter_graph_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/iter_graph_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/graph_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/graph_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/graph_optimization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/graph_optimization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/gm_transformation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/gm_transformation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/experimental_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/experimental_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/distribute.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/distribute.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/data_parallel.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/data_parallel.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/comm_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/comm_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/batch_dim_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/batch_dim_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_spmd/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_spmd/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_sharding_spec/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_sharding_spec/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_sharded_tensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_sharded_tensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/embedding_bag.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/embedding_bag.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/embedding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/embedding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/chunk_sharding_spec_ops/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/chunk_sharding_spec.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/chunk_sharding_spec.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/_internals.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/_internals.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_spec/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_spec/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_plan/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_plan/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharding_plan/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharding_plan/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharder.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharder.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/shard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/shard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/reshard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/reshard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/metadata.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/metadata.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/logging_handlers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/logging_handlers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/logger.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/logger.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/_ops/tensor_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/_ops/tensor_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/_ops/misc_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/_ops/misc_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/_ops/init.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/_ops/init.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/_ops/binary_cmp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/_ops/binary_cmp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/_ops/_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/_ops/_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/_ops/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/_ops/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_tensor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_tensor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_optim/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_optim/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/sharded_optim/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/sharded_optim/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/op_registry_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/op_registry_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/metadata.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/metadata.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/common_op_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/common_op_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/checkpoint/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/checkpoint/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_shard/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_shard/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_functional_collectives_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_functional_collectives_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_functional_collectives.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_functional_collectives.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable_state.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable_state.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/replicate.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/replicate.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fully_shard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fully_shard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/fully_shard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/fully_shard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/_fsdp_state.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/_fsdp_state.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/_fsdp_param_group.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/_fsdp_param_group.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/_fsdp_param.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/_fsdp_param.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/_fsdp_init.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/_fsdp_init.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/_fsdp_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/_fsdp_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/_fsdp_collectives.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/_fsdp_collectives.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/_fsdp_api.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/_fsdp_api.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/fsdp/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/fsdp/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/contract.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/contract.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/checkpoint_activation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/checkpoint_activation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/_composable/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/_composable/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/distributed/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/distributed/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/streams.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cuda/streams.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/sparse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cuda/sparse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/random.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cuda/random.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/profiler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cuda/profiler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/nvtx.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cuda/nvtx.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/nccl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cuda/nccl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/memory.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cuda/memory.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/jiterator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cuda/jiterator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/graphs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cuda/graphs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/error.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cuda/error.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/comm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cuda/comm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/amp/grad_scaler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cuda/amp/grad_scaler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/amp/common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cuda/amp/common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/amp/autocast_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cuda/amp/autocast_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/amp/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cuda/amp/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cuda/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/_sanitizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cuda/_sanitizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/_memory_viz.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cuda/_memory_viz.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/_gpu_trace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cuda/_gpu_trace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cuda/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cuda/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/csrc/lazy/test_mnist.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/csrc/lazy/test_mnist.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/csrc/jit/tensorexpr/scripts/bisect.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/csrc/jit/tensorexpr/scripts/bisect.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/csrc/jit/tensorexpr/codegen_external.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/csrc/jit/tensorexpr/codegen_external.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cpu/amp/grad_scaler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cpu/amp/grad_scaler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cpu/amp/autocast_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cpu/amp/autocast_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cpu/amp/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cpu/amp/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/cpu/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/cpu/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/contrib/_tensorboard_vis.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/contrib/_tensorboard_vis.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/contrib/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/contrib/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/compiler/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/compiler/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/xnnpack/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/backends/xnnpack/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/xeon/run_cpu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/backends/xeon/run_cpu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/xeon/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/backends/xeon/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/quantized/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/backends/quantized/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/opt_einsum/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/backends/opt_einsum/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/openmp/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/backends/openmp/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/nnpack/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/backends/nnpack/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/mps/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/backends/mps/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/mkldnn/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/backends/mkldnn/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/mkl/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/backends/mkl/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/mha/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/backends/mha/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/cudnn/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/backends/cudnn/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/cudnn/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/backends/cudnn/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/cuda/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/backends/cuda/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/cpu/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/backends/cpu/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/_nnapi/serializer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/backends/_nnapi/serializer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/_nnapi/prepare.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/backends/_nnapi/prepare.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/_nnapi/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/backends/_nnapi/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/_coreml/preprocess.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/backends/_coreml/preprocess.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/_coreml/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/backends/_coreml/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/backends/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/backends/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/variable.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/autograd/variable.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/profiler_util.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/autograd/profiler_util.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/profiler_legacy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/autograd/profiler_legacy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/profiler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/autograd/profiler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/autograd/graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/gradcheck.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/autograd/gradcheck.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/grad_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/autograd/grad_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/autograd/functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/autograd/function.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/forward_ad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/autograd/forward_ad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/anomaly_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/autograd/anomaly_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/_functions/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/autograd/_functions/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/_functions/tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/autograd/_functions/tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/_functions/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/autograd/_functions/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/autograd/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/autograd/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/stubs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/stubs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/xnnpack_quantizer_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/xnnpack_quantizer_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/xnnpack_quantizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/xnnpack_quantizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/x86_inductor_quantizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/x86_inductor_quantizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/quantizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/quantizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/embedding_quantizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/embedding_quantizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/composable_quantizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/composable_quantizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantizer/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantizer/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantize_pt2e.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantize_pt2e.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantize_jit.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantize_jit.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantize_fx.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantize_fx.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quantization_mappings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quantization_mappings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/quant_type.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/quant_type.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/qconfig_mapping.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/qconfig_mapping.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/qconfig.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/qconfig.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/representation/rewrite.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/representation/rewrite.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/representation/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/representation/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/qat_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/qat_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/prepare.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/prepare.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/port_metadata_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/port_metadata_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/graph_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/graph_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/generate_numeric_debug_handle.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/generate_numeric_debug_handle.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/export_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/export_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/duplicate_dq_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/duplicate_dq_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/pt2e/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/pt2e/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/observer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/observer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/tracer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/tracer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/quantize_handler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/quantize_handler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/qconfig_mapping_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/qconfig_mapping_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/prepare.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/prepare.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/pattern_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/pattern_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/match_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/match_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/lstm_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/lstm_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/lower_to_qnnpack.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/lower_to_qnnpack.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/lower_to_fbgemm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/lower_to_fbgemm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/graph_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/graph_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/fuse_handler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/fuse_handler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/fuse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/fuse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/custom_config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/custom_config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/convert.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/convert.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_model_report/model_report_visualizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_model_report/model_report_visualizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_model_report/model_report_observer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_model_report/model_report_observer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_model_report/model_report.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_model_report/model_report.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_model_report/detector.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_model_report/detector.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_model_report/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_model_report/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_lower_to_native_backend.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_lower_to_native_backend.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_equalize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_equalize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/_decomposed.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/_decomposed.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fx/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fx/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fuser_method_mappings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fuser_method_mappings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fuse_modules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fuse_modules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/fake_quantize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/fake_quantize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/quantizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/quantizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/qconfig.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/qconfig.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/observer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/observer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/fake_quantize_function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/fake_quantize_function.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/fake_quantize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/fake_quantize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/apot_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/apot_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/experimental/APoT_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/experimental/APoT_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/x86.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/x86.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/tensorrt.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/tensorrt.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/qnnpack.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/qnnpack.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/onednn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/onednn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/observation_type.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/observation_type.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/native.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/native.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/fbgemm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/fbgemm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/executorch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/executorch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/backend_config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/backend_config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/_qnnpack_pt2e.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/_qnnpack_pt2e.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/_common_operator_config_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/_common_operator_config_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/backend_config/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/backend_config/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/_learnable_fake_quantize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/_learnable_fake_quantize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/_equalize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/_equalize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/_correct_bias.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/_correct_bias.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/quantization/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/quantization/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/sparsifier/weight_norm_sparsifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/sparsifier/weight_norm_sparsifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/sparsifier/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/sparsifier/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/sparsifier/nearly_diagonal_sparsifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/sparsifier/nearly_diagonal_sparsifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/sparsifier/base_sparsifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/sparsifier/base_sparsifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/sparsifier/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/sparsifier/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/scheduler/lambda_scheduler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/scheduler/lambda_scheduler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/scheduler/cubic_scheduler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/scheduler/cubic_scheduler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/scheduler/base_scheduler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/scheduler/base_scheduler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/scheduler/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/scheduler/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_mappings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_mappings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/saliency_pruner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/saliency_pruner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/prune_functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/prune_functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/parametrization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/parametrization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/match_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/match_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/lstm_saliency_pruner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/lstm_saliency_pruner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/base_structured_sparsifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/base_structured_sparsifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/pruner/FPGM_pruner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/pruner/FPGM_pruner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/quantization_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/quantization_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/lightning/tests/test_callbacks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/lightning/tests/test_callbacks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/lightning/callbacks/data_sparsity.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/lightning/callbacks/data_sparsity.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/lightning/callbacks/_data_sparstity_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/lightning/callbacks/_data_sparstity_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/lightning/callbacks/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/lightning/callbacks/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/lightning/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/lightning/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/data_norm_sparsifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/data_norm_sparsifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/evaluate_model_metrics.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/evaluate_model_metrics.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/evaluate_forward_time.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/evaluate_forward_time.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/evaluate_disk_savings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/evaluate_disk_savings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/dlrm_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/benchmarks/dlrm_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/base_data_sparsifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/base_data_sparsifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_sparsifier/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_sparsifier/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_scheduler/base_data_scheduler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_scheduler/base_data_scheduler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/data_scheduler/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/data_scheduler/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/activation_sparsifier/activation_sparsifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/activation_sparsifier/activation_sparsifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/activation_sparsifier/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/activation_sparsifier/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/_experimental/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/_experimental/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/pruning/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/pruning/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/weight_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/weight_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/qconfig_multi_mapping.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/qconfig_multi_mapping.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/pattern_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/pattern_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/ns_types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/ns_types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/n_shadows_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/n_shadows_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/mappings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/mappings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/graph_passes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/graph_passes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/graph_matcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/graph_matcher.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/fx/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/ns/fx/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/_numeric_suite_fx.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/ns/_numeric_suite_fx.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/_numeric_suite.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/ns/_numeric_suite.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/ns/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/ns/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/sparse/quantized/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/sparse/quantized/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/sparse/quantized/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/sparse/quantized/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/sparse/quantized/dynamic/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/sparse/quantized/dynamic/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/sparse/quantized/dynamic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/sparse/quantized/dynamic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/sparse/quantized/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/sparse/quantized/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/sparse/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/sparse/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/reference/modules/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/reference/modules/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/reference/modules/sparse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/reference/modules/sparse.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/reference/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/reference/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/reference/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/reference/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/reference/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/reference/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/reference/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/reference/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/reference/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/reference/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/normalization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/normalization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/functional_modules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/functional_modules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/embedding_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/embedding_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/dropout.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/dropout.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/batchnorm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/batchnorm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/activation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/activation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/dynamic/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/dynamic/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/dynamic/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/dynamic/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/dynamic/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/dynamic/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/dynamic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/dynamic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/dynamic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/dynamic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantized/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantized/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantizable/modules/rnn.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantizable/modules/rnn.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantizable/modules/activation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantizable/modules/activation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantizable/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantizable/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/quantizable/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/quantizable/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/modules/embedding_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/modules/embedding_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/modules/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/modules/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/dynamic/modules/linear.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/dynamic/modules/linear.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/dynamic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/dynamic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/dynamic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/dynamic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/qat/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/qat/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/modules/linear_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/modules/linear_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/modules/conv_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/modules/conv_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/modules/conv_add.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/modules/conv_add.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/modules/bn_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/modules/bn_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/dynamic/modules/linear_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/dynamic/modules/linear_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/dynamic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/dynamic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/dynamic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/dynamic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/quantized/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/quantized/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/qat/modules/linear_relu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/qat/modules/linear_relu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/qat/modules/linear_fused.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/qat/modules/linear_fused.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/qat/modules/conv_fused.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/qat/modules/conv_fused.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/qat/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/qat/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/qat/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/qat/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/modules/fused.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/modules/fused.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/modules/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/modules/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/intrinsic/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/intrinsic/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/nn/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/nn/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/ao/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/ao/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/amp/grad_scaler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/amp/grad_scaler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/amp/autocast_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/amp/autocast_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/amp/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/amp/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_weights_only_unpickler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_weights_only_unpickler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_vmap_internals.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_vmap_internals.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_vendor/packaging/version.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_vendor/packaging/version.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_vendor/packaging/_structures.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_vendor/packaging/_structures.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_vendor/packaging/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_vendor/packaging/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_vendor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_vendor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_utils_internal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_utils_internal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_torch_docs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_torch_docs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_tensor_str.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_tensor_str.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_tensor_docs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_tensor_docs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_subclasses/schema_check_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_subclasses/schema_check_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_subclasses/meta_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_subclasses/meta_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_subclasses/functional_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_subclasses/functional_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_subclasses/fake_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_subclasses/fake_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_subclasses/fake_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_subclasses/fake_tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_subclasses/fake_impls.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_subclasses/fake_impls.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_subclasses/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_subclasses/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_streambase.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_streambase.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_storage_docs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_storage_docs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_sources.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_sources.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_refs/special/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_refs/special/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_refs/nn/functional/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_refs/nn/functional/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_refs/nn/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_refs/nn/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_refs/linalg/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_refs/linalg/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_refs/fft.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_refs/fft.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_refs/_conversions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_refs/_conversions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_refs/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_refs/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_python_dispatcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_python_dispatcher.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_prims_common/wrappers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_prims_common/wrappers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_prims_common/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_prims_common/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_prims/rng_prims.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_prims/rng_prims.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_prims/executor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_prims/executor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_prims/debug_prims.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_prims/debug_prims.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_prims/context.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_prims/context.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_prims/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_prims/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/testing/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_numpy/testing/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/testing/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_numpy/testing/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/random.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_numpy/random.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/linalg.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_numpy/linalg.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/fft.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_numpy/fft.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_util.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_numpy/_util.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_unary_ufuncs_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_numpy/_unary_ufuncs_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_ufuncs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_numpy/_ufuncs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_reductions_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_numpy/_reductions_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_normalizations.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_numpy/_normalizations.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_ndarray.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_numpy/_ndarray.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_getlimits.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_numpy/_getlimits.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_funcs_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_numpy/_funcs_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_funcs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_numpy/_funcs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_dtypes_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_numpy/_dtypes_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_dtypes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_numpy/_dtypes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_casting_dicts.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_numpy/_casting_dicts.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/_binary_ufuncs_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_numpy/_binary_ufuncs_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_numpy/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_numpy/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_namedtensor_internals.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_namedtensor_internals.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_meta_registrations.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_meta_registrations.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lowrank.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_lowrank.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_logging/structured.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_logging/structured.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_logging/_registrations.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_logging/_registrations.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_logging/_internal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_logging/_internal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_logging/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_logging/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lobpcg.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_lobpcg.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_linalg_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_linalg_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_library/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_library/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_library/simple_registry.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_library/simple_registry.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_library/fake_class_registry.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_library/fake_class_registry.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_library/custom_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_library/custom_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_library/autograd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_library/autograd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_library/abstract_impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_library/abstract_impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_library/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_library/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/ts_backend.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_lazy/ts_backend.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/tensor_factory_functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_lazy/tensor_factory_functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/metrics.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_lazy/metrics.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/ir_cache.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_lazy/ir_cache.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/extract_compiled_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_lazy/extract_compiled_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/device_context.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_lazy/device_context.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/debug.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_lazy/debug.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_lazy/config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/computation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_lazy/computation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/closure.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_lazy/closure.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_lazy/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_lazy/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_jit_internal.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_jit_internal.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/wrapper_benchmark.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/wrapper_benchmark.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/virtualized.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/virtualized.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/triton_heuristics.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/triton_heuristics.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/triton_helpers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/triton_helpers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/test_operators.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/test_operators.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/test_case.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/test_case.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/sizevars.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/sizevars.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/select_algorithm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/select_algorithm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/scheduler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/scheduler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/quantized_lowerings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/quantized_lowerings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/pattern_matcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/pattern_matcher.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/optimize_indexing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/optimize_indexing.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/ops_handler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/ops_handler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/mkldnn_lowerings.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/mkldnn_lowerings.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/metrics.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/metrics.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/lowering.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/lowering.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/unpack_mixed_mm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/unpack_mixed_mm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/templated_attention.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/templated_attention.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/mm_plus_mm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/mm_plus_mm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/mm_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/mm_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/mm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/mm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/conv.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/conv.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/bmm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/bmm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/kernel/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/kernel/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/ir.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/ir.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/inductor_prims.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/inductor_prims.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/index_propagation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/index_propagation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/split_cat.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/split_cat.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/mm_pattern.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/mm_pattern.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/bmm_pattern.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/bmm_pattern.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/addmm_pattern.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/addmm_pattern.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_9.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_9.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_8.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_8.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_7.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_7.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_6.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_6.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_5.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_5.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_4.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_4.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_3.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_3.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_2.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_2.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_18.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_18.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_17.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_17.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_16.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_16.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_15.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_15.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_14.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_14.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_13.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_13.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_12.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_12.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_11.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_11.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_10.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_10.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_1.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/_sfdp_pattern_1.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/serialized_patterns/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/serialized_patterns/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/replace_random.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/replace_random.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/reinplace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/reinplace.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/quantization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/quantization.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/pre_grad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/pre_grad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/post_grad.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/post_grad.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/pad_mm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/pad_mm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/numeric_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/numeric_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/mkldnn_fusion.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/mkldnn_fusion.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/misc_patterns.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/misc_patterns.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/joint_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/joint_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/group_batch_fusion.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/group_batch_fusion.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/fuse_attention.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/fuse_attention.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/freezing_patterns.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/freezing_patterns.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/efficient_conv_bn_eval.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/efficient_conv_bn_eval.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/dedupe_symint_uses.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/dedupe_symint_uses.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/decompose_mem_bound_mm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/decompose_mem_bound_mm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/ddp_fusion.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/ddp_fusion.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/binary_folding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/binary_folding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/fx_passes/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/fx_passes/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/freezing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/freezing.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/exc.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/exc.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/dependencies.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/dependencies.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/decomposition.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/decomposition.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/debug.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/debug.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/cudagraph_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/cudagraph_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/cudagraph_trees.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/cudagraph_trees.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/coordinate_descent_tuner.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/coordinate_descent_tuner.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/constant_folding.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/constant_folding.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/compile_fx.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/compile_fx.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/comms.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/comms.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/comm_analysis.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/comm_analysis.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/xpu/device_op_overrides.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/xpu/device_op_overrides.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/xpu/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/xpu/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/wrapper.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/wrapper.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/triton_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/triton_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/triton_split_scan.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/triton_split_scan.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/triton_foreach.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/triton_foreach.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/triton.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/triton.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/multi_kernel.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/multi_kernel.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/memory_planning.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/memory_planning.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda_combined_scheduling.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda_combined_scheduling.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/gemm_template.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/gemm_template.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/device_op_overrides.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/device_op_overrides.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cutlass_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cutlass_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cutlass_lib_extensions/gemm_operation_extensions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cutlass_lib_extensions/gemm_operation_extensions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cutlass_lib_extensions/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cutlass_lib_extensions/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cutlass_epilogue_gen.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cutlass_epilogue_gen.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cuda_template.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cuda_template.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cuda_kernel.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cuda_kernel.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cuda_env.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cuda_env.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/cuda_cpp_scheduling.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/cuda_cpp_scheduling.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cuda/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cuda/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cpp_wrapper_cuda.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cpp_wrapper_cuda.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cpp_wrapper_cpu.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cpp_wrapper_cpu.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/cpp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/cpp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codegen/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codegen/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/codecache.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/codecache.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/bounds.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/bounds.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/autotune_process.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/autotune_process.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_inductor/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_inductor/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/wrap.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/wrap.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/while_loop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/while_loop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/triton_kernel_wrap.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/triton_kernel_wrap.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/torchbind.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/torchbind.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/templated_attention.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/templated_attention.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/strict_mode.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/strict_mode.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/out_dtype.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/out_dtype.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/map.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/map.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/effects.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/effects.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/cond.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/cond.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/auto_functionalize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/auto_functionalize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_higher_order_ops/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_higher_order_ops/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_guards.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_guards.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/vmap.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/vmap.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/top_operators_github_usage.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/top_operators_github_usage.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/pytree_hacks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/pytree_hacks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/python_key.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/python_key.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/pyfunctorch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/pyfunctorch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/partitioners.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/partitioners.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/make_functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/make_functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/fx_minifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/fx_minifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/functional_call.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/functional_call.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/eager_transforms.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/eager_transforms.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/deprecated.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/deprecated.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/compilers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/compilers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/compile_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/compile_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/benchmark_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/benchmark_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/batch_norm_replacement.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/batch_norm_replacement.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/autograd_function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/autograd_function.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/apis.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/apis.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/aot_autograd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/aot_autograd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/traced_function_transforms.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/traced_function_transforms.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/subclass_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/subclass_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/schemas.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/schemas.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/runtime_wrappers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/runtime_wrappers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/logging_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/logging_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/input_output_analysis.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/input_output_analysis.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/functional_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/functional_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/collect_metadata_analysis.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/collect_metadata_analysis.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/_aot_autograd/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/_aot_autograd/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_functorch/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_functorch/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/wrappers.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/wrappers.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/verifier.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/verifier.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/serde/upgrade.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/serde/upgrade.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/serde/union.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/serde/union.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/serde/serialize.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/serde/serialize.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/serde/schema_check.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/serde/schema_check.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/serde/schema.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/serde/schema.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/serde/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/serde/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/replace_view_ops_with_view_copy_ops_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/passes/replace_view_ops_with_view_copy_ops_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/replace_sym_size_ops_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/passes/replace_sym_size_ops_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/replace_set_grad_with_hop_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/passes/replace_set_grad_with_hop_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/remove_runtime_assertions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/passes/remove_runtime_assertions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/lift_constants_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/passes/lift_constants_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/functionalize_side_effectful_ops_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/passes/functionalize_side_effectful_ops_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/collect_tracepoints_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/passes/collect_tracepoints_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/add_runtime_assertions_for_constraints_pass.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/passes/add_runtime_assertions_for_constraints_pass.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/passes/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/passes/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/pass_infra/proxy_value.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/pass_infra/proxy_value.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/pass_infra/node_metadata.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/pass_infra/node_metadata.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/pass_infra/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/pass_infra/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/pass_base.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/pass_base.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/non_strict_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/non_strict_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/exported_program.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/exported_program.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/error.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/error.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/logging.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/logging.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/gen_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/gen_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/user_input_mutation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/user_input_mutation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/type_reflection_method.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/type_reflection_method.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/torch_sym_min.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/torch_sym_min.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/tensor_setattr.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/tensor_setattr.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/static_if.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/static_if.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/static_for_loop.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/static_for_loop.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/specialized_attribute.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/specialized_attribute.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/scalar_output.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/scalar_output.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/pytree_flatten.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/pytree_flatten.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/optional_input.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/optional_input.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/null_context_manager.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/null_context_manager.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/nested_function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/nested_function.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/model_attr_mutation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/model_attr_mutation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/list_unpack.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/list_unpack.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/list_contains.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/list_contains.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/fn_with_kwargs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/fn_with_kwargs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dynamic_shape_view.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dynamic_shape_view.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dynamic_shape_slicing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dynamic_shape_slicing.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dynamic_shape_round.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dynamic_shape_round.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dynamic_shape_map.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dynamic_shape_map.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dynamic_shape_if_guard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dynamic_shape_if_guard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dynamic_shape_constructor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dynamic_shape_constructor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dynamic_shape_assert.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dynamic_shape_assert.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/dictionary.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/dictionary.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/decorator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/decorator.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/constrain_as_value_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/constrain_as_value_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/constrain_as_size_example.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/constrain_as_size_example.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/cond_predicate.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/cond_predicate.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/cond_operands.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/cond_operands.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/cond_closed_over_variable.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/cond_closed_over_variable.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/cond_branch_nonlocal_variables.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/cond_branch_nonlocal_variables.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/cond_branch_nested_function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/cond_branch_nested_function.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/cond_branch_class_method.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/cond_branch_class_method.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/class_method.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/class_method.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/autograd_function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/autograd_function.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/assume_constant_result.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/assume_constant_result.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/examples/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/examples/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/case.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/case.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/db/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/db/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_export/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_export/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/user_defined.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/user_defined.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/torch_function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/torch_function.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/torch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/torch.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/tensor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/sdpa.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/sdpa.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/optimizer.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/optimizer.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/nn_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/nn_module.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/misc.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/misc.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/lists.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/lists.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/lazy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/lazy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/iter.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/iter.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/higher_order_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/higher_order_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/functions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/distributed.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/distributed.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/dicts.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/dicts.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/ctx_manager.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/ctx_manager.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/constant.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/constant.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/builtin.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/builtin.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/builder.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/builder.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/base.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/base.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/variables/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/variables/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/types.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/trace_rules.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/trace_rules.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/testing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/testing.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/test_minifier_common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/test_minifier_common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/test_case.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/test_case.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/tensor_version_op.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/tensor_version_op.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/symbolic_convert.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/symbolic_convert.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/source.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/source.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/side_effects.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/side_effects.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/resume_execution.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/resume_execution.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/repro/after_dynamo.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/repro/after_dynamo.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/repro/after_aot.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/repro/after_aot.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/repro/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/repro/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/replay_record.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/replay_record.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/profiler.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/profiler.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/polyfill.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/polyfill.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/output_graph.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/output_graph.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/mutation_guard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/mutation_guard.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/logging.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/logging.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/hooks.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/hooks.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/guards.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/guards.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/funcname_cache.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/funcname_cache.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/external_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/external_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/exc.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/exc.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/eval_frame.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/eval_frame.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/device_interface.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/device_interface.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/decorators.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/decorators.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/debug_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/debug_utils.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/current_scope_id.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/current_scope_id.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/create_parameter_op.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/create_parameter_op.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/convert_frame.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/convert_frame.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/config.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/comptime.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/comptime.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/compiled_autograd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/compiled_autograd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/codegen.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/codegen.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/code_context.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/code_context.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/callback.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/callback.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/cache_size.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/cache_size.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/bytecode_transformation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/bytecode_transformation.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/bytecode_analysis.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/bytecode_analysis.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/tvm.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/tvm.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/torchxla.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/torchxla.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/tensorrt.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/tensorrt.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/registry.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/registry.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/onnxrt.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/onnxrt.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/inductor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/inductor.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/distributed.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/distributed.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/debugging.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/debugging.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/cudagraphs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/cudagraphs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/common.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/common.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/backends/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/backends/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/_trace_wrapped_higher_order_op.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/_trace_wrapped_higher_order_op.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dynamo/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dynamo/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dispatch/python.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dispatch/python.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_dispatch/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_dispatch/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_deploy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_deploy.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_decomp/decompositions_for_rng.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_decomp/decompositions_for_rng.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_decomp/decompositions_for_jvp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_decomp/decompositions_for_jvp.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_decomp/decompositions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_decomp/decompositions.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_decomp/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_decomp/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_custom_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_custom_ops.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_custom_op/impl.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_custom_op/impl.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_custom_op/functional.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_custom_op/functional.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_custom_op/autograd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_custom_op/autograd.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_custom_op/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_custom_op/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_compile.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_compile.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_classes.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_classes.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_awaits/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_awaits/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_appdirs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_appdirs.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/__init__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/__future__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/__future__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/__config__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/__config__.py + for f in `find ./torch/ -name '*.py'` + install -D -pm 644 ./torch/_VF.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torch/_VF.py ++ find ./torchgen/ -name '*.py' + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/yaml_utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/yaml_utils.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/utils.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/utils.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/static_runtime/generator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/static_runtime/generator.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/static_runtime/gen_static_runtime_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/static_runtime/gen_static_runtime_ops.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/static_runtime/config.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/static_runtime/config.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/static_runtime/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/static_runtime/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/shape_functions/gen_jit_shape_functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/shape_functions/gen_jit_shape_functions.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/selective_build/selector.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/selective_build/selector.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/selective_build/operator.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/selective_build/operator.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/selective_build/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/selective_build/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/operator_versions/gen_mobile_upgraders_constant.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/operator_versions/gen_mobile_upgraders_constant.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/operator_versions/gen_mobile_upgraders.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/operator_versions/gen_mobile_upgraders.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/operator_versions/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/operator_versions/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/native_function_generation.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/native_function_generation.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/model.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/model.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/local.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/local.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/gen_vmap_plumbing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/gen_vmap_plumbing.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/gen_lazy_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/gen_lazy_tensor.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/gen_functionalization_type.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/gen_functionalization_type.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/gen_executorch.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/gen_executorch.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/gen_backend_stubs.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/gen_backend_stubs.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/gen_aoti_c_shim.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/gen_aoti_c_shim.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/gen.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/gen.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/fuse/gen_patterns.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/fuse/gen_patterns.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/parse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/executorch/parse.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/model.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/executorch/model.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/api/unboxing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/executorch/api/unboxing.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/api/types/types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/executorch/api/types/types.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/api/types/signatures.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/executorch/api/types/signatures.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/api/types/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/executorch/api/types/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/api/et_cpp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/executorch/api/et_cpp.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/api/custom_ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/executorch/api/custom_ops.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/api/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/executorch/api/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/executorch/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/executorch/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/dest/ufunc.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/dest/ufunc.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/dest/register_dispatch_key.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/dest/register_dispatch_key.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/dest/native_functions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/dest/native_functions.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/dest/lazy_ts_lowering.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/dest/lazy_ts_lowering.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/dest/lazy_ir.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/dest/lazy_ir.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/dest/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/dest/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/decompositions/gen_jit_decompositions.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/decompositions/gen_jit_decompositions.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/context.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/context.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/code_template.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/code_template.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/unboxing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/api/unboxing.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/ufunc.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/api/ufunc.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/types/types_base.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/api/types/types_base.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/types/types.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/api/types/types.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/types/signatures.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/api/types/signatures.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/types/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/api/types/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/translate.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/api/translate.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/structured.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/api/structured.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/python.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/api/python.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/native.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/api/native.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/meta.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/api/meta.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/lazy.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/api/lazy.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/functionalization.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/api/functionalization.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/dispatcher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/api/dispatcher.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/cpp.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/api/cpp.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/autograd.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/api/autograd.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/api/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/api/__init__.py + for f in `find ./torchgen/ -name '*.py'` + install -D -pm 644 ./torchgen/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./torchgen/__init__.py ++ find ./functorch/ -name '*.py' + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/op_analysis/gen_data.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/op_analysis/gen_data.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/notebooks/_src/plot_per_sample_gradients.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/notebooks/_src/plot_per_sample_gradients.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/notebooks/_src/plot_jacobians_and_hessians.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/notebooks/_src/plot_jacobians_and_hessians.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/notebooks/_src/plot_ensembling.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/notebooks/_src/plot_ensembling.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/experimental/ops.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/experimental/ops.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/experimental/control_flow.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/experimental/control_flow.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/experimental/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/experimental/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/maml_regression/evjang_transforms_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/examples/maml_regression/evjang_transforms_module.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/maml_regression/evjang_transforms.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/examples/maml_regression/evjang_transforms.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/maml_regression/evjang.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/examples/maml_regression/evjang.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/maml_omniglot/support/omniglot_loaders.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/examples/maml_omniglot/support/omniglot_loaders.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/maml_omniglot/maml-omniglot-transforms.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/examples/maml_omniglot/maml-omniglot-transforms.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/maml_omniglot/maml-omniglot-ptonly.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/examples/maml_omniglot/maml-omniglot-ptonly.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/maml_omniglot/maml-omniglot-higher.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/examples/maml_omniglot/maml-omniglot-higher.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/lennard_jones/lennard_jones.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/examples/lennard_jones/lennard_jones.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/ensembling/parallel_train.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/examples/ensembling/parallel_train.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/dp_cifar10/cifar10_transforms.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/examples/dp_cifar10/cifar10_transforms.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/dp_cifar10/cifar10_opacus.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/examples/dp_cifar10/cifar10_opacus.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/compilation/simple_function.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/examples/compilation/simple_function.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/compilation/linear_train.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/examples/compilation/linear_train.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/compilation/fuse_module.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/examples/compilation/fuse_module.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/examples/compilation/eager_fusion.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/examples/compilation/eager_fusion.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/einops/rearrange.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/einops/rearrange.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/einops/_parsing.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/einops/_parsing.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/einops/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/einops/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/docs/source/conf.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/docs/source/conf.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/wrap_type.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/dim/wrap_type.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/tree_map.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/dim/tree_map.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/reference.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/dim/reference.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/op_properties.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/dim/op_properties.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/magic_trace.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/dim/magic_trace.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/dim.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/dim/dim.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/delayed_mul_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/dim/delayed_mul_tensor.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/batch_tensor.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/dim/batch_tensor.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/dim/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/dim/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/compile/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/compile/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/benchmarks/process_scorecard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/benchmarks/process_scorecard.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/benchmarks/pointwise_scorecard.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/benchmarks/pointwise_scorecard.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/benchmarks/per_sample_grads.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/benchmarks/per_sample_grads.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/benchmarks/operator_authoring.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/benchmarks/operator_authoring.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/benchmarks/cse.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/benchmarks/cse.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/benchmarks/chrome_trace_parser.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/benchmarks/chrome_trace_parser.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/_src/vmap/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/_src/vmap/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/_src/make_functional/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/_src/make_functional/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/_src/eager_transforms/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/_src/eager_transforms/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/_src/aot_autograd/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/_src/aot_autograd/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/_src/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/_src/__init__.py + for f in `find ./functorch/ -name '*.py'` + install -D -pm 644 ./functorch/__init__.py /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/python3.12/site-packages/./functorch/__init__.py ++ /usr/local/cuda/bin/nvcc --version ++ grep release ++ awk '{print $2}' ++ cut -d, -f2 + cuver=12.3 + echo 'from typing import Optional' + echo '__all__ = ['\''__version__'\'', '\''debug'\'', '\''cuda'\'', '\''git_version'\'', '\''hip'\'']' + echo '__version__ = '\''2.4.0'\''' + echo 'debug = False' + echo 'cuda: Optional[str] = '\''12.3'\''' + echo 'git_version = '\''7efaf54dc46034189cb36b345764a5a9a5b693d4'\''' + echo 'hip: Optional[str] = None' + mv -f /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//builddir/build/BUILD/pytorch/nvfuser/nvfuser.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/ mv: cannot stat '/builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//builddir/build/BUILD/pytorch/nvfuser/nvfuser.so': No such file or directory + true + mv -f /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//builddir/build/BUILD/pytorch/torch/lib/libnvfuser_codegen.so /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/ mv: cannot stat '/builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//builddir/build/BUILD/pytorch/torch/lib/libnvfuser_codegen.so': No such file or directory + true + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/include/fmt + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/include/clog.h + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/include/xnnpack.h + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//builddir/build/BUILD/pytorch/test + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//builddir/build/BUILD/pytorch/nvfuser + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/cmake/fmt + rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64//usr/lib64/pkgconfig/fmt.pc + find /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64 -name functorch.so -exec rm -f '{}' ';' + /usr/bin/python3 setup.py egg_info Building wheel torch-2.4.0a0+git7efaf54 running egg_info creating torch.egg-info writing torch.egg-info/PKG-INFO writing dependency_links to torch.egg-info/dependency_links.txt writing entry points to torch.egg-info/entry_points.txt writing requirements to torch.egg-info/requires.txt writing top-level names to torch.egg-info/top_level.txt writing manifest file 'torch.egg-info/SOURCES.txt' reading manifest file 'torch.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no previously-included files matching '*.o' found anywhere in distribution warning: no previously-included files matching '*.so' found anywhere in distribution warning: no previously-included files matching '*.dylib' found anywhere in distribution warning: no previously-included files matching '*.a' found anywhere in distribution warning: no previously-included files matching '*.swp' found anywhere in distribution adding license file 'LICENSE' adding license file 'NOTICE' writing manifest file 'torch.egg-info/SOURCES.txt' + cp -r torch.egg-info /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib64/python3.12/site-packages/ + sed -i '/^\[/!s/[<=>].*//g' /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib64/python3.12/site-packages/torch.egg-info/requires.txt + sed -i /triton/d /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib64/python3.12/site-packages/torch.egg-info/requires.txt + set +x Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/bin/torch_shm_manager Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib64/libc10.so.2.4.0 Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib64/libc10_cuda.so Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib64/libcaffe2_nvrtc.so Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib64/libnnapi_backend.so Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib64/libshm.so.2.4.0 Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib64/libtorch.so.2.4.0 Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib64/libtorch_cpu.so.2.4.0 Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib64/libtorch_cuda.so Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib64/libtorch_cuda_linalg.so Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib64/libtorch_global_deps.so.2.4.0 Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib64/libtorch_python.so.2.4.0 Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib64/python3.12/site-packages/functorch/_C.so Stripping: /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib64/python3.12/site-packages/torch/_C.so + /usr/lib/rpm/check-buildroot + /usr/lib/rpm/redhat/brp-ldconfig + /usr/lib/rpm/brp-compress + /usr/lib/rpm/brp-strip /usr/bin/strip + /usr/lib/rpm/brp-strip-comment-note /usr/bin/strip /usr/bin/objdump + /usr/lib/rpm/redhat/brp-strip-lto /usr/bin/strip + /usr/lib/rpm/brp-strip-static-archive /usr/bin/strip + /usr/lib/rpm/check-rpaths + /usr/lib/rpm/redhat/brp-mangle-shebangs + /usr/lib/rpm/brp-remove-la-files + env /usr/lib/rpm/redhat/brp-python-bytecompile '' 1 0 -j4 Bytecompiling .py files below /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/lib64/python3.12 using python3.12 + /usr/lib/rpm/redhat/brp-python-hardlink Processing files: pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64 Executing(%doc): /bin/sh -e /var/tmp/rpm-tmp.0qh5Ag + umask 022 + cd /builddir/build/BUILD + cd pytorch + DOCDIR=/builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/share/doc/pytorch + export LC_ALL= + LC_ALL= + export DOCDIR + /usr/bin/mkdir -p /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/share/doc/pytorch + cp -pr /builddir/build/BUILD/pytorch/README.md /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/share/doc/pytorch + cp -pr /builddir/build/BUILD/pytorch/CONTRIBUTING.md /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/share/doc/pytorch + RPM_EC=0 ++ jobs -p + exit 0 Executing(%license): /bin/sh -e /var/tmp/rpm-tmp.09vRXJ + umask 022 + cd /builddir/build/BUILD + cd pytorch + LICENSEDIR=/builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/share/licenses/pytorch + export LC_ALL= + LC_ALL= + export LICENSEDIR + /usr/bin/mkdir -p /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/share/licenses/pytorch + cp -pr /builddir/build/BUILD/pytorch/LICENSE /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64/usr/share/licenses/pytorch + RPM_EC=0 ++ jobs -p + exit 0 Provides: libc10.so.2.4()(64bit) libc10_cuda.so()(64bit) libcaffe2_nvrtc.so()(64bit) libnnapi_backend.so()(64bit) libshm.so.2.4()(64bit) libtorch.so.2.4()(64bit) libtorch_cpu.so.2.4()(64bit) libtorch_cuda.so()(64bit) libtorch_cuda_linalg.so()(64bit) libtorch_global_deps.so.2.4()(64bit) pytorch = 2.4.0-20240412.0.git7efaf54d.cu12_3.fc41 pytorch(x86-64) = 2.4.0-20240412.0.git7efaf54d.cu12_3.fc41 Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 Requires: ld-linux-x86-64.so.2()(64bit) ld-linux-x86-64.so.2(GLIBC_2.3)(64bit) libc.so.6()(64bit) libc.so.6(GLIBC_2.11)(64bit) libc.so.6(GLIBC_2.14)(64bit) libc.so.6(GLIBC_2.16)(64bit) libc.so.6(GLIBC_2.17)(64bit) libc.so.6(GLIBC_2.2.5)(64bit) libc.so.6(GLIBC_2.28)(64bit) libc.so.6(GLIBC_2.3)(64bit) libc.so.6(GLIBC_2.3.2)(64bit) libc.so.6(GLIBC_2.3.4)(64bit) libc.so.6(GLIBC_2.32)(64bit) libc.so.6(GLIBC_2.33)(64bit) libc.so.6(GLIBC_2.34)(64bit) libc.so.6(GLIBC_2.38)(64bit) libc.so.6(GLIBC_2.6)(64bit) libc10.so.2.4()(64bit) libc10_cuda.so()(64bit) libcpuinfo.so.1()(64bit) libcublas.so.12()(64bit) libcublas.so.12(libcublas.so.12)(64bit) libcublasLt.so.12()(64bit) libcublasLt.so.12(libcublasLt.so.12)(64bit) libcuda.so.1()(64bit) libcudart.so.12()(64bit) libcudart.so.12(libcudart.so.12)(64bit) libcudnn.so.8()(64bit) libcudnn.so.8(libcudnn.so.8)(64bit) libcufft.so.11()(64bit) libcufft.so.11(libcufft.so.11)(64bit) libcurand.so.10()(64bit) libcusolver.so.11()(64bit) libcusolver.so.11(libcusolver.so.11)(64bit) libcusparse.so.12()(64bit) libcusparse.so.12(libcusparse.so.12)(64bit) libfbgemm.so.1()(64bit) libfoxi_loader.so.1.4.1()(64bit) libgcc_s.so.1()(64bit) libgcc_s.so.1(GCC_3.0)(64bit) libgcc_s.so.1(GCC_3.4)(64bit) libgflags.so.2.2()(64bit) libglog.so.0()(64bit) libgloo.so.1()(64bit) libgloo_cuda.so.1()(64bit) libgomp.so.1()(64bit) libgomp.so.1(GOMP_4.0)(64bit) libgomp.so.1(OMP_1.0)(64bit) libhiredis.so.1.0.0()(64bit) libkineto.so.1()(64bit) libleveldb.so.1()(64bit) liblmdb.so.0.0.0()(64bit) libm.so.6()(64bit) libm.so.6(GLIBC_2.2.5)(64bit) libm.so.6(GLIBC_2.23)(64bit) libm.so.6(GLIBC_2.27)(64bit) libm.so.6(GLIBC_2.29)(64bit) libm.so.6(GLIBC_2.35)(64bit) libm.so.6(GLIBC_2.38)(64bit) libmagma.so.1()(64bit) libnccl.so.2()(64bit) libnnpack.so.1()(64bit) libnuma.so.1()(64bit) libnuma.so.1(libnuma_1.1)(64bit) libnuma.so.1(libnuma_1.2)(64bit) libnvToolsExt.so.1()(64bit) libnvToolsExt.so.1(libnvToolsExt.so.1)(64bit) libnvrtc.so.12()(64bit) libnvrtc.so.12(libnvrtc.so.12)(64bit) libonnx.so()(64bit) libonnx_optimizer.so()(64bit) libonnx_proto.so()(64bit) libopenblaso.so.0()(64bit) libopencv_calib3d.so.409()(64bit) libopencv_core.so.409()(64bit) libopencv_cudev.so.409()(64bit) libopencv_dnn.so.409()(64bit) libopencv_features2d.so.409()(64bit) libopencv_flann.so.409()(64bit) libopencv_highgui.so.409()(64bit) libopencv_imgcodecs.so.409()(64bit) libopencv_imgproc.so.409()(64bit) libopencv_optflow.so.409()(64bit) libopencv_video.so.409()(64bit) libopencv_videoio.so.409()(64bit) libopencv_ximgproc.so.409()(64bit) libprotobuf.so.32()(64bit) libpthreadpool.so.1()(64bit) libqnnpack.so.1()(64bit) libshm.so.2.4()(64bit) libsleef.so.3()(64bit) libsnappy.so.1()(64bit) libstdc++.so.6()(64bit) libstdc++.so.6(CXXABI_1.3)(64bit) libstdc++.so.6(CXXABI_1.3.11)(64bit) libstdc++.so.6(CXXABI_1.3.13)(64bit) libstdc++.so.6(CXXABI_1.3.15)(64bit) libstdc++.so.6(CXXABI_1.3.2)(64bit) libstdc++.so.6(CXXABI_1.3.3)(64bit) libstdc++.so.6(CXXABI_1.3.5)(64bit) libstdc++.so.6(CXXABI_1.3.7)(64bit) libstdc++.so.6(CXXABI_1.3.8)(64bit) libstdc++.so.6(CXXABI_1.3.9)(64bit) libstdc++.so.6(GLIBCXX_3.4)(64bit) libstdc++.so.6(GLIBCXX_3.4.11)(64bit) libstdc++.so.6(GLIBCXX_3.4.14)(64bit) libstdc++.so.6(GLIBCXX_3.4.15)(64bit) libstdc++.so.6(GLIBCXX_3.4.17)(64bit) libstdc++.so.6(GLIBCXX_3.4.18)(64bit) libstdc++.so.6(GLIBCXX_3.4.19)(64bit) libstdc++.so.6(GLIBCXX_3.4.20)(64bit) libstdc++.so.6(GLIBCXX_3.4.21)(64bit) libstdc++.so.6(GLIBCXX_3.4.22)(64bit) libstdc++.so.6(GLIBCXX_3.4.26)(64bit) libstdc++.so.6(GLIBCXX_3.4.29)(64bit) libstdc++.so.6(GLIBCXX_3.4.30)(64bit) libstdc++.so.6(GLIBCXX_3.4.32)(64bit) libstdc++.so.6(GLIBCXX_3.4.9)(64bit) libtensorpipe.so.1()(64bit) libtensorpipe_cuda.so.1()(64bit) libtorch.so.2.4()(64bit) libtorch_cpu.so.2.4()(64bit) libtorch_cuda.so()(64bit) libtorch_python.so.2.4()(64bit) libzmq.so.5()(64bit) rtld(GNU_HASH) Processing files: pytorch-devel-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64 Provides: cmake(ATen) cmake(Caffe2) cmake(Torch) = 2.4.0 cmake(aten) cmake(caffe2) cmake(torch) = 2.4.0 pytorch-devel = 2.4.0-20240412.0.git7efaf54d.cu12_3.fc41 pytorch-devel(x86-64) = 2.4.0-20240412.0.git7efaf54d.cu12_3.fc41 Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 Requires: cmake-filesystem libc10.so.2.4()(64bit) libshm.so.2.4()(64bit) libtorch.so.2.4()(64bit) libtorch_cpu.so.2.4()(64bit) libtorch_global_deps.so.2.4()(64bit) Processing files: pytorch-python3-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64 warning: absolute symlink: /usr/lib64/python3.12/site-packages/torch/bin/torch_shm_manager -> /usr/bin/torch_shm_manager warning: absolute symlink: /usr/lib64/python3.12/site-packages/torch/include -> /usr/include warning: absolute symlink: /usr/lib64/python3.12/site-packages/torch/lib -> /usr/lib64 Provides: libtorch_python.so.2.4()(64bit) python3.12dist(torch) = 2.4.0 python3.12dist(torch) = 2.4~a0 python3dist(torch) = 2.4~a0 pytorch-python3 = 2.4.0-20240412.0.git7efaf54d.cu12_3.fc41 pytorch-python3(x86-64) = 2.4.0-20240412.0.git7efaf54d.cu12_3.fc41 Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PartialHardlinkSets) <= 4.0.4-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 Requires: libc.so.6()(64bit) libc.so.6(GLIBC_2.14)(64bit) libc.so.6(GLIBC_2.16)(64bit) libc.so.6(GLIBC_2.2.5)(64bit) libc.so.6(GLIBC_2.3.2)(64bit) libc.so.6(GLIBC_2.3.4)(64bit) libc.so.6(GLIBC_2.32)(64bit) libc.so.6(GLIBC_2.34)(64bit) libc.so.6(GLIBC_2.38)(64bit) libc10.so.2.4()(64bit) libc10_cuda.so()(64bit) libcudart.so.12()(64bit) libcudart.so.12(libcudart.so.12)(64bit) libcudnn.so.8()(64bit) libcudnn.so.8(libcudnn.so.8)(64bit) libgcc_s.so.1()(64bit) libgcc_s.so.1(GCC_3.0)(64bit) libgcc_s.so.1(GCC_3.4)(64bit) libglog.so.0()(64bit) libnvToolsExt.so.1()(64bit) libnvToolsExt.so.1(libnvToolsExt.so.1)(64bit) libprotobuf.so.32()(64bit) libshm.so.2.4()(64bit) libstdc++.so.6()(64bit) libstdc++.so.6(CXXABI_1.3)(64bit) libstdc++.so.6(CXXABI_1.3.11)(64bit) libstdc++.so.6(CXXABI_1.3.13)(64bit) libstdc++.so.6(CXXABI_1.3.15)(64bit) libstdc++.so.6(CXXABI_1.3.2)(64bit) libstdc++.so.6(CXXABI_1.3.3)(64bit) libstdc++.so.6(CXXABI_1.3.5)(64bit) libstdc++.so.6(CXXABI_1.3.8)(64bit) libstdc++.so.6(CXXABI_1.3.9)(64bit) libstdc++.so.6(GLIBCXX_3.4)(64bit) libstdc++.so.6(GLIBCXX_3.4.11)(64bit) libstdc++.so.6(GLIBCXX_3.4.14)(64bit) libstdc++.so.6(GLIBCXX_3.4.15)(64bit) libstdc++.so.6(GLIBCXX_3.4.18)(64bit) libstdc++.so.6(GLIBCXX_3.4.19)(64bit) libstdc++.so.6(GLIBCXX_3.4.20)(64bit) libstdc++.so.6(GLIBCXX_3.4.21)(64bit) libstdc++.so.6(GLIBCXX_3.4.22)(64bit) libstdc++.so.6(GLIBCXX_3.4.26)(64bit) libstdc++.so.6(GLIBCXX_3.4.29)(64bit) libstdc++.so.6(GLIBCXX_3.4.30)(64bit) libstdc++.so.6(GLIBCXX_3.4.32)(64bit) libstdc++.so.6(GLIBCXX_3.4.9)(64bit) libtorch.so.2.4()(64bit) libtorch_cpu.so.2.4()(64bit) libtorch_cuda.so()(64bit) libtorch_python.so.2.4()(64bit) python(abi) = 3.12 python3.12dist(filelock) python3.12dist(fsspec) python3.12dist(jinja2) python3.12dist(networkx) python3.12dist(sympy) python3.12dist(typing-extensions) >= 4.8 rtld(GNU_HASH) Checking for unpackaged file(s): /usr/lib/rpm/check-files /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64 Wrote: /builddir/build/RPMS/pytorch-devel-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64.rpm Wrote: /builddir/build/RPMS/pytorch-python3-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64.rpm Wrote: /builddir/build/RPMS/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64.rpm Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.sckhkO + umask 022 + cd /builddir/build/BUILD + cd pytorch + /usr/bin/rm -rf /builddir/build/BUILDROOT/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.x86_64 + RPM_EC=0 ++ jobs -p + exit 0 Executing(rmbuild): /bin/sh -e /var/tmp/rpm-tmp.xUxLt9 + umask 022 + cd /builddir/build/BUILD + rm -rf /builddir/build/BUILD/pytorch-SPECPARTS + rm -rf pytorch pytorch.gemspec + RPM_EC=0 ++ jobs -p + exit 0 RPM build warnings: %patchN is deprecated (2 usages found), use %patch N (or %patch -P N) absolute symlink: /usr/lib64/python3.12/site-packages/torch/bin/torch_shm_manager -> /usr/bin/torch_shm_manager absolute symlink: /usr/lib64/python3.12/site-packages/torch/include -> /usr/include absolute symlink: /usr/lib64/python3.12/site-packages/torch/lib -> /usr/lib64 Finish: rpmbuild pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.src.rpm Finish: build phase for pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.src.rpm INFO: chroot_scan: 1 files copied to /var/lib/copr-rpmbuild/results/chroot_scan INFO: /var/lib/mock/fedora-rawhide-x86_64-1712885339.434030/root/var/log/dnf5.log INFO: Done(/var/lib/copr-rpmbuild/results/pytorch-2.4.0-20240412.0.git7efaf54d.cu12_3.fc41.src.rpm) Config(child) 413 minutes 0 seconds INFO: Results and/or logs in: /var/lib/copr-rpmbuild/results INFO: Cleaning up build root ('cleanup_on_success=True') Start: clean chroot INFO: unmounting tmpfs. Finish: clean chroot Finish: run Running RPMResults tool Package info: { "packages": [ { "name": "pytorch", "epoch": null, "version": "2.4.0", "release": "20240412.0.git7efaf54d.cu12_3.fc41", "arch": "x86_64" }, { "name": "pytorch-python3", "epoch": null, "version": "2.4.0", "release": "20240412.0.git7efaf54d.cu12_3.fc41", "arch": "x86_64" }, { "name": "pytorch", "epoch": null, "version": "2.4.0", "release": "20240412.0.git7efaf54d.cu12_3.fc41", "arch": "src" }, { "name": "pytorch-devel", "epoch": null, "version": "2.4.0", "release": "20240412.0.git7efaf54d.cu12_3.fc41", "arch": "x86_64" } ] } RPMResults finished