linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
Masahiro Yamada	56769ba4b2	kbuild: unify vdso_install rules Currently, there is no standard implementation for vdso_install, leading to various issues: 1. Code duplication Many architectures duplicate similar code just for copying files to the install destination. Some architectures (arm, sparc, x86) create build-id symlinks, introducing more code duplication. 2. Unintended updates of in-tree build artifacts The vdso_install rule depends on the vdso files to install. It may update in-tree build artifacts. This can be problematic, as explained in commit `19514fc665` ("arm, kbuild: make "make install" not depend on vmlinux"). 3. Broken code in some architectures Makefile code is often copied from one architecture to another without proper adaptation. 'make vdso_install' for parisc does not work. 'make vdso_install' for s390 installs vdso64, but not vdso32. To address these problems, this commit introduces a generic vdso_install rule. Architectures that support vdso_install need to define vdso-install-y in arch/*/Makefile. vdso-install-y lists the files to install. For example, arch/x86/Makefile looks like this: vdso-install-$(CONFIG_X86_64) += arch/x86/entry/vdso/vdso64.so.dbg vdso-install-$(CONFIG_X86_X32_ABI) += arch/x86/entry/vdso/vdsox32.so.dbg vdso-install-$(CONFIG_X86_32) += arch/x86/entry/vdso/vdso32.so.dbg vdso-install-$(CONFIG_IA32_EMULATION) += arch/x86/entry/vdso/vdso32.so.dbg These files will be installed to $(MODLIB)/vdso/ with the .dbg suffix, if exists, stripped away. vdso-install-y can optionally take the second field after the colon separator. This is needed because some architectures install a vdso file as a different base name. The following is a snippet from arch/arm64/Makefile. vdso-install-$(CONFIG_COMPAT_VDSO) += arch/arm64/kernel/vdso32/vdso.so.dbg:vdso32.so This will rename vdso.so.dbg to vdso32.so during installation. If such architectures change their implementation so that the base names match, this workaround will go away. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Sven Schnelle <svens@linux.ibm.com> # s390 Reviewed-by: Nicolas Schier <nicolas@fjasle.eu> Reviewed-by: Guo Ren <guoren@kernel.org> Acked-by: Helge Deller <deller@gmx.de> # parisc Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>	2023-10-28 21:09:02 +09:00
Catalin Marinas	14dcf78a6c	Merge branch 'for-next/cpus_have_const_cap' into for-next/core * for-next/cpus_have_const_cap: (38 commits) : cpus_have_const_cap() removal arm64: Remove cpus_have_const_cap() arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_REPEAT_TLBI arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_CAVIUM_23154 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_2645198 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1742098 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1542419 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419 arm64: Avoid cpus_have_const_cap() for ARM64_UNMAP_KERNEL_AT_EL0 arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64} arm64: Avoid cpus_have_const_cap() for ARM64_SPECTRE_V2 arm64: Avoid cpus_have_const_cap() for ARM64_SSBS arm64: Avoid cpus_have_const_cap() for ARM64_MTE arm64: Avoid cpus_have_const_cap() for ARM64_HAS_TLB_RANGE arm64: Avoid cpus_have_const_cap() for ARM64_HAS_WFXT arm64: Avoid cpus_have_const_cap() for ARM64_HAS_RNG arm64: Avoid cpus_have_const_cap() for ARM64_HAS_EPAN arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PAN arm64: Avoid cpus_have_const_cap() for ARM64_HAS_GIC_PRIO_MASKING arm64: Avoid cpus_have_const_cap() for ARM64_HAS_DIT ...	2023-10-26 17:10:18 +01:00
Catalin Marinas	2baca17e6a	Merge branch 'for-next/feat_lse128' into for-next/core * for-next/feat_lse128: : HWCAP for FEAT_LSE128 kselftest/arm64: add FEAT_LSE128 to hwcap test arm64: add FEAT_LSE128 HWCAP	2023-10-26 17:10:07 +01:00
Catalin Marinas	023113fe66	Merge branch 'for-next/feat_lrcpc3' into for-next/core * for-next/feat_lrcpc3: : HWCAP for FEAT_LRCPC3 selftests/arm64: add HWCAP2_LRCPC3 test arm64: add FEAT_LRCPC3 HWCAP	2023-10-26 17:10:05 +01:00
Catalin Marinas	2a3f8ce3bb	Merge branch 'for-next/feat_sve_b16b16' into for-next/core * for-next/feat_sve_b16b16: : Add support for FEAT_SVE_B16B16 (BFloat16) kselftest/arm64: Verify HWCAP2_SVE_B16B16 arm64/sve: Report FEAT_SVE_B16B16 to userspace	2023-10-26 17:10:01 +01:00
Catalin Marinas	1519018ccb	Merge branches 'for-next/sve-remove-pseudo-regs', 'for-next/backtrace-ipi', 'for-next/kselftest', 'for-next/misc' and 'for-next/cpufeat-display-cores', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: perf: hisi: Fix use-after-free when register pmu fails drivers/perf: hisi_pcie: Initialize event->cpu only on success drivers/perf: hisi_pcie: Check the type first in pmu::event_init() perf/arm-cmn: Enable per-DTC counter allocation perf/arm-cmn: Rework DTC counters (again) perf/arm-cmn: Fix DTC domain detection drivers: perf: arm_pmuv3: Drop some unused arguments from armv8_pmu_init() drivers: perf: arm_pmuv3: Read PMMIR_EL1 unconditionally drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hisi_hns3_pmu uninit process drivers/perf: xgene: Use device_get_match_data() perf/amlogic: add missing MODULE_DEVICE_TABLE docs/perf: Add ampere_cspmu to toctree to fix a build warning perf: arm_cspmu: ampere_cspmu: Add support for Ampere SoC PMU perf: arm_cspmu: Support implementation specific validation perf: arm_cspmu: Support implementation specific filters perf: arm_cspmu: Split 64-bit write to 32-bit writes perf: arm_cspmu: Separate Arm and vendor module * for-next/sve-remove-pseudo-regs: : arm64/fpsimd: Remove the vector length pseudo registers arm64/sve: Remove SMCR pseudo register from cpufeature code arm64/sve: Remove ZCR pseudo register from cpufeature code * for-next/backtrace-ipi: : Add IPI for backtraces/kgdb, use NMI arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup arm64: smp: avoid NMI IPIs with broken MediaTek FW arm64: smp: Mark IPI globals as __ro_after_init arm64: kgdb: Implement kgdb_roundup_cpus() to enable pseudo-NMI roundup arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMI arm64: smp: Add arch support for backtrace using pseudo-NMI arm64: smp: Remove dedicated wakeup IPI arm64: idle: Tag the arm64 idle functions as __cpuidle irqchip/gic-v3: Enable support for SGIs to act as NMIs * for-next/kselftest: : Various arm64 kselftest updates kselftest/arm64: Validate SVCR in streaming SVE stress test * for-next/misc: : Miscellaneous patches arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=n arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper clocksource/drivers/arm_arch_timer: limit XGene-1 workaround arm64: Remove system_uses_lse_atomics() arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unused arm64/mm: Hoist synchronization out of set_ptes() loop arm64: swiotlb: Reduce the default size if no ZONE_DMA bouncing needed * for-next/cpufeat-display-cores: : arm64 cpufeature display enabled cores arm64: cpufeature: Change DBM to display enabled cores arm64: cpufeature: Display the set of cores with a feature	2023-10-26 17:09:52 +01:00
Maria Yu	d35686444f	arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=n The counting of module PLTs has been broken when CONFIG_RANDOMIZE_BASE=n since commit: `3e35d303ab` ("arm64: module: rework module VA range selection") Prior to that commit, when CONFIG_RANDOMIZE_BASE=n, the kernel image and all modules were placed within a 128M region, and no PLTs were necessary for B or BL. Hence count_plts() and partition_branch_plt_relas() skipped handling B and BL when CONFIG_RANDOMIZE_BASE=n. After that commit, modules can be placed anywhere within a 2G window regardless of CONFIG_RANDOMIZE_BASE, and hence PLTs may be necessary for B and BL even when CONFIG_RANDOMIZE_BASE=n. Unfortunately that commit failed to update count_plts() and partition_branch_plt_relas() accordingly. Due to this, module_emit_plt_entry() may fail if an insufficient number of PLT entries have been reserved, resulting in modules failing to load with -ENOEXEC. Fix this by counting PLTs regardless of CONFIG_RANDOMIZE_BASE in count_plts() and partition_branch_plt_relas(). Fixes: `3e35d303ab` ("arm64: module: rework module VA range selection") Signed-off-by: Maria Yu <quic_aiquny@quicinc.com> Cc: <stable@vger.kernel.org> # 6.5.x Acked-by: Ard Biesheuvel <ardb@kernel.org> Fixes: `3e35d303ab` ("arm64: module: rework module VA range selection") Reviewed-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20231024010954.6768-1-quic_aiquny@quicinc.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-24 15:12:48 +01:00
James Morse	c54e52f84d	arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper ACPI, irqchip and the architecture code all inspect the MADT enabled bit for a GICC entry in the MADT. The addition of an 'online capable' bit means all these sites need updating. Move the current checks behind a helper to make future updates easier. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> Acked-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/E1quv5D-00AeNJ-U8@rmk-PC.armlinux.org.uk Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-24 15:12:09 +01:00
Jeremy Linton	04d402a453	arm64: cpufeature: Change DBM to display enabled cores Now that we have the ability to display the list of cores with a feature when its selectivly enabled, lets convert DBM to use that as well. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Link: https://lore.kernel.org/r/20231017052322.1211099-3-jeremy.linton@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-23 16:40:37 +01:00
Jeremy Linton	23b727dc20	arm64: cpufeature: Display the set of cores with a feature The AMU feature can be enabled on a subset of the cores in a system. Because of that, it prints a message for each core as it is detected. This becomes tedious when there are hundreds of cores. Instead, for CPU features which can be enabled on a subset of the present cores, lets wait until update_cpu_capabilities() and print the subset of cores the feature was enabled on. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Tested-by: Ionela Voinescu <ionela.voinescu@arm.com> Reviewed-by: Punit Agrawal <punit.agrawal@bytedance.com> Tested-by: Punit Agrawal <punit.agrawal@bytedance.com> Link: https://lore.kernel.org/r/20231017052322.1211099-2-jeremy.linton@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-23 16:40:37 +01:00
Lorenzo Stoakes	6a1960b8a8	mm/gup: adapt get_user_page_vma_remote() to never return NULL get_user_pages_remote() will never return 0 except in the case of FOLL_NOWAIT being specified, which we explicitly disallow. This simplifies error handling for the caller and avoids the awkwardness of dealing with both errors and failing to pin. Failing to pin here is an error. Link: https://lkml.kernel.org/r/00319ce292d27b3aae76a0eb220ce3f528187508.1696288092.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Suggested-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-10-18 14:34:15 -07:00
Arnd Bergmann	b8466fe82b	efi: move screen_info into efi init code After the vga console no longer relies on global screen_info, there are only two remaining use cases: - on the x86 architecture, it is used for multiple boot methods (bzImage, EFI, Xen, kexec) to commucate the initial VGA or framebuffer settings to a number of device drivers. - on other architectures, it is only used as part of the EFI stub, and only for the three sysfb framebuffers (simpledrm, simplefb, efifb). Remove the duplicate data structure definitions by moving it into the efi-init.c file that sets it up initially for the EFI case, leaving x86 as an exception that retains its own definition for non-EFI boots. The added #ifdefs here are optional, I added them to further limit the reach of screen_info to configurations that have at least one of the users enabled. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20231017093947.3627976-1-arnd@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-10-17 16:33:39 +02:00
Ryan Roberts	3425cec42c	arm64/mm: Hoist synchronization out of set_ptes() loop set_ptes() sets a physically contiguous block of memory (which all belongs to the same folio) to a contiguous block of ptes. The arm64 implementation of this previously just looped, operating on each individual pte. But the __sync_icache_dcache() and mte_sync_tags() operations can both be hoisted out of the loop so that they are performed once for the contiguous set of pages (which may be less than the whole folio). This should result in minor performance gains. __sync_icache_dcache() already acts on the whole folio, and sets a flag in the folio so that it skips duplicate calls. But by hoisting the call, all the pte testing is done only once. mte_sync_tags() operates on each individual page with its own loop. But by passing the number of pages explicitly, we can rely solely on its loop and do the checks only once. This approach also makes it robust for the future, rather than assuming if a head page of a compound page is being mapped, then the whole compound page is being mapped, instead we explicitly know how many pages are being mapped. The old assumption may not continue to hold once the "anonymous large folios" feature is merged. Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20231005140730.2191134-1-ryan.roberts@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 18:27:31 +01:00
Mark Rutland	e8d4006dc2	arm64: Remove cpus_have_const_cap() There are no longer any users of cpus_have_const_cap(), and therefore it can be removed. Remove cpus_have_const_cap(). At the same time, remove __cpus_have_const_cap(), as this is a trivial wrapper of alternative_has_cap_unlikely(), which can be used directly instead. The comment for __system_matches_cap() is updated to no longer refer to cpus_have_const_cap(). As we have a number of ways to check the cpucaps, the specific suggestions are removed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Kristina Martsenko <kristina.martsenko@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 14:17:07 +01:00
Mark Rutland	0d48058ef8	arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP In has_useable_cnp() we use cpus_have_const_cap() to check for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP, but this is not necessary and cpus_have_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. We use has_useable_cnp() to determine whether we have the system-wide ARM64_HAS_CNP cpucap. Due to the structure of the cpufeature code, we call has_useable_cnp() in two distinct cases: 1) When finalizing system capabilities, setup_system_capabilities() will call has_useable_cnp() with SCOPE_SYSTEM to determine whether all CPUs have the feature. This is called after we've detected any local cpucaps including ARM64_WORKAROUND_NVIDIA_CARMEL_CNP, but prior to patching alternatives. If the ARM64_WORKAROUND_NVIDIA_CARMEL_CNP was detected, we will not detect ARM64_HAS_CNP. 2) After finalizing system capabilties, verify_local_cpu_capabilities() will call has_useable_cnp() with SCOPE_LOCAL_CPU to verify that CPUs have CNP if we previously detected it. Note that if ARM64_WORKAROUND_NVIDIA_CARMEL_CNP was detected, we will not have detected ARM64_HAS_CNP. For case 1 we must check the system_cpucaps bitmap as this occurs prior to patching the alternatives. For case 2 we'll only call has_useable_cnp() once per subsequent onlining of a CPU, and as this isn't a fast path it's not necessary to optimize for this case. This patch replaces the use of cpus_have_const_cap() with cpus_have_cap(), which will only generate the bitmap test and avoid generating an alternative sequence, resulting in slightly simpler annd smaller code being generated. The ARM64_WORKAROUND_NVIDIA_CARMEL_CNP cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 14:17:07 +01:00
Mark Rutland	48b57d9199	arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1742098 In elf_hwcap_fixup() we use cpus_have_const_cap() to check for ARM64_WORKAROUND_1742098, but this is not necessary and cpus_have_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_WORKAROUND_1742098 cpucap is detected and patched before elf_hwcap_fixup() can run, and hence it is not necessary to use cpus_have_const_cap(). We run cpus_have_const_cap() at most twice: once after finalizing system cpucaps, and potentially once more after detecting mismatched CPUs which support AArch32 at EL0. Due to this, it's not necessary to optimize for many calls to elf_hwcap_fixup(), and it's fine to use cpus_have_cap(). This patch replaces the use of cpus_have_const_cap() with cpus_have_cap(), which will only generate the bitmap test and avoid generating an alternative sequence, resulting in slightly simpler annd smaller code being generated. For consistenct with other cpucaps, the ARM64_WORKAROUND_1742098 cpucap is added to cpucap_is_possible() so that code can be elided when this is not possible. However, as we only define compat_elf_hwcap2 when CONFIG_COMPAT=y, some ifdeffery is still required within user_feature_fixup() to avoid build errors when CONFIG_COMPAT=n. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 14:17:06 +01:00
Mark Rutland	d1e40f8222	arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1542419 We use cpus_have_const_cap() to check for ARM64_WORKAROUND_1542419 but this is not necessary and cpus_have_final_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_WORKAROUND_1542419 cpucap is detected and patched before any userspace code can run, and the both __do_compat_cache_op() and ctr_read_handler() are only reachable from exceptions taken from userspace. Thus it is not necessary for either to use cpus_have_const_cap(), and cpus_have_final_cap() is equivalent. This patch replaces the use of cpus_have_const_cap() with cpus_have_final_cap(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Using cpus_have_final_cap() clearly documents that we do not expect this code to run before cpucaps are finalized, and will make it easier to spot issues if code is changed in future to allow these functions to be reached earlier. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 14:17:06 +01:00
Mark Rutland	0a285dfe87	arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419 In count_plts() and is_forbidden_offset_for_adrp() we use cpus_have_const_cap() to check for ARM64_WORKAROUND_843419, but this is not necessary and cpus_have_final_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. It's not possible to load a module in the window between detecting the ARM64_WORKAROUND_843419 cpucap and patching alternatives. The module VA range limits are initialized much later in module_init_limits() which is a subsys_initcall, and module loading cannot happen before this. Hence it's not necessary for count_plts() or is_forbidden_offset_for_adrp() to use cpus_have_const_cap(). This patch replaces the use of cpus_have_const_cap() with cpus_have_final_cap() which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Using cpus_have_final_cap() clearly documents that we do not expect this code to run before cpucaps are finalized, and will make it easier to spot issues if code is changed in future to allow modules to be loaded earlier. The ARM64_WORKAROUND_843419 cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible, and redundant IS_ENABLED() checks are removed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 14:17:06 +01:00
Mark Rutland	c2ef5f1e15	arm64: Avoid cpus_have_const_cap() for ARM64_UNMAP_KERNEL_AT_EL0 In arm64_kernel_unmapped_at_el0() we use cpus_have_const_cap() to check for ARM64_UNMAP_KERNEL_AT_EL0, but this is only necessary so that arm64_get_bp_hardening_vector() and this_cpu_set_vectors() can run prior to alternatives being patched. Otherwise this is not necessary and alternative_has_cap_() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_UNMAP_KERNEL_AT_EL0 cpucap is a system-wide feature that is detected and patched before any translation tables are created for userspace. In the window between detecting the ARM64_UNMAP_KERNEL_AT_EL0 cpucap and patching alternatives, most users of arm64_kernel_unmapped_at_el0() do not need to know that the cpucap has been detected: * As KVM is initialized after cpucaps are finalized, no usaef of arm64_kernel_unmapped_at_el0() in the KVM code is reachable during this window. * The arm64_mm_context_get() function in arch/arm64/mm/context.c is only called after the SMMU driver is brought up after alternatives have been patched. Thus this can safely use cpus_have_final_cap() or alternative_has_cap_(). Similarly the asids_update_limit() function is called after alternatives have been patched as an arch_initcall, and this can safely use cpus_have_final_cap() or alternative_has_cap_(). Similarly we do not expect an ASID rollover to occur between cpucaps being detected and patching alternatives. Thus set_reserved_asid_bits() can safely use cpus_have_final_cap() or alternative_has_cap_(). The __tlbi_user() and __tlbi_user_level() macros are not used during this window, and only need to invalidate additional entries once userspace translation tables have been active on a CPU. Thus these can safely use alternative_has_cap_(). The xen_kernel_unmapped_at_usr() function is not used during this window as it is only used in a late_initcall. Thus this can safely use cpus_have_final_cap() or alternative_has_cap_(). The arm64_get_meltdown_state() function is not used during this window. It only used by arm64_get_meltdown_state() and KVM code, both of which are only used after cpucaps have been finalized. Thus this can safely use cpus_have_final_cap() or alternative_has_cap_(). The tls_thread_switch() uses arm64_kernel_unmapped_at_el0() as an optimization to avoid zeroing tpidrro_el0 when KPTI is enabled and this will be trampled by the KPTI trampoline. It doesn't matter if this continues to zero the register during the window between detecting the cpucap and patching alternatives, so this can safely use alternative_has_cap_(). The sdei_arch_get_entry_point() and do_sdei_event() functions aren't reachable at this time as the SDEI driver is registered later by acpi_init() -> acpi_ghes_init() -> sdei_init(), where acpi_init is a subsys_initcall. Thus these can safely use cpus_have_final_cap() or alternative_has_cap_(). The uses under drivers/ aren't reachable at this time as the drivers are registered later: - TRBE is registered via module_init() - SMMUv3 is registred via module_driver() - SPE is registred via module_init() * The arm64_get_bp_hardening_vector() and this_cpu_set_vectors() functions need to run on boot CPUs prior to patching alternatives. As these are only called during the onlining of a CPU, it's fine to perform a system_cpucaps bitmap test using cpus_have_cap(). This patch modifies this_cpu_set_vectors() to use cpus_have_cap(), and replaced all other use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. The ARM64_UNMAP_KERNEL_AT_EL0 cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 14:17:06 +01:00
Mark Rutland	a76521d160	arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64} In system_supports_{sve,sme,sme2,fa64}() we use cpus_have_const_cap() to check for the relevant cpucaps, but this is only necessary so that sve_setup() and sme_setup() can run prior to alternatives being patched, and otherwise alternative_has_cap_() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. All of system_supports_{sve,sme,sme2,fa64}() will return false prior to system cpucaps being detected. In the window between system cpucaps being detected and patching alternatives, we need system_supports_sve() and system_supports_sme() to run to initialize SVE and SME properties, but all other users of system_supports_{sve,sme,sme2,fa64}() don't depend on the relevant cpucap becoming true until alternatives are patched: * No KVM code runs until after alternatives are patched, and so this can safely use cpus_have_final_cap() or alternative_has_cap_(). The cpuid_cpu_online() callback in arch/arm64/kernel/cpuinfo.c is registered later from cpuinfo_regs_init() as a device_initcall, and so this can safely use cpus_have_final_cap() or alternative_has_cap_(). The entry, signal, and ptrace code isn't reachable until userspace has run, and so this can safely use cpus_have_final_cap() or alternative_has_cap_(). Currently perf_reg_validate() will un-reserve the PERF_REG_ARM64_VG pseudo-register before alternatives are patched, and before sve_setup() has run. If a sampling event is created early enough, this would allow perf_ext_reg_value() to sample (the as-yet uninitialized) thread_struct::vl[] prior to alternatives being patched. It would be preferable to defer this until alternatives are patched, and this can safely use alternative_has_cap_(). The context-switch code will run during this window as part of stop_machine() used during alternatives_patch_all(), and potentially for other work if other kernel threads are created early. No threads require the use of SVE/SME/SME2/FA64 prior to alternatives being patched, and it would be preferable for the related context-switch logic to take effect after alternatives are patched so that ths is guaranteed to see a consistent system-wide state (e.g. anything initialized by sve_setup() and sme_setup(). This can safely ues alternative_has_cap_*(). This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. The sve_setup() and sme_setup() functions are modified to use cpus_have_cap() directly so that they can observe the cpucaps being set prior to alternatives being patched. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 14:17:06 +01:00
Mark Rutland	bc75d0c0f3	arm64: Avoid cpus_have_const_cap() for ARM64_SSBS In ssbs_thread_switch() we use cpus_have_const_cap() to check for ARM64_SSBS, but this is not necessary and alternative_has_cap_() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The cpus_have_const_cap() check in ssbs_thread_switch() is an optimization to avoid the overhead of spectre_v4_enable_task_mitigation() where all CPUs implement SSBS and naturally preserve the SSBS bit in SPSR_ELx. In the window between detecting the ARM64_SSBS system-wide and patching alternative branches it is benign to continue to call spectre_v4_enable_task_mitigation(). This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 14:17:05 +01:00
Mark Rutland	53d62e995d	arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PAN In system_uses_hw_pan() we use cpus_have_const_cap() to check for ARM64_HAS_PAN, but this is only necessary so that the system_uses_ttbr0_pan() check in setup_cpu_features() can run prior to alternatives being patched, and otherwise this is not necessary and alternative_has_cap_() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_HAS_PAN cpucap is used by system_uses_hw_pan() and system_uses_ttbr0_pan() depending on whether CONFIG_ARM64_SW_TTBR0_PAN is selected, and: * We only use system_uses_hw_pan() directly in __sdei_handler(), which isn't reachable until after alternatives have been patched, and for this it is safe to use alternative_has_cap_(). We use system_uses_ttbr0_pan() in a few places: - In check_and_switch_context() and cpu_uninstall_idmap(), which will defer installing a translation table into TTBR0 when the ARM64_HAS_PAN cpucap is not detected. Prior to patching alternatives, all CPUs will be using init_mm with the reserved ttbr0 translation tables install in TTBR0, so these can safely use alternative_has_cap_(). - In update_saved_ttbr0(), which will only save the active TTBR0 into a per-thread variable when the ARM64_HAS_PAN cpucap is not detected. Prior to patching alternatives, all CPUs will be using init_mm with the reserved ttbr0 translation tables install in TTBR0, so these can safely use alternative_has_cap_(). - In efi_set_pgd(), which will handle check_and_switch_context() deferring the installation of TTBR0 when TTBR0 PAN is detected. The EFI runtime services are not initialized until after alternatives have been patched, and so this can safely use alternative_has_cap_() or cpus_have_final_cap(). - In uaccess_ttbr0_disable() and uaccess_ttbr0_enable(), where we'll avoid installing/uninstalling a translation table in TTBR0 when ARM64_HAS_PAN is detected. Prior to patching alternatives we will not perform any uaccess and will not call uaccess_ttbr0_disable() or uaccess_ttbr0_enable(), and so these can safely use alternative_has_cap_() or cpus_have_final_cap(). - In is_el1_permission_fault() where we will consider a translation fault on a TTBR0 address to be a permission fault when ARM64_HAS_PAN is not detected and we have set the PAN bit in the SPSR (which tells us that in the interrupted context, TTBR0 pointed at the reserved zero ttbr). In the window between detecting system cpucaps and patching alternatives we should not perform any accesses to TTBR0 addresses, and no userspace translation tables exist until after patching alternatives. Thus it is safe for this to use alternative_has_cap*(). This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. So that the check for TTBR0 PAN in setup_cpu_features() can run prior to alternatives being patched, the call to system_uses_ttbr0_pan() is replaced with an explicit check of the ARM64_HAS_PAN bit in the system_cpucaps bitmap. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 14:17:04 +01:00
Mark Rutland	25693f1771	arm64: Avoid cpus_have_const_cap() for ARM64_HAS_DIT In __cpu_suspend_exit() we use cpus_have_const_cap() to check for ARM64_HAS_DIT but this is not necessary and cpus_have_final_cap() of alternative_has_cap_() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_HAS_DIT cpucap is detected and patched (along with all other cpucaps) before __cpu_suspend_exit() can run. We'll only use __cpu_suspend_exit() as part of PSCI cpuidle or hibernation, and both of these are intialized after system cpucaps are detected and patched: the PSCI cpuidle driver is registered with a device_initcall, hibernation restoration occurs in a late_initcall, and hibarnation saving is driven by usrspace. Therefore it is not necessary to use cpus_have_const_cap(), and using alternative_has_cap_*() or cpus_have_final_cap() is sufficient. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. To clearly document the ordering relationship between suspend/resume and alternatives patching, an explicit check for system_capabilities_finalized() is added to cpu_suspend() along with a comment block, which will make it easier to spot issues if code is changed in future to allow these functions to be reached earlier. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 14:17:04 +01:00
Mark Rutland	54c8818aa2	arm64: Avoid cpus_have_const_cap() for ARM64_HAS_CNP In system_supports_cnp() we use cpus_have_const_cap() to check for ARM64_HAS_CNP, but this is only necessary so that the cpu_enable_cnp() callback can run prior to alternatives being patched, and otherwise this is not necessary and alternative_has_cap_() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The cpu_enable_cnp() callback is run immediately after the ARM64_HAS_CNP cpucap is detected system-wide under setup_system_capabilities(), prior to alternatives being patched. During this window cpu_enable_cnp() uses cpu_replace_ttbr1() to set the CNP bit for the swapper_pg_dir in TTBR1. No other users of the ARM64_HAS_CNP cpucap need the up-to-date value during this window: * As KVM isn't initialized yet, kvm_get_vttbr() isn't reachable. * As cpuidle isn't initialized yet, __cpu_suspend_exit() isn't reachable. * At this point all CPUs are using the swapper_pg_dir with a reserved ASID in TTBR1, and the idmap_pg_dir in TTBR0, so neither check_and_switch_context() nor cpu_do_switch_mm() need to do anything special. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. To allow cpu_enable_cnp() to function prior to alternatives being patched, cpu_replace_ttbr1() is split into cpu_replace_ttbr1() and cpu_enable_swapper_cnp(), with the former only used for early TTBR1 replacement, and the latter used by both cpu_enable_cnp() and __cpu_suspend_exit(). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Vladimir Murzin <vladimir.murzin@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 14:17:04 +01:00
Mark Rutland	bbbb65770b	arm64: Avoid cpus_have_const_cap() for ARM64_HAS_BTI In system_supports_bti() we use cpus_have_const_cap() to check for ARM64_HAS_BTI, but this is not necessary and alternative_has_cap_() or cpus_have_final_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. When CONFIG_ARM64_BTI_KERNEL=y, the ARM64_HAS_BTI cpucap is a strict boot cpu feature which is detected and patched early on the boot cpu. All uses guarded by CONFIG_ARM64_BTI_KERNEL happen after the boot CPU has detected ARM64_HAS_BTI and patched boot alternatives, and hence can safely use alternative_has_cap_() or cpus_have_final_boot_cap(). Regardless of CONFIG_ARM64_BTI_KERNEL, all other uses of ARM64_HAS_BTI happen after system capabilities have been finalized and alternatives have been patched. Hence these can safely use alternative_has_cap_*) or cpus_have_final_cap(). This patch splits system_supports_bti() into system_supports_bti() and system_supports_bti_kernel(), with the former handling where the cpucap affects userspace functionality, and ther latter handling where the cpucap affects kernel functionality. The use of cpus_have_const_cap() is replaced by cpus_have_final_cap() in cpus_have_const_cap, and cpus_have_final_boot_cap() in system_supports_bti_kernel(). This will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. The use of cpus_have_final_cap() and cpus_have_final_boot_cap() will make it easier to spot if code is chaanged such that these run before the ARM64_HAS_BTI cpucap is guaranteed to have been finalized. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 14:17:04 +01:00
Mark Rutland	34f66c4c4d	arm64: Use a positive cpucap for FP/SIMD Currently we have a negative cpucap which describes the absence of FP/SIMD rather than presence of FP/SIMD. This largely works, but is somewhat awkward relative to other cpucaps that describe the presence of a feature, and it would be nicer to have a cpucap which describes the presence of FP/SIMD: * This will allow the cpucap to be treated as a standard ARM64_CPUCAP_SYSTEM_FEATURE, which can be detected with the standard has_cpuid_feature() function and ARM64_CPUID_FIELDS() description. * This ensures that the cpucap will only transition from not-present to present, reducing the risk of unintentional and/or unsafe usage of FP/SIMD before cpucaps are finalized. * This will allow using arm64_cpu_capabilities::cpu_enable() to enable the use of FP/SIMD later, with FP/SIMD being disabled at boot time otherwise. This will ensure that any unintentional and/or unsafe usage of FP/SIMD prior to this is trapped, and will ensure that FP/SIMD is never unintentionally enabled for userspace in mismatched big.LITTLE systems. This patch replaces the negative ARM64_HAS_NO_FPSIMD cpucap with a positive ARM64_HAS_FPSIMD cpucap, making changes as described above. Note that as FP/SIMD will now be trapped when not supported system-wide, do_fpsimd_acc() must handle these traps in the same way as for SVE and SME. The commentary in fpsimd_restore_current_state() is updated to describe the new scheme. No users of system_supports_fpsimd() need to know that FP/SIMD is available prior to alternatives being patched, so this is updated to use alternative_has_cap_likely() to check for the ARM64_HAS_FPSIMD cpucap, without generating code to test the system_cpucaps bitmap. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 14:17:03 +01:00
Mark Rutland	14567ba42c	arm64: Rename SVE/SME cpu_enable functions The arm64_cpu_capabilities::cpu_enable() callbacks for SVE, SME, SME2, and FA64 are named with an unusual "${feature}_kernel_enable" pattern rather than the much more common "cpu_enable_${feature}". Now that we only use these as cpu_enable() callbacks, it would be nice to have them match the usual scheme. This patch renames the cpu_enable() callbacks to match this scheme. At the same time, the comment above cpu_enable_sve() is removed for consistency with the other cpu_enable() callbacks. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 14:17:03 +01:00
Mark Rutland	9077229170	arm64: Use build-time assertions for cpucap ordering Both sme2_kernel_enable() and fa64_kernel_enable() need to run after sme_kernel_enable(). This happens to be true today as ARM64_SME has a lower index than either ARM64_SME2 or ARM64_SME_FA64, and both functions have a comment to this effect. It would be nicer to have a build-time assertion like we for for can_use_gic_priorities() and has_gic_prio_relaxed_sync(), as that way it will be harder to miss any potential breakage. This patch replaces the comments with build-time assertions. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 14:17:03 +01:00
Mark Rutland	bc9bbb7880	arm64: Explicitly save/restore CPACR when probing SVE and SME When a CPUs onlined we first probe for supported features and propetites, and then we subsequently enable features that have been detected. This is a little problematic for SVE and SME, as some properties (e.g. vector lengths) cannot be probed while they are disabled. Due to this, the code probing for SVE properties has to enable SVE for EL1 prior to proving, and the code probing for SME properties has to enable SME for EL1 prior to probing. We never disable SVE or SME for EL1 after probing. It would be a little nicer to transiently enable SVE and SME during probing, leaving them both disabled unless explicitly enabled, as this would make it much easier to catch unintentional usage (e.g. when they are not present system-wide). This patch reworks the SVE and SME feature probing code to only transiently enable support at EL1, disabling after probing is complete. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 14:16:53 +01:00
Mark Rutland	42c5a3b04b	arm64: Split kpti_install_ng_mappings() The arm64_cpu_capabilities::cpu_enable callbacks are intended for cpu-local feature enablement (e.g. poking system registers). These get called for each online CPU when boot/system cpucaps get finalized and enabled, and get called whenever a CPU is subsequently onlined. For KPTI with the ARM64_UNMAP_KERNEL_AT_EL0 cpucap, we use the kpti_install_ng_mappings() function as the cpu_enable callback. This does a mixture of cpu-local configuration (setting VBAR_EL1 to the appropriate trampoline vectors) and some global configuration (rewriting the swapper page tables to sue non-glboal mappings) that must happen at most once. This patch splits kpti_install_ng_mappings() into a cpu-local cpu_enable_kpti() initialization function and a system-wide kpti_install_ng_mappings() function. The cpu_enable_kpti() function is responsible for selecting the necessary cpu-local vectors each time a CPU is onlined, and the kpti_install_ng_mappings() function performs the one-time rewrite of the translation tables too use non-global mappings. Splitting the two makes the code a bit easier to follow and also allows the page table rewriting code to be marked as __init such that it can be freed after use. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 12:57:54 +01:00
Mark Rutland	7f632d331d	arm64: Fixup user features at boot time For ARM64_WORKAROUND_2658417, we use a cpu_enable() callback to hide the ID_AA64ISAR1_EL1.BF16 ID register field. This is a little awkward as CPUs may attempt to apply the workaround concurrently, requiring that we protect the bulk of the callback with a raw_spinlock, and requiring some pointless work every time a CPU is subsequently hotplugged in. This patch makes this a little simpler by handling the masking once at boot time. A new user_feature_fixup() function is called at the start of setup_user_features() to mask the feature, matching the style of elf_hwcap_fixup(). The ARM64_WORKAROUND_2658417 cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible. Note that the ARM64_WORKAROUND_2658417 capability is matched with ERRATA_MIDR_RANGE(), which implicitly gives the capability a ARM64_CPUCAP_LOCAL_CPU_ERRATUM type, which forbids the late onlining of a CPU with the erratum if the erratum was not present at boot time. Therefore this patch doesn't change the behaviour for late onlining. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 12:57:52 +01:00
Mark Rutland	075f48c924	arm64: Rework setup_cpu_features() Currently setup_cpu_features() handles a mixture of one-time kernel feature setup (e.g. cpucaps) and one-time user feature setup (e.g. ELF hwcaps). Subsequent patches will rework other one-time setup and expand the logic currently in setup_cpu_features(), and in preparation for this it would be helpful to split the kernel and user setup into separate functions. This patch splits setup_user_features() out of setup_cpu_features(), with a few additional cleanups of note: * setup_cpu_features() is renamed to setup_system_features() to make it clear that it handles system-wide feature setup rather than cpu-local feature setup. * setup_system_capabilities() is folded into setup_system_features(). * Presence of TTBR0 pan is logged immediately after update_cpu_capabilities(), so that this is guaranteed to appear alongside all the other detected system cpucaps. * The 'cwg' variable is removed as its value is only consumed once and it's simpler to use cache_type_cwg() directly without assigning its return value to a variable. * The call to setup_user_features() is moved after alternatives are patched, which will allow user feature setup code to depend on alternative branches and allow for simplifications in subsequent patches. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-16 12:57:51 +01:00
Joey Gouly	94d0657f9f	arm64: add FEAT_LSE128 HWCAP Add HWCAP for FEAT_LSE128 (128-bit Atomic instructions). Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20231003124544.858804-2-joey.gouly@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-13 19:12:34 +01:00
Joey Gouly	338a835f40	arm64: add FEAT_LRCPC3 HWCAP FEAT_LRCPC3 adds more instructions to support the Release Consistency model. Add a HWCAP so that userspace can make decisions about instructions it can use. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230919162757.2707023-2-joey.gouly@arm.com [catalin.marinas@arm.com: change the HWCAP number] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-13 19:11:35 +01:00
Joel Granados	de8a660b03	arm: Remove now superfluous sentinel elem from ctl_table arrays This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/) Removed the sentinel as well as the explicit size from ctl_isa_vars. The size is redundant as the initialization sets it. Changed insn_emulation->sysctl from a 2 element array of struct ctl_table to a simple struct. This has no consequence for the sysctl registration as it is forwarded as a pointer. Removed sentinel from sve_defatul_vl_table, sme_default_vl_table, tagged_addr_sysctl_table and armv8_pmu_sysctl_table. This removal is safe because register_sysctl_sz and register_sysctl use the array size in addition to checking for the sentinel. Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>	2023-10-10 15:22:02 -07:00
Kristina Martsenko	2de451a329	KVM: arm64: Add handler for MOPS exceptions An Armv8.8 FEAT_MOPS main or epilogue instruction will take an exception if executed on a CPU with a different MOPS implementation option (A or B) than the CPU where the preceding prologue instruction ran. In this case the OS exception handler is expected to reset the registers and restart execution from the prologue instruction. A KVM guest may use the instructions at EL1 at times when the guest is not able to handle the exception, expecting that the instructions will only run on one CPU (e.g. when running UEFI boot services in the guest). As KVM may reschedule the guest between different types of CPUs at any time (on an asymmetric system), it needs to also handle the resulting exception itself in case the guest is not able to. A similar situation will also occur in the future when live migrating a guest from one type of CPU to another. Add handling for the MOPS exception to KVM. The handling can be shared with the EL0 exception handler, as the logic and register layouts are the same. The exception can be handled right after exiting a guest, which avoids the cost of returning to the host exit handler. Similarly to the EL0 exception handler, in case the main or epilogue instruction is being single stepped, it makes sense to finish the step before executing the prologue instruction, so advance the single step state machine. Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230922112508.1774352-2-kristina.martsenko@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2023-10-09 19:54:25 +00:00
Douglas Anderson	ef31b8ce31	arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup In commit `2b2d0a7a96` ("arm64: smp: Remove dedicated wakeup IPI") we started using a scheduler IPI to avoid a dedicated reschedule. When we did this, we used arch_smp_send_reschedule() directly rather than calling smp_send_reschedule(). The only difference is that calling arch_smp_send_reschedule() directly avoids tracing. Presumably we _don't_ want to avoid tracing here, so switch to smp_send_reschedule(). Fixes: `2b2d0a7a96` ("arm64: smp: Remove dedicated wakeup IPI") Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-06 12:35:01 +01:00
Mark Rutland	a07a594152	arm64: smp: avoid NMI IPIs with broken MediaTek FW Some MediaTek devices have broken firmware which corrupts some GICR registers behind the back of the OS, and pseudo-NMIs cannot be used on these devices. For more details see commit: `44bd78dd2b` ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues") We did not take this problem into account in commit: `331a1b3a83` ("arm64: smp: Add arch support for backtrace using pseudo-NMI") Since that commit arm64's SMP code will try to setup some IPIs as pseudo-NMIs, even on systems with broken FW. The GICv3 code will (rightly) reject attempts to request interrupts as pseudo-NMIs, resulting in boot-time failures. Avoid the problem by taking the broken FW into account when deciding to request IPIs as pseudo-NMIs. The GICv3 driver maintains a static_key named "supports_pseudo_nmis" which is false on systems with broken FW, and we can consult this within ipi_should_be_nmi(). Fixes: `331a1b3a83` ("arm64: smp: Add arch support for backtrace using pseudo-NMI") Reported-by: Chen-Yu Tsai <wenst@chromium.org> Closes: https://issuetracker.google.com/issues/197061987#comment68 Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Tested-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-10-06 12:34:41 +01:00
Frederic Weisbecker	448e9f34d9	rcu: Standardize explicit CPU-hotplug calls rcu_report_dead() and rcutree_migrate_callbacks() have their headers in rcupdate.h while those are pure rcutree calls, like the other CPU-hotplug functions. Also rcu_cpu_starting() and rcu_report_dead() have different naming conventions while they mirror each other's effects. Fix the headers and propose a naming that relates both functions and aligns with the prefix of other rcutree CPU-hotplug functions. Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>	2023-10-04 22:29:45 +02:00
Frederic Weisbecker	c964c1f5ee	rcu: Assume rcu_report_dead() is always called locally rcu_report_dead() has to be called locally by the CPU that is going to exit the RCU state machine. Passing a cpu argument here is error-prone and leaves the possibility for a racy remote call. Use local access instead. Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>	2023-10-04 17:35:56 +02:00
Rob Herring	471470bc70	arm64: errata: Add Cortex-A520 speculative unprivileged load workaround Implement the workaround for ARM Cortex-A520 erratum 2966298. On an affected Cortex-A520 core, a speculatively executed unprivileged load might leak data from a privileged load via a cache side channel. The issue only exists for loads within a translation regime with the same translation (e.g. same ASID and VMID). Therefore, the issue only affects the return to EL0. The workaround is to execute a TLBI before returning to EL0 after all loads of privileged data. A non-shareable TLBI to any address is sufficient. The workaround isn't necessary if page table isolation (KPTI) is enabled, but for simplicity it will be. Page table isolation should normally be disabled for Cortex-A520 as it supports the CSV3 feature and the E0PD feature (used when KASLR is enabled). Cc: stable@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230921194156.1050055-2-robh@kernel.org Signed-off-by: Will Deacon <will@kernel.org>	2023-09-29 16:31:33 +01:00
Mark Brown	5d5b4e8c2d	arm64/sve: Report FEAT_SVE_B16B16 to userspace SVE 2.1 introduced a new feature FEAT_SVE_B16B16 which adds instructions supporting the BFloat16 floating point format. Report this to userspace through the ID registers and hwcap. Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230915-arm64-zfr-b16b16-el0-v1-1-f9aba807bdb5@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-09-29 15:56:17 +01:00
Douglas Anderson	62817d5ba2	arm64: smp: Mark IPI globals as __ro_after_init Mark the three IPI-related globals in smp.c as "__ro_after_init" since they are only ever set in set_smp_ipi_range(), which is marked "__init". This is a better and more secure marking than the old "__read_mostly". Suggested-by: Stephen Boyd <swboyd@chromium.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Link: https://lore.kernel.org/r/20230906090246.v13.7.I625d393afd71e1766ef73d3bfaac0b347a4afd19@changeid Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-09-25 17:15:29 +01:00
Douglas Anderson	2f5cd0c7ff	arm64: kgdb: Implement kgdb_roundup_cpus() to enable pseudo-NMI roundup Up until now we've been using the generic (weak) implementation for kgdb_roundup_cpus() when using kgdb on arm64. Let's move to a custom one. The advantage here is that, when pseudo-NMI is enabled on a device, we'll be able to round up CPUs using pseudo-NMI. This allows us to debug CPUs that are stuck with interrupts disabled. If pseudo-NMIs are not enabled then we'll fallback to just using an IPI, which is still slightly better than the generic implementation since it avoids the potential situation described in the generic kgdb_call_nmi_hook(). Co-developed-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20230906090246.v13.6.I2ef26d1b3bfbed2d10a281942b0da7d9854de05e@changeid Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-09-25 17:15:29 +01:00
Douglas Anderson	d7402513c9	arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMI There's no reason why IPI_CPU_STOP and IPI_CPU_CRASH_STOP can't be handled as NMI. They are very simple and everything in them is NMI-safe. Mark them as things to use NMI for if NMI is available. Suggested-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Misono Tomohiro <misono.tomohiro@fujitsu.com> Reviewed-by: Sumit Garg <sumit.garg@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20230906090246.v13.5.Ifadbfd45b22c52edcb499034dd4783d096343260@changeid Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-09-25 17:15:29 +01:00
Douglas Anderson	331a1b3a83	arm64: smp: Add arch support for backtrace using pseudo-NMI Enable arch_trigger_cpumask_backtrace() support on arm64. This enables things much like they are enabled on arm32 (including some of the funky logic around NR_IPI, nr_ipi, and MAX_IPI) but with the difference that, unlike arm32, we'll try to enable the backtrace to use pseudo-NMI. NOTE: this patch is a squash of the little bit of code adding the ability to mark an IPI to try to use pseudo-NMI plus the little bit of code to hook things up for kgdb. This approach was decided upon in the discussion of v9 [1]. This patch depends on commit `8d539b84f1` ("nmi_backtrace: allow excluding an arbitrary CPU") since that commit changed the prototype of arch_trigger_cpumask_backtrace(), which this patch implements. [1] https://lore.kernel.org/r/ZORY51mF4alI41G1@FVFF77S0Q05N Co-developed-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Co-developed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Misono Tomohiro <misono.tomohiro@fujitsu.com> Tested-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20230906090246.v13.4.Ie6c132b96ebbbcddbf6954b9469ed40a6960343c@changeid Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-09-25 17:15:28 +01:00
Mark Rutland	2b2d0a7a96	arm64: smp: Remove dedicated wakeup IPI To enable NMI backtrace and KGDB's NMI cpu roundup, we need to free up at least one dedicated IPI. On arm64 the IPI_WAKEUP IPI is only used for the ACPI parking protocol, which itself is only used on some very early ARMv8 systems which couldn't implement PSCI. Remove the IPI_WAKEUP IPI, and rely on the IPI_RESCHEDULE IPI to wake CPUs from the parked state. This will cause a tiny amonut of redundant work to check the thread flags, but this is miniscule in relation to the cost of taking and handling the IPI in the first place. We can safely handle redundant IPI_RESCHEDULE IPIs, so there should be no functional impact as a result of this change. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Sumit Garg <sumit.garg@linaro.org> Tested-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230906090246.v13.3.I7209db47ef8ec151d3de61f59005bbc59fe8f113@changeid Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-09-25 17:15:28 +01:00
Douglas Anderson	d0c14a7d36	arm64: idle: Tag the arm64 idle functions as __cpuidle As per the (somewhat recent) comment before the definition of `__cpuidle`, the tag is like `noinstr` but also marks a function so it can be identified by cpu_in_idle(). Let's add these markings to arm64 cpuidle functions With this change we get useful backtraces like: NMI backtrace for cpu N skipped: idling at cpu_do_idle+0x94/0x98 instead of useless backtraces when dumping all processors using nmi_cpu_backtrace(). NOTE: this patch won't make cpu_in_idle() work perfectly for arm64, but it doesn't hurt and does catch some cases. Specifically an example that wasn't caught in my testing looked like this: gic_cpu_sys_reg_init+0x1f8/0x314 gic_cpu_pm_notifier+0x40/0x78 raw_notifier_call_chain+0x5c/0x134 cpu_pm_notify+0x38/0x64 cpu_pm_exit+0x20/0x2c psci_enter_idle_state+0x48/0x70 cpuidle_enter_state+0xb8/0x260 cpuidle_enter+0x44/0x5c do_idle+0x188/0x30c Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Acked-by: Sumit Garg <sumit.garg@linaro.org> Tested-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20230906090246.v13.2.I4baba13e220bdd24d11400c67f137c35f07f82c7@changeid Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-09-25 17:15:28 +01:00
Mark Brown	391208485c	arm64/sve: Remove SMCR pseudo register from cpufeature code For reasons that are not currently apparent during cpufeature enumeration we maintain a pseudo register for SMCR which records the maximum supported vector length using the value that would be written to SMCR_EL1.LEN to configure it. This is not exposed to userspace and is not sufficient for detecting unsupportable configurations, we need the more detailed checks in vec_update_vq_map() for that since we can't cope with missing vector lengths on late CPUs and KVM requires an exactly matching set of supported vector lengths as EL1 can enumerate VLs directly with the hardware. Remove the code, replacing the usage in sme_setup() with a query of the vq_map. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230913-arm64-vec-len-cpufeature-v1-2-cc69b0600a8a@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-09-25 16:14:49 +01:00
Mark Brown	abef0695f9	arm64/sve: Remove ZCR pseudo register from cpufeature code For reasons that are not currently apparent during cpufeature enumeration we maintain a pseudo register for ZCR which records the maximum supported vector length using the value that would be written to ZCR_EL1.LEN to configure it. This is not exposed to userspace and is not sufficient for detecting unsupportable configurations, we need the more detailed checks in vec_update_vq_map() for that since we can't cope with missing vector lengths on late CPUs and KVM requires an exactly matching set of supported vector lengths as EL1 can enumerate VLs directly with the hardware. Remove the code, replacing the usage in sve_setup() with a query of the vq_map. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230913-arm64-vec-len-cpufeature-v1-1-cc69b0600a8a@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2023-09-25 16:14:49 +01:00

... 3 4 5 6 7 ...

4485 commits