Currently, there is no standard implementation for vdso_install,
leading to various issues:
1. Code duplication
Many architectures duplicate similar code just for copying files
to the install destination.
Some architectures (arm, sparc, x86) create build-id symlinks,
introducing more code duplication.
2. Unintended updates of in-tree build artifacts
The vdso_install rule depends on the vdso files to install.
It may update in-tree build artifacts. This can be problematic,
as explained in commit 19514fc665 ("arm, kbuild: make
"make install" not depend on vmlinux").
3. Broken code in some architectures
Makefile code is often copied from one architecture to another
without proper adaptation.
'make vdso_install' for parisc does not work.
'make vdso_install' for s390 installs vdso64, but not vdso32.
To address these problems, this commit introduces a generic vdso_install
rule.
Architectures that support vdso_install need to define vdso-install-y
in arch/*/Makefile. vdso-install-y lists the files to install.
For example, arch/x86/Makefile looks like this:
vdso-install-$(CONFIG_X86_64) += arch/x86/entry/vdso/vdso64.so.dbg
vdso-install-$(CONFIG_X86_X32_ABI) += arch/x86/entry/vdso/vdsox32.so.dbg
vdso-install-$(CONFIG_X86_32) += arch/x86/entry/vdso/vdso32.so.dbg
vdso-install-$(CONFIG_IA32_EMULATION) += arch/x86/entry/vdso/vdso32.so.dbg
These files will be installed to $(MODLIB)/vdso/ with the .dbg suffix,
if exists, stripped away.
vdso-install-y can optionally take the second field after the colon
separator. This is needed because some architectures install a vdso
file as a different base name.
The following is a snippet from arch/arm64/Makefile.
vdso-install-$(CONFIG_COMPAT_VDSO) += arch/arm64/kernel/vdso32/vdso.so.dbg:vdso32.so
This will rename vdso.so.dbg to vdso32.so during installation. If such
architectures change their implementation so that the base names match,
this workaround will go away.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Sven Schnelle <svens@linux.ibm.com> # s390
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Reviewed-by: Guo Ren <guoren@kernel.org>
Acked-by: Helge Deller <deller@gmx.de> # parisc
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
* arm64/for-next/perf:
perf: hisi: Fix use-after-free when register pmu fails
drivers/perf: hisi_pcie: Initialize event->cpu only on success
drivers/perf: hisi_pcie: Check the type first in pmu::event_init()
perf/arm-cmn: Enable per-DTC counter allocation
perf/arm-cmn: Rework DTC counters (again)
perf/arm-cmn: Fix DTC domain detection
drivers: perf: arm_pmuv3: Drop some unused arguments from armv8_pmu_init()
drivers: perf: arm_pmuv3: Read PMMIR_EL1 unconditionally
drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hisi_hns3_pmu uninit process
drivers/perf: xgene: Use device_get_match_data()
perf/amlogic: add missing MODULE_DEVICE_TABLE
docs/perf: Add ampere_cspmu to toctree to fix a build warning
perf: arm_cspmu: ampere_cspmu: Add support for Ampere SoC PMU
perf: arm_cspmu: Support implementation specific validation
perf: arm_cspmu: Support implementation specific filters
perf: arm_cspmu: Split 64-bit write to 32-bit writes
perf: arm_cspmu: Separate Arm and vendor module
* for-next/sve-remove-pseudo-regs:
: arm64/fpsimd: Remove the vector length pseudo registers
arm64/sve: Remove SMCR pseudo register from cpufeature code
arm64/sve: Remove ZCR pseudo register from cpufeature code
* for-next/backtrace-ipi:
: Add IPI for backtraces/kgdb, use NMI
arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup
arm64: smp: avoid NMI IPIs with broken MediaTek FW
arm64: smp: Mark IPI globals as __ro_after_init
arm64: kgdb: Implement kgdb_roundup_cpus() to enable pseudo-NMI roundup
arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMI
arm64: smp: Add arch support for backtrace using pseudo-NMI
arm64: smp: Remove dedicated wakeup IPI
arm64: idle: Tag the arm64 idle functions as __cpuidle
irqchip/gic-v3: Enable support for SGIs to act as NMIs
* for-next/kselftest:
: Various arm64 kselftest updates
kselftest/arm64: Validate SVCR in streaming SVE stress test
* for-next/misc:
: Miscellaneous patches
arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer
arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=n
arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper
clocksource/drivers/arm_arch_timer: limit XGene-1 workaround
arm64: Remove system_uses_lse_atomics()
arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unused
arm64/mm: Hoist synchronization out of set_ptes() loop
arm64: swiotlb: Reduce the default size if no ZONE_DMA bouncing needed
* for-next/cpufeat-display-cores:
: arm64 cpufeature display enabled cores
arm64: cpufeature: Change DBM to display enabled cores
arm64: cpufeature: Display the set of cores with a feature
The counting of module PLTs has been broken when CONFIG_RANDOMIZE_BASE=n
since commit:
3e35d303ab ("arm64: module: rework module VA range selection")
Prior to that commit, when CONFIG_RANDOMIZE_BASE=n, the kernel image and
all modules were placed within a 128M region, and no PLTs were necessary
for B or BL. Hence count_plts() and partition_branch_plt_relas() skipped
handling B and BL when CONFIG_RANDOMIZE_BASE=n.
After that commit, modules can be placed anywhere within a 2G window
regardless of CONFIG_RANDOMIZE_BASE, and hence PLTs may be necessary for
B and BL even when CONFIG_RANDOMIZE_BASE=n. Unfortunately that commit
failed to update count_plts() and partition_branch_plt_relas()
accordingly.
Due to this, module_emit_plt_entry() may fail if an insufficient number
of PLT entries have been reserved, resulting in modules failing to load
with -ENOEXEC.
Fix this by counting PLTs regardless of CONFIG_RANDOMIZE_BASE in
count_plts() and partition_branch_plt_relas().
Fixes: 3e35d303ab ("arm64: module: rework module VA range selection")
Signed-off-by: Maria Yu <quic_aiquny@quicinc.com>
Cc: <stable@vger.kernel.org> # 6.5.x
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Fixes: 3e35d303ab ("arm64: module: rework module VA range selection")
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20231024010954.6768-1-quic_aiquny@quicinc.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
ACPI, irqchip and the architecture code all inspect the MADT
enabled bit for a GICC entry in the MADT.
The addition of an 'online capable' bit means all these sites need
updating.
Move the current checks behind a helper to make future updates easier.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Acked-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/E1quv5D-00AeNJ-U8@rmk-PC.armlinux.org.uk
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that we have the ability to display the list of cores
with a feature when its selectivly enabled, lets convert
DBM to use that as well.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Link: https://lore.kernel.org/r/20231017052322.1211099-3-jeremy.linton@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The AMU feature can be enabled on a subset of the cores in a system.
Because of that, it prints a message for each core as it is detected.
This becomes tedious when there are hundreds of cores. Instead, for
CPU features which can be enabled on a subset of the present cores,
lets wait until update_cpu_capabilities() and print the subset of cores
the feature was enabled on.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@bytedance.com>
Tested-by: Punit Agrawal <punit.agrawal@bytedance.com>
Link: https://lore.kernel.org/r/20231017052322.1211099-2-jeremy.linton@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
get_user_pages_remote() will never return 0 except in the case of
FOLL_NOWAIT being specified, which we explicitly disallow.
This simplifies error handling for the caller and avoids the awkwardness
of dealing with both errors and failing to pin. Failing to pin here is an
error.
Link: https://lkml.kernel.org/r/00319ce292d27b3aae76a0eb220ce3f528187508.1696288092.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
After the vga console no longer relies on global screen_info, there are
only two remaining use cases:
- on the x86 architecture, it is used for multiple boot methods
(bzImage, EFI, Xen, kexec) to commucate the initial VGA or framebuffer
settings to a number of device drivers.
- on other architectures, it is only used as part of the EFI stub,
and only for the three sysfb framebuffers (simpledrm, simplefb, efifb).
Remove the duplicate data structure definitions by moving it into the
efi-init.c file that sets it up initially for the EFI case, leaving x86
as an exception that retains its own definition for non-EFI boots.
The added #ifdefs here are optional, I added them to further limit the
reach of screen_info to configurations that have at least one of the
users enabled.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20231017093947.3627976-1-arnd@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
set_ptes() sets a physically contiguous block of memory (which all
belongs to the same folio) to a contiguous block of ptes. The arm64
implementation of this previously just looped, operating on each
individual pte. But the __sync_icache_dcache() and mte_sync_tags()
operations can both be hoisted out of the loop so that they are
performed once for the contiguous set of pages (which may be less than
the whole folio). This should result in minor performance gains.
__sync_icache_dcache() already acts on the whole folio, and sets a flag
in the folio so that it skips duplicate calls. But by hoisting the call,
all the pte testing is done only once.
mte_sync_tags() operates on each individual page with its own loop. But
by passing the number of pages explicitly, we can rely solely on its
loop and do the checks only once. This approach also makes it robust for
the future, rather than assuming if a head page of a compound page is
being mapped, then the whole compound page is being mapped, instead we
explicitly know how many pages are being mapped. The old assumption may
not continue to hold once the "anonymous large folios" feature is
merged.
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20231005140730.2191134-1-ryan.roberts@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
There are no longer any users of cpus_have_const_cap(), and therefore it
can be removed.
Remove cpus_have_const_cap(). At the same time, remove
__cpus_have_const_cap(), as this is a trivial wrapper of
alternative_has_cap_unlikely(), which can be used directly instead.
The comment for __system_matches_cap() is updated to no longer refer to
cpus_have_const_cap(). As we have a number of ways to check the cpucaps,
the specific suggestions are removed.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Kristina Martsenko <kristina.martsenko@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In has_useable_cnp() we use cpus_have_const_cap() to check for
ARM64_WORKAROUND_NVIDIA_CARMEL_CNP, but this is not necessary and
cpus_have_cap() would be preferable.
For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.
Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.
We use has_useable_cnp() to determine whether we have the system-wide
ARM64_HAS_CNP cpucap. Due to the structure of the cpufeature code, we
call has_useable_cnp() in two distinct cases:
1) When finalizing system capabilities, setup_system_capabilities() will
call has_useable_cnp() with SCOPE_SYSTEM to determine whether all
CPUs have the feature. This is called after we've detected any local
cpucaps including ARM64_WORKAROUND_NVIDIA_CARMEL_CNP, but prior to
patching alternatives.
If the ARM64_WORKAROUND_NVIDIA_CARMEL_CNP was detected, we will not
detect ARM64_HAS_CNP.
2) After finalizing system capabilties, verify_local_cpu_capabilities()
will call has_useable_cnp() with SCOPE_LOCAL_CPU to verify that CPUs
have CNP if we previously detected it.
Note that if ARM64_WORKAROUND_NVIDIA_CARMEL_CNP was detected, we will
not have detected ARM64_HAS_CNP.
For case 1 we must check the system_cpucaps bitmap as this occurs prior
to patching the alternatives. For case 2 we'll only call
has_useable_cnp() once per subsequent onlining of a CPU, and as this
isn't a fast path it's not necessary to optimize for this case.
This patch replaces the use of cpus_have_const_cap() with
cpus_have_cap(), which will only generate the bitmap test and avoid
generating an alternative sequence, resulting in slightly simpler annd
smaller code being generated. The ARM64_WORKAROUND_NVIDIA_CARMEL_CNP
cpucap is added to cpucap_is_possible() so that code can be elided
entirely when this is not possible.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In elf_hwcap_fixup() we use cpus_have_const_cap() to check for
ARM64_WORKAROUND_1742098, but this is not necessary and cpus_have_cap()
would be preferable.
For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.
Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.
The ARM64_WORKAROUND_1742098 cpucap is detected and patched before
elf_hwcap_fixup() can run, and hence it is not necessary to use
cpus_have_const_cap(). We run cpus_have_const_cap() at most twice: once
after finalizing system cpucaps, and potentially once more after
detecting mismatched CPUs which support AArch32 at EL0. Due to this,
it's not necessary to optimize for many calls to elf_hwcap_fixup(), and
it's fine to use cpus_have_cap().
This patch replaces the use of cpus_have_const_cap() with
cpus_have_cap(), which will only generate the bitmap test and avoid
generating an alternative sequence, resulting in slightly simpler annd
smaller code being generated. For consistenct with other cpucaps, the
ARM64_WORKAROUND_1742098 cpucap is added to cpucap_is_possible() so that
code can be elided when this is not possible. However, as we only define
compat_elf_hwcap2 when CONFIG_COMPAT=y, some ifdeffery is still required
within user_feature_fixup() to avoid build errors when CONFIG_COMPAT=n.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We use cpus_have_const_cap() to check for ARM64_WORKAROUND_1542419 but
this is not necessary and cpus_have_final_cap() would be preferable.
For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.
Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.
The ARM64_WORKAROUND_1542419 cpucap is detected and patched before any
userspace code can run, and the both __do_compat_cache_op() and
ctr_read_handler() are only reachable from exceptions taken from
userspace. Thus it is not necessary for either to use
cpus_have_const_cap(), and cpus_have_final_cap() is equivalent.
This patch replaces the use of cpus_have_const_cap() with
cpus_have_final_cap(), which will avoid generating code to test the
system_cpucaps bitmap and should be better for all subsequent calls at
runtime. Using cpus_have_final_cap() clearly documents that we do not
expect this code to run before cpucaps are finalized, and will make it
easier to spot issues if code is changed in future to allow these
functions to be reached earlier.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In count_plts() and is_forbidden_offset_for_adrp() we use
cpus_have_const_cap() to check for ARM64_WORKAROUND_843419, but this is
not necessary and cpus_have_final_cap() would be preferable.
For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.
Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.
It's not possible to load a module in the window between detecting the
ARM64_WORKAROUND_843419 cpucap and patching alternatives. The module VA
range limits are initialized much later in module_init_limits() which is
a subsys_initcall, and module loading cannot happen before this. Hence
it's not necessary for count_plts() or is_forbidden_offset_for_adrp() to
use cpus_have_const_cap().
This patch replaces the use of cpus_have_const_cap() with
cpus_have_final_cap() which will avoid generating code to test the
system_cpucaps bitmap and should be better for all subsequent calls at
runtime. Using cpus_have_final_cap() clearly documents that we do not
expect this code to run before cpucaps are finalized, and will make it
easier to spot issues if code is changed in future to allow modules to
be loaded earlier. The ARM64_WORKAROUND_843419 cpucap is added to
cpucap_is_possible() so that code can be elided entirely when this is not
possible, and redundant IS_ENABLED() checks are removed.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In arm64_kernel_unmapped_at_el0() we use cpus_have_const_cap() to check
for ARM64_UNMAP_KERNEL_AT_EL0, but this is only necessary so that
arm64_get_bp_hardening_vector() and this_cpu_set_vectors() can run prior
to alternatives being patched. Otherwise this is not necessary and
alternative_has_cap_*() would be preferable.
For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.
Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.
The ARM64_UNMAP_KERNEL_AT_EL0 cpucap is a system-wide feature that is
detected and patched before any translation tables are created for
userspace. In the window between detecting the ARM64_UNMAP_KERNEL_AT_EL0
cpucap and patching alternatives, most users of
arm64_kernel_unmapped_at_el0() do not need to know that the cpucap has
been detected:
* As KVM is initialized after cpucaps are finalized, no usaef of
arm64_kernel_unmapped_at_el0() in the KVM code is reachable during
this window.
* The arm64_mm_context_get() function in arch/arm64/mm/context.c is only
called after the SMMU driver is brought up after alternatives have
been patched. Thus this can safely use cpus_have_final_cap() or
alternative_has_cap_*().
Similarly the asids_update_limit() function is called after
alternatives have been patched as an arch_initcall, and this can
safely use cpus_have_final_cap() or alternative_has_cap_*().
Similarly we do not expect an ASID rollover to occur between cpucaps
being detected and patching alternatives. Thus
set_reserved_asid_bits() can safely use cpus_have_final_cap() or
alternative_has_cap_*().
* The __tlbi_user() and __tlbi_user_level() macros are not used during
this window, and only need to invalidate additional entries once
userspace translation tables have been active on a CPU. Thus these can
safely use alternative_has_cap_*().
* The xen_kernel_unmapped_at_usr() function is not used during this
window as it is only used in a late_initcall. Thus this can safely use
cpus_have_final_cap() or alternative_has_cap_*().
* The arm64_get_meltdown_state() function is not used during this
window. It only used by arm64_get_meltdown_state() and KVM code, both
of which are only used after cpucaps have been finalized. Thus this
can safely use cpus_have_final_cap() or alternative_has_cap_*().
* The tls_thread_switch() uses arm64_kernel_unmapped_at_el0() as an
optimization to avoid zeroing tpidrro_el0 when KPTI is enabled
and this will be trampled by the KPTI trampoline. It doesn't matter if
this continues to zero the register during the window between
detecting the cpucap and patching alternatives, so this can safely use
alternative_has_cap_*().
* The sdei_arch_get_entry_point() and do_sdei_event() functions aren't
reachable at this time as the SDEI driver is registered later by
acpi_init() -> acpi_ghes_init() -> sdei_init(), where acpi_init is a
subsys_initcall. Thus these can safely use cpus_have_final_cap() or
alternative_has_cap_*().
* The uses under drivers/ aren't reachable at this time as the drivers
are registered later:
- TRBE is registered via module_init()
- SMMUv3 is registred via module_driver()
- SPE is registred via module_init()
* The arm64_get_bp_hardening_vector() and this_cpu_set_vectors()
functions need to run on boot CPUs prior to patching alternatives.
As these are only called during the onlining of a CPU, it's fine to
perform a system_cpucaps bitmap test using cpus_have_cap().
This patch modifies this_cpu_set_vectors() to use cpus_have_cap(), and
replaced all other use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime. The ARM64_UNMAP_KERNEL_AT_EL0 cpucap is added to
cpucap_is_possible() so that code can be elided entirely when this is
not possible.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In system_supports_{sve,sme,sme2,fa64}() we use cpus_have_const_cap() to
check for the relevant cpucaps, but this is only necessary so that
sve_setup() and sme_setup() can run prior to alternatives being patched,
and otherwise alternative_has_cap_*() would be preferable.
For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.
Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.
All of system_supports_{sve,sme,sme2,fa64}() will return false prior to
system cpucaps being detected. In the window between system cpucaps being
detected and patching alternatives, we need system_supports_sve() and
system_supports_sme() to run to initialize SVE and SME properties, but
all other users of system_supports_{sve,sme,sme2,fa64}() don't depend on
the relevant cpucap becoming true until alternatives are patched:
* No KVM code runs until after alternatives are patched, and so this can
safely use cpus_have_final_cap() or alternative_has_cap_*().
* The cpuid_cpu_online() callback in arch/arm64/kernel/cpuinfo.c is
registered later from cpuinfo_regs_init() as a device_initcall, and so
this can safely use cpus_have_final_cap() or alternative_has_cap_*().
* The entry, signal, and ptrace code isn't reachable until userspace has
run, and so this can safely use cpus_have_final_cap() or
alternative_has_cap_*().
* Currently perf_reg_validate() will un-reserve the PERF_REG_ARM64_VG
pseudo-register before alternatives are patched, and before
sve_setup() has run. If a sampling event is created early enough, this
would allow perf_ext_reg_value() to sample (the as-yet uninitialized)
thread_struct::vl[] prior to alternatives being patched.
It would be preferable to defer this until alternatives are patched,
and this can safely use alternative_has_cap_*().
* The context-switch code will run during this window as part of
stop_machine() used during alternatives_patch_all(), and potentially
for other work if other kernel threads are created early. No threads
require the use of SVE/SME/SME2/FA64 prior to alternatives being
patched, and it would be preferable for the related context-switch
logic to take effect after alternatives are patched so that ths is
guaranteed to see a consistent system-wide state (e.g. anything
initialized by sve_setup() and sme_setup().
This can safely ues alternative_has_cap_*().
This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime. The sve_setup() and sme_setup() functions are modified to
use cpus_have_cap() directly so that they can observe the cpucaps being
set prior to alternatives being patched.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In ssbs_thread_switch() we use cpus_have_const_cap() to check for
ARM64_SSBS, but this is not necessary and alternative_has_cap_*() would
be preferable.
For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.
Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.
The cpus_have_const_cap() check in ssbs_thread_switch() is an
optimization to avoid the overhead of
spectre_v4_enable_task_mitigation() where all CPUs implement SSBS and
naturally preserve the SSBS bit in SPSR_ELx. In the window between
detecting the ARM64_SSBS system-wide and patching alternative branches
it is benign to continue to call spectre_v4_enable_task_mitigation().
This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In system_uses_hw_pan() we use cpus_have_const_cap() to check for
ARM64_HAS_PAN, but this is only necessary so that the
system_uses_ttbr0_pan() check in setup_cpu_features() can run prior to
alternatives being patched, and otherwise this is not necessary and
alternative_has_cap_*() would be preferable.
For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.
Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.
The ARM64_HAS_PAN cpucap is used by system_uses_hw_pan() and
system_uses_ttbr0_pan() depending on whether CONFIG_ARM64_SW_TTBR0_PAN
is selected, and:
* We only use system_uses_hw_pan() directly in __sdei_handler(), which
isn't reachable until after alternatives have been patched, and for
this it is safe to use alternative_has_cap_*().
* We use system_uses_ttbr0_pan() in a few places:
- In check_and_switch_context() and cpu_uninstall_idmap(), which will
defer installing a translation table into TTBR0 when the
ARM64_HAS_PAN cpucap is not detected.
Prior to patching alternatives, all CPUs will be using init_mm with
the reserved ttbr0 translation tables install in TTBR0, so these can
safely use alternative_has_cap_*().
- In update_saved_ttbr0(), which will only save the active TTBR0 into
a per-thread variable when the ARM64_HAS_PAN cpucap is not detected.
Prior to patching alternatives, all CPUs will be using init_mm with
the reserved ttbr0 translation tables install in TTBR0, so these can
safely use alternative_has_cap_*().
- In efi_set_pgd(), which will handle check_and_switch_context()
deferring the installation of TTBR0 when TTBR0 PAN is detected.
The EFI runtime services are not initialized until after
alternatives have been patched, and so this can safely use
alternative_has_cap_*() or cpus_have_final_cap().
- In uaccess_ttbr0_disable() and uaccess_ttbr0_enable(), where we'll
avoid installing/uninstalling a translation table in TTBR0 when
ARM64_HAS_PAN is detected.
Prior to patching alternatives we will not perform any uaccess and
will not call uaccess_ttbr0_disable() or uaccess_ttbr0_enable(), and
so these can safely use alternative_has_cap_*() or
cpus_have_final_cap().
- In is_el1_permission_fault() where we will consider a translation
fault on a TTBR0 address to be a permission fault when ARM64_HAS_PAN
is not detected *and* we have set the PAN bit in the SPSR (which
tells us that in the interrupted context, TTBR0 pointed at the
reserved zero ttbr).
In the window between detecting system cpucaps and patching
alternatives we should not perform any accesses to TTBR0 addresses,
and no userspace translation tables exist until after patching
alternatives. Thus it is safe for this to use alternative_has_cap*().
This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime.
So that the check for TTBR0 PAN in setup_cpu_features() can run prior to
alternatives being patched, the call to system_uses_ttbr0_pan() is
replaced with an explicit check of the ARM64_HAS_PAN bit in the
system_cpucaps bitmap.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In __cpu_suspend_exit() we use cpus_have_const_cap() to check for
ARM64_HAS_DIT but this is not necessary and cpus_have_final_cap() of
alternative_has_cap_*() would be preferable.
For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.
Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.
The ARM64_HAS_DIT cpucap is detected and patched (along with all other
cpucaps) before __cpu_suspend_exit() can run. We'll only use
__cpu_suspend_exit() as part of PSCI cpuidle or hibernation, and both of
these are intialized after system cpucaps are detected and patched: the
PSCI cpuidle driver is registered with a device_initcall, hibernation
restoration occurs in a late_initcall, and hibarnation saving is driven
by usrspace. Therefore it is not necessary to use cpus_have_const_cap(),
and using alternative_has_cap_*() or cpus_have_final_cap() is
sufficient.
This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime. To clearly document the ordering relationship between
suspend/resume and alternatives patching, an explicit check for
system_capabilities_finalized() is added to cpu_suspend() along with a
comment block, which will make it easier to spot issues if code is
changed in future to allow these functions to be reached earlier.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In system_supports_cnp() we use cpus_have_const_cap() to check for
ARM64_HAS_CNP, but this is only necessary so that the cpu_enable_cnp()
callback can run prior to alternatives being patched, and otherwise this
is not necessary and alternative_has_cap_*() would be preferable.
For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.
Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.
The cpu_enable_cnp() callback is run immediately after the ARM64_HAS_CNP
cpucap is detected system-wide under setup_system_capabilities(), prior
to alternatives being patched. During this window cpu_enable_cnp() uses
cpu_replace_ttbr1() to set the CNP bit for the swapper_pg_dir in TTBR1.
No other users of the ARM64_HAS_CNP cpucap need the up-to-date value
during this window:
* As KVM isn't initialized yet, kvm_get_vttbr() isn't reachable.
* As cpuidle isn't initialized yet, __cpu_suspend_exit() isn't
reachable.
* At this point all CPUs are using the swapper_pg_dir with a reserved
ASID in TTBR1, and the idmap_pg_dir in TTBR0, so neither
check_and_switch_context() nor cpu_do_switch_mm() need to do anything
special.
This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime. To allow cpu_enable_cnp() to function prior to alternatives
being patched, cpu_replace_ttbr1() is split into cpu_replace_ttbr1() and
cpu_enable_swapper_cnp(), with the former only used for early TTBR1
replacement, and the latter used by both cpu_enable_cnp() and
__cpu_suspend_exit().
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In system_supports_bti() we use cpus_have_const_cap() to check for
ARM64_HAS_BTI, but this is not necessary and alternative_has_cap_*() or
cpus_have_final_*cap() would be preferable.
For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.
Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.
When CONFIG_ARM64_BTI_KERNEL=y, the ARM64_HAS_BTI cpucap is a strict
boot cpu feature which is detected and patched early on the boot cpu.
All uses guarded by CONFIG_ARM64_BTI_KERNEL happen after the boot CPU
has detected ARM64_HAS_BTI and patched boot alternatives, and hence can
safely use alternative_has_cap_*() or cpus_have_final_boot_cap().
Regardless of CONFIG_ARM64_BTI_KERNEL, all other uses of ARM64_HAS_BTI
happen after system capabilities have been finalized and alternatives
have been patched. Hence these can safely use alternative_has_cap_*) or
cpus_have_final_cap().
This patch splits system_supports_bti() into system_supports_bti() and
system_supports_bti_kernel(), with the former handling where the cpucap
affects userspace functionality, and ther latter handling where the
cpucap affects kernel functionality. The use of cpus_have_const_cap() is
replaced by cpus_have_final_cap() in cpus_have_const_cap, and
cpus_have_final_boot_cap() in system_supports_bti_kernel(). This will
avoid generating code to test the system_cpucaps bitmap and should be
better for all subsequent calls at runtime. The use of
cpus_have_final_cap() and cpus_have_final_boot_cap() will make it easier
to spot if code is chaanged such that these run before the ARM64_HAS_BTI
cpucap is guaranteed to have been finalized.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently we have a negative cpucap which describes the *absence* of
FP/SIMD rather than *presence* of FP/SIMD. This largely works, but is
somewhat awkward relative to other cpucaps that describe the presence of
a feature, and it would be nicer to have a cpucap which describes the
presence of FP/SIMD:
* This will allow the cpucap to be treated as a standard
ARM64_CPUCAP_SYSTEM_FEATURE, which can be detected with the standard
has_cpuid_feature() function and ARM64_CPUID_FIELDS() description.
* This ensures that the cpucap will only transition from not-present to
present, reducing the risk of unintentional and/or unsafe usage of
FP/SIMD before cpucaps are finalized.
* This will allow using arm64_cpu_capabilities::cpu_enable() to enable
the use of FP/SIMD later, with FP/SIMD being disabled at boot time
otherwise. This will ensure that any unintentional and/or unsafe usage
of FP/SIMD prior to this is trapped, and will ensure that FP/SIMD is
never unintentionally enabled for userspace in mismatched big.LITTLE
systems.
This patch replaces the negative ARM64_HAS_NO_FPSIMD cpucap with a
positive ARM64_HAS_FPSIMD cpucap, making changes as described above.
Note that as FP/SIMD will now be trapped when not supported system-wide,
do_fpsimd_acc() must handle these traps in the same way as for SVE and
SME. The commentary in fpsimd_restore_current_state() is updated to
describe the new scheme.
No users of system_supports_fpsimd() need to know that FP/SIMD is
available prior to alternatives being patched, so this is updated to
use alternative_has_cap_likely() to check for the ARM64_HAS_FPSIMD
cpucap, without generating code to test the system_cpucaps bitmap.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The arm64_cpu_capabilities::cpu_enable() callbacks for SVE, SME, SME2,
and FA64 are named with an unusual "${feature}_kernel_enable" pattern
rather than the much more common "cpu_enable_${feature}". Now that we
only use these as cpu_enable() callbacks, it would be nice to have them
match the usual scheme.
This patch renames the cpu_enable() callbacks to match this scheme. At
the same time, the comment above cpu_enable_sve() is removed for
consistency with the other cpu_enable() callbacks.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Both sme2_kernel_enable() and fa64_kernel_enable() need to run after
sme_kernel_enable(). This happens to be true today as ARM64_SME has a
lower index than either ARM64_SME2 or ARM64_SME_FA64, and both functions
have a comment to this effect.
It would be nicer to have a build-time assertion like we for for
can_use_gic_priorities() and has_gic_prio_relaxed_sync(), as that way
it will be harder to miss any potential breakage.
This patch replaces the comments with build-time assertions.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When a CPUs onlined we first probe for supported features and
propetites, and then we subsequently enable features that have been
detected. This is a little problematic for SVE and SME, as some
properties (e.g. vector lengths) cannot be probed while they are
disabled. Due to this, the code probing for SVE properties has to enable
SVE for EL1 prior to proving, and the code probing for SME properties
has to enable SME for EL1 prior to probing. We never disable SVE or SME
for EL1 after probing.
It would be a little nicer to transiently enable SVE and SME during
probing, leaving them both disabled unless explicitly enabled, as this
would make it much easier to catch unintentional usage (e.g. when they
are not present system-wide).
This patch reworks the SVE and SME feature probing code to only
transiently enable support at EL1, disabling after probing is complete.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The arm64_cpu_capabilities::cpu_enable callbacks are intended for
cpu-local feature enablement (e.g. poking system registers). These get
called for each online CPU when boot/system cpucaps get finalized and
enabled, and get called whenever a CPU is subsequently onlined.
For KPTI with the ARM64_UNMAP_KERNEL_AT_EL0 cpucap, we use the
kpti_install_ng_mappings() function as the cpu_enable callback. This
does a mixture of cpu-local configuration (setting VBAR_EL1 to the
appropriate trampoline vectors) and some global configuration (rewriting
the swapper page tables to sue non-glboal mappings) that must happen at
most once.
This patch splits kpti_install_ng_mappings() into a cpu-local
cpu_enable_kpti() initialization function and a system-wide
kpti_install_ng_mappings() function. The cpu_enable_kpti() function is
responsible for selecting the necessary cpu-local vectors each time a
CPU is onlined, and the kpti_install_ng_mappings() function performs the
one-time rewrite of the translation tables too use non-global mappings.
Splitting the two makes the code a bit easier to follow and also allows
the page table rewriting code to be marked as __init such that it can be
freed after use.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
For ARM64_WORKAROUND_2658417, we use a cpu_enable() callback to hide the
ID_AA64ISAR1_EL1.BF16 ID register field. This is a little awkward as
CPUs may attempt to apply the workaround concurrently, requiring that we
protect the bulk of the callback with a raw_spinlock, and requiring some
pointless work every time a CPU is subsequently hotplugged in.
This patch makes this a little simpler by handling the masking once at
boot time. A new user_feature_fixup() function is called at the start of
setup_user_features() to mask the feature, matching the style of
elf_hwcap_fixup(). The ARM64_WORKAROUND_2658417 cpucap is added to
cpucap_is_possible() so that code can be elided entirely when this is
not possible.
Note that the ARM64_WORKAROUND_2658417 capability is matched with
ERRATA_MIDR_RANGE(), which implicitly gives the capability a
ARM64_CPUCAP_LOCAL_CPU_ERRATUM type, which forbids the late onlining of
a CPU with the erratum if the erratum was not present at boot time.
Therefore this patch doesn't change the behaviour for late onlining.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently setup_cpu_features() handles a mixture of one-time kernel
feature setup (e.g. cpucaps) and one-time user feature setup (e.g. ELF
hwcaps). Subsequent patches will rework other one-time setup and expand
the logic currently in setup_cpu_features(), and in preparation for this
it would be helpful to split the kernel and user setup into separate
functions.
This patch splits setup_user_features() out of setup_cpu_features(),
with a few additional cleanups of note:
* setup_cpu_features() is renamed to setup_system_features() to make it
clear that it handles system-wide feature setup rather than cpu-local
feature setup.
* setup_system_capabilities() is folded into setup_system_features().
* Presence of TTBR0 pan is logged immediately after
update_cpu_capabilities(), so that this is guaranteed to appear
alongside all the other detected system cpucaps.
* The 'cwg' variable is removed as its value is only consumed once and
it's simpler to use cache_type_cwg() directly without assigning its
return value to a variable.
* The call to setup_user_features() is moved after alternatives are
patched, which will allow user feature setup code to depend on
alternative branches and allow for simplifications in subsequent
patches.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
FEAT_LRCPC3 adds more instructions to support the Release Consistency model.
Add a HWCAP so that userspace can make decisions about instructions it can use.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230919162757.2707023-2-joey.gouly@arm.com
[catalin.marinas@arm.com: change the HWCAP number]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)
Removed the sentinel as well as the explicit size from ctl_isa_vars. The
size is redundant as the initialization sets it. Changed
insn_emulation->sysctl from a 2 element array of struct ctl_table to a
simple struct. This has no consequence for the sysctl registration as it
is forwarded as a pointer. Removed sentinel from sve_defatul_vl_table,
sme_default_vl_table, tagged_addr_sysctl_table and
armv8_pmu_sysctl_table.
This removal is safe because register_sysctl_sz and register_sysctl use
the array size in addition to checking for the sentinel.
Signed-off-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
An Armv8.8 FEAT_MOPS main or epilogue instruction will take an exception
if executed on a CPU with a different MOPS implementation option (A or
B) than the CPU where the preceding prologue instruction ran. In this
case the OS exception handler is expected to reset the registers and
restart execution from the prologue instruction.
A KVM guest may use the instructions at EL1 at times when the guest is
not able to handle the exception, expecting that the instructions will
only run on one CPU (e.g. when running UEFI boot services in the guest).
As KVM may reschedule the guest between different types of CPUs at any
time (on an asymmetric system), it needs to also handle the resulting
exception itself in case the guest is not able to. A similar situation
will also occur in the future when live migrating a guest from one type
of CPU to another.
Add handling for the MOPS exception to KVM. The handling can be shared
with the EL0 exception handler, as the logic and register layouts are
the same. The exception can be handled right after exiting a guest,
which avoids the cost of returning to the host exit handler.
Similarly to the EL0 exception handler, in case the main or epilogue
instruction is being single stepped, it makes sense to finish the step
before executing the prologue instruction, so advance the single step
state machine.
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230922112508.1774352-2-kristina.martsenko@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
In commit 2b2d0a7a96 ("arm64: smp: Remove dedicated wakeup IPI") we
started using a scheduler IPI to avoid a dedicated reschedule. When we
did this, we used arch_smp_send_reschedule() directly rather than
calling smp_send_reschedule(). The only difference is that calling
arch_smp_send_reschedule() directly avoids tracing. Presumably we
_don't_ want to avoid tracing here, so switch to
smp_send_reschedule().
Fixes: 2b2d0a7a96 ("arm64: smp: Remove dedicated wakeup IPI")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Some MediaTek devices have broken firmware which corrupts some GICR
registers behind the back of the OS, and pseudo-NMIs cannot be used on
these devices. For more details see commit:
44bd78dd2b ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues")
We did not take this problem into account in commit:
331a1b3a83 ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
Since that commit arm64's SMP code will try to setup some IPIs as
pseudo-NMIs, even on systems with broken FW. The GICv3 code will
(rightly) reject attempts to request interrupts as pseudo-NMIs,
resulting in boot-time failures.
Avoid the problem by taking the broken FW into account when deciding to
request IPIs as pseudo-NMIs. The GICv3 driver maintains a static_key
named "supports_pseudo_nmis" which is false on systems with broken FW,
and we can consult this within ipi_should_be_nmi().
Fixes: 331a1b3a83 ("arm64: smp: Add arch support for backtrace using pseudo-NMI")
Reported-by: Chen-Yu Tsai <wenst@chromium.org>
Closes: https://issuetracker.google.com/issues/197061987#comment68
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Chen-Yu Tsai <wenst@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
rcu_report_dead() and rcutree_migrate_callbacks() have their headers in
rcupdate.h while those are pure rcutree calls, like the other CPU-hotplug
functions.
Also rcu_cpu_starting() and rcu_report_dead() have different naming
conventions while they mirror each other's effects.
Fix the headers and propose a naming that relates both functions and
aligns with the prefix of other rcutree CPU-hotplug functions.
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
rcu_report_dead() has to be called locally by the CPU that is going to
exit the RCU state machine. Passing a cpu argument here is error-prone
and leaves the possibility for a racy remote call.
Use local access instead.
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Implement the workaround for ARM Cortex-A520 erratum 2966298. On an
affected Cortex-A520 core, a speculatively executed unprivileged load
might leak data from a privileged load via a cache side channel. The
issue only exists for loads within a translation regime with the same
translation (e.g. same ASID and VMID). Therefore, the issue only affects
the return to EL0.
The workaround is to execute a TLBI before returning to EL0 after all
loads of privileged data. A non-shareable TLBI to any address is
sufficient.
The workaround isn't necessary if page table isolation (KPTI) is
enabled, but for simplicity it will be. Page table isolation should
normally be disabled for Cortex-A520 as it supports the CSV3 feature
and the E0PD feature (used when KASLR is enabled).
Cc: stable@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230921194156.1050055-2-robh@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
SVE 2.1 introduced a new feature FEAT_SVE_B16B16 which adds instructions
supporting the BFloat16 floating point format. Report this to userspace
through the ID registers and hwcap.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230915-arm64-zfr-b16b16-el0-v1-1-f9aba807bdb5@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark the three IPI-related globals in smp.c as "__ro_after_init" since
they are only ever set in set_smp_ipi_range(), which is marked
"__init". This is a better and more secure marking than the old
"__read_mostly".
Suggested-by: Stephen Boyd <swboyd@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Chen-Yu Tsai <wenst@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20230906090246.v13.7.I625d393afd71e1766ef73d3bfaac0b347a4afd19@changeid
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Up until now we've been using the generic (weak) implementation for
kgdb_roundup_cpus() when using kgdb on arm64. Let's move to a custom
one. The advantage here is that, when pseudo-NMI is enabled on a
device, we'll be able to round up CPUs using pseudo-NMI. This allows
us to debug CPUs that are stuck with interrupts disabled. If
pseudo-NMIs are not enabled then we'll fallback to just using an IPI,
which is still slightly better than the generic implementation since
it avoids the potential situation described in the generic
kgdb_call_nmi_hook().
Co-developed-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Chen-Yu Tsai <wenst@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20230906090246.v13.6.I2ef26d1b3bfbed2d10a281942b0da7d9854de05e@changeid
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
There's no reason why IPI_CPU_STOP and IPI_CPU_CRASH_STOP can't be
handled as NMI. They are very simple and everything in them is
NMI-safe. Mark them as things to use NMI for if NMI is available.
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Misono Tomohiro <misono.tomohiro@fujitsu.com>
Reviewed-by: Sumit Garg <sumit.garg@linaro.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Chen-Yu Tsai <wenst@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20230906090246.v13.5.Ifadbfd45b22c52edcb499034dd4783d096343260@changeid
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Enable arch_trigger_cpumask_backtrace() support on arm64. This enables
things much like they are enabled on arm32 (including some of the
funky logic around NR_IPI, nr_ipi, and MAX_IPI) but with the
difference that, unlike arm32, we'll try to enable the backtrace to
use pseudo-NMI.
NOTE: this patch is a squash of the little bit of code adding the
ability to mark an IPI to try to use pseudo-NMI plus the little bit of
code to hook things up for kgdb. This approach was decided upon in the
discussion of v9 [1].
This patch depends on commit 8d539b84f1 ("nmi_backtrace: allow
excluding an arbitrary CPU") since that commit changed the prototype
of arch_trigger_cpumask_backtrace(), which this patch implements.
[1] https://lore.kernel.org/r/ZORY51mF4alI41G1@FVFF77S0Q05N
Co-developed-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Co-developed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Misono Tomohiro <misono.tomohiro@fujitsu.com>
Tested-by: Chen-Yu Tsai <wenst@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20230906090246.v13.4.Ie6c132b96ebbbcddbf6954b9469ed40a6960343c@changeid
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
To enable NMI backtrace and KGDB's NMI cpu roundup, we need to free up
at least one dedicated IPI.
On arm64 the IPI_WAKEUP IPI is only used for the ACPI parking protocol,
which itself is only used on some very early ARMv8 systems which
couldn't implement PSCI.
Remove the IPI_WAKEUP IPI, and rely on the IPI_RESCHEDULE IPI to wake
CPUs from the parked state. This will cause a tiny amonut of redundant
work to check the thread flags, but this is miniscule in relation to the
cost of taking and handling the IPI in the first place. We can safely
handle redundant IPI_RESCHEDULE IPIs, so there should be no functional
impact as a result of this change.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Sumit Garg <sumit.garg@linaro.org>
Tested-by: Chen-Yu Tsai <wenst@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230906090246.v13.3.I7209db47ef8ec151d3de61f59005bbc59fe8f113@changeid
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
As per the (somewhat recent) comment before the definition of
`__cpuidle`, the tag is like `noinstr` but also marks a function so it
can be identified by cpu_in_idle(). Let's add these markings to arm64
cpuidle functions
With this change we get useful backtraces like:
NMI backtrace for cpu N skipped: idling at cpu_do_idle+0x94/0x98
instead of useless backtraces when dumping all processors using
nmi_cpu_backtrace().
NOTE: this patch won't make cpu_in_idle() work perfectly for arm64,
but it doesn't hurt and does catch some cases. Specifically an example
that wasn't caught in my testing looked like this:
gic_cpu_sys_reg_init+0x1f8/0x314
gic_cpu_pm_notifier+0x40/0x78
raw_notifier_call_chain+0x5c/0x134
cpu_pm_notify+0x38/0x64
cpu_pm_exit+0x20/0x2c
psci_enter_idle_state+0x48/0x70
cpuidle_enter_state+0xb8/0x260
cpuidle_enter+0x44/0x5c
do_idle+0x188/0x30c
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Acked-by: Sumit Garg <sumit.garg@linaro.org>
Tested-by: Chen-Yu Tsai <wenst@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20230906090246.v13.2.I4baba13e220bdd24d11400c67f137c35f07f82c7@changeid
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
For reasons that are not currently apparent during cpufeature enumeration
we maintain a pseudo register for SMCR which records the maximum supported
vector length using the value that would be written to SMCR_EL1.LEN to
configure it. This is not exposed to userspace and is not sufficient for
detecting unsupportable configurations, we need the more detailed checks in
vec_update_vq_map() for that since we can't cope with missing vector
lengths on late CPUs and KVM requires an exactly matching set of supported
vector lengths as EL1 can enumerate VLs directly with the hardware.
Remove the code, replacing the usage in sme_setup() with a query of the
vq_map.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230913-arm64-vec-len-cpufeature-v1-2-cc69b0600a8a@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
For reasons that are not currently apparent during cpufeature enumeration
we maintain a pseudo register for ZCR which records the maximum supported
vector length using the value that would be written to ZCR_EL1.LEN to
configure it. This is not exposed to userspace and is not sufficient for
detecting unsupportable configurations, we need the more detailed checks in
vec_update_vq_map() for that since we can't cope with missing vector
lengths on late CPUs and KVM requires an exactly matching set of supported
vector lengths as EL1 can enumerate VLs directly with the hardware.
Remove the code, replacing the usage in sve_setup() with a query of the
vq_map.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230913-arm64-vec-len-cpufeature-v1-1-cc69b0600a8a@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>