Now Kbuild provides reasonable defaults for objtool, sanitizers, and
profilers.
Remove redundant variables.
Note:
This commit changes the coverage for some objects:
- include arch/mips/vdso/vdso-image.o into UBSAN, GCOV, KCOV
- include arch/sparc/vdso/vdso-image-*.o into UBSAN
- include arch/sparc/vdso/vma.o into UBSAN
- include arch/x86/entry/vdso/extable.o into KASAN, KCSAN, UBSAN, GCOV, KCOV
- include arch/x86/entry/vdso/vdso-image-*.o into KASAN, KCSAN, UBSAN, GCOV, KCOV
- include arch/x86/entry/vdso/vdso32-setup.o into KASAN, KCSAN, UBSAN, GCOV, KCOV
- include arch/x86/entry/vdso/vma.o into GCOV, KCOV
- include arch/x86/um/vdso/vma.o into KASAN, GCOV, KCOV
I believe these are positive effects because all of them are kernel
space objects.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Roberto Sassu <roberto.sassu@huawei.com>
LLVM moved their issue tracker from their own Bugzilla instance to GitHub
issues. While all of the links are still valid, they may not necessarily
show the most up to date information around the issues, as all updates
will occur on GitHub, not Bugzilla.
Another complication is that the Bugzilla issue number is not always the
same as the GitHub issue number. Thankfully, LLVM maintains this mapping
through two shortlinks:
https://llvm.org/bz<num> -> https://bugs.llvm.org/show_bug.cgi?id=<num>
https://llvm.org/pr<num> -> https://github.com/llvm/llvm-project/issues/<mapped_num>
Switch all "https://bugs.llvm.org/show_bug.cgi?id=<num>" links to the
"https://llvm.org/pr<num>" shortlink so that the links show the most up to
date information. Each migrated issue links back to the Bugzilla entry,
so there should be no loss of fidelity of information here.
Link: https://lkml.kernel.org/r/20240109-update-llvm-links-v1-3-eb09b59db071@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Fangrui Song <maskray@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Mykola Lysenko <mykolal@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The EFI stub makefile contains logic to ensure that the objects that
make up the stub do not contain relocations that require runtime fixups
(typically to account for the runtime load address of the executable)
On RISC-V, we also avoid GP based relocations, as they require that GP
is assigned the correct base in the startup code, which is not
implemented in the EFI stub.
So add these relocation types to the grep expression that is used to
carry out this check.
Link: https://lkml.kernel.org/r/42c63cb9-87d0-49db-9af8-95771b186684%40siemens.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The cflags for the RISC-V efistub were missing -mno-relax, thus were
under the risk that the compiler could use GP-relative addressing. That
happened for _edata with binutils-2.41 and kernel 6.1, causing the
relocation to fail due to an invalid kernel_size in handle_kernel_image.
It was not yet observed with newer versions, but that may just be luck.
Cc: <stable@vger.kernel.org>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
* Support for handling misaligned accesses in S-mode.
* Probing for misaligned access support is now properly cached and
handled in parallel.
* PTDUMP now reflects the SW reserved bits, as well as the PBMT and
NAPOT extensions.
* Performance improvements for TLB flushing.
* Support for many new relocations in the module loader.
* Various bug fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmVOUCcTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYicJ2D/9S+9dnHYHVGTeJfr9Zf2T4r+qHBPyx
LXbTAbgHN6139MgcRLMRlcUaQ04RVxuBCWhxewJ6mQiHiYNlullgKmJO8oYMS4uZ
2yQGHKhzKEVluXxe+qT6VW+zsP0cY6pDQ+e59AqZgyWzvATxMU4VtFfCDdjFG03I
k/8Y3MUKSHAKzIHUsGHiMW5J2YRiM/iVehv2gZfanreulWlK6lyiV4AZ4KChu8Sa
gix9QkFJw+9+7RHnouHvczt4xTqLPJQcdecLJsbisEI4VaaPtTVzkvXx/kwbMwX0
qkQnZ7I60fPHrCb9ccuedjDMa1Z0lrfwRldBGz9f9QaW37Eppirn6LA5JiZ1cA47
wKTwba6gZJCTRXELFTJLcv+Cwdy003E0y3iL5UK2rkbLqcxfvLdq1WAJU2t05Lmh
aRQN10BtM2DZG+SNPlLoBpXPDw0Q3KOc20zGtuhmk010+X4yOK7WXlu8zNGLLE0+
yHamiZqAbpIUIEzwDdGbb95jywR1sUhNTbScuhj4Rc79ZqLtPxty1PUhnfqFat1R
i3ngQtCbeUUYFS2YV9tKkXjLf/xkQNRbt7kQBowuvFuvfksl9UwMdRAWcE/h0M9P
7uz7cBFhuG0v/XblB7bUhYLkKITvP+ltSMyxaGlfpGqCLAH2KIztdZ2PLWLRdKeU
+9dtZSQR6oBLqQ==
=NhdR
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.7-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
- Support for handling misaligned accesses in S-mode
- Probing for misaligned access support is now properly cached and
handled in parallel
- PTDUMP now reflects the SW reserved bits, as well as the PBMT and
NAPOT extensions
- Performance improvements for TLB flushing
- Support for many new relocations in the module loader
- Various bug fixes and cleanups
* tag 'riscv-for-linus-6.7-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits)
riscv: Optimize bitops with Zbb extension
riscv: Rearrange hwcap.h and cpufeature.h
drivers: perf: Do not broadcast to other cpus when starting a counter
drivers: perf: Check find_first_bit() return value
of: property: Add fw_devlink support for msi-parent
RISC-V: Don't fail in riscv_of_parent_hartid() for disabled HARTs
riscv: Fix set_memory_XX() and set_direct_map_XX() by splitting huge linear mappings
riscv: Don't use PGD entries for the linear mapping
RISC-V: Probe misaligned access speed in parallel
RISC-V: Remove __init on unaligned_emulation_finish()
RISC-V: Show accurate per-hart isa in /proc/cpuinfo
RISC-V: Don't rely on positional structure initialization
riscv: Add tests for riscv module loading
riscv: Add remaining module relocations
riscv: Avoid unaligned access when relocating modules
riscv: split cache ops out of dma-noncoherent.c
riscv: Improve flush_tlb_kernel_range()
riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb
riscv: Improve flush_tlb_range() for hugetlb pages
riscv: Improve tlb_flush()
...
This patch leverages the alternative mechanism to dynamically optimize
bitops (including __ffs, __fls, ffs, fls) with Zbb instructions. When
Zbb ext is not supported by the runtime CPU, legacy implementation is
used. If Zbb is supported, then the optimized variants will be selected
via alternative patching.
The legacy bitops support is taken from the generic C implementation as
fallback.
If the parameter is a build-time constant, we leverage compiler builtin to
calculate the result directly, this approach is inspired by x86 bitops
implementation.
EFI stub runs before the kernel, so alternative mechanism should not be
used there, this patch introduces a macro NO_ALTERNATIVE for this purpose.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20231031064553.2319688-3-xiao.w.wang@intel.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Now that the EFI stub always zero inits its BSS section upon entry,
there is no longer a need to place the BSS symbols carried by the stub
into the .data section.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230912090051.4014114-18-ardb@google.com
Alexandre Ghiti <alexghiti@rivosinc.com> says:
The following KASLR implementation allows to randomize the kernel mapping:
- virtually: we expect the bootloader to provide a seed in the device-tree
- physically: only implemented in the EFI stub, it relies on the firmware to
provide a seed using EFI_RNG_PROTOCOL. arm64 has a similar implementation
hence the patch 3 factorizes KASLR related functions for riscv to take
advantage.
The new virtual kernel location is limited by the early page table that only
has one PUD and with the PMD alignment constraint, the kernel can only take
< 512 positions.
* b4-shazam-merge:
riscv: libstub: Implement KASLR by using generic functions
libstub: Fix compilation warning for rv32
arm64: libstub: Move KASLR handling functions to kaslr.c
riscv: Dump out kernel offset information on panic
riscv: Introduce virtual kernel mapping KASLR
Link: https://lore.kernel.org/r/20230722123850.634544-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
We can now use arm64 functions to handle the move of the kernel physical
mapping: if KASLR is enabled, we will try to get a random seed from the
firmware, if not possible, the kernel will be moved to a location that
suits its alignment constraints.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Song Shuai <songshuaishuai@tinylab.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230722123850.634544-6-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
This prepares for riscv to use the same functions to handle the pĥysical
kernel move when KASLR is enabled.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Song Shuai <songshuaishuai@tinylab.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230722123850.634544-4-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
- one bugfix for x86 mixed mode that did not make it into v6.5
- first pass of cleanup for the EFI runtime wrappers
- some cosmetic touchups
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCZOx+LAAKCRAwbglWLn0t
XKzLAPwK8wIZ14NlC55NCtdvKEzK3N/muxKRqg2MAHfrbnREFgD9GgUxSbIEU2gz
BXQM9GLaP86qCXkZlPYktjP6RVfDYAk=
=e2mx
-----END PGP SIGNATURE-----
Merge tag 'efi-next-for-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
"This primarily covers some cleanup work on the EFI runtime wrappers,
which are shared between all EFI architectures except Itanium, and
which provide some level of isolation to prevent faults occurring in
the firmware code (which runs at the same privilege level as the
kernel) from bringing down the system.
Beyond that, there is a fix that did not make it into v6.5, and some
doc fixes and dead code cleanup.
- one bugfix for x86 mixed mode that did not make it into v6.5
- first pass of cleanup for the EFI runtime wrappers
- some cosmetic touchups"
* tag 'efi-next-for-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
x86/efistub: Fix PCI ROM preservation in mixed mode
efi/runtime-wrappers: Clean up white space and add __init annotation
acpi/prmt: Use EFI runtime sandbox to invoke PRM handlers
efi/runtime-wrappers: Don't duplicate setup/teardown code
efi/runtime-wrappers: Remove duplicated macro for service returning void
efi/runtime-wrapper: Move workqueue manipulation out of line
efi/runtime-wrappers: Use type safe encapsulation of call arguments
efi/riscv: Move EFI runtime call setup/teardown helpers out of line
efi/arm64: Move EFI runtime call setup/teardown helpers out of line
efi/riscv: libstub: Fix comment about absolute relocation
efi: memmap: Remove kernel-doc warnings
efi: Remove unused extern declaration efi_lookup_mapped_addr()
In preparation for updating the EFI stub boot flow to avoid the bare
metal decompressor code altogether, implement the support code for
switching between 4 and 5 levels of paging before jumping to the kernel
proper.
Reuse the newly refactored trampoline that the bare metal decompressor
uses, but relies on EFI APIs to allocate 32-bit addressable memory and
remap it with the appropriate permissions. Given that the bare metal
decompressor will no longer call into the trampoline if the number of
paging levels is already set correctly, it is no longer needed to remove
NX restrictions from the memory range where this trampoline may end up.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/r/20230807162720.545787-17-ardb@kernel.org
We don't want absolute symbols references in the stub, so fix the double
negation in the comment.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
UEFI Specification version 2.9 introduces the concept of memory
acceptance: Some Virtual Machine platforms, such as Intel TDX or AMD
SEV-SNP, requiring memory to be accepted before it can be used by the
guest. Accepting happens via a protocol specific for the Virtual
Machine platform.
Accepting memory is costly and it makes VMM allocate memory for the
accepted guest physical address range. It's better to postpone memory
acceptance until memory is needed. It lowers boot time and reduces
memory overhead.
The kernel needs to know what memory has been accepted. Firmware
communicates this information via memory map: a new memory type --
EFI_UNACCEPTED_MEMORY -- indicates such memory.
Range-based tracking works fine for firmware, but it gets bulky for
the kernel: e820 (or whatever the arch uses) has to be modified on every
page acceptance. It leads to table fragmentation and there's a limited
number of entries in the e820 table.
Another option is to mark such memory as usable in e820 and track if the
range has been accepted in a bitmap. One bit in the bitmap represents a
naturally aligned power-2-sized region of address space -- unit.
For x86, unit size is 2MiB: 4k of the bitmap is enough to track 64GiB or
physical address space.
In the worst-case scenario -- a huge hole in the middle of the
address space -- It needs 256MiB to handle 4PiB of the address
space.
Any unaccepted memory that is not aligned to unit_size gets accepted
upfront.
The bitmap is allocated and constructed in the EFI stub and passed down
to the kernel via EFI configuration table. allocate_e820() allocates the
bitmap if unaccepted memory is present, according to the size of
unaccepted region.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230606142637.5171-4-kirill.shutemov@linux.intel.com
UEFI heavily relies on so-called protocols, which are essentially
tables populated with pointers to executable code, and these are invoked
indirectly using BR or BLR instructions.
This makes the EFI execution context vulnerable to attacks on forward
edge control flow, and so it would help if we could enable hardware
enforcement (BTI) on CPUs that implement it.
So let's no longer disable BTI codegen for the EFI stub, and set the
newly introduced PE/COFF header flag when the kernel is built with BTI
landing pads.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Instead of cleaning the entire loaded kernel image to the PoC and
disabling the MMU and caches before branching to the kernel's bare metal
entry point, we can leave the MMU and caches enabled, and rely on EFI's
cacheable 1:1 mapping of all of system RAM (which is mandated by the
spec) to populate the initial page tables.
This removes the need for managing coherency in software, which is
tedious and error prone.
Note that we still need to clean the executable region of the image to
the PoU if this is required for I/D coherency, but only if we actually
decided to move the image in memory, as otherwise, this will have been
taken care of by the loader.
This change affects both the builtin EFI stub as well as the zboot
decompressor, which now carries the entire EFI stub along with the
decompression code and the compressed image.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230111102236.1430401-7-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
- Refactor the zboot code so that it incorporates all the EFI stub
logic, rather than calling the decompressed kernel as a EFI app.
- Add support for initrd= command line option to x86 mixed mode.
- Allow initrd= to be used with arbitrary EFI accessible file systems
instead of just the one the kernel itself was loaded from.
- Move some x86-only handling and manipulation of the EFI memory map
into arch/x86, as it is not used anywhere else.
- More flexible handling of any random seeds provided by the boot
environment (i.e., systemd-boot) so that it becomes available much
earlier during the boot.
- Allow improved arch-agnostic EFI support in loaders, by setting a
uniform baseline of supported features, and adding a generic magic
number to the DOS/PE header. This should allow loaders such as GRUB or
systemd-boot to reduce the amount of arch-specific handling
substantially.
- (arm64) Run EFI runtime services from a dedicated stack, and use it to
recover from synchronous exceptions that might occur in the firmware
code.
- (arm64) Ensure that we don't allocate memory outside of the 48-bit
addressable physical range.
- Make EFI pstore record size configurable
- Add support for decoding CXL specific CPER records
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmOTQ1cACgkQw08iOZLZ
jyQRkAv+LqaZFWeVwhAQHiw/N3RnRM0nZHea6++D2p1y/ZbCpwv3pdLl2YHQ1KmW
wDG9Nr4C1ITLtfy1YZKeYpwloQtq9S1GZDWnFpVv/hdo7L924eRAwIlxowWn1OnP
ruxv2PaYXyb0plh1YD1f6E1BqrfUOtajET55Kxs9ZsxmnMtDpIX3NiYy4LKMBIZC
+Eywt41M3uBX+wgmSujFBMVVJjhOX60WhUYXqy0RXwDKOyrz/oW5td+eotSCreB6
FVbjvwQvUdtzn4s1FayOMlTrkxxLw4vLhsaUGAdDOHd3rg3sZT9Xh1HqFFD6nss6
ZAzAYQ6BzdiV/5WSB9meJe+BeG1hjTNKjJI6JPO2lctzYJqlnJJzI6JzBuH9vzQ0
dffLB8NITeEW2rphIh+q+PAKFFNbXWkJtV4BMRpqmzZ/w7HwupZbUXAzbWE8/5km
qlFpr0kmq8GlVcbXNOFjmnQVrJ8jPYn+O3AwmEiVAXKZJOsMH0sjlXHKsonme9oV
Sk71c6Em
=JEXz
-----END PGP SIGNATURE-----
Merge tag 'efi-next-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
"Another fairly sizable pull request, by EFI subsystem standards.
Most of the work was done by me, some of it in collaboration with the
distro and bootloader folks (GRUB, systemd-boot), where the main focus
has been on removing pointless per-arch differences in the way EFI
boots a Linux kernel.
- Refactor the zboot code so that it incorporates all the EFI stub
logic, rather than calling the decompressed kernel as a EFI app.
- Add support for initrd= command line option to x86 mixed mode.
- Allow initrd= to be used with arbitrary EFI accessible file systems
instead of just the one the kernel itself was loaded from.
- Move some x86-only handling and manipulation of the EFI memory map
into arch/x86, as it is not used anywhere else.
- More flexible handling of any random seeds provided by the boot
environment (i.e., systemd-boot) so that it becomes available much
earlier during the boot.
- Allow improved arch-agnostic EFI support in loaders, by setting a
uniform baseline of supported features, and adding a generic magic
number to the DOS/PE header. This should allow loaders such as GRUB
or systemd-boot to reduce the amount of arch-specific handling
substantially.
- (arm64) Run EFI runtime services from a dedicated stack, and use it
to recover from synchronous exceptions that might occur in the
firmware code.
- (arm64) Ensure that we don't allocate memory outside of the 48-bit
addressable physical range.
- Make EFI pstore record size configurable
- Add support for decoding CXL specific CPER records"
* tag 'efi-next-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (43 commits)
arm64: efi: Recover from synchronous exceptions occurring in firmware
arm64: efi: Execute runtime services from a dedicated stack
arm64: efi: Limit allocations to 48-bit addressable physical region
efi: Put Linux specific magic number in the DOS header
efi: libstub: Always enable initrd command line loader and bump version
efi: stub: use random seed from EFI variable
efi: vars: prohibit reading random seed variables
efi: random: combine bootloader provided RNG seed with RNG protocol output
efi/cper, cxl: Decode CXL Error Log
efi/cper, cxl: Decode CXL Protocol Error Section
efi: libstub: fix efi_load_initrd_dev_path() kernel-doc comment
efi: x86: Move EFI runtime map sysfs code to arch/x86
efi: runtime-maps: Clarify purpose and enable by default for kexec
efi: pstore: Add module parameter for setting the record size
efi: xen: Set EFI_PARAVIRT for Xen dom0 boot on all architectures
efi: memmap: Move manipulation routines into x86 arch tree
efi: memmap: Move EFI fake memmap support into x86 arch tree
efi: libstub: Undeprecate the command line initrd loader
efi: libstub: Add mixed mode support to command line initrd loader
efi: libstub: Permit mixed mode return types other than efi_status_t
...
ACPI:
* Enable FPDT support for boot-time profiling
* Fix CPU PMU probing to work better with PREEMPT_RT
* Update SMMUv3 MSI DeviceID parsing to latest IORT spec
* APMT support for probing Arm CoreSight PMU devices
CPU features:
* Advertise new SVE instructions (v2.1)
* Advertise range prefetch instruction
* Advertise CSSC ("Common Short Sequence Compression") scalar
instructions, adding things like min, max, abs, popcount
* Enable DIT (Data Independent Timing) when running in the kernel
* More conversion of system register fields over to the generated
header
CPU misfeatures:
* Workaround for Cortex-A715 erratum #2645198
Dynamic SCS:
* Support for dynamic shadow call stacks to allow switching at
runtime between Clang's SCS implementation and the CPU's
pointer authentication feature when it is supported (complete
with scary DWARF parser!)
Tracing and debug:
* Remove static ftrace in favour of, err, dynamic ftrace!
* Seperate 'struct ftrace_regs' from 'struct pt_regs' in core
ftrace and existing arch code
* Introduce and implement FTRACE_WITH_ARGS on arm64 to replace
the old FTRACE_WITH_REGS
* Extend 'crashkernel=' parameter with default value and fallback
to placement above 4G physical if initial (low) allocation
fails
SVE:
* Optimisation to avoid disabling SVE unconditionally on syscall
entry and just zeroing the non-shared state on return instead
Exceptions:
* Rework of undefined instruction handling to avoid serialisation
on global lock (this includes emulation of user accesses to the
ID registers)
Perf and PMU:
* Support for TLP filters in Hisilicon's PCIe PMU device
* Support for the DDR PMU present in Amlogic Meson G12 SoCs
* Support for the terribly-named "CoreSight PMU" architecture
from Arm (and Nvidia's implementation of said architecture)
Misc:
* Tighten up our boot protocol for systems with memory above
52 bits physical
* Const-ify static keys to satisty jump label asm constraints
* Trivial FFA driver cleanups in preparation for v1.1 support
* Export the kernel_neon_* APIs as GPL symbols
* Harden our instruction generation routines against
instrumentation
* A bunch of robustness improvements to our arch-specific selftests
* Minor cleanups and fixes all over (kbuild, kprobes, kfence, PMU, ...)
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmOPLFAQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNPRcCACLyDTvkimiqfoPxzzgdkx/6QOvw9s3/mXg
UcTORSZBR1VnYkiMYEKVz/tTfG99dnWtD8/0k/rz48NbhBfsF2sN4ukyBBXVf0zR
fjnaVyVC11LUgBgZKPo6maV+jf/JWf9hJtpPl06KTiPb2Hw2JX4DXg+PeF8t2hGx
NLH4ekQOrlDM8mlsN5mc0YsHbiuO7Xe/NRuet8TsgU4bEvLAwO6bzOLVUMqDQZNq
bQe2ENcGVAzAf7iRJb38lj9qB/5hrQTHRXqLXMSnJyyVjQEwYca0PeJMa7x30bXF
ZZ+xQ8Wq0mxiffZraf6SE34yD4gaYS4Fziw7rqvydC15vYhzJBH1
=hV+2
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"The highlights this time are support for dynamically enabling and
disabling Clang's Shadow Call Stack at boot and a long-awaited
optimisation to the way in which we handle the SVE register state on
system call entry to avoid taking unnecessary traps from userspace.
Summary:
ACPI:
- Enable FPDT support for boot-time profiling
- Fix CPU PMU probing to work better with PREEMPT_RT
- Update SMMUv3 MSI DeviceID parsing to latest IORT spec
- APMT support for probing Arm CoreSight PMU devices
CPU features:
- Advertise new SVE instructions (v2.1)
- Advertise range prefetch instruction
- Advertise CSSC ("Common Short Sequence Compression") scalar
instructions, adding things like min, max, abs, popcount
- Enable DIT (Data Independent Timing) when running in the kernel
- More conversion of system register fields over to the generated
header
CPU misfeatures:
- Workaround for Cortex-A715 erratum #2645198
Dynamic SCS:
- Support for dynamic shadow call stacks to allow switching at
runtime between Clang's SCS implementation and the CPU's pointer
authentication feature when it is supported (complete with scary
DWARF parser!)
Tracing and debug:
- Remove static ftrace in favour of, err, dynamic ftrace!
- Seperate 'struct ftrace_regs' from 'struct pt_regs' in core ftrace
and existing arch code
- Introduce and implement FTRACE_WITH_ARGS on arm64 to replace the
old FTRACE_WITH_REGS
- Extend 'crashkernel=' parameter with default value and fallback to
placement above 4G physical if initial (low) allocation fails
SVE:
- Optimisation to avoid disabling SVE unconditionally on syscall
entry and just zeroing the non-shared state on return instead
Exceptions:
- Rework of undefined instruction handling to avoid serialisation on
global lock (this includes emulation of user accesses to the ID
registers)
Perf and PMU:
- Support for TLP filters in Hisilicon's PCIe PMU device
- Support for the DDR PMU present in Amlogic Meson G12 SoCs
- Support for the terribly-named "CoreSight PMU" architecture from
Arm (and Nvidia's implementation of said architecture)
Misc:
- Tighten up our boot protocol for systems with memory above 52 bits
physical
- Const-ify static keys to satisty jump label asm constraints
- Trivial FFA driver cleanups in preparation for v1.1 support
- Export the kernel_neon_* APIs as GPL symbols
- Harden our instruction generation routines against instrumentation
- A bunch of robustness improvements to our arch-specific selftests
- Minor cleanups and fixes all over (kbuild, kprobes, kfence, PMU, ...)"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (151 commits)
arm64: kprobes: Return DBG_HOOK_ERROR if kprobes can not handle a BRK
arm64: kprobes: Let arch do_page_fault() fix up page fault in user handler
arm64: Prohibit instrumentation on arch_stack_walk()
arm64:uprobe fix the uprobe SWBP_INSN in big-endian
arm64: alternatives: add __init/__initconst to some functions/variables
arm_pmu: Drop redundant armpmu->map_event() in armpmu_event_init()
kselftest/arm64: Allow epoll_wait() to return more than one result
kselftest/arm64: Don't drain output while spawning children
kselftest/arm64: Hold fp-stress children until they're all spawned
arm64/sysreg: Remove duplicate definitions from asm/sysreg.h
arm64/sysreg: Convert ID_DFR1_EL1 to automatic generation
arm64/sysreg: Convert ID_DFR0_EL1 to automatic generation
arm64/sysreg: Convert ID_AFR0_EL1 to automatic generation
arm64/sysreg: Convert ID_MMFR5_EL1 to automatic generation
arm64/sysreg: Convert MVFR2_EL1 to automatic generation
arm64/sysreg: Convert MVFR1_EL1 to automatic generation
arm64/sysreg: Convert MVFR0_EL1 to automatic generation
arm64/sysreg: Convert ID_PFR2_EL1 to automatic generation
arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation
arm64/sysreg: Convert ID_PFR0_EL1 to automatic generation
...
Ampere Altra machines are reported to misbehave when the SetTime() EFI
runtime service is called after ExitBootServices() but before calling
SetVirtualAddressMap(). Given that the latter is horrid, pointless and
explicitly documented as optional by the EFI spec, we no longer invoke
it at boot if the configured size of the VA space guarantees that the
EFI runtime memory regions can remain mapped 1:1 like they are at boot
time.
On Ampere Altra machines, this results in SetTime() calls issued by the
rtc-efi driver triggering synchronous exceptions during boot. We can
now recover from those without bringing down the system entirely, due to
commit 23715a26c8 ("arm64: efi: Recover from synchronous
exceptions occurring in firmware"). However, it would be better to avoid
the issue entirely, given that the firmware appears to remain in a funny
state after this.
So attempt to identify these machines based on the 'family' field in the
type #1 SMBIOS record, and call SetVirtualAddressMap() unconditionally
in that case.
Tested-by: Alexandru Elisei <alexandru.elisei@gmail.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Enable asynchronous unwind table generation for both the core kernel as
well as modules, and emit the resulting .eh_frame sections as init code
so we can use the unwind directives for code patching at boot or module
load time.
This will be used by dynamic shadow call stack support, which will rely
on code patching rather than compiler codegen to emit the shadow call
stack push and pop instructions.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20221027155908.1940624-2-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
The LoongArch build of the EFI stub is part of the core kernel image, and
therefore accesses section markers directly when it needs to figure out
the size of the various section.
The zboot decompressor does not have access to those symbols, but
doesn't really need that either. So let's move handle_kernel_image()
into a separate file (or rather, move everything else into a separate
file) so that the zboot build does not pull in unused code that links to
symbols that it does not define.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The arm64 build of the EFI stub is part of the core kernel image, and
therefore accesses section markers directly when it needs to figure out
the size of the various section.
The zboot decompressor does not have access to those symbols, but
doesn't really need that either. So let's move handle_kernel_image()
into a separate file (or rather, move everything else into a separate
file) so that the zboot build does not pull in unused code that links to
symbols that it does not define.
While at it, introduce a helper routine that the generic zboot loader
will need to invoke after decompressing the image but before invoking
it, to ensure that the I-side view of memory is consistent.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The RISC-V build of the EFI stub is part of the core kernel image, and
therefore accesses section markers directly when it needs to figure out
the size of the various section.
The zboot decompressor does not have access to those symbols, but
doesn't really need that either. So let's move handle_kernel_image()
into a separate file (or rather, move everything else into a separate
file) so that the zboot build does not pull in unused code that links to
symbols that it does not define.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
In preparation for allowing the EFI zboot decompressor to reuse most of
the EFI stub machinery, factor out the actual EFI PE/COFF entrypoint
into a separate file.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Clone the implementations of strrchr() and memchr() in lib/string.c so
we can use them in the standalone zboot decompressor app. These routines
are used by the FDT handling code.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Currently, arm64, RISC-V and LoongArch rely on the fact that struct
screen_info can be accessed directly, due to the fact that the EFI stub
and the core kernel are part of the same image. This will change after a
future patch, so let's ensure that the screen_info handling is able to
deal with this, by adopting the arm32 approach of passing it as a
configuration table. While at it, switch to ACPI reclaim memory to hold
the screen_info data, which is more appropriate for this kind of
allocation.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Split the efi_printk() routine into its own source file, and provide
local implementations of strlen() and strnlen() so that the standalone
zboot app can efi_err and efi_info etc.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
We will be sharing efi-entry.S with the zboot decompressor build, which
does not link against vmlinux directly. So move it into the libstub
source directory so we can include in the libstub static library.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
No need for the same pattern to be used four times for each architecture
individually if we can just apply it once later.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
linux-next for a couple of months without, to my knowledge, any negative
reports (or any positive ones, come to that).
- Also the Maple Tree from Liam R. Howlett. An overlapping range-based
tree for vmas. It it apparently slight more efficient in its own right,
but is mainly targeted at enabling work to reduce mmap_lock contention.
Liam has identified a number of other tree users in the kernel which
could be beneficially onverted to mapletrees.
Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
(https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com).
This has yet to be addressed due to Liam's unfortunately timed
vacation. He is now back and we'll get this fixed up.
- Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
clang-generated instrumentation to detect used-unintialized bugs down to
the single bit level.
KMSAN keeps finding bugs. New ones, as well as the legacy ones.
- Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
memory into THPs.
- Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support
file/shmem-backed pages.
- userfaultfd updates from Axel Rasmussen
- zsmalloc cleanups from Alexey Romanov
- cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure
- Huang Ying adds enhancements to NUMA balancing memory tiering mode's
page promotion, with a new way of detecting hot pages.
- memcg updates from Shakeel Butt: charging optimizations and reduced
memory consumption.
- memcg cleanups from Kairui Song.
- memcg fixes and cleanups from Johannes Weiner.
- Vishal Moola provides more folio conversions
- Zhang Yi removed ll_rw_block() :(
- migration enhancements from Peter Xu
- migration error-path bugfixes from Huang Ying
- Aneesh Kumar added ability for a device driver to alter the memory
tiering promotion paths. For optimizations by PMEM drivers, DRM
drivers, etc.
- vma merging improvements from Jakub Matěn.
- NUMA hinting cleanups from David Hildenbrand.
- xu xin added aditional userspace visibility into KSM merging activity.
- THP & KSM code consolidation from Qi Zheng.
- more folio work from Matthew Wilcox.
- KASAN updates from Andrey Konovalov.
- DAMON cleanups from Kaixu Xia.
- DAMON work from SeongJae Park: fixes, cleanups.
- hugetlb sysfs cleanups from Muchun Song.
- Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA
joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf
bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU=
=xfWx
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any
negative reports (or any positive ones, come to that).
- Also the Maple Tree from Liam Howlett. An overlapping range-based
tree for vmas. It it apparently slightly more efficient in its own
right, but is mainly targeted at enabling work to reduce mmap_lock
contention.
Liam has identified a number of other tree users in the kernel which
could be beneficially onverted to mapletrees.
Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
at [1]. This has yet to be addressed due to Liam's unfortunately
timed vacation. He is now back and we'll get this fixed up.
- Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
clang-generated instrumentation to detect used-unintialized bugs down
to the single bit level.
KMSAN keeps finding bugs. New ones, as well as the legacy ones.
- Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
memory into THPs.
- Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
support file/shmem-backed pages.
- userfaultfd updates from Axel Rasmussen
- zsmalloc cleanups from Alexey Romanov
- cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
memory-failure
- Huang Ying adds enhancements to NUMA balancing memory tiering mode's
page promotion, with a new way of detecting hot pages.
- memcg updates from Shakeel Butt: charging optimizations and reduced
memory consumption.
- memcg cleanups from Kairui Song.
- memcg fixes and cleanups from Johannes Weiner.
- Vishal Moola provides more folio conversions
- Zhang Yi removed ll_rw_block() :(
- migration enhancements from Peter Xu
- migration error-path bugfixes from Huang Ying
- Aneesh Kumar added ability for a device driver to alter the memory
tiering promotion paths. For optimizations by PMEM drivers, DRM
drivers, etc.
- vma merging improvements from Jakub Matěn.
- NUMA hinting cleanups from David Hildenbrand.
- xu xin added aditional userspace visibility into KSM merging
activity.
- THP & KSM code consolidation from Qi Zheng.
- more folio work from Matthew Wilcox.
- KASAN updates from Andrey Konovalov.
- DAMON cleanups from Kaixu Xia.
- DAMON work from SeongJae Park: fixes, cleanups.
- hugetlb sysfs cleanups from Muchun Song.
- Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]
* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
hugetlb: allocate vma lock for all sharable vmas
hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
hugetlb: fix vma lock handling during split vma and range unmapping
mglru: mm/vmscan.c: fix imprecise comments
mm/mglru: don't sync disk for each aging cycle
mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
mm: memcontrol: use do_memsw_account() in a few more places
mm: memcontrol: deprecate swapaccounting=0 mode
mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
mm/secretmem: remove reduntant return value
mm/hugetlb: add available_huge_pages() func
mm: remove unused inline functions from include/linux/mm_inline.h
selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
selftests/vm: add thp collapse shmem testing
selftests/vm: add thp collapse file and tmpfs testing
selftests/vm: modularize thp collapse memory operations
selftests/vm: dedup THP helpers
mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
mm/madvise: add file and shmem support to MADV_COLLAPSE
...
- implement EFI boot support for LoongArch
- implement generic EFI compressed boot support for arm64, RISC-V and
LoongArch, none of which implement a decompressor today
- measure the kernel command line into the TPM if measured boot is in
effect
- refactor the EFI stub code in order to isolate DT dependencies for
architectures other than x86
- avoid calling SetVirtualAddressMap() on arm64 if the configured size
of the VA space guarantees that doing so is unnecessary
- move some ARM specific code out of the generic EFI source files
- unmap kernel code from the x86 mixed mode 1:1 page tables
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmM5mfEACgkQw08iOZLZ
jySnJwv9G2nBheSlK9bbWKvCpnDvVIExtlL+mg1wB64oxPrGiWRgjxeyA9+92bT0
Y6jYfKbGOGKnxkEJQl19ik6C3JfEwtGm4SnOVp4+osFeDRB7lFemfcIYN5dqz111
wkZA/Y15rnz3tZeGaXnq2jMoFuccQDXPJtOlqbdVqFQ5Py6YT92uMyuI079pN0T+
GSu7VVOX+SBsv4nGaUKIpSVwAP0gXkS/7s7CTf47QiR2+j8WMTlQEYZVjOKZjMJZ
/7hXY2/mduxnuVuT7cfx0mpZKEryUREJoBL5nDzjTnlhLb5X8cHKiaE1lx0aJ//G
JYTR8lDklJZl/7RUw/IW/YodcKcofr3F36NMzWB5vzM+KHOOpv4qEZhoGnaXv94u
auqhzYA83heaRjz7OISlk6kgFxdlIRE1VdrkEBXSlQeCQUv1woS+ZNVGYcKqgR0B
48b31Ogm2A0pAuba89+U9lz/n33lhIDtYvJqLO6AAPLGiVacD9ZdapN5kMftVg/1
SfhFqNzy
=d8Ps
-----END PGP SIGNATURE-----
Merge tag 'efi-next-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
"A bit more going on than usual in the EFI subsystem. The main driver
for this has been the introduction of the LoonArch architecture last
cycle, which inspired some cleanup and refactoring of the EFI code.
Another driver for EFI changes this cycle and in the future is
confidential compute.
The LoongArch architecture does not use either struct bootparams or DT
natively [yet], and so passing information between the EFI stub and
the core kernel using either of those is undesirable. And in general,
overloading DT has been a source of issues on arm64, so using DT for
this on new architectures is a to avoid for the time being (even if we
might converge on something DT based for non-x86 architectures in the
future). For this reason, in addition to the patch that enables EFI
boot for LoongArch, there are a number of refactoring patches applied
on top of which separate the DT bits from the generic EFI stub bits.
These changes are on a separate topich branch that has been shared
with the LoongArch maintainers, who will include it in their pull
request as well. This is not ideal, but the best way to manage the
conflicts without stalling LoongArch for another cycle.
Another development inspired by LoongArch is the newly added support
for EFI based decompressors. Instead of adding yet another
arch-specific incarnation of this pattern for LoongArch, we are
introducing an EFI app based on the existing EFI libstub
infrastructure that encapulates the decompression code we use on other
architectures, but in a way that is fully generic. This has been
developed and tested in collaboration with distro and systemd folks,
who are eager to start using this for systemd-boot and also for arm64
secure boot on Fedora. Note that the EFI zimage files this introduces
can also be decompressed by non-EFI bootloaders if needed, as the
image header describes the location of the payload inside the image,
and the type of compression that was used. (Note that Fedora's arm64
GRUB is buggy [0] so you'll need a recent version or switch to
systemd-boot in order to use this.)
Finally, we are adding TPM measurement of the kernel command line
provided by EFI. There is an oversight in the TCG spec which results
in a blind spot for command line arguments passed to loaded images,
which means that either the loader or the stub needs to take the
measurement. Given the combinatorial explosion I am anticipating when
it comes to firmware/bootloader stacks and firmware based attestation
protocols (SEV-SNP, TDX, DICE, DRTM), it is good to set a baseline now
when it comes to EFI measured boot, which is that the kernel measures
the initrd and command line. Intermediate loaders can measure
additional assets if needed, but with the baseline in place, we can
deploy measured boot in a meaningful way even if you boot into Linux
straight from the EFI firmware.
Summary:
- implement EFI boot support for LoongArch
- implement generic EFI compressed boot support for arm64, RISC-V and
LoongArch, none of which implement a decompressor today
- measure the kernel command line into the TPM if measured boot is in
effect
- refactor the EFI stub code in order to isolate DT dependencies for
architectures other than x86
- avoid calling SetVirtualAddressMap() on arm64 if the configured
size of the VA space guarantees that doing so is unnecessary
- move some ARM specific code out of the generic EFI source files
- unmap kernel code from the x86 mixed mode 1:1 page tables"
* tag 'efi-next-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits)
efi/arm64: libstub: avoid SetVirtualAddressMap() when possible
efi: zboot: create MemoryMapped() device path for the parent if needed
efi: libstub: fix up the last remaining open coded boot service call
efi/arm: libstub: move ARM specific code out of generic routines
efi/libstub: measure EFI LoadOptions
efi/libstub: refactor the initrd measuring functions
efi/loongarch: libstub: remove dependency on flattened DT
efi: libstub: install boot-time memory map as config table
efi: libstub: remove DT dependency from generic stub
efi: libstub: unify initrd loading between architectures
efi: libstub: remove pointless goto kludge
efi: libstub: simplify efi_get_memory_map() and struct efi_boot_memmap
efi: libstub: avoid efi_get_memory_map() for allocating the virt map
efi: libstub: drop pointless get_memory_map() call
efi: libstub: fix type confusion for load_options_size
arm64: efi: enable generic EFI compressed boot
loongarch: efi: enable generic EFI compressed boot
riscv: efi: enable generic EFI compressed boot
efi/libstub: implement generic EFI zboot
efi/libstub: move efi_system_table global var into separate object
...
This replaces the prior support for Clang's standard Control Flow
Integrity (CFI) instrumentation, which has required a lot of special
conditions (e.g. LTO) and work-arounds. The current implementation
("Kernel CFI") is specific to C, directly designed for the Linux kernel,
and takes advantage of architectural features like x86's IBT. This
series retains arm64 support and adds x86 support. Additional "generic"
architectural support is expected soon:
https://github.com/samitolvanen/llvm-project/commits/kcfi_generic
- treewide: Remove old CFI support details
- arm64: Replace Clang CFI support with Clang KCFI support
- x86: Introduce Clang KCFI support
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmM4aAUWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJkgWD/4mUgb7xewNIG/+fuipGd620Iao
K0T8q4BNxLNRltOxNc3Q0WMDCggX0qJGCeds7EdFQJQOGxWcbifM8MAS4idAGM0G
fc3Gxl1imC/oF6goCAbQgndA6jYFIWXGsv8LsRjAXRidWLFr3GFAqVqYJyokSySr
8zMQsEDuF4I1gQnOhEWdtPZbV3MQ4ZjfFzpv+33agbq6Gb72vKvDh3G6g2VXlxjt
1qnMtS+eEpbBU65cJkOi4MSLgymWbnIAeTMb0dbsV4kJ08YoTl8uz1B+weeH6GgT
WP73ZJ4nqh1kkkT9EqS9oKozNB9fObhvCokEuAjuQ7i1eCEZsbShvRc0iL7OKTGG
UfuTJa5qQ4h7Z0JS35FCSJETa+fcG0lTyEd133nLXLMZP9K2antf+A6O//fd0J1V
Jg4VN7DQmZ+UNGOzRkL6dTtQUy4PkxhniIloaClfSYXxhNirA+v//sHTnTK3z2Bl
6qceYqmFmns2Laual7+lvnZgt6egMBcmAL/MOdbU74+KIR9Xw76wxQjifktHX+WF
FEUQkUJDB5XcUyKlbvHoqobRMxvEZ8RIlC5DIkgFiPRE3TI0MqfzNSFnQ/6+lFNg
Y0AS9HYJmcj8sVzAJ7ji24WPFCXzsbFn6baJa9usDNbWyQZokYeiv7ZPNPHPDVrv
YEBP6aYko0lVSUS9qw==
=Li4D
-----END PGP SIGNATURE-----
Merge tag 'kcfi-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull kcfi updates from Kees Cook:
"This replaces the prior support for Clang's standard Control Flow
Integrity (CFI) instrumentation, which has required a lot of special
conditions (e.g. LTO) and work-arounds.
The new implementation ("Kernel CFI") is specific to C, directly
designed for the Linux kernel, and takes advantage of architectural
features like x86's IBT. This series retains arm64 support and adds
x86 support.
GCC support is expected in the future[1], and additional "generic"
architectural support is expected soon[2].
Summary:
- treewide: Remove old CFI support details
- arm64: Replace Clang CFI support with Clang KCFI support
- x86: Introduce Clang KCFI support"
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107048 [1]
Link: https://github.com/samitolvanen/llvm-project/commits/kcfi_generic [2]
* tag 'kcfi-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (22 commits)
x86: Add support for CONFIG_CFI_CLANG
x86/purgatory: Disable CFI
x86: Add types to indirectly called assembly functions
x86/tools/relocs: Ignore __kcfi_typeid_ relocations
kallsyms: Drop CONFIG_CFI_CLANG workarounds
objtool: Disable CFI warnings
objtool: Preserve special st_shndx indexes in elf_update_symbol
treewide: Drop __cficanonical
treewide: Drop WARN_ON_FUNCTION_MISMATCH
treewide: Drop function_nocfi
init: Drop __nocfi from __init
arm64: Drop unneeded __nocfi attributes
arm64: Add CFI error handling
arm64: Add types to indirect called assembly functions
psci: Fix the function type for psci_initcall_t
lkdtm: Emit an indirect call for CFI tests
cfi: Add type helper macros
cfi: Switch to -fsanitize=kcfi
cfi: Drop __CFI_ADDRESSABLE
cfi: Remove CONFIG_CFI_CLANG_SHADOW
...
This is necessary because the EFI libstub refactoring patches are mostly
directed at enabling LoongArch to wire up generic EFI boot support
without being forced to consume DT properties that conflict with
information that EFI also provides, e.g., memory map and reservations,
etc.
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmMy3RMACgkQw08iOZLZ
jyTTEwwAxIsv4t82EIj0D2Ml37TjmuB1cenbKHqq8c6cK/2xPGk1Hprd0KbpXuWh
hF88DoBeDyZ68RTmncEzwSCu5ZSIlQwPNATaAGn4qIYi6uHpHufM6IUDspYrXJnO
5K6HE0Rc5PKIDVJdA2dqnXLIxrVe5IG1UTHwzqJAi6/eTal5S22Y7lCALa0MjeAZ
fsGEOztztrdoRRY0+H3VStg4oVbMGmMH24N3ODtM5yNS7qqmfKrEvAfGkTC+6wGG
O8klUF9EcluvGNiLT2c2YeCKsqTVKun0K1TYvY8ATjAONqj8zImja9g1wTjtZOgz
rnfS0RGmbrv9X5jkaC7k8KkCGfMWwcuQTnKNYmuzt51bzNQw7tJWltfVMFR3pqN8
+1594fBxLT0llnS9P9qXpZLfjvxhqeQuNMkOQr+gG1E+2h1N9CJQxhhLKLtesLMp
Pm6RHlpc8CrYrmHDL6oPHEg6oYTNe3NmuIVUB71wsl8czGMrU11JpxG/Q4iOtZOB
vA5hkpn2
=IlXm
-----END PGP SIGNATURE-----
Merge tag 'efi-loongarch-for-v6.1-2' into HEAD
Second shared stable tag between EFI and LoongArch trees
This is necessary because the EFI libstub refactoring patches are mostly
directed at enabling LoongArch to wire up generic EFI boot support
without being forced to consume DT properties that conflict with
information that EFI also provides, e.g., memory map and reservations,
etc.
LoongArch does not use FDT or DT natively [yet], and the only reason it
currently uses it is so that it can reuse the existing EFI stub code.
Overloading the DT with data passed between the EFI stub and the core
kernel has been a source of problems: there is the overlap between
information provided by EFI which DT can also provide (initrd base/size,
command line, memory descriptions), requiring us to reason about which
is which and what to prioritize. It has also resulted in ABI leaks,
i.e., internal ABI being promoted to external ABI inadvertently because
the bootloader can set the EFI stub's DT properties as well (e.g.,
"kaslr-seed"). This has become especially problematic with boot
environments that want to pretend that EFI boot is being done (to access
ACPI and SMBIOS tables, for instance) but have no ability to execute the
EFI stub, and so the environment that the EFI stub creates is emulated
[poorly, in some cases].
Another downside of treating DT like this is that the DT binary that the
kernel receives is different from the one created by the firmware, which
is undesirable in the context of secure and measured boot.
Given that LoongArch support in Linux is brand new, we can avoid these
pitfalls, and treat the DT strictly as a hardware description, and use a
separate handover method between the EFI stub and the kernel. Now that
initrd loading and passing the EFI memory map have been refactored into
pure EFI routines that use EFI configuration tables, the only thing we
need to pass directly is the kernel command line (even if we could pass
this via a config table as well, it is used extremely early, so passing
it directly is preferred in this case.)
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Huacai Chen <chenhuacai@loongson.cn>
In preparation for removing CC_FLAGS_CFI from CC_FLAGS_LTO, explicitly
filter out CC_FLAGS_CFI in all the makefiles where we currently filter
out CC_FLAGS_LTO.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220908215504.3686827-2-samitolvanen@google.com
Implement a minimal EFI app that decompresses the real kernel image and
launches it using the firmware's LoadImage and StartImage boot services.
This removes the need for any arch-specific hacks.
Note that on systems that have UEFI secure boot policies enabled,
LoadImage/StartImage require images to be signed, or their hashes known
a priori, in order to be permitted to boot.
There are various possible strategies to work around this requirement,
but they all rely either on overriding internal PI/DXE protocols (which
are not part of the EFI spec) or omitting the firmware provided
LoadImage() and StartImage() boot services, which is also undesirable,
given that they encapsulate platform specific policies related to secure
boot and measured boot, but also related to memory permissions (whether
or not and which types of heap allocations have both write and execute
permissions.)
The only generic and truly portable way around this is to simply sign
both the inner and the outer image with the same key/cert pair, so this
is what is implemented here.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
To avoid pulling in the wrong object when using the libstub static
library to build the decompressor, define efi_system_table in a separate
compilation unit.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The stub is used in different execution environments, but on arm64,
RISC-V and LoongArch, we still use the core kernel's implementation of
memcpy and memset, as they are just a branch instruction away, and can
generally be reused even from code such as the EFI stub that runs in a
completely different address space.
KAsan complicates this slightly, resulting in the need for some hacks to
expose the uninstrumented, __ prefixed versions as the normal ones, as
the latter are instrumented to include the KAsan checks, which only work
in the core kernel.
Unfortunately, #define'ing memcpy to __memcpy when building C code does
not guarantee that no explicit memcpy() calls will be emitted. And with
the upcoming zboot support, which consists of a separate binary which
therefore needs its own implementation of memcpy/memset anyway, it's
better to provide one explicitly instead of linking to the existing one.
Given that EFI exposes implementations of memmove() and memset() via the
boot services table, let's wire those up in the appropriate way, and
drop the references to the core kernel ones.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
This patch adds efistub booting support, which is the standard UEFI boot
protocol for LoongArch to use.
We use generic efistub, which means we can pass boot information (i.e.,
system table, memory map, kernel command line, initrd) via a light FDT
and drop a lot of non-standard code.
We use a flat mapping to map the efi runtime in the kernel's address
space. In efi, VA = PA; in kernel, VA = PA + PAGE_OFFSET. As a result,
flat mapping is not identity mapping, SetVirtualAddressMap() is still
needed for the efi runtime.
Tested-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
[ardb: change fpic to fpie as suggested by Xi Ruoyao]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
The EFI stub is a wrapper around the core kernel that makes it look like
a EFI compatible PE/COFF application to the EFI firmware. EFI
applications run on top of the EFI runtime, which is heavily based on
so-called protocols, which are struct types consisting [mostly] of
function pointer members that are instantiated and recorded in a
protocol database.
These structs look like the ideal randomization candidates to the
randstruct plugin (as they only carry function pointers), but of course,
these protocols are contracts between the firmware that exposes them,
and the EFI applications (including our stubbed kernel) that invoke
them. This means that struct randomization for EFI protocols is not a
great idea, and given that the stub shares very little data with the
core kernel that is represented as a randomizable struct, we're better
off just disabling it completely here.
Cc: <stable@vger.kernel.org> # v4.14+
Reported-by: Daniel Marth <daniel.marth@inso.tuwien.ac.at>
Tested-by: Daniel Marth <daniel.marth@inso.tuwien.ac.at>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
When cross compiling x86 on an ARM machine with clang, there are several
errors along the lines of:
arch/x86/include/asm/page_64.h:52:7: error: invalid output constraint '=D' in asm
This happens because the x86 flags in the EFI stub are not derived from
KBUILD_CFLAGS like the other architectures are and the clang flags that
set the target architecture ('--target=') and the path to the GNU cross
tools ('--prefix=') are not present, meaning that the host architecture
is targeted.
These flags are available as $(CLANG_FLAGS) from the main Makefile so
add them to the cflags for x86 so that cross compiling works as expected.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lkml.kernel.org/r/20210326000435.4785-4-nathan@kernel.org
With CONFIG_LTO_CLANG, we produce LLVM bitcode instead of ELF object
files. Since LTO is not really needed here and the Makefile assumes we
produce an object file, disable LTO for libstub.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20201211184633.3213045-13-samitolvanen@google.com
This contains a handful of cleanups and new features, including:
* A handful of cleanups for our page fault handling.
* Improvements to how we fill out cacheinfo.
* Support for EFI-based systems.
---
This contains a merge from the EFI tree that was necessary as some of the EFI
support landed over there. It's my first time doing something like this,
I haven't included the set_fs stuff because the base branch it depends on
hasn't been merged yet. I'll probably have another merge window PR, as
there's more in flight (most notably the fix for new binutils I just sent out),
but I figured there was no reason to delay this any longer.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAl+KQ6gTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYibmwD/4qWfOW7R/kUWi08ethcaAhNEWLvqIh
2/KjGLORw+NTZ1F4pEFyQG5LRd3yWDT/UXh/k8gXINqmdclNV01Z3T+O7WuRlISs
07i26W1qRpNeJ7lDVhr9foKpeOU/AXvidgoF330nGlyO4HZkYKhK2yB3t8uGWywr
Zt/EpMJeBIRKzWiLhOgLAdYJthhZ9AlnouNnr9myHnO5Ksel+AZ/BKYvn7ZbHMns
6vFUxp6392/LERRRIfDqPsTuxPIYMHjuEsGSESLsjAIyq/shgN1knG/C+zwU5DcK
zUDBt1DEP7Tb45w7VBASSjn1M+cUolz9/c2dBhlVcdBlk1GKF+KILSTmWUBpQ8oP
ETVAuQK5HTcjy9bVcJMj0Oa3mFshVAAByOH+Wyrdo+qSLkb7y3spPvsL4dyjrKjL
+pe6C7WvavaEFoQXVWO2sTUBGYt7qDLRdrDgOGBIHylTXhTxf2wYzAF4ZmDROECT
Qfc7Ac3aIWYvWDmxE+x8OniuclfZ0DndKLKQj6FJWUTIxFZzTxsHK75d47D1ID0S
ZwAmUd0eYjjwMTO/6AM/Aqu3o8IP4GOXjJf4ijxH9+LjpUhm/ibmHDAUY69sU1WX
kdX51gQzoEuW7XMVz1HoTSvaGGKtyFDuRxs8RG/tSFaRtznRz0Sro6BpLCeG968n
k/d5WL/vZZ/NDA==
=FYs/
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-5.10-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
"A handful of cleanups and new features:
- A handful of cleanups for our page fault handling
- Improvements to how we fill out cacheinfo
- Support for EFI-based systems"
* tag 'riscv-for-linus-5.10-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (22 commits)
RISC-V: Add page table dump support for uefi
RISC-V: Add EFI runtime services
RISC-V: Add EFI stub support.
RISC-V: Add PE/COFF header for EFI stub
RISC-V: Implement late mapping page table allocation functions
RISC-V: Add early ioremap support
RISC-V: Move DT mapping outof fixmap
RISC-V: Fix duplicate included thread_info.h
riscv/mm/fault: Set FAULT_FLAG_INSTRUCTION flag in do_page_fault()
riscv/mm/fault: Fix inline placement in vmalloc_fault() declaration
riscv: Add cache information in AUX vector
riscv: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO
riscv: Set more data to cacheinfo
riscv/mm/fault: Move access error check to function
riscv/mm/fault: Move FAULT_FLAG_WRITE handling in do_page_fault()
riscv/mm/fault: Simplify mm_fault_error()
riscv/mm/fault: Move fault error handling to mm_fault_error()
riscv/mm/fault: Simplify fault error handling
riscv/mm/fault: Move vmalloc fault handling to vmalloc_fault()
riscv/mm/fault: Move bad area handling to bad_area()
...
because the heuristics that various linkers & compilers use to handle them
(include these bits into the output image vs discarding them silently)
are both highly idiosyncratic and also version dependent.
Instead of this historically problematic mess, this tree by Kees Cook (et al)
adds build time asserts and build time warnings if there's any orphan section
in the kernel or if a section is not sized as expected.
And because we relied on so many silent assumptions in this area, fix a metric
ton of dependencies and some outright bugs related to this, before we can
finally enable the checks on the x86, ARM and ARM64 platforms.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+Edv4RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hiKBAApdJEOaK7hMc3013DYNctklIxEPJL2mFJ
11YJRIh4pUJTF0TE+EHT/D+rSIuRsyuoSmOQBQ61/wVSnyG067GjjVJRqh/eYaJ1
fDhJi2FuHOjXl+CiN0KxzBjjp+V4NhF7jHT59tpQSvfZeg7FjteoxfztxaCp5ek3
S3wHB3CC4c4jE3lfjHem1E9/PwT4kwPYx1c3gAUdEqJdjkihjX9fWusfjLeqW6/d
Y5VkApi6bL9XiZUZj5l0dEIweLJJ86+PkKJqpo3spxxEak1LSn1MEix+lcJ8e1Kg
sb/bEEivDcmFlFWOJnn0QLquCR0Cx5bz1pwsL0tuf0yAd4+sXX5IMuGUysZlEdKM
BHL9h5HbevGF4BScwZwZH7lyEg7q67s5KnRu4hxy0Swfcj7y0oT/9lXqpbpZ2DqO
Hd+bRRQKIbqnTMp0hcit9LfpLp93vj0dBlaV5ocAJJlu62u9VnwGG5HQuZ5giLUr
kA1SLw63Y1wopFRxgFyER8les7eLsu0zxHeK44rRVlVnfI99OMTOgVNicmDFy3Fm
AfcnfJG0BqBEJGQz5es34uQQKKBwFPtC9NztopI62KiwOspYYZyrO1BNxdOc6DlS
mIHrmO89HMXuid5eolvLaFqUWirHoWO8TlycgZxUWVHc2txVPjAEU/axouU/dSSU
w/6GpzAa+7g=
=fXAw
-----END PGP SIGNATURE-----
Merge tag 'core-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull orphan section checking from Ingo Molnar:
"Orphan link sections were a long-standing source of obscure bugs,
because the heuristics that various linkers & compilers use to handle
them (include these bits into the output image vs discarding them
silently) are both highly idiosyncratic and also version dependent.
Instead of this historically problematic mess, this tree by Kees Cook
(et al) adds build time asserts and build time warnings if there's any
orphan section in the kernel or if a section is not sized as expected.
And because we relied on so many silent assumptions in this area, fix
a metric ton of dependencies and some outright bugs related to this,
before we can finally enable the checks on the x86, ARM and ARM64
platforms"
* tag 'core-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
x86/boot/compressed: Warn on orphan section placement
x86/build: Warn on orphan section placement
arm/boot: Warn on orphan section placement
arm/build: Warn on orphan section placement
arm64/build: Warn on orphan section placement
x86/boot/compressed: Add missing debugging sections to output
x86/boot/compressed: Remove, discard, or assert for unwanted sections
x86/boot/compressed: Reorganize zero-size section asserts
x86/build: Add asserts for unwanted sections
x86/build: Enforce an empty .got.plt section
x86/asm: Avoid generating unused kprobe sections
arm/boot: Handle all sections explicitly
arm/build: Assert for unwanted sections
arm/build: Add missing sections
arm/build: Explicitly keep .ARM.attributes sections
arm/build: Refactor linker script headers
arm64/build: Assert for unwanted sections
arm64/build: Add missing DWARF sections
arm64/build: Use common DISCARDS in linker script
arm64/build: Remove .eh_frame* sections due to unwind tables
...
Add a RISC-V architecture specific stub code that actually copies the
actual kernel image to a valid address and jump to it after boot services
are terminated. Enable UEFI related kernel configs as well for RISC-V.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Link: https://lore.kernel.org/r/20200421033336.9663-4-atish.patra@wdc.com
[ardb: - move hartid fetch into check_platform_features()
- use image_size not reserve_size
- select ISA_C
- do not use dram_base]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
TEXT_OFFSET serves no purpose, and for this reason, it was redefined
as 0x0 in the v5.8 timeframe. Since this does not appear to have caused
any issues that require us to revisit that decision, let's get rid of the
macro entirely, along with any references to it.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20200825135440.11288-1-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
In preparation for adding --orphan-handling=warn to more architectures,
disable -mbranch-protection, as EFI does not yet support it[1]. This was
noticed due to it producing unwanted .note.gnu.property sections (prefixed
with .init due to the objcopy build step).
However, we must also work around a bug in Clang where the section is
still emitted for code-less object files[2], so also remove the section
during the objcopy.
[1] https://lore.kernel.org/lkml/CAMj1kXHck12juGi=E=P4hWP_8vQhQ+-x3vBMc3TGeRWdQ-XkxQ@mail.gmail.com
[2] https://bugs.llvm.org/show_bug.cgi?id=46480
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20200821194310.3089815-8-keescook@chromium.org
Eliminate all GOT entries in the decompressor binary, by forcing hidden
visibility for all symbol references, which informs the compiler that
such references will be resolved at link time without the need for
allocating GOT entries.
To ensure that no GOT entries will creep back in, add an assertion to
the decompressor linker script that will fire if the .got section has
a non-zero size.
[Arvind: move hidden.h to include/linux instead of making a copy]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200731230820.1742553-3-keescook@chromium.org