Implement the AT_MINSIGSTKSZ AUXV entry, allowing userspace to
dynamically size stack allocations in a manner forward-compatible with
new processor state saved in the signal frame
For now these statically find the maximum signal frame size rather than
doing any runtime testing of features to minimise the size.
glibc 2.34 will take advantage of this, as will applications that use
use _SC_MINSIGSTKSZ and _SC_SIGSTKSZ.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
References: 94b07c1f8c ("arm64: signal: Report signal frame size to userspace via auxv")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220307182734.289289-2-npiggin@gmail.com
The PowerPC vDSO uses $(CC) to link, which differs from the rest of the
kernel, which uses $(LD) directly. As a result, the default linker of
the compiler is used, which may differ from the linker requested by the
builder. For example:
$ make ARCH=powerpc LLVM=1 mrproper defconfig arch/powerpc/kernel/vdso/
...
$ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg
File: arch/powerpc/kernel/vdso/vdso32.so.dbg
String dump of section '.comment':
[ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
File: arch/powerpc/kernel/vdso/vdso64.so.dbg
String dump of section '.comment':
[ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
LLVM=1 sets LD=ld.lld but ld.lld is not used to link the vDSO; GNU ld is
because "ld" is the default linker for clang on most Linux platforms.
This is a problem for Clang's Link Time Optimization as implemented in
the kernel because use of GNU ld with LTO requires the LLVMgold plugin,
which is not technically supported for ld.bfd per
https://llvm.org/docs/GoldPlugin.html. Furthermore, if LLVMgold.so is
missing from a user's system, the build will fail, even though LTO as it
is implemented in the kernel requires ld.lld to avoid this dependency in
the first place.
Ultimately, the PowerPC vDSO should be converted to compiling and
linking with $(CC) and $(LD) respectively but there were issues last
time this was tried, potentially due to older but supported tool
versions. To avoid regressing GCC + binutils, use the compiler option
'-fuse-ld', which tells the compiler which linker to use when it is
invoked as both the compiler and linker. Use '-fuse-ld=lld' when
LD=ld.lld has been specified (CONFIG_LD_IS_LLD) so that the vDSO is
linked with the same linker as the rest of the kernel.
$ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg
File: arch/powerpc/kernel/vdso/vdso32.so.dbg
String dump of section '.comment':
[ 0] Linker: LLD 14.0.0
[ 14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
File: arch/powerpc/kernel/vdso/vdso64.so.dbg
String dump of section '.comment':
[ 0] Linker: LLD 14.0.0
[ 14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
LD can be a full path to ld.lld, which will not be handled properly by
'-fuse-ld=lld' if the full path to ld.lld is outside of the compiler's
search path. '-fuse-ld' can take a path to the linker but it is
deprecated in clang 12.0.0; '--ld-path' is preferred for this scenario.
Use '--ld-path' if it is supported, as it will handle a full path or
just 'ld.lld' properly. See the LLVM commit below for the full details
of '--ld-path'.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://github.com/ClangBuiltLinux/linux/issues/774
Link: 1bc5c84710
Link: https://lore.kernel.org/r/20220511185001.3269404-3-nathan@kernel.org
When linking vdso{32,64}.so.dbg with ld.lld, there is a warning about
not finding _start for the starting address:
ld.lld: warning: cannot find entry symbol _start; not setting start address
ld.lld: warning: cannot find entry symbol _start; not setting start address
Looking at GCC + GNU ld, the entry point address is 0x0:
$ llvm-readelf -h vdso{32,64}.so.dbg &| rg "(File|Entry point address):"
File: vdso32.so.dbg
Entry point address: 0x0
File: vdso64.so.dbg
Entry point address: 0x0
This matches what ld.lld emits:
$ powerpc64le-linux-gnu-readelf -p .comment vdso{32,64}.so.dbg
File: vdso32.so.dbg
String dump of section '.comment':
[ 0] Linker: LLD 14.0.0
[ 14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
File: vdso64.so.dbg
String dump of section '.comment':
[ 0] Linker: LLD 14.0.0
[ 14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
$ llvm-readelf -h vdso{32,64}.so.dbg &| rg "(File|Entry point address):"
File: vdso32.so.dbg
Entry point address: 0x0
File: vdso64.so.dbg
Entry point address: 0x0
Remove ENTRY to remove the warning, as it is unnecessary for the vDSO to
function correctly.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220511185001.3269404-2-nathan@kernel.org
When the mmu_feature_keys[] was introduced in the commit c12e6f24d4
("powerpc: Add option to use jump label for mmu_has_feature()"),
it is unlikely that it would be used either directly or indirectly in
the out of tree modules. So we exported it as GPL only.
But with the evolution of the codes, especially the PPC_KUAP support, it
may be indirectly referenced by some primitive macro or inline functions
such as get_user() or __copy_from_user_inatomic(), this will make it
impossible to build many non GPL modules (such as ZFS) on ppc
architecture. Fix this by exposing the mmu_feature_keys[] to the non-GPL
modules too.
Fixes: 7613f5a66b ("powerpc/64s/kuap: Use mmu_has_feature()")
Reported-by: Nathaniel Filardo <nwfilardo@gmail.com>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220329085709.4132729-1-haokexin@gmail.com
The panic notifiers infrastructure is a bit limited in the scope of
the callbacks - basically every kind of functionality is dropped
in a list that runs in the same point during the kernel panic path.
This is not really on par with the complexities and particularities
of architecture / hypervisors' needs, and a refactor is ongoing.
As part of this refactor, it was observed that powerpc has 2 notifiers,
with mixed goals: one is just a KASLR offset dumper, whereas the other
aims to hard-disable IRQs (necessary on panic path), warn firmware of
the panic event (fadump) and run low-level platform-specific machinery
that might stop kernel execution and never come back.
Clearly, the 2nd notifier has opposed goals: disable IRQs / fadump
should run earlier while low-level platform actions should
run late since it might not even return. Hence, this patch decouples
the notifiers splitting them in three:
- First one is responsible for hard-disable IRQs and fadump,
should run early;
- The kernel KASLR offset dumper is really an informative notifier,
harmless and may run at any moment in the panic path;
- The last notifier should run last, since it aims to perform
low-level actions for specific platforms, and might never return.
It is also only registered for 2 platforms, pseries and ps3.
The patch better documents the notifiers and clears the code too,
also removing a useless header.
Currently no functionality change should be observed, but after
the planned panic refactor we should expect more panic reliability
with this patch.
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220427224924.592546-9-gpiccoli@igalia.com
LoPAPR defines guest visible IOMMU with hypercalls to use it -
H_PUT_TCE/etc. Implemented first on POWER7 where hypercalls would trap
in the KVM in the real mode (with MMU off). The problem with the real mode
is some memory is not available and some API usage crashed the host but
enabling MMU was an expensive operation.
The problems with the real mode handlers are:
1. Occasionally these cannot complete the request so the code is
copied+modified to work in the virtual mode, very little is shared;
2. The real mode handlers have to be linked into vmlinux to work;
3. An exception in real mode immediately reboots the machine.
If the small DMA window is used, the real mode handlers bring better
performance. However since POWER8, there has always been a bigger DMA
window which VMs use to map the entire VM memory to avoid calling
H_PUT_TCE. Such 1:1 mapping happens once and uses H_PUT_TCE_INDIRECT
(a bulk version of H_PUT_TCE) which virtual mode handler is even closer
to its real mode version.
On POWER9 hypercalls trap straight to the virtual mode so the real mode
handlers never execute on POWER9 and later CPUs.
So with the current use of the DMA windows and MMU improvements in
POWER9 and later, there is no point in duplicating the code.
The 32bit passed through devices may slow down but we do not have many
of these in practice. For example, with this applied, a 1Gbit ethernet
adapter still demostrates above 800Mbit/s of actual throughput.
This removes the real mode handlers from KVM and related code from
the powernv platform.
This updates the list of implemented hcalls in KVM-HV as the realmode
handlers are removed.
This changes ABI - kvmppc_h_get_tce() moves to the KVM module and
kvmppc_find_table() is static now.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220506053755.3820702-1-aik@ozlabs.ru
RTAS runs in real mode (MSR[DR] and MSR[IR] unset) and in 32-bit big
endian mode (MSR[SF,LE] unset).
The change in MSR is done in enter_rtas() in a relatively complex way,
since the MSR value could be hardcoded.
Furthermore, a panic has been reported when hitting the watchdog interrupt
while running in RTAS, this leads to the following stack trace:
watchdog: CPU 24 Hard LOCKUP
watchdog: CPU 24 TB:997512652051031, last heartbeat TB:997504470175378 (15980ms ago)
...
Supported: No, Unreleased kernel
CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: G E X 5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c
NIP: 000000001fb41050 LR: 000000001fb4104c CTR: 0000000000000000
REGS: c00000000fc33d60 TRAP: 0100 Tainted: G E X (5.14.21-150400.71.1.bz196362_2-default)
MSR: 8000000002981000 <SF,VEC,VSX,ME> CR: 48800002 XER: 20040020
CFAR: 000000000000011c IRQMASK: 1
GPR00: 0000000000000003 ffffffffffffffff 0000000000000001 00000000000050dc
GPR04: 000000001ffb6100 0000000000000020 0000000000000001 000000001fb09010
GPR08: 0000000020000000 0000000000000000 0000000000000000 0000000000000000
GPR12: 80040000072a40a8 c00000000ff8b680 0000000000000007 0000000000000034
GPR16: 000000001fbf6e94 000000001fbf6d84 000000001fbd1db0 000000001fb3f008
GPR20: 000000001fb41018 ffffffffffffffff 000000000000017f fffffffffffff68f
GPR24: 000000001fb18fe8 000000001fb3e000 000000001fb1adc0 000000001fb1cf40
GPR28: 000000001fb26000 000000001fb460f0 000000001fb17f18 000000001fb17000
NIP [000000001fb41050] 0x1fb41050
LR [000000001fb4104c] 0x1fb4104c
Call Trace:
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
Oops: Unrecoverable System Reset, sig: 6 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
...
Supported: No, Unreleased kernel
CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: G E X 5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c
NIP: 000000001fb41050 LR: 000000001fb4104c CTR: 0000000000000000
REGS: c00000000fc33d60 TRAP: 0100 Tainted: G E X (5.14.21-150400.71.1.bz196362_2-default)
MSR: 8000000002981000 <SF,VEC,VSX,ME> CR: 48800002 XER: 20040020
CFAR: 000000000000011c IRQMASK: 1
GPR00: 0000000000000003 ffffffffffffffff 0000000000000001 00000000000050dc
GPR04: 000000001ffb6100 0000000000000020 0000000000000001 000000001fb09010
GPR08: 0000000020000000 0000000000000000 0000000000000000 0000000000000000
GPR12: 80040000072a40a8 c00000000ff8b680 0000000000000007 0000000000000034
GPR16: 000000001fbf6e94 000000001fbf6d84 000000001fbd1db0 000000001fb3f008
GPR20: 000000001fb41018 ffffffffffffffff 000000000000017f fffffffffffff68f
GPR24: 000000001fb18fe8 000000001fb3e000 000000001fb1adc0 000000001fb1cf40
GPR28: 000000001fb26000 000000001fb460f0 000000001fb17f18 000000001fb17000
NIP [000000001fb41050] 0x1fb41050
LR [000000001fb4104c] 0x1fb4104c
Call Trace:
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace 3ddec07f638c34a2 ]---
This happens because MSR[RI] is unset when entering RTAS but there is no
valid reason to not set it here.
RTAS is expected to be called with MSR[RI] as specified in PAPR+ section
"7.2.1 Machine State":
R1–7.2.1–9. If called with MSR[RI] equal to 1, then RTAS must protect
its own critical regions from recursion by setting the MSR[RI] bit to
0 when in the critical regions.
Fixing this by reviewing the way MSR is compute before calling RTAS. Now a
hardcoded value meaning real mode, 32 bits big endian mode and Recoverable
Interrupt is loaded. In the case MSR[S] is set, it will remain set while
entering RTAS as only urfid can unset it (thanks Fabiano).
In addition a check is added in do_enter_rtas() to detect calls made with
MSR[RI] unset, as we are forcing it on later.
This patch has been tested on the following machines:
Power KVM Guest
P8 S822L (host Ubuntu kernel 5.11.0-49-generic)
PowerVM LPAR
P8 9119-MME (FW860.A1)
p9 9008-22L (FW950.00)
P10 9080-HEX (FW1010.00)
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220504101244.12107-1-ldufour@linux.ibm.com
- Fix the DWARF CFI in our VDSO time functions, allowing gdb to backtrace through them
correctly.
- Fix a buffer overflow in the papr_scm driver, only triggerable by hypervisor input.
- A fix in the recently added QoS handling for VAS (used for communicating with
coprocessors).
Thanks to: Alan Modra, Haren Myneni, Kajol Jain, Segher Boessenkool.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmJ3r7kTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgM1LEACU87gXiSJJW99qD5IvoPncR1nRjd+3
zTAch9Dk70Kt/1Zkn0bEGjkua6YwcOdlA7QXiHBR2HAYli85VWK/w9Vz/TLTbL9/
SrDvSfzwmqbvU61A2ZppvH487z8pfEBmviv3SCrmB3xZWtttYkatEj4A1EjdBJUI
+xq0JgrXj8rAayRpkJS5XEUpkw8eXJ85X1WXdx+peIGvcRKB+n46HyCYhsl3/YVv
pAn6jcnNnPqYzXeE0sQ4ZpybzCkzqVHC4SyCemkp8PWfqyL8LqgIvz2qtCvXAzij
SJikxmHUN3XvHCF4aOgJG6OTkMEz92cpH0hGsb59c4usPCNlRFMuqhJxuYYmNGT3
FtDUrrptsQ5ZRdWttQzi0RFjK0klro5VKvMJRv7uDWIlaPwl3Vi8DxvSrzCSdQPE
Q1XsJS7B/io1JCCzrhrJyRQkIt+Z9e1/3xm618ide1UKQkqVYApZcImJDk7DlDCV
QL1mtqHgasQjDLRkQq/fil/vyt2byw2uzCy6j0X/WQYrKJlUmbXKlcbdbbeCH2qZ
8NndD96wlw+dd/q2u/ZdngQ8f/68n6+a/Zxv7eAbb01JY04VFsCXgdgGHpzhBiJ5
YWqp5qzzuTcvhSYhyfdqFd9KCxxGRIag6jmhs6vbfQbMlZMoo9Jr4vaVm1aIW27t
MRhuEwi2KzQI2A==
=sVeX
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix the DWARF CFI in our VDSO time functions, allowing gdb to
backtrace through them correctly.
- Fix a buffer overflow in the papr_scm driver, only triggerable by
hypervisor input.
- A fix in the recently added QoS handling for VAS (used for
communicating with coprocessors).
Thanks to Alan Modra, Haren Myneni, Kajol Jain, and Segher Boessenkool.
* tag 'powerpc-5.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/papr_scm: Fix buffer overflow issue with CONFIG_FORTIFY_SOURCE
powerpc/vdso: Fix incorrect CFI in gettimeofday.S
powerpc/pseries/vas: Use QoS credits from the userspace
__do_irq() inconditionnaly calls ppc_md.get_irq()
That's definitely a hot path.
At the time being ppc_md.get_irq address is read every time
from ppc_md structure.
Replace that call by a static call, and initialise that
call after ppc_md.init_IRQ() has set ppc_md.get_irq.
Emit a warning and don't set the static call if ppc_md.init_IRQ()
is still NULL, that way the kernel won't blow up if for some
reason ppc_md.get_irq() doesn't get properly set.
With the patch:
00000000 <__SCT__ppc_get_irq>:
0: 48 00 00 20 b 20 <__static_call_return0> <== Replaced by 'b <ppc_md.get_irq>' at runtime
...
00000020 <__static_call_return0>:
20: 38 60 00 00 li r3,0
24: 4e 80 00 20 blr
...
00000058 <__do_irq>:
...
64: 48 00 00 01 bl 64 <__do_irq+0xc>
64: R_PPC_REL24 __SCT__ppc_get_irq
68: 2c 03 00 00 cmpwi r3,0
...
Before the patch:
00000038 <__do_irq>:
...
3c: 3d 20 00 00 lis r9,0
3e: R_PPC_ADDR16_HA ppc_md+0x1c
...
44: 81 29 00 00 lwz r9,0(r9)
46: R_PPC_ADDR16_LO ppc_md+0x1c
...
4c: 7d 29 03 a6 mtctr r9
50: 4e 80 04 21 bctrl
54: 2c 03 00 00 cmpwi r3,0
...
On PPC64 which doesn't implement static calls yet we get:
00000000000000d0 <__do_irq>:
...
dc: 00 00 22 3d addis r9,r2,0
dc: R_PPC64_TOC16_HA .data+0x8
...
e4: 00 00 89 e9 ld r12,0(r9)
e4: R_PPC64_TOC16_LO_DS .data+0x8
...
f0: a6 03 89 7d mtctr r12
f4: 18 00 41 f8 std r2,24(r1)
f8: 21 04 80 4e bctrl
fc: 18 00 41 e8 ld r2,24(r1)
...
So on PPC64 that's similar to what we get without static calls.
But at least until ppc_md.get_irq() is set the call is to
__static_call_return0.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/afb92085f930651d8b1063e4d4bf0396c80ebc7d.1647002274.git.christophe.leroy@csgroup.eu
Add fn and fn_arg members into struct kernel_clone_args and test for
them in copy_thread (instead of testing for PF_KTHREAD | PF_IO_WORKER).
This allows any task that wants to be a user space task that only runs
in kernel mode to use this functionality.
The code on x86 is an exception and still retains a PF_KTHREAD test
because x86 unlikely everything else handles kthreads slightly
differently than user space tasks that start with a function.
The functions that created tasks that start with a function
have been updated to set ".fn" and ".fn_arg" instead of
".stack" and ".stack_size". These functions are fork_idle(),
create_io_thread(), kernel_thread(), and user_mode_thread().
Link: https://lkml.kernel.org/r/20220506141512.516114-4-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
With io_uring we have started supporting tasks that are for most
purposes user space tasks that exclusively run code in kernel mode.
The kernel task that exec's init and tasks that exec user mode
helpers are also user mode tasks that just run kernel code
until they call kernel execve.
Pass kernel_clone_args into copy_thread so these oddball
tasks can be supported more cleanly and easily.
v2: Fix spelling of kenrel_clone_args on h8300
Link: https://lkml.kernel.org/r/20220506141512.516114-2-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Move pci_device_from_OF_node() in pci64.c because it needs definition
of struct device_node and is not worth inlining.
ppc32.c already has it in pci32.c.
That way pci-bridge.h doesn't need linux/of.h (Brought by asm/prom.h
via asm/pci.h)
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3c88286b55413730d7784133993a46ef4a3607ce.1646767214.git.christophe.leroy@csgroup.eu
emulate_step() instruction emulation including sc instruction emulation
initially appeared in xmon. It was then moved into sstep.c where kprobes
could use it too, and later hw_breakpoint and uprobes started to use it.
Until uprobes, the only instruction emulation users were for kernel
mode instructions.
- xmon only steps / breaks on kernel addresses.
- kprobes is kernel only.
- hw_breakpoint only emulates kernel instructions, single steps user.
At one point, there was support for the kernel to execute sc
instructions, although that is long removed and it's not clear whether
there were any in-tree users. So system call emulation is not required
by the above users.
uprobes uses emulate_step and it appears possible to emulate sc
instruction in userspace. Userspace system call emulation is broken and
it's not clear it ever worked well.
The big complication is that userspace takes an interrupt to the kernel
to emulate the instruction. The user->kernel interrupt sets up registers
and interrupt stack frame expecting to return to userspace, then system
call instruction emulation re-directs that stack frame to the kernel,
early in the system call interrupt handler. This means the interrupt
return code takes the kernel->kernel restore path, which does not
restore everything as the system call interrupt handler would expect
coming from userspace. regs->iamr appears to get lost for example,
because the kernel->kernel return does not restore the user iamr.
Accounting such as irqflags tracing and CPU accounting does not get
flipped back to user mode as the system call handler expects, so those
appear to enter the kernel twice without returning to userspace.
These things may be individually fixable with various complication, but
it is a big complexity for unclear real benefit.
Furthermore, it is not possible to single step a system call instruction
since it causes an interrupt. As such, a separate patch disables probing
on system call instructions.
This patch removes system call emulation and disables stepping system
calls.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[minor commit log edit, and also get rid of '#ifdef CONFIG_PPC64']
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a412e3b3791ed83de18704c8d90f492e7a0049c0.1648648712.git.naveen.n.rao@linux.vnet.ibm.com
Per the ISA, a Trace interrupt is not generated for:
- [h|u]rfi[d]
- rfscv
- sc, scv, and Trap instructions that trap
- Power-Saving Mode instructions
- other instructions that cause interrupts (other than Trace interrupts)
- the first instructions of any interrupt handler (applies to Branch and Single Step tracing;
CIABR matches may still occur)
- instructions that are emulated by software
Add a helper to check for instructions belonging to the first four
categories above and to reject kprobes, uprobes and xmon breakpoints on
such instructions. We reject probing on instructions belonging to these
categories across all ISA versions and across both BookS and BookE.
For trap instructions, we can't know in advance if they can cause a
trap, and there is no good reason to allow probing on those. Also,
uprobes already refuses to probe trap instructions and kprobes does not
allow probes on trap instructions used for kernel warnings and bugs. As
such, stop allowing any type of probes/breakpoints on trap instruction
across uprobes, kprobes and xmon.
For some of the fp/altivec instructions that can generate an interrupt
and which we emulate in the kernel (altivec assist, for example), we
check and turn off single stepping in emulate_single_step().
Instructions generating a DSI are restarted and single stepping normally
completes once the instruction is completed.
In uprobes, if a single stepped instruction results in a non-fatal
signal to be delivered to the task, such signals are "delayed" until
after the instruction completes. For fatal signals, single stepping is
cancelled and the instruction restarted in-place so that core dump
captures proper addresses.
In kprobes, we do not allow probes on instructions having an extable
entry and we also do not allow probing interrupt vectors.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f56ee979d50b8711fae350fc97870f3ca34acd75.1648648712.git.naveen.n.rao@linux.vnet.ibm.com
Various spelling mistakes in comments.
Detected with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220430185654.5855-1-Julia.Lawall@inria.fr
arch_randomize_brk() is only needed for hash on book3s/64, for other
platforms the one provided by the default mmap layout is good enough.
Move it to hash_utils.c and use randomize_page() like the generic one.
And properly opt out the radix case instead of making an assumption
on mmu_highuser_ssize.
Also change to a 32M range like most other architectures instead of 8M.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eafa4d18ec8ac7b98dd02b40181e61643707cc7c.1649523076.git.christophe.leroy@csgroup.eu
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmJlxloeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGBoIH/3b1GGuTBq8XndVl
1EaCJe/3auE8cHklNpLyTWsQY7He9CcIe4b0fGmtkUlwqAE6E5fUPfqwzjjb3eux
MDPYYKbnjm/jA73dbTbr3lxMlc/caZpuRrwwuek0+vS0DLYhP917NmDvGX8q3l5U
84RCEHztrTmzOivS0BwNJV1XFpcqnODTDN4zNR43o9ZY9tVY4/OqL0+lcQIHM2Nh
6urEzWMMi+BRGaOqdgtt+NxmgKQNTRAkPan6FpJloRSxrOzl4LiYBKMfE2iyisND
91i1CGvOQjaQNIw9JYNvtSWGawMb+obTyCFHyj1Qm7LwD0VAZ+FQrvrdGz4rJrAY
Sq01XYs=
=Rl6l
-----END PGP SIGNATURE-----
Merge tag 'v5.18-rc4' into next
Merge master into next, to bring in commit 5f24d5a579 ("mm, hugetlb:
allow for "high" userspace addresses"), which is needed as a
prerequisite for the series converting powerpc to the generic mmap
logic.
As reported by Alan, the CFI (Call Frame Information) in the VDSO time
routines is incorrect since commit ce7d8056e3 ("powerpc/vdso: Prepare
for switching VDSO to generic C implementation.").
DWARF has a concept called the CFA (Canonical Frame Address), which on
powerpc is calculated as an offset from the stack pointer (r1). That
means when the stack pointer is changed there must be a corresponding
CFI directive to update the calculation of the CFA.
The current code is missing those directives for the changes to r1,
which prevents gdb from being able to generate a backtrace from inside
VDSO functions, eg:
Breakpoint 1, 0x00007ffff7f804dc in __kernel_clock_gettime ()
(gdb) bt
#0 0x00007ffff7f804dc in __kernel_clock_gettime ()
#1 0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
#2 0x00007fffffffd960 in ?? ()
#3 0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
Backtrace stopped: frame did not save the PC
Alan helpfully describes some rules for correctly maintaining the CFI information:
1) Every adjustment to the current frame address reg (ie. r1) must be
described, and exactly at the instruction where r1 changes. Why?
Because stack unwinding might want to access previous frames.
2) If a function changes LR or any non-volatile register, the save
location for those regs must be given. The CFI can be at any
instruction after the saves up to the point that the reg is
changed.
(Exception: LR save should be described before a bl. not after)
3) If asychronous unwind info is needed then restores of LR and
non-volatile regs must also be described. The CFI can be at any
instruction after the reg is restored up to the point where the
save location is (potentially) trashed.
Fix the inability to backtrace by adding CFI directives describing the
changes to r1, ie. satisfying rule 1.
Also change the information for LR to point to the copy saved on the
stack, not the value in r0 that will be overwritten by the function
call.
Finally, add CFI directives describing the save/restore of r2.
With the fix gdb can correctly back trace and navigate up and down the stack:
Breakpoint 1, 0x00007ffff7f804dc in __kernel_clock_gettime ()
(gdb) bt
#0 0x00007ffff7f804dc in __kernel_clock_gettime ()
#1 0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
#2 0x0000000100015b60 in gettime ()
#3 0x000000010000c8bc in print_long_format ()
#4 0x000000010000d180 in print_current_files ()
#5 0x00000001000054ac in main ()
(gdb) up
#1 0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
(gdb)
#2 0x0000000100015b60 in gettime ()
(gdb)
#3 0x000000010000c8bc in print_long_format ()
(gdb)
#4 0x000000010000d180 in print_current_files ()
(gdb)
#5 0x00000001000054ac in main ()
(gdb)
Initial frame selected; you cannot go up.
(gdb) down
#4 0x000000010000d180 in print_current_files ()
(gdb)
#3 0x000000010000c8bc in print_long_format ()
(gdb)
#2 0x0000000100015b60 in gettime ()
(gdb)
#1 0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
(gdb)
#0 0x00007ffff7f804dc in __kernel_clock_gettime ()
(gdb)
Fixes: ce7d8056e3 ("powerpc/vdso: Prepare for switching VDSO to generic C implementation.")
Cc: stable@vger.kernel.org # v5.11+
Reported-by: Alan Modra <amodra@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Link: https://lore.kernel.org/r/20220502125010.1319370-1-mpe@ellerman.id.au
__setup() handlers should return 1 to obsolete_checksetup() in
init/main.c to indicate that the boot option has been handled.
A return of 0 causes the boot option/value to be listed as an Unknown
kernel parameter and added to init's (limited) argument or environment
strings.
Also, error return codes don't mean anything to obsolete_checksetup() --
only non-zero (usually 1) or zero. So return 1 from powersave_off().
Fixes: 302eca184f ("[POWERPC] cell: use ppc_md->power_save instead of cbe_idle_loop")
Reported-by: Igor Zhbanov <izh1979@gmail.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220502192925.19954-1-rdunlap@infradead.org
These one line of code don't meet the kernel coding style, so remove the
redundant space.
Signed-off-by: maqiang <maqianga@uniontech.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210303115710.30886-1-maqianga@uniontech.com
The simple_strtoull() function is deprecated in some situation, since
it does not check for the range overflow, use kstrtoull() instead.
Signed-off-by: Chen Huang <chenhuang5@huawei.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210526092020.554341-1-chenhuang5@huawei.com
The sparse tool complains as follow:
arch/powerpc/kernel/btext.c:48:5: warning:
symbol 'boot_text_mapped' was not declared. Should it be static?
This symbol is not used outside of btext.c, so this commit make
it static.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210408011801.557004-3-yukuai3@huawei.com
Fixes gcc '-Wunused-but-set-variable' warning:
arch/powerpc/kernel/btext.c:49:12: error: 'force_printk_to_btext'
defined but not used.
It is never used, and so can be removed.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210408011801.557004-2-yukuai3@huawei.com
We found these warnings in arch/powerpc/kernel/time.c as follows:
warning: symbol 'decrementer_max' was not declared. Should it be static?
warning: symbol 'rtc_lock' was not declared. Should it be static?
warning: symbol 'dtl_consumer' was not declared. Should it be static?
Declare 'decrementer_max' in powerpc asm/time.h.
Include linux/mc146818rtc.h in powerpc kernel/time.c where 'rtc_lock' is
declared. And remove duplicated declaration of 'rtc_lock' in powerpc
platforms/chrp/time.c because it has included linux/mc146818rtc.h.
Move 'dtl_consumer' definition after "include <asm/dtl.h>" because it is
declared there.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: He Ying <heying24@huawei.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210324090939.143477-1-heying24@huawei.com
Patch series "Convert vmcore to use an iov_iter", v5.
For some reason several people have been sending bad patches to fix
compiler warnings in vmcore recently. Here's how it should be done.
Compile-tested only on x86. As noted in the first patch, s390 should take
this conversion a bit further, but I'm not inclined to do that work
myself.
This patch (of 3):
Instead of passing in a 'buf' and 'userbuf' argument, pass in an iov_iter.
s390 needs more work to pass the iov_iter down further, or refactor, but
I'd be more comfortable if someone who can test on s390 did that work.
It's more convenient to convert the whole of read_from_oldmem() to take an
iov_iter at the same time, so rename it to read_from_oldmem_iter() and add
a temporary read_from_oldmem() wrapper that creates an iov_iter.
Link: https://lkml.kernel.org/r/20220408090636.560886-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20220408090636.560886-2-bhe@redhat.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
On crash, boot memory area is copied to a destination address by f/w.
This region is setup as separate PT_LOAD segment with appropriate
offset to handle the different physical address and offset in vmcore.
If this destination address is not page aligned, reading the vmcore
with mmap is likely to fail forcing tools like makedumpfile to fall
back to regular read. Avoid mmap read failure by ensuring that the
destination address is always page aligned.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220406093839.206608-3-hbathini@linux.ibm.com
Boot memory area is setup as separate PT_LOAD segment in the vmcore
as it is moved by f/w, on crash, to a destination address provided by
the kernel. Having separate PT_LOAD segment helps in handling the
different physical address and offset for boot memory area in the
vmcore.
Commit ced1bf52f4 ("powerpc/fadump: merge adjacent memory ranges to
reduce PT_LOAD segements") inadvertly broke this pre-condition for
cases where some of the first kernel memory is available adjacent to
boot memory area. This scenario is rare but possible when memory for
fadump could not be reserved adjacent to boot memory area owing to
memory hole or such. Reading memory from a vmcore exported in such
scenario provides incorrect data. Fix it by ensuring no other region
is folded into boot memory area.
Fixes: ced1bf52f4 ("powerpc/fadump: merge adjacent memory ranges to reduce PT_LOAD segements")
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220406093839.206608-2-hbathini@linux.ibm.com
An LPAR can be terminated by the POWER Hypervisor (PHYP) for various
reasons. If FADump was configured when PHYP terminates the LPAR,
platform-assisted dump is initiated to save the kernel dump. But CPU
register data would not be processed/saved in the vmcore in such case
because CPU mask is set in crash_fadump() at the time of kernel crash
and it remains unset in this case with LPAR being terminated by PHYP
abruptly.
To get around the problem, initialize cpu_mask to cpu_possible_mask
so as to ensure all possible CPUs' register data is processed for the
vmcore generated on PHYP terminated LPAR. Also, rename the crash info
member variable from online_mask to cpu_mask as it doesn't necessarily
have to be online CPU mask always.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220404182137.59231-1-hbathini@linux.ibm.com
- Partly revert a change to our timer_interrupt() that caused lockups with high res
timers disabled.
- Fix a bug in KVM TCE handling that could corrupt kernel memory.
- Two commits fixing Power9/Power10 perf alternative event selection.
Thanks to: Alexey Kardashevskiy, Athira Rajeev, David Gibson, Frederic Barrat, Madhavan
Srinivasan, Miguel Ojeda, Nicholas Piggin.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmJlSCATHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgM7CD/9KH+mjtSwF3hSdun/WxMcWawdNY24g
f+eMI/vABVqN1RvmO3oC5Z1ruMUw4AxL7BMugAa/SlTjQXOyCuyHQP7vIe4ax3rZ
4TMfsRm8W4xlgI4k9d9q/unrIHko2k1OhY/wvfGMFhFdG0LWt4qJDL5vbccG5CKb
xikrutQ5+t8fNLtGH+fJVDeK9q2CU4inJRuyD4m3sfKnXygLI13l1GhcOebxN/p1
W8qBIac+YJqeezbqiwLl4BC+yXAEDixvFpTh9NuvWdoJaQHdvrltYSLQxCFMIE4B
dSp5EaBTXemalZ4F8fnGyKf4eTbtO9VIfWq3hicjfnJiFRodbFZOo7dnSpDrYlfJ
EysGdmI2HxpmWC8DgQQFv+xwZxKW/ExvPiPYb49n+j/4hKJ724wTi9Z8r3XP5fkg
lD/th40NDhe/sjCSPNWoK3l/UJb3gexd+Ict8iUp2fgNEq3FoJkTR4QlWGj6BeP3
3pOBoqmWjSXR8tWNShvyK6mLn6fclD0IA7cwTIsZZVmqI+nNR4nR0fC2Ah66Rj+R
EOof4kCBOcZ2getDyk+Hv97EFNbkDcIm6IE291Vp9hgilp0n2VnPbwwwEdexp6Jv
KpsYCHosCchnHcu7P1VFFt9w46JFSN7/euq8YZe6znFua2qhV6AGeI7H/uA2X7yL
KvuO+c+ORhrVKQ==
=xieK
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Partly revert a change to our timer_interrupt() that caused lockups
with high res timers disabled.
- Fix a bug in KVM TCE handling that could corrupt kernel memory.
- Two commits fixing Power9/Power10 perf alternative event selection.
Thanks to Alexey Kardashevskiy, Athira Rajeev, David Gibson, Frederic
Barrat, Madhavan Srinivasan, Miguel Ojeda, and Nicholas Piggin.
* tag 'powerpc-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/perf: Fix 32bit compile
powerpc/perf: Fix power10 event alternatives
powerpc/perf: Fix power9 event alternatives
KVM: PPC: Fix TCE handling for VFIO
powerpc/time: Always set decrementer in timer_interrupt()
This is a partial revert of commit 0faf20a1ad ("powerpc/64s/interrupt:
Don't enable MSR[EE] in irq handlers unless perf is in use").
Prior to that commit, we always set the decrementer in
timer_interrupt(), to clear the timer interrupt. Otherwise we could end
up continuously taking timer interrupts.
When high res timers are enabled there is no problem seen with leaving
the decrementer untouched in timer_interrupt(), because it will be
programmed via hrtimer_interrupt() -> tick_program_event() ->
clockevents_program_event() -> decrementer_set_next_event().
However with CONFIG_HIGH_RES_TIMERS=n or booting with highres=off, we
see a stall/lockup, because tick_nohz_handler() does not cause a
reprogram of the decrementer, leading to endless timer interrupts.
Example trace:
[ 1.898617][ T7] Freeing initrd memory: 2624K^M
[ 22.680919][ C1] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:^M
[ 22.682281][ C1] rcu: 0-....: (25 ticks this GP) idle=073/0/0x1 softirq=10/16 fqs=1050 ^M
[ 22.682851][ C1] (detected by 1, t=2102 jiffies, g=-1179, q=476)^M
[ 22.683649][ C1] Sending NMI from CPU 1 to CPUs 0:^M
[ 22.685252][ C0] NMI backtrace for cpu 0^M
[ 22.685649][ C0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc2-00185-g0faf20a1ad16 #145^M
[ 22.686393][ C0] NIP: c000000000016d64 LR: c000000000f6cca4 CTR: c00000000019c6e0^M
[ 22.686774][ C0] REGS: c000000002833590 TRAP: 0500 Not tainted (5.16.0-rc2-00185-g0faf20a1ad16)^M
[ 22.687222][ C0] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24000222 XER: 00000000^M
[ 22.688297][ C0] CFAR: c00000000000c854 IRQMASK: 0 ^M
...
[ 22.692637][ C0] NIP [c000000000016d64] arch_local_irq_restore+0x174/0x250^M
[ 22.694443][ C0] LR [c000000000f6cca4] __do_softirq+0xe4/0x3dc^M
[ 22.695762][ C0] Call Trace:^M
[ 22.696050][ C0] [c000000002833830] [c000000000f6cc80] __do_softirq+0xc0/0x3dc (unreliable)^M
[ 22.697377][ C0] [c000000002833920] [c000000000151508] __irq_exit_rcu+0xd8/0x130^M
[ 22.698739][ C0] [c000000002833950] [c000000000151730] irq_exit+0x20/0x40^M
[ 22.699938][ C0] [c000000002833970] [c000000000027f40] timer_interrupt+0x270/0x460^M
[ 22.701119][ C0] [c0000000028339d0] [c0000000000099a8] decrementer_common_virt+0x208/0x210^M
Possibly this should be fixed in the lowres timing code, but that would
be a generic change and could take some time and may not backport
easily, so for now make the programming of the decrementer unconditional
again in timer_interrupt() to avoid the stall/lockup.
Fixes: 0faf20a1ad ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use")
Reported-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Link: https://lore.kernel.org/r/20220420141657.771442-1-mpe@ellerman.id.au
Huge page backed vmalloc memory could benefit performance in many cases.
However, some users of vmalloc may not be ready to handle huge pages for
various reasons: hardware constraints, potential pages split, etc.
VM_NO_HUGE_VMAP was introduced to allow vmalloc users to opt-out huge
pages. However, it is not easy to track down all the users that require
the opt-out, as the allocation are passed different stacks and may cause
issues in different layers.
To address this issue, replace VM_NO_HUGE_VMAP with an opt-in flag,
VM_ALLOW_HUGE_VMAP, so that users that benefit from huge pages could ask
specificially.
Also, remove vmalloc_no_huge() and add opt-in helper vmalloc_huge().
Fixes: fac54e2bfb ("x86/Kconfig: Select HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP")
Link: https://lore.kernel.org/netdev/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/"
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Power SVM wants to allocate a swiotlb buffer that is not restricted to
low memory for the trusted hypervisor scheme. Consolidate the support
for this into the swiotlb_init interface by adding a new flag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
x86-32 was the last architecture that implemented separate user and
kernel registers.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220325153953.162643-3-brgerst@gmail.com
- Fix KVM "lost kick" race, where an attempt to pull a vcpu out of the guest could be
lost (or delayed until the next guest exit).
- Disable SCV (system call vectored) when PR KVM guests could be run.
- Fix KVM PR guests using SCV, by disallowing AIL != 0 for KVM PR guests.
- Add a new KVM CAP to indicate if AIL == 3 is supported.
- Fix a regression when hotplugging a CPU to a memoryless/cpuless node.
- Make virt_addr_valid() stricter for 64-bit Book3E & 32-bit, which fixes crashes seen
due to hardened usercopy.
- Revert a change to max_mapnr which broke HIGHMEM.
Thanks to: Christophe Leroy, Fabiano Rosas, Kefeng Wang, Nicholas Piggin, Srikar Dronamraju.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmJSzCYTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgFqxD/98cokv9ZFbXoPApT0rbZo/5Re5GWGj
IzSI4kuBI7j5oPqDdwusfF/pqKt+zFmr0fVsnhz2WYZ4gX4xr9B48OpmuIQvNNbx
46gz4wWIPE2C9xVnOtU829DTOXfFoBOQo16TFzE8wfiLFx9M8gF2oogTzvF14LML
+tbE2STL3ga6MGje8oZ3VOvvXrt9zrynTRt4W/SsfpkXvhQYRdGSPC2Rw6IkbN1k
XDoFPt+vN9C+g6ItW7OzBrkMvCSYNxmsptWAA48zCqbGOawXomYoZyFTS7fooX5E
nhGM9wAQGVNRlbnLgEtOAUv/Djz4yVz1gjR+4b7LF26AN3bd3CrQJ+whZJAAqw+G
I6wtRZI6DrZ4UH5sfjsUQaOIT6DcGlt2MTidGmG2hY+XlanKgiLCdIisnxAMa4+x
kBD1zqSuThPWgpryfKMex4r1WBZyZ27bcwQ9L9Z9GeCQN0V9cNfD8OHwyeKEuQEb
hA941h2qq9bzzVL/wrDxVesRSzXRXoBed77RCL2YUYLonybW+mxijqbaWNVcqqB0
Hr3/hhgq+0uYid5Ld9rxHnXl9yrJI9itakXNFU6dmzqZtQ7b4xaha21IME5zoIcJ
DRkTWGnub0wjp2Re1rdJVpTDREP19k+gPu/dVJFNlW16SG4/Lhg1xOLTkRNQ+gnt
Ayp4o27CPzoTJg==
=uNqF
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix KVM "lost kick" race, where an attempt to pull a vcpu out of the
guest could be lost (or delayed until the next guest exit).
- Disable SCV (system call vectored) when PR KVM guests could be run.
- Fix KVM PR guests using SCV, by disallowing AIL != 0 for KVM PR
guests.
- Add a new KVM CAP to indicate if AIL == 3 is supported.
- Fix a regression when hotplugging a CPU to a memoryless/cpuless node.
- Make virt_addr_valid() stricter for 64-bit Book3E & 32-bit, which
fixes crashes seen due to hardened usercopy.
- Revert a change to max_mapnr which broke HIGHMEM.
Thanks to Christophe Leroy, Fabiano Rosas, Kefeng Wang, Nicholas Piggin,
and Srikar Dronamraju.
* tag 'powerpc-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
Revert "powerpc: Set max_mapnr correctly"
powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit
KVM: PPC: Move kvmhv_on_pseries() into kvm_ppc.h
powerpc/numa: Handle partially initialized numa nodes
powerpc/64: Fix build failure with allyesconfig in book3s_64_entry.S
KVM: PPC: Use KVM_CAP_PPC_AIL_MODE_3
KVM: PPC: Book3S PR: Disallow AIL != 0
KVM: PPC: Book3S PR: Disable SCV when AIL could be disabled
KVM: PPC: Book3S HV P9: Fix "lost kick" race
- Add new environment variables, USERCFLAGS and USERLDFLAGS to allow
additional flags to be passed to user-space programs.
- Fix missing fflush() bugs in Kconfig and fixdep
- Fix a minor bug in the comment format of the .config file
- Make kallsyms ignore llvm's local labels, .L*
- Fix UAPI compile-test for cross-compiling with Clang
- Extend the LLVM= syntax to support LLVM=<suffix> form for using a
particular version of LLVm, and LLVM=<prefix> form for using custom
LLVM in a particular directory path.
- Clean up Makefiles
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmJFGloVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGH0kP/j6Vx5BqEv3tP2Q+UANxLqITleJs
IFpbSesz/BhlG7I/IapWmCDSqFbYd5uJTO4ko8CsPmZHcxr6Gw3y+DN5yQACKaG/
p9xiF6GjPyKR8+VdcT2tV50+dVY8ANe/DxCyzKrJd/uyYxgARPKJh0KRMNz+d9lj
ixUpCXDhx/XlKzPIlcxrvhhjevKz+NnHmN0fe6rzcOw9KzBGBTsf20Q3PqUuBOKa
rWHsRGcBPA8eKLfWT1Us1jjic6cT2g4aMpWjF20YgUWKHgWVKcNHpxYKGXASVo/z
ewdDnNfmwo7f7fKMCDDro9iwFWV/BumGtn43U00tnqdBcTpFojPlEOga37UPbZDF
nmTblGVUhR0vn4PmfBy8WkAkbW+IpVatKwJGV4J3KjSvdWvZOmVj9VUGLVAR0TXW
/YcgRs6EtG8Hn0IlCj0fvZ5wRWoDLbP2DSZ67R/44EP0GaNQPwUe4FI1izEE4EYX
oVUAIxcKixWGj4RmdtmtMMdUcZzTpbgS9uloMUmS3u9LK0Ir/8tcWaf2zfMO6Jl2
p4Q31s1dUUKCnFnj0xDKRyKGUkxYebrHLfuBqi0RIc0xRpSlxoXe3Dynm9aHEQoD
ZSV0eouQJxnaxM1ck5Bu4AHLgEebHfEGjWVyUHno7jFU5EI9Wpbqpe4pCYEEDTm1
+LJMEpdZO0dFvpF+
=84rW
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.18-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Add new environment variables, USERCFLAGS and USERLDFLAGS to allow
additional flags to be passed to user-space programs.
- Fix missing fflush() bugs in Kconfig and fixdep
- Fix a minor bug in the comment format of the .config file
- Make kallsyms ignore llvm's local labels, .L*
- Fix UAPI compile-test for cross-compiling with Clang
- Extend the LLVM= syntax to support LLVM=<suffix> form for using a
particular version of LLVm, and LLVM=<prefix> form for using custom
LLVM in a particular directory path.
- Clean up Makefiles
* tag 'kbuild-v5.18-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: Make $(LLVM) more flexible
kbuild: add --target to correctly cross-compile UAPI headers with Clang
fixdep: use fflush() and ferror() to ensure successful write to files
arch: syscalls: simplify uapi/kapi directory creation
usr/include: replace extra-y with always-y
certs: simplify empty certs creation in certs/Makefile
certs: include certs/signing_key.x509 unconditionally
kallsyms: ignore all local labels prefixed by '.L'
kconfig: fix missing '# end of' for empty menu
kconfig: add fflush() before ferror() check
kbuild: replace $(if A,A,B) with $(or A,B)
kbuild: Add environment variables for userprogs flags
kbuild: unify cmd_copy and cmd_shipped
$(shell ...) expands to empty. There is no need to assign it to _dummy.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
This set of changes removes tracehook.h, moves modification of all of
the ptrace fields inside of siglock to remove races, adds a missing
permission check to ptrace.c
The removal of tracehook.h is quite significant as it has been a major
source of confusion in recent years. Much of that confusion was
around task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled
making the semantics clearer).
For people who don't know tracehook.h is a vestiage of an attempt to
implement uprobes like functionality that was never fully merged, and
was later superseeded by uprobes when uprobes was merged. For many
years now we have been removing what tracehook functionaly a little
bit at a time. To the point where now anything left in tracehook.h is
some weird strange thing that is difficult to understand.
Eric W. Biederman (15):
ptrace: Move ptrace_report_syscall into ptrace.h
ptrace/arm: Rename tracehook_report_syscall report_syscall
ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
ptrace: Remove arch_syscall_{enter,exit}_tracehook
ptrace: Remove tracehook_signal_handler
task_work: Remove unnecessary include from posix_timers.h
task_work: Introduce task_work_pending
task_work: Call tracehook_notify_signal from get_signal on all architectures
task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
resume_user_mode: Move to resume_user_mode.h
tracehook: Remove tracehook.h
ptrace: Move setting/clearing ptrace_message into ptrace_stop
ptrace: Return the signal to continue with from ptrace_stop
Jann Horn (1):
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
Yang Li (1):
ptrace: Remove duplicated include in ptrace.c
MAINTAINERS | 1 -
arch/Kconfig | 5 +-
arch/alpha/kernel/ptrace.c | 5 +-
arch/alpha/kernel/signal.c | 4 +-
arch/arc/kernel/ptrace.c | 5 +-
arch/arc/kernel/signal.c | 4 +-
arch/arm/kernel/ptrace.c | 12 +-
arch/arm/kernel/signal.c | 4 +-
arch/arm64/kernel/ptrace.c | 14 +--
arch/arm64/kernel/signal.c | 4 +-
arch/csky/kernel/ptrace.c | 5 +-
arch/csky/kernel/signal.c | 4 +-
arch/h8300/kernel/ptrace.c | 5 +-
arch/h8300/kernel/signal.c | 4 +-
arch/hexagon/kernel/process.c | 4 +-
arch/hexagon/kernel/signal.c | 1 -
arch/hexagon/kernel/traps.c | 6 +-
arch/ia64/kernel/process.c | 4 +-
arch/ia64/kernel/ptrace.c | 6 +-
arch/ia64/kernel/signal.c | 1 -
arch/m68k/kernel/ptrace.c | 5 +-
arch/m68k/kernel/signal.c | 4 +-
arch/microblaze/kernel/ptrace.c | 5 +-
arch/microblaze/kernel/signal.c | 4 +-
arch/mips/kernel/ptrace.c | 5 +-
arch/mips/kernel/signal.c | 4 +-
arch/nds32/include/asm/syscall.h | 2 +-
arch/nds32/kernel/ptrace.c | 5 +-
arch/nds32/kernel/signal.c | 4 +-
arch/nios2/kernel/ptrace.c | 5 +-
arch/nios2/kernel/signal.c | 4 +-
arch/openrisc/kernel/ptrace.c | 5 +-
arch/openrisc/kernel/signal.c | 4 +-
arch/parisc/kernel/ptrace.c | 7 +-
arch/parisc/kernel/signal.c | 4 +-
arch/powerpc/kernel/ptrace/ptrace.c | 8 +-
arch/powerpc/kernel/signal.c | 4 +-
arch/riscv/kernel/ptrace.c | 5 +-
arch/riscv/kernel/signal.c | 4 +-
arch/s390/include/asm/entry-common.h | 1 -
arch/s390/kernel/ptrace.c | 1 -
arch/s390/kernel/signal.c | 5 +-
arch/sh/kernel/ptrace_32.c | 5 +-
arch/sh/kernel/signal_32.c | 4 +-
arch/sparc/kernel/ptrace_32.c | 5 +-
arch/sparc/kernel/ptrace_64.c | 5 +-
arch/sparc/kernel/signal32.c | 1 -
arch/sparc/kernel/signal_32.c | 4 +-
arch/sparc/kernel/signal_64.c | 4 +-
arch/um/kernel/process.c | 4 +-
arch/um/kernel/ptrace.c | 5 +-
arch/x86/kernel/ptrace.c | 1 -
arch/x86/kernel/signal.c | 5 +-
arch/x86/mm/tlb.c | 1 +
arch/xtensa/kernel/ptrace.c | 5 +-
arch/xtensa/kernel/signal.c | 4 +-
block/blk-cgroup.c | 2 +-
fs/coredump.c | 1 -
fs/exec.c | 1 -
fs/io-wq.c | 6 +-
fs/io_uring.c | 11 +-
fs/proc/array.c | 1 -
fs/proc/base.c | 1 -
include/asm-generic/syscall.h | 2 +-
include/linux/entry-common.h | 47 +-------
include/linux/entry-kvm.h | 2 +-
include/linux/posix-timers.h | 1 -
include/linux/ptrace.h | 81 ++++++++++++-
include/linux/resume_user_mode.h | 64 ++++++++++
include/linux/sched/signal.h | 17 +++
include/linux/task_work.h | 5 +
include/linux/tracehook.h | 226 -----------------------------------
include/uapi/linux/ptrace.h | 2 +-
kernel/entry/common.c | 19 +--
kernel/entry/kvm.c | 9 +-
kernel/exit.c | 3 +-
kernel/livepatch/transition.c | 1 -
kernel/ptrace.c | 47 +++++---
kernel/seccomp.c | 1 -
kernel/signal.c | 62 +++++-----
kernel/task_work.c | 4 +-
kernel/time/posix-cpu-timers.c | 1 +
mm/memcontrol.c | 2 +-
security/apparmor/domain.c | 1 -
security/selinux/hooks.c | 1 -
85 files changed, 372 insertions(+), 495 deletions(-)
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmJCQkoACgkQC/v6Eiaj
j0DCWQ/5AZVFU+hX32obUNCLackHTwgcCtSOs3JNBmNA/zL/htPiYYG0ghkvtlDR
Dw5J5DnxC6P7PVAdAqrpvx2uX2FebHYU0bRlyLx8LYUEP5dhyNicxX9jA882Z+vw
Ud0Ue9EojwGWS76dC9YoKUj3slThMATbhA2r4GVEoof8fSNJaBxQIqath44t0FwU
DinWa+tIOvZANGBZr6CUUINNIgqBIZCH/R4h6ArBhMlJpuQ5Ufk2kAaiWFwZCkX4
0LuuAwbKsCKkF8eap5I2KrIg/7zZVgxAg9O3cHOzzm8OPbKzRnNnQClcDe8perqp
S6e/f3MgpE+eavd1EiLxevZ660cJChnmikXVVh8ZYYoefaMKGqBaBSsB38bNcLjY
3+f2dB+TNBFRnZs1aCujK3tWBT9QyjZDKtCBfzxDNWBpXGLhHH6j6lA5Lj+Cef5K
/HNHFb+FuqedlFZh5m1Y+piFQ70hTgCa2u8b+FSOubI2hW9Zd+WzINV0ANaZ2LvZ
4YGtcyDNk1q1+c87lxP9xMRl/xi6rNg+B9T2MCo4IUnHgpSVP6VEB3osgUmrrrN0
eQlUI154G/AaDlqXLgmn1xhRmlPGfmenkxpok1AuzxvNJsfLKnpEwQSc13g3oiZr
disZQxNY0kBO2Nv3G323Z6PLinhbiIIFez6cJzK5v0YJ2WtO3pY=
=uEro
-----END PGP SIGNATURE-----
Merge tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ptrace cleanups from Eric Biederman:
"This set of changes removes tracehook.h, moves modification of all of
the ptrace fields inside of siglock to remove races, adds a missing
permission check to ptrace.c
The removal of tracehook.h is quite significant as it has been a major
source of confusion in recent years. Much of that confusion was around
task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the
semantics clearer).
For people who don't know tracehook.h is a vestiage of an attempt to
implement uprobes like functionality that was never fully merged, and
was later superseeded by uprobes when uprobes was merged. For many
years now we have been removing what tracehook functionaly a little
bit at a time. To the point where anything left in tracehook.h was
some weird strange thing that was difficult to understand"
* tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ptrace: Remove duplicated include in ptrace.c
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
ptrace: Return the signal to continue with from ptrace_stop
ptrace: Move setting/clearing ptrace_message into ptrace_stop
tracehook: Remove tracehook.h
resume_user_mode: Move to resume_user_mode.h
resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
task_work: Call tracehook_notify_signal from get_signal on all architectures
task_work: Introduce task_work_pending
task_work: Remove unnecessary include from posix_timers.h
ptrace: Remove tracehook_signal_handler
ptrace: Remove arch_syscall_{enter,exit}_tracehook
ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
ptrace/arm: Rename tracehook_report_syscall report_syscall
ptrace: Move ptrace_report_syscall into ptrace.h
Merge some more commits from our KVM topic branch. In particular this
brings in some commits that depend on a new capability that was merged
via the KVM tree for v5.18.
coarse grained, hardware based, forward edge Control-Flow-Integrity mechanism
where any indirect CALL/JMP must target an ENDBR instruction or suffer #CP.
Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation is
limited to 2 instructions (and typically fewer) on branch targets not starting
with ENDBR. CET-IBT also limits speculation of the next sequential instruction
after the indirect CALL/JMP [1].
CET-IBT is fundamentally incompatible with retpolines, but provides, as
described above, speculation limits itself.
[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEv3OU3/byMaA0LqWJdkfhpEvA5LoFAmI/LI8VHHBldGVyekBp
bmZyYWRlYWQub3JnAAoJEHZH4aRLwOS6ZnkP/2QCgQLTu6oRxv9O020CHwlaSEeD
1Hoy3loum5q5hAi1Ik3dR9p0H5u64c9qbrBVxaFoNKaLt5GKrtHaDSHNk2L/CFHX
urpH65uvTLxbyZzcahkAahoJ71XU+m7PcrHLWMunw9sy10rExYVsUOlFyoyG6XCF
BDCNZpdkC09ZM3vwlWGMZd5Pp+6HcZNPyoV9tpvWAS2l+WYFWAID7mflbpQ+tA8b
y/hM6b3Ud0rT2ubuG1iUpopgNdwqQZ+HisMPGprh+wKZkYwS2l8pUTrz0MaBkFde
go7fW16kFy2HQzGm6aIEBmfcg0palP/mFVaWP0zS62LwhJSWTn5G6xWBr3yxSsht
9gWCiI0oDZuTg698MedWmomdG2SK6yAuZuqmdKtLLoWfWgviPEi7TDFG/cKtZdAW
ag8GM8T4iyYZzpCEcWO9GWbjo6TTGq30JBQefCBG47GjD0csv2ubXXx0Iey+jOwT
x3E8wnv9dl8V9FSd/tMpTFmje8ges23yGrWtNpb5BRBuWTeuGiBPZED2BNyyIf+T
dmewi2ufNMONgyNp27bDKopY81CPAQq9cVxqNm9Cg3eWPFnpOq2KGYEvisZ/rpEL
EjMQeUBsy/C3AUFAleu1vwNnkwP/7JfKYpN00gnSyeQNZpqwxXBCKnHNgOMTXyJz
beB/7u2KIUbKEkSN
=jZfK
-----END PGP SIGNATURE-----
Merge tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 CET-IBT (Control-Flow-Integrity) support from Peter Zijlstra:
"Add support for Intel CET-IBT, available since Tigerlake (11th gen),
which is a coarse grained, hardware based, forward edge
Control-Flow-Integrity mechanism where any indirect CALL/JMP must
target an ENDBR instruction or suffer #CP.
Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation
is limited to 2 instructions (and typically fewer) on branch targets
not starting with ENDBR. CET-IBT also limits speculation of the next
sequential instruction after the indirect CALL/JMP [1].
CET-IBT is fundamentally incompatible with retpolines, but provides,
as described above, speculation limits itself"
[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html
* tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
kvm/emulate: Fix SETcc emulation for ENDBR
x86/Kconfig: Only allow CONFIG_X86_KERNEL_IBT with ld.lld >= 14.0.0
x86/Kconfig: Only enable CONFIG_CC_HAS_IBT for clang >= 14.0.0
kbuild: Fixup the IBT kbuild changes
x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy
x86: Remove toolchain check for X32 ABI capability
x86/alternative: Use .ibt_endbr_seal to seal indirect calls
objtool: Find unused ENDBR instructions
objtool: Validate IBT assumptions
objtool: Add IBT/ENDBR decoding
objtool: Read the NOENDBR annotation
x86: Annotate idtentry_df()
x86,objtool: Move the ASM_REACHABLE annotation to objtool.h
x86: Annotate call_on_stack()
objtool: Rework ASM_REACHABLE
x86: Mark __invalid_creds() __noreturn
exit: Mark do_group_exit() __noreturn
x86: Mark stop_this_cpu() __noreturn
objtool: Ignore extra-symbol code
objtool: Rename --duplicate to --lto
...
- Enforce kernel RO, and implement STRICT_MODULE_RWX for 603.
- Add support for livepatch to 32-bit.
- Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS.
- Merge vdso64 and vdso32 into a single directory.
- Fix build errors with newer binutils.
- Add support for UADDR64 relocations, which are emitted by some toolchains. This allows
powerpc to build with the latest lld.
- Fix (another) potential userspace r13 corruption in transactional memory handling.
- Cleanups of function descriptor handling & related fixes to LKDTM.
Thanks to: Abdul Haleem, Alexey Kardashevskiy, Anders Roxell, Aneesh Kumar K.V, Anton
Blanchard, Arnd Bergmann, Athira Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chen
Jingwen, Christophe JAILLET, Christophe Leroy, Corentin Labbe, Daniel Axtens, Daniel
Henrique Barboza, David Dai, Fabiano Rosas, Ganesh Goudar, Guo Zhengkui, Hangyu Hua, Haren
Myneni, Hari Bathini, Igor Zhbanov, Jakob Koschel, Jason Wang, Jeremy Kerr, Joachim
Wiberg, Jordan Niethe, Julia Lawall, Kajol Jain, Kees Cook, Laurent Dufour, Madhavan
Srinivasan, Mamatha Inamdar, Maxime Bizon, Maxim Kiselev, Maxim Kochetkov, Michal
Suchanek, Nageswara R Sastry, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nour-eddine
Taleb, Paul Menzel, Ping Fang, Pratik R. Sampat, Randy Dunlap, Ritesh Harjani, Rohan
McLure, Russell Currey, Sachin Sant, Segher Boessenkool, Shivaprasad G Bhat, Sourabh Jain,
Thierry Reding, Tobias Waldekranz, Tyrel Datwyler, Vaibhav Jain, Vladimir Oltean, Wedson
Almeida Filho, YueHaibing.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmI9TtQTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgLp2D/0dwoliEJubRCfoawYUGhxTRZuo6ZYw
EQzprOiFA/MtrZyPfbrX/FwxeeetzQWysaw2r5JAuQwx5Jb7Od9dNIrVmueFEktC
hD4fkO8YT+QuOD3Xhp/rDQTImdw4fkeofIjnWIqEAtz0XGInmiRQKOnojVe/Po7f
72Yi1u80LxYBAnkN/Hhpmi/BsVmu0Nh3wELu+JZopQXjINj4RyD49ayCBSLbmiNc
uo7oYzJ0/WsZHNTpX9kAzzCq+XmI3dKZPyf2AOCvoRxJTmUPCRZF9QCwsmQFikiI
vZOdz4fI5e+C0aYJj8ODmWMrXiS+JUQdEShjGg9t9K6EN8idC8joKWpAuXjTA9KN
kRjzXX7AvjxaMEGbLe8gjU0PmEjY3eSzMOy15Oc/C0DRRswXRzrXdx2AF+/J6bQb
MWMM4aCKfcYs5/TENkEnV0xpbOCOy4ikHM1KZbxvVrShvjSlNIL9XTOnl/pNK5BJ
XSSI2mfnjKkbI1+l0KQ4NBXIRTo6HLpu5jwY3Xh97Tq7kaEfqDbO5p2P2HoOCiLa
ZjdzmpP99zM6wnqUSj+lyvjob7btyhoq6TKmPtxfKbR6OaSfRJ760BCJ5y15Y9Hc
rHey4Y/NL7LqsVYFZxi4/T6Ncq1hNeYr2Fiis4gH+/1zjr6Cd4othnvw3Slaxhst
AaHpN3pyx1QI6g==
=8r2c
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Livepatch support for 32-bit is probably the standout new feature,
otherwise mostly just lots of bits and pieces all over the board.
There's a series of commits cleaning up function descriptor handling,
which touches a few other arches as well as LKDTM. It has acks from
Arnd, Kees and Helge.
Summary:
- Enforce kernel RO, and implement STRICT_MODULE_RWX for 603.
- Add support for livepatch to 32-bit.
- Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS.
- Merge vdso64 and vdso32 into a single directory.
- Fix build errors with newer binutils.
- Add support for UADDR64 relocations, which are emitted by some
toolchains. This allows powerpc to build with the latest lld.
- Fix (another) potential userspace r13 corruption in transactional
memory handling.
- Cleanups of function descriptor handling & related fixes to LKDTM.
Thanks to Abdul Haleem, Alexey Kardashevskiy, Anders Roxell, Aneesh
Kumar K.V, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Bhaskar
Chowdhury, Cédric Le Goater, Chen Jingwen, Christophe JAILLET,
Christophe Leroy, Corentin Labbe, Daniel Axtens, Daniel Henrique
Barboza, David Dai, Fabiano Rosas, Ganesh Goudar, Guo Zhengkui, Hangyu
Hua, Haren Myneni, Hari Bathini, Igor Zhbanov, Jakob Koschel, Jason
Wang, Jeremy Kerr, Joachim Wiberg, Jordan Niethe, Julia Lawall, Kajol
Jain, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mamatha Inamdar,
Maxime Bizon, Maxim Kiselev, Maxim Kochetkov, Michal Suchanek,
Nageswara R Sastry, Nathan Lynch, Naveen N. Rao, Nicholas Piggin,
Nour-eddine Taleb, Paul Menzel, Ping Fang, Pratik R. Sampat, Randy
Dunlap, Ritesh Harjani, Rohan McLure, Russell Currey, Sachin Sant,
Segher Boessenkool, Shivaprasad G Bhat, Sourabh Jain, Thierry Reding,
Tobias Waldekranz, Tyrel Datwyler, Vaibhav Jain, Vladimir Oltean,
Wedson Almeida Filho, and YueHaibing"
* tag 'powerpc-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (179 commits)
powerpc/pseries: Fix use after free in remove_phb_dynamic()
powerpc/time: improve decrementer clockevent processing
powerpc/time: Fix KVM host re-arming a timer beyond decrementer range
powerpc/tm: Fix more userspace r13 corruption
powerpc/xive: fix return value of __setup handler
powerpc/64: Add UADDR64 relocation support
powerpc: 8xx: fix a return value error in mpc8xx_pic_init
powerpc/ps3: remove unneeded semicolons
powerpc/64: Force inlining of prevent_user_access() and set_kuap()
powerpc/bitops: Force inlining of fls()
powerpc: declare unmodified attribute_group usages const
powerpc/spufs: Fix build warning when CONFIG_PROC_FS=n
powerpc/secvar: fix refcount leak in format_show()
powerpc/64e: Tie PPC_BOOK3E_64 to PPC_FSL_BOOK3E
powerpc: Move C prototypes out of asm-prototypes.h
powerpc/kexec: Declare kexec_paca static
powerpc/smp: Declare current_set static
powerpc: Cleanup asm-prototypes.c
powerpc/ftrace: Use STK_GOT in ftrace_mprofile.S
powerpc/ftrace: Regroup PPC64 specific operations in ftrace_mprofile.S
...
... and call node_dev_init() after memory_dev_init() from driver_init(),
so before any of the existing arch/subsys calls. All online nodes should
be known at that point: early during boot, arch code determines node and
zone ranges and sets the relevant nodes online; usually this happens in
setup_arch().
This is in line with memory_dev_init(), which initializes the memory
device subsystem and creates all memory block devices.
Similar to memory_dev_init(), panic() if anything goes wrong, we don't
want to continue with such basic initialization errors.
The important part is that node_dev_init() gets called after
memory_dev_init() and after cpu_dev_init(), but before any of the relevant
archs call register_cpu() to register the new cpu device under the node
device. The latter should be the case for the current users of
topology_init().
Link: https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Tested-by: Anatoly Pugachev <matorola@gmail.com> (sparc64)
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With commit a4e92ce8e4 ("powerpc/fadump: Reservationless firmware
assisted dump"), Linux kernel's Contiguous Memory Allocator (CMA) based
reservation was introduced in fadump. That change was aimed at using CMA
to let applications utilize the memory reserved for fadump while blocking
it from being used for kernel pages. The assumption was, even if CMA
activation fails for whatever reason, the memory still remains reserved to
avoid it from being used for kernel pages. But commit 072355c1cf
("mm/cma: expose all pages to the buddy if activation of an area fails")
breaks this assumption as it started exposing all pages to buddy allocator
on CMA activation failure. It led to warning messages like below while
running crash-utility on vmcore of a kernel having above two commits:
crash: seek error: kernel virtual address: <from reserved region>
To fix this problem, opt out from exposing pages to buddy allocator on CMA
activation failure for fadump reserved memory.
Link: https://lkml.kernel.org/r/20220117075246.36072-3-hbathini@linux.ibm.com
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>