-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmDPuyMeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGvxgH/RKvSuRPwkJ2Jcp9
VLi5kCbqtJlYLq6tB6peSJ8otKgxkcRwC0pIY4LlYIAWYboktLQ5RKp/9nB2h2FN
aMZUMu6AI/lVJyFMI5MnKnJIUiUq+WXR3lSSlw68vwFLFdzqUZFNq+bqeiVvnIy1
yqA6naj24Tu/RbYffQoPvdSJcU2SLXRMxwD8HRGiU2d51RaFsOvsZvF+P5TVcsEV
ZmttJeER21CaI/A809eqaFmyGrUOcZZK9roZEbMwanTZOMw18biEsLu/UH4kBX01
JC4+RlGxcWjQ5YNZgChsgoOK/CHzc6ITztTntdeDWAvwZjQFzV7pCy4/3BWne3O+
5178yHM=
=o8cN
-----END PGP SIGNATURE-----
Backmerge tag 'v5.13-rc7' into drm-next
Backmerge Linux 5.13-rc7 to make some pulls from later bases apply,
and to bake in the conflicts so far.
(There's a lot more in the pipe):
- Prevent corruption of the XSTATE buffer in signal handling by
validating what is being copied from userspace first.
- Invalidate other task's preserved FPU registers on XRSTOR failure
(#PF) because latter can still modify some of them.
- Restore the proper PKRU value in case userspace modified it
- Reset FPU state when signal restoring fails
Other:
- Map EFI boot services data memory as encrypted in a SEV guest so that
the guest can access it and actually boot properly
- Two SGX correctness fixes: proper resources freeing and a NUMA fix
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmDO5vQACgkQEsHwGGHe
VUrUjw//fRU8BPZ3/SWNQO188QhHdFpm3jqtjRJsZD1FfnnLdxIg2SCP4RjFxv+Y
eFyN0nYLekG8a3CMV081H9Rhr5tt3bflk0oTcGAar7m2qQiCiqaAH0wptIlQonSu
nQCSs+PeaaK4nRCtW+TUJnwG0ZU/y7fEXa3pxJ6hSMnxZjz3lj70zKhpA1nQtqRZ
OOStvBNtaWcDdTTE4r8XuFIxuMUUEuwHlQQmkAVHQYUf6vxGYfnDYEg83Wddvq1E
1leSRNFlLcCAbPUV/fax3KGvaekeJ1U411uWqXlain6m105+mk+irmrLxtur/lJ5
cWTVb5CbIHFZnJvC5jzNPv/03GbIIQaVm4jPI2qB1AZbjcVlAPKj1Ne+U1fzvmDT
wNUob/rnIXiGptvtUMNYGURxBTj65Nnom3iAJV+AdMOThDwYMvsJJjFkMnC5wO2n
ZAexumWPnUzWoxSMTraT7a6b/kilFUrcPljxSrFd9yVeU8E6a1OSW35oWoQ3itrc
xx/ne8RodLmCPC9DjecFcQR+qUuXsF+XCCj07QpfKNTAObr17e9nsKJneR6MX79C
Lpc7Ka/CiTGYcebWX7tqtjwGPfa6iqekswxYRRp7j54bQ4sHmKyordZy0Q8+c079
gmMlPdNbqQg3YwHyXW2yeJETDS1HBp61RRojAP15BsL73wyYQNE=
=AuXr
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v5.13_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
"A first set of urgent fixes to the FPU/XSTATE handling mess^W code.
(There's a lot more in the pipe):
- Prevent corruption of the XSTATE buffer in signal handling by
validating what is being copied from userspace first.
- Invalidate other task's preserved FPU registers on XRSTOR failure
(#PF) because latter can still modify some of them.
- Restore the proper PKRU value in case userspace modified it
- Reset FPU state when signal restoring fails
Other:
- Map EFI boot services data memory as encrypted in a SEV guest so
that the guest can access it and actually boot properly
- Two SGX correctness fixes: proper resources freeing and a NUMA fix"
* tag 'x86_urgent_for_v5.13_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Avoid truncating memblocks for SGX memory
x86/sgx: Add missing xa_destroy() when virtual EPC is destroyed
x86/fpu: Reset state for all signal restore failures
x86/pkru: Write hardware init value to PKRU when xstate is init
x86/process: Check PF_KTHREAD and not current->mm for kernel threads
x86/fpu: Invalidate FPU state after a failed XRSTOR from a user buffer
x86/fpu: Prevent state corruption in __fpu__restore_sig()
x86/ioremap: Map EFI-reserved memory as encrypted for SEV
xa_destroy() needs to be called to destroy a virtual EPC's page array
before calling kfree() to free the virtual EPC. Currently it is not
called so add the missing xa_destroy().
Fixes: 540745ddbc ("x86/sgx: Introduce virtual EPC for use by KVM guests")
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Tested-by: Yang Zhong <yang.zhong@intel.com>
Link: https://lkml.kernel.org/r/20210615101639.291929-1-kai.huang@intel.com
- Fix the NMI watchdog on ancient Intel CPUs
- Remove a misguided, NMI-unsafe KASAN callback
from the NMI-safe irq_work path used by perf.
- Fix uncore events on Ice Lake servers.
- Someone booted maxcpus=1 on an SNB-EP, and the
uncore driver emitted warnings and was probably
buggy. Fix it.
- KCSAN found a genuine data race in the core perf
code. Somewhat ironically the bug was introduced
through a recent race fix. :-/ In our defense, the
new race window was much more narrow. Fix it.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmDErJkRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1gjNxAAhWPl+zsVr+bMZGQVnjPf7swXSaqsphtU
LrP0hrs4nH0JiB7lZJVjPhCMQKXb+gvP0CTmxkOXmNORDKDK3slIS/zp9uyH1F+d
nXhmWi7c1bHU0vortnv87LGJpeeI1E7rQ/uBxK6b2v6kOBmCnjvQEiPvJEIGTtpE
YimVBERdPDTBQiW5EQbbyL3VScwm5QUN2STnLPjUtVc9HES/zCdhXNlsASfhn/Tn
8rlSAqVEOUcsTpTXYadHckNi1zn4zrpuhWKpSHXrvXCo3qU8QpISjYNwAJ/0IGBj
CMdg2r+MneF6gop76R5aRcA0JDvDgtv56LKFVhi9gEkE5em9YAni17HU0IeTvJmT
mL9j64h8oUErC/TpAU1vXCJjIxH7jLq8YQoNwHUvF0pSvcNGsaFeWu1ADQuTEIi9
fyKHRpFwPMBhwc+AMaRepgQ9FlvE4567fQmwlrUDUKlCU0x0dfvFCM2z/o61YFlH
oFgB0h0SNxdoj5EXny50LtokP1Kp/oBNVhhNsUpH8wVxWLi61BHJOslcc7nzdP6t
JBqVE6bLQlxmlKt2AwiOkxe9xVv34o3AMxUYtUBYgCTZSlRjL//7pcqgG5r+CZH/
eXEU3wWcGtRPEItGXtiGT9Vm2ZYSaUMFF7k7OrTPCHgkW51oEW4FUoaV7M+9fl43
638x9Wnse4Q=
=9LoT
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-2021-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Misc fixes:
- Fix the NMI watchdog on ancient Intel CPUs
- Remove a misguided, NMI-unsafe KASAN callback from the NMI-safe
irq_work path used by perf.
- Fix uncore events on Ice Lake servers.
- Someone booted maxcpus=1 on an SNB-EP, and the uncore driver
emitted warnings and was probably buggy. Fix it.
- KCSAN found a genuine data race in the core perf code. Somewhat
ironically the bug was introduced through a recent race fix. :-/
In our defense, the new race window was much more narrow. Fix it"
* tag 'perf-urgent-2021-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/nmi_watchdog: Fix old-style NMI watchdog regression on old Intel CPUs
irq_work: Make irq_work_queue() NMI-safe again
perf/x86/intel/uncore: Fix M2M event umask for Ice Lake server
perf/x86/intel/uncore: Fix a kernel WARNING triggered by maxcpus=1
perf: Fix data race between pin_count increment/decrement
The following commit:
3a4ac121c2 ("x86/perf: Add hardware performance events support for Zhaoxin CPU.")
Got the old-style NMI watchdog logic wrong and broke it for basically every
Intel CPU where it was active. Which is only truly old CPUs, so few people noticed.
On CPUs with perf events support we turn off the old-style NMI watchdog, so it
was pretty pointless to add the logic for X86_VENDOR_ZHAOXIN to begin with ... :-/
Anyway, the fix is to restore the old logic and add a 'break'.
[ mingo: Wrote a new changelog. ]
Fixes: 3a4ac121c2 ("x86/perf: Add hardware performance events support for Zhaoxin CPU.")
Signed-off-by: CodyYao-oc <CodyYao-oc@zhaoxin.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210607025335.9643-1-CodyYao-oc@zhaoxin.com
If access_ok() or fpregs_soft_set() fails in __fpu__restore_sig() then the
function just returns but does not clear the FPU state as it does for all
other fatal failures.
Clear the FPU state for these failures as well.
Fixes: 72a671ced6 ("x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/87mtryyhhz.ffs@nanos.tec.linutronix.de
Both Intel and AMD consider it to be architecturally valid for XRSTOR to
fail with #PF but nonetheless change the register state. The actual
conditions under which this might occur are unclear [1], but it seems
plausible that this might be triggered if one sibling thread unmaps a page
and invalidates the shared TLB while another sibling thread is executing
XRSTOR on the page in question.
__fpu__restore_sig() can execute XRSTOR while the hardware registers
are preserved on behalf of a different victim task (using the
fpu_fpregs_owner_ctx mechanism), and, in theory, XRSTOR could fail but
modify the registers.
If this happens, then there is a window in which __fpu__restore_sig()
could schedule out and the victim task could schedule back in without
reloading its own FPU registers. This would result in part of the FPU
state that __fpu__restore_sig() was attempting to load leaking into the
victim task's user-visible state.
Invalidate preserved FPU registers on XRSTOR failure to prevent this
situation from corrupting any state.
[1] Frequent readers of the errata lists might imagine "complex
microarchitectural conditions".
Fixes: 1d731e731c ("x86/fpu: Add a fastpath to __fpu__restore_sig()")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rik van Riel <riel@surriel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210608144345.758116583@linutronix.de
The non-compacted slowpath uses __copy_from_user() and copies the entire
user buffer into the kernel buffer, verbatim. This means that the kernel
buffer may now contain entirely invalid state on which XRSTOR will #GP.
validate_user_xstate_header() can detect some of that corruption, but that
leaves the onus on callers to clear the buffer.
Prior to XSAVES support, it was possible just to reinitialize the buffer,
completely, but with supervisor states that is not longer possible as the
buffer clearing code split got it backwards. Fixing that is possible but
not corrupting the state in the first place is more robust.
Avoid corruption of the kernel XSAVE buffer by using copy_user_to_xstate()
which validates the XSAVE header contents before copying the actual states
to the kernel. copy_user_to_xstate() was previously only called for
compacted-format kernel buffers, but it works for both compacted and
non-compacted forms.
Using it for the non-compacted form is slower because of multiple
__copy_from_user() operations, but that cost is less important than robust
code in an already slow path.
[ Changelog polished by Dave Hansen ]
Fixes: b860eb8dce ("x86/fpu/xstate: Define new functions for clearing fpregs and xstates")
Reported-by: syzbot+2067e764dbcd10721e2e@syzkaller.appspotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rik van Riel <riel@surriel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210608144345.611833074@linutronix.de
Let's reserve JSL stolen memory for graphics.
JasperLake is a gen11 platform which is compatible with
ICL/EHL changes.
This was missed in commit 24ea098b7c ("drm/i915/jsl: Split
EHL/JSL platform info and PCI ids")
V2:
- Added maintainer list in cc
- Added patch ref in commit message
V1:
- Added Cc: x86@kernel.org
Fixes: 24ea098b7c ("drm/i915/jsl: Split EHL/JSL platform info and PCI ids")
Cc: <stable@vger.kernel.org> # v5.11+
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210608053411.394166-1-tejaskumarx.surendrakumar.upadhyay@intel.com
There are BIOSes that are known to corrupt the memory under 1M, or more
precisely under 640K because the memory above 640K is anyway reserved
for the EGA/VGA frame buffer and BIOS.
To prevent usage of the memory that will be potentially clobbered by the
kernel, the beginning of the memory is always reserved. The exact size
of the reserved area is determined by CONFIG_X86_RESERVE_LOW build time
and the "reservelow=" command line option. The reserved range may be
from 4K to 640K with the default of 64K. There are also configurations
that reserve the entire 1M range, like machines with SandyBridge graphic
devices or systems that enable crash kernel.
In addition to the potentially clobbered memory, EBDA of unknown size may
be as low as 128K and the memory above that EBDA start is also reserved
early.
It would have been possible to reserve the entire range under 1M unless for
the real mode trampoline that must reside in that area.
To accommodate placement of the real mode trampoline and keep the memory
safe from being clobbered by BIOS, reserve the first 64K of RAM before
memory allocations are possible and then, after the real mode trampoline
is allocated, reserve the entire range from 0 to 1M.
Update trim_snb_memory() and reserve_real_mode() to avoid redundant
reservations of the same memory range.
Also make sure the memory under 1M is not getting freed by
efi_free_boot_services().
[ bp: Massage commit message and comments. ]
Fixes: a799c2bd29 ("x86/setup: Consolidate early memory reservations")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Hugh Dickins <hughd@google.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213177
Link: https://lkml.kernel.org/r/20210601075354.5149-2-rppt@kernel.org
Up until now the assumption was that an alternative patching site would
have some instructions at the beginning and trailing single-byte NOPs
(0x90) padding. Therefore, the patching machinery would go and optimize
those single-byte NOPs into longer ones.
However, this assumption is broken on 32-bit when code like
hv_do_hypercall() in hyperv_init() would use the ratpoline speculation
killer CALL_NOSPEC. The 32-bit version of that macro would align certain
insns to 16 bytes, leading to the compiler issuing a one or more
single-byte NOPs, depending on the holes it needs to fill for alignment.
That would lead to the warning in optimize_nops() to fire:
------------[ cut here ]------------
Not a NOP at 0xc27fb598
WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:211 optimize_nops.isra.13
due to that function verifying whether all of the following bytes really
are single-byte NOPs.
Therefore, carve out the NOP padding into a separate function and call
it for each NOP range beginning with a single-byte NOP.
Fixes: 23c1ad538f ("x86/alternatives: Optimize optimize_nops()")
Reported-by: Richard Narron <richard@aaazen.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213301
Link: https://lkml.kernel.org/r/20210601212125.17145-1-bp@alien8.de
While digesting the XSAVE-related horrors which got introduced with
the supervisor/user split, the recent addition of ENQCMD-related
functionality got on the radar and turned out to be similarly broken.
update_pasid(), which is only required when X86_FEATURE_ENQCMD is
available, is invoked from two places:
1) From switch_to() for the incoming task
2) Via a SMP function call from the IOMMU/SMV code
#1 is half-ways correct as it hacks around the brokenness of get_xsave_addr()
by enforcing the state to be 'present', but all the conditionals in that
code are completely pointless for that.
Also the invocation is just useless overhead because at that point
it's guaranteed that TIF_NEED_FPU_LOAD is set on the incoming task
and all of this can be handled at return to user space.
#2 is broken beyond repair. The comment in the code claims that it is safe
to invoke this in an IPI, but that's just wishful thinking.
FPU state of a running task is protected by fregs_lock() which is
nothing else than a local_bh_disable(). As BH-disabled regions run
usually with interrupts enabled the IPI can hit a code section which
modifies FPU state and there is absolutely no guarantee that any of the
assumptions which are made for the IPI case is true.
Also the IPI is sent to all CPUs in mm_cpumask(mm), but the IPI is
invoked with a NULL pointer argument, so it can hit a completely
unrelated task and unconditionally force an update for nothing.
Worse, it can hit a kernel thread which operates on a user space
address space and set a random PASID for it.
The offending commit does not cleanly revert, but it's sufficient to
force disable X86_FEATURE_ENQCMD and to remove the broken update_pasid()
code to make this dysfunctional all over the place. Anything more
complex would require more surgery and none of the related functions
outside of the x86 core code are blatantly wrong, so removing those
would be overkill.
As nothing enables the PASID bit in the IA32_XSS MSR yet, which is
required to make this actually work, this cannot result in a regression
except for related out of tree train-wrecks, but they are broken already
today.
Fixes: 20f0afd1fb ("x86/mmu: Allocate/free a PASID")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/87mtsd6gr9.ffs@nanos.tec.linutronix.de
There are machines out there with added value crap^WBIOS which provide an
SMI handler for the local APIC thermal sensor interrupt. Out of reset,
the BSP on those machines has something like 0x200 in that APIC register
(timestamps left in because this whole issue is timing sensitive):
[ 0.033858] read lvtthmr: 0x330, val: 0x200
which means:
- bit 16 - the interrupt mask bit is clear and thus that interrupt is enabled
- bits [10:8] have 010b which means SMI delivery mode.
Now, later during boot, when the kernel programs the local APIC, it
soft-disables it temporarily through the spurious vector register:
setup_local_APIC:
...
/*
* If this comes from kexec/kcrash the APIC might be enabled in
* SPIV. Soft disable it before doing further initialization.
*/
value = apic_read(APIC_SPIV);
value &= ~APIC_SPIV_APIC_ENABLED;
apic_write(APIC_SPIV, value);
which means (from the SDM):
"10.4.7.2 Local APIC State After It Has Been Software Disabled
...
* The mask bits for all the LVT entries are set. Attempts to reset these
bits will be ignored."
And this happens too:
[ 0.124111] APIC: Switch to symmetric I/O mode setup
[ 0.124117] lvtthmr 0x200 before write 0xf to APIC 0xf0
[ 0.124118] lvtthmr 0x10200 after write 0xf to APIC 0xf0
This results in CPU 0 soft lockups depending on the placement in time
when the APIC soft-disable happens. Those soft lockups are not 100%
reproducible and the reason for that can only be speculated as no one
tells you what SMM does. Likely, it confuses the SMM code that the APIC
is disabled and the thermal interrupt doesn't doesn't fire at all,
leading to CPU 0 stuck in SMM forever...
Now, before
4f432e8bb1 ("x86/mce: Get rid of mcheck_intel_therm_init()")
due to how the APIC_LVTTHMR was read before APIC initialization in
mcheck_intel_therm_init(), it would read the value with the mask bit 16
clear and then intel_init_thermal() would replicate it onto the APs and
all would be peachy - the thermal interrupt would remain enabled.
But that commit moved that reading to a later moment in
intel_init_thermal(), resulting in reading APIC_LVTTHMR on the BSP too
late and with its interrupt mask bit set.
Thus, revert back to the old behavior of reading the thermal LVT
register before the APIC gets initialized.
Fixes: 4f432e8bb1 ("x86/mce: Get rid of mcheck_intel_therm_init()")
Reported-by: James Feeney <james@nurealm.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lkml.kernel.org/r/YKIqDdFNaXYd39wz@zn.tnic
PIC interrupts do not support affinity setting and they can end up on
any online CPU. Therefore, it's required to mark the associated vectors
as system-wide reserved. Otherwise, the corresponding irq descriptors
are copied to the secondary CPUs but the vectors are not marked as
assigned or reserved. This works correctly for the IO/APIC case.
When the IO/APIC is disabled via config, kernel command line or lack of
enumeration then all legacy interrupts are routed through the PIC, but
nothing marks them as system-wide reserved vectors.
As a consequence, a subsequent allocation on a secondary CPU can result in
allocating one of these vectors, which triggers the BUG() in
apic_update_vector() because the interrupt descriptor slot is not empty.
Imran tried to work around that by marking those interrupts as allocated
when a CPU comes online. But that's wrong in case that the IO/APIC is
available and one of the legacy interrupts, e.g. IRQ0, has been switched to
PIC mode because then marking them as allocated will fail as they are
already marked as system vectors.
Stay consistent and update the legacy vectors after attempting IO/APIC
initialization and mark them as system vectors in case that no IO/APIC is
available.
Fixes: 69cde0004a ("x86/vector: Use matrix allocator for vector assignment")
Reported-by: Imran Khan <imran.f.khan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210519233928.2157496-1-imran.f.khan@oracle.com
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmCqzFgeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGIgQH/3nAV/fYbUCubEQe
RXUcjMGznIpdHeMiY/hPezObYnpBI3UAi2JwHCvQfoE8ckbx4tq8Xp+TUWebsdaf
zpDhKXDj2jHha1f5AixHCn1UFxiqOSn3d2muY2Bh1Nhg7iJuzU8xjIMCcOdss+fp
8e4wqidOHkpWvGJ96CQ5zCNxeXI+/f7VX2IgdJ+RCDwzbqJlIvvXwAkg1KrguUEz
EPmhpODqjPbVVc/mhtguMLMWl78WKCTBOSHCcYBolatXfm2ojsnX1hXprypWY4Mg
vKXxF/91AS8InCC08Jw+puz+fXDBx1jtNmFFhDOFTyz/TvwPaKZiWbAeXOZFJA2Z
Wm4su7g=
=cqxg
-----END PGP SIGNATURE-----
Merge v5.13-rc3 into drm-next
drm/i915 is extremely on fire without the below revert from -rc3:
commit 293837b9ac
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed May 19 05:55:57 2021 -1000
Revert "i915: fix remap_io_sg to verify the pgprot"
Backmerge so we don't have a too wide bisect window for anything
that's a more involved workload than booting the driver.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
of killing the machine and by using the accessors with the exact functionality
needed when accessing memory.
- Fix a confusion with Clang LTO compiler switches passed to the it
- Handle the case gracefully when VMGEXIT has been executed in userspace
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmCqKdwACgkQEsHwGGHe
VUrnfBAAitJ9ytn5PzrLhg9cKt+BRVg8QQExWUYqOrSDXHus5+X/21YKey7BBhIj
rMJSHi7qytO5rrfj5nw3dIH30hnat8nn5GWcNMG0hi1ptep+GP0xMG1nGw7INJDW
85FpQI9jpO+vz0AcoZYAtSOWbwonVqbhjdHGzDhIi2e0Qt+1uKbjsT+iPxANBpyB
fyEU3biPyWfKY4JSr1n0EHBywR329IW5I+yZInb2SBEU42V4vDBGFCXgdS8eFGo5
KPz/bikERC/gZuDIRXDP6riKIpy1yCO1JZb0EgukwDddbzNz/ox7dX9JL+dEeRzl
0zr28cJSoZgYQjdi3LU412CMVa8eYw7Ca0/mbhADdZK6Wd7xUNEiUR7FFoBA2Jxp
+oYzYe4KvlsaFQyPrt8mfJDA36r+FZcqr3WJF+LYmPbRi+cbNDbKSoeDqShAh+Fq
uUVNloWiOltsRuCS5/du8qzhmJLdIH1uFqtYK37PGLzAHz+KJ9SAdLWaYaLx4GFd
rrFuCnk5DmoDf3I5lQvIzIEmYysEQOloGgDR6dDaPFRymOgor7BsCdR+dtxVQ6P6
SMSUzyJLq4tC4dzT5PxWfZDlO+wIxu5QAOhu95oWIdZbsaoABZYCuLf7T7XQr9PA
DLil4v4i7/FGpDBh+2s3V5hTXHKATuI7SGXnMNfx1eLurChg07k=
=51BK
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v5.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Fix how SEV handles MMIO accesses by forwarding potential page faults
instead of killing the machine and by using the accessors with the
exact functionality needed when accessing memory.
- Fix a confusion with Clang LTO compiler switches passed to the it
- Handle the case gracefully when VMGEXIT has been executed in
userspace
* tag 'x86_urgent_for_v5.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev-es: Use __put_user()/__get_user() for data accesses
x86/sev-es: Forward page-faults which happen during emulation
x86/sev-es: Don't return NULL from sev_es_get_ghcb()
x86/build: Fix location of '-plugin-opt=' flags
x86/sev-es: Invalidate the GHCB after completing VMGEXIT
x86/sev-es: Move sev_es_put_ghcb() in prep for follow on patch
Pull siginfo fix from Eric Biederman:
"During the merge window an issue with si_perf and the siginfo ABI came
up. The alpha and sparc siginfo structure layout had changed with the
addition of SIGTRAP TRAP_PERF and the new field si_perf.
The reason only alpha and sparc were affected is that they are the
only architectures that use si_trapno.
Looking deeper it was discovered that si_trapno is used for only a few
select signals on alpha and sparc, and that none of the other
_sigfault fields past si_addr are used at all. Which means technically
no regression on alpha and sparc.
While the alignment concerns might be dismissed the abuse of si_errno
by SIGTRAP TRAP_PERF does have the potential to cause regressions in
existing userspace.
While we still have time before userspace starts using and depending
on the new definition siginfo for SIGTRAP TRAP_PERF this set of
changes cleans up siginfo_t.
- The si_trapno field is demoted from magic alpha and sparc status
and made an ordinary union member of the _sigfault member of
siginfo_t. Without moving it of course.
- si_perf is replaced with si_perf_data and si_perf_type ending the
abuse of si_errno.
- Unnecessary additions to signalfd_siginfo are removed"
* 'for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signalfd: Remove SIL_PERF_EVENT fields from signalfd_siginfo
signal: Deliver all of the siginfo perf data in _perf
signal: Factor force_sig_perf out of perf_sigtrap
signal: Implement SIL_FAULT_TRAPNO
siginfo: Move si_trapno inside the union inside _si_fault
- drm: Rename DP_PSR_SELECTIVE_UPDATE to better mach eDP spec (Jose).
Driver Changes:
- Display plane clock rates fixes and improvements (Ville).
- Uninint DMC FW loader state during shutdown (Imre).
- Convert snprintf to sysfs_emit (Xuezhi).
- Fix invalid access to ACPI _DSM objects (Takashi).
- A big refactor around how i915 addresses the graphics
and display IP versions. (Matt, Lucas).
- Backlight fix (Lyude).
- Display watermark and DBUF fixes (Ville).
- HDCP fix (Anshuman).
- Improve cases where display is not available (Jose).
- Defeature PSR2 for RKL and ALD-S (Jose).
- VLV DSI panel power fixes and improvements (Hans).
- display-12 workaround (Jose).
- Fix modesetting (Imre).
- Drop redundant address-of op before lttpr_common_caps array (Imre).
- Fix compiler checks (Jose, Jason).
- GLK display fixes (Ville).
- Fix error code returns (Dan).
- eDP novel: back again to slow and wide link training everywhere (Kai-Heng).
- Abstract DMC FW path (Rodrigo).
- Preparation and changes for upcoming
XeLPD display IP (Jose, Matt, Ville, Juha-Pekka, Animesh).
- Fix comment typo in DSI code (zuoqilin).
- Simplify CCS and UV plane alignment handling (Imre).
- PSR Fixes on TGL (Gwan-gyeong, Jose).
- Add intel_dp_hdcp.h and rename init (Jani).
- Move crtc and dpll declarations around (Jani).
- Fix pre-skl DP AUX precharge length (Ville).
- Remove stray newlines from random files (Ville).
- crtc->index and intel_crtc+drm_crtc pointer clean-up (Ville).
- Add frontbuffer tracking tracepoints (Ville).
- ADL-S PCI ID updates (Anand).
- Use unique backlight device names (Jani).
- A few clean-ups on i915/audio (Jani).
- Use intel_framebuffer instead of drm one on intel_fb functions (Imre).
- Add the missing MC CCS/XYUV8888 format support on display >= 12 (Imre).
- Nuke display error state (Ville).
- ADL-P initial enablement patches
starting to land (Clint, Imre, Jose, Umesh, Vandita, Mika).
- Display clean-up around VBT and the strap bits (Lucas).
- Try YCbCr420 color when RGB fails (Werner).
- More PSR fixes and improvements (Jose).
- Other generic display code clean-up (Jose, Ville).
- Use correct downstream caps for check Src-Ctl mode for PCON (Ankit).
- Disable HiZ Raw Stall Optimization on broken gen7 (Simon).
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmClYcoACgkQ+mJfZA7r
E8oXBwf/Rfb8o/4WZeoc3vxtFlWenA/9QJA2Xs4ui6U3vJScpaHFLq5Ki6aOSxIO
WudQvatS1Bw+QzzAjSZFZx+WhCwop4BLhFJJxVK2RD4REeSjJvPZ6oovgndMOGY4
RvyeXoIJoXoHPQ7uJXMZZGRthYTWR83Aw93hi3uTd4jU+JB8WtHgvvycKTVKIkVB
T6V3PSuTmXwhHNURfev8d/JyiZMphRDJLD3esamwn2XRYtPDZjfkavwYQVeUlbms
TstymTGZXjNvPnX9HkzoURdF4F394iNyx3lX1j5nyYm0QgyHJKJI8moy8Dfv4+AB
JlL5vE7cTKtnKC5OUPCh9NZRH4pNZw==
=uO7R
-----END PGP SIGNATURE-----
Merge tag 'drm-intel-next-2021-05-19-1' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
Core Changes:
- drm: Rename DP_PSR_SELECTIVE_UPDATE to better mach eDP spec (Jose).
Driver Changes:
- Display plane clock rates fixes and improvements (Ville).
- Uninint DMC FW loader state during shutdown (Imre).
- Convert snprintf to sysfs_emit (Xuezhi).
- Fix invalid access to ACPI _DSM objects (Takashi).
- A big refactor around how i915 addresses the graphics
and display IP versions. (Matt, Lucas).
- Backlight fix (Lyude).
- Display watermark and DBUF fixes (Ville).
- HDCP fix (Anshuman).
- Improve cases where display is not available (Jose).
- Defeature PSR2 for RKL and ALD-S (Jose).
- VLV DSI panel power fixes and improvements (Hans).
- display-12 workaround (Jose).
- Fix modesetting (Imre).
- Drop redundant address-of op before lttpr_common_caps array (Imre).
- Fix compiler checks (Jose, Jason).
- GLK display fixes (Ville).
- Fix error code returns (Dan).
- eDP novel: back again to slow and wide link training everywhere (Kai-Heng).
- Abstract DMC FW path (Rodrigo).
- Preparation and changes for upcoming
XeLPD display IP (Jose, Matt, Ville, Juha-Pekka, Animesh).
- Fix comment typo in DSI code (zuoqilin).
- Simplify CCS and UV plane alignment handling (Imre).
- PSR Fixes on TGL (Gwan-gyeong, Jose).
- Add intel_dp_hdcp.h and rename init (Jani).
- Move crtc and dpll declarations around (Jani).
- Fix pre-skl DP AUX precharge length (Ville).
- Remove stray newlines from random files (Ville).
- crtc->index and intel_crtc+drm_crtc pointer clean-up (Ville).
- Add frontbuffer tracking tracepoints (Ville).
- ADL-S PCI ID updates (Anand).
- Use unique backlight device names (Jani).
- A few clean-ups on i915/audio (Jani).
- Use intel_framebuffer instead of drm one on intel_fb functions (Imre).
- Add the missing MC CCS/XYUV8888 format support on display >= 12 (Imre).
- Nuke display error state (Ville).
- ADL-P initial enablement patches
starting to land (Clint, Imre, Jose, Umesh, Vandita, Mika).
- Display clean-up around VBT and the strap bits (Lucas).
- Try YCbCr420 color when RGB fails (Werner).
- More PSR fixes and improvements (Jose).
- Other generic display code clean-up (Jose, Ville).
- Use correct downstream caps for check Src-Ctl mode for PCON (Ankit).
- Disable HiZ Raw Stall Optimization on broken gen7 (Simon).
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YKVioeu0JkUAlR7y@intel.com
The put_user() and get_user() functions do checks on the address which is
passed to them. They check whether the address is actually a user-space
address and whether its fine to access it. They also call might_fault()
to indicate that they could fault and possibly sleep.
All of these checks are neither wanted nor needed in the #VC exception
handler, which can be invoked from almost any context and also for MMIO
instructions from kernel space on kernel memory. All the #VC handler
wants to know is whether a fault happened when the access was tried.
This is provided by __put_user()/__get_user(), which just do the access
no matter what. Also add comments explaining why __get_user() and
__put_user() are the best choice here and why it is safe to use them
in this context. Also explain why copy_to/from_user can't be used.
In addition, also revert commit
7024f60d65 ("x86/sev-es: Handle string port IO to kernel memory properly")
because using __get_user()/__put_user() fixes the same problem while
the above commit introduced several problems:
1) It uses access_ok() which is only allowed in task context.
2) It uses memcpy() which has no fault handling at all and is
thus unsafe to use here.
[ bp: Fix up commit ID of the reverted commit above. ]
Fixes: f980f9c31a ("x86/sev-es: Compile early handler code into kernel image")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # v5.10+
Link: https://lkml.kernel.org/r/20210519135251.30093-4-joro@8bytes.org
When emulating guest instructions for MMIO or IOIO accesses, the #VC
handler might get a page-fault and will not be able to complete. Forward
the page-fault in this case to the correct handler instead of killing
the machine.
Fixes: 0786138c78 ("x86/sev-es: Add a Runtime #VC Exception Handler")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # v5.10+
Link: https://lkml.kernel.org/r/20210519135251.30093-3-joro@8bytes.org
sev_es_get_ghcb() is called from several places but only one of them
checks the return value. The reaction to returning NULL is always the
same: calling panic() and kill the machine.
Instead of adding checks to all call sites, move the panic() into the
function itself so that it will no longer return NULL.
Fixes: 0786138c78 ("x86/sev-es: Add a Runtime #VC Exception Handler")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # v5.10+
Link: https://lkml.kernel.org/r/20210519135251.30093-2-joro@8bytes.org
It turns out that linux uses si_trapno very sparingly, and as such it
can be considered extra information for a very narrow selection of
signals, rather than information that is present with every fault
reported in siginfo.
As such move si_trapno inside the union inside of _si_fault. This
results in no change in placement, and makes it eaiser
to extend _si_fault in the future as this reduces the number of
special cases. In particular with si_trapno included in the union it
is no longer a concern that the union must be pointer aligned on most
architectures because the union follows immediately after si_addr
which is a pointer.
This change results in a difference in siginfo field placement on
sparc and alpha for the fields si_addr_lsb, si_lower, si_upper,
si_pkey, and si_perf. These architectures do not implement the
signals that would use si_addr_lsb, si_lower, si_upper, si_pkey, and
si_perf. Further these architecture have not yet implemented the
userspace that would use si_perf.
The point of this change is in fact to correct these placement issues
before sparc or alpha grow userspace that cares. This change was
discussed[1] and the agreement is that this change is currently safe.
[1]: https://lkml.kernel.org/r/CAK8P3a0+uKYwL1NhY6Hvtieghba2hKYGD6hcKx5n8=4Gtt+pHA@mail.gmail.com
Acked-by: Marco Elver <elver@google.com>
v1: https://lkml.kernel.org/r/m1tunns7yf.fsf_-_@fess.ebiederm.org
v2: https://lkml.kernel.org/r/20210505141101.11519-5-ebiederm@xmission.com
Link: https://lkml.kernel.org/r/20210517195748.8880-1-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Since the VMGEXIT instruction can be issued from userspace, invalidate
the GHCB after performing VMGEXIT processing in the kernel.
Invalidation is only required after userspace is available, so call
vc_ghcb_invalidate() from sev_es_put_ghcb(). Update vc_ghcb_invalidate()
to additionally clear the GHCB exit code so that it is always presented
as 0 when VMGEXIT has been issued by anything else besides the kernel.
Fixes: 0786138c78 ("x86/sev-es: Add a Runtime #VC Exception Handler")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/5a8130462e4f0057ee1184509cd056eedd78742b.1621273353.git.thomas.lendacky@amd.com
- Reorganize SEV code to streamline and simplify future development
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmCg1XQACgkQEsHwGGHe
VUpRKA//dwzDD1QU16JucfhgFlv/9OTm48ukSwAb9lZjDEy4H1CtVL3xEHFd7L3G
LJp0LTW+OQf0/0aGlQp/cP6sBF6G9Bf4mydx70Id4SyCQt8eZDodB+ZOOWbeteWq
p92fJPbX8CzAglutbE+3v/MD8CCAllTiLZnJZPVj4Kux2/wF6EryDgF1+rb5q8jp
ObTT9817mHVwWVUYzbgceZtd43IocOlKZRmF1qivwScMGylQTe1wfMjunpD5pVt8
Zg4UDNknNfYduqpaG546E6e1zerGNaJK7SHnsuzHRUVU5icNqtgBk061CehP9Ksq
DvYXLUl4xF16j6xJAqIZPNrBkJGdQf4q1g5x2FiBm7rSQU5owzqh5rkVk4EBFFzn
UtzeXpqbStbsZHXycyxBNdq2HXxkFPf2NXZ+bkripPg+DifOGots1uwvAft+6iAE
GudK6qxAvr8phR1cRyy6BahGtgOStXbZYEz0ZdU6t7qFfZMz+DomD5Jimj0kAe6B
s6ras5xm8q3/Py87N/KNjKtSEpgsHv/7F+idde7ODtHhpRL5HCBqhkZOSRkMMZqI
ptX1oSTvBXwRKyi5x9YhkKHUFqfFSUTfJhiRFCWK+IEAv3Y7SipJtfkqxRbI6fEV
FfCeueKDDdViBtseaRceVLJ8Tlr6Qjy27fkPPTqJpthqPpCdoZ0=
=ENfF
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v5.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
"The three SEV commits are not really urgent material. But we figured
since getting them in now will avoid a huge amount of conflicts
between future SEV changes touching tip, the kvm and probably other
trees, sending them to you now would be best.
The idea is that the tip, kvm etc branches for 5.14 will all base
ontop of -rc2 and thus everything will be peachy. What is more, those
changes are purely mechanical and defines movement so they should be
fine to go now (famous last words).
Summary:
- Enable -Wundef for the compressed kernel build stage
- Reorganize SEV code to streamline and simplify future development"
* tag 'x86_urgent_for_v5.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/compressed: Enable -Wundef
x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFG
x86/sev: Move GHCB MSR protocol and NAE definitions in a common header
x86/sev-es: Rename sev-es.{ch} to sev.{ch}
Some AMD Ryzen generations has different calculation method on maximum
performance. 255 is not for all ASICs, some specific generations should use 166
as the maximum performance. Otherwise, it will report incorrect frequency value
like below:
~ → lscpu | grep MHz
CPU MHz: 3400.000
CPU max MHz: 7228.3198
CPU min MHz: 2200.0000
[ mingo: Tidied up whitespace use. ]
[ Alexander Monakov <amonakov@ispras.ru>: fix 225 -> 255 typo. ]
Fixes: 41ea667227 ("x86, sched: Calculate frequency invariance for AMD systems")
Fixes: 3c55e94c0a ("cpufreq: ACPI: Extend frequency tables to cover boost frequencies")
Reported-by: Jason Bagavatsingham <jason.bagavatsingham@gmail.com>
Fixed-by: Alexander Monakov <amonakov@ispras.ru>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jason Bagavatsingham <jason.bagavatsingham@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210425073451.2557394-1-ray.huang@amd.com
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211791
Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Fix virtualization of RDPID
* Virtualization of DR6_BUS_LOCK, which on bare metal is new in
the 5.13 merge window
* More nested virtualization migration fixes (nSVM and eVMCS)
* Fix for KVM guest hibernation
* Fix for warning in SEV-ES SRCU usage
* Block KVM from loading on AMD machines with 5-level page tables,
due to the APM not mentioning how host CR4.LA57 exactly impacts
the guest.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCZWwgUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOE9wgAk7Io8cuvnhC9ogVqzZWrPweWqFg8
fJcPMB584JRnMqYHBVYbkTPGe8SsCHKR2MKsNdc4cEP111cyr3suWsxOdmjJn58i
7ahy6PcKx7wWeWwEt7O599l6CeoX5XB9ExvA6eiXAv7iZeOJHFa+Ny2GlWgauy6Y
DELryEomx1r4IUkZaSR+2fYjzvOWTXQixwU/jwx8NcTJz0DrzknzLE7XOciPBfn0
t0Q2rCXdL2nF1uPksZbntx8Qoa6t6GDVIyrH/ZCPQYJtAX6cjxNAh3zwCe+hMnOd
fW8ntBH1nZRiNnberA4IICAzqnUokgPWdKBrZT2ntWHBK+aqxXHznrlPJA==
=e+gD
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
- Lots of bug fixes.
- Fix virtualization of RDPID
- Virtualization of DR6_BUS_LOCK, which on bare metal is new to this
release
- More nested virtualization migration fixes (nSVM and eVMCS)
- Fix for KVM guest hibernation
- Fix for warning in SEV-ES SRCU usage
- Block KVM from loading on AMD machines with 5-level page tables, due
to the APM not mentioning how host CR4.LA57 exactly impacts the
guest.
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (48 commits)
KVM: SVM: Move GHCB unmapping to fix RCU warning
KVM: SVM: Invert user pointer casting in SEV {en,de}crypt helpers
kvm: Cap halt polling at kvm->max_halt_poll_ns
tools/kvm_stat: Fix documentation typo
KVM: x86: Prevent deadlock against tk_core.seq
KVM: x86: Cancel pvclock_gtod_work on module removal
KVM: x86: Prevent KVM SVM from loading on kernels with 5-level paging
KVM: X86: Expose bus lock debug exception to guest
KVM: X86: Add support for the emulation of DR6_BUS_LOCK bit
KVM: PPC: Book3S HV: Fix conversion to gfn-based MMU notifier callbacks
KVM: x86: Hide RDTSCP and RDPID if MSR_TSC_AUX probing failed
KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU model
KVM: x86: Move uret MSR slot management to common x86
KVM: x86: Export the number of uret MSRs to vendor modules
KVM: VMX: Disable loading of TSX_CTRL MSR the more conventional way
KVM: VMX: Use common x86's uret MSR list as the one true list
KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting list
KVM: VMX: Configure list of user return MSRs at module init
KVM: x86: Add support for RDPID without RDTSCP
KVM: SVM: Probe and load MSR_TSC_AUX regardless of RDTSCP support in host
...
The SYSCFG MSR continued being updated beyond the K8 family; drop the K8
name from it.
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20210427111636.1207-4-brijesh.singh@amd.com
The guest and the hypervisor contain separate macros to get and set
the GHCB MSR protocol and NAE event fields. Consolidate the GHCB
protocol definitions and helper macros in one place.
Leave the supported protocol version define in separate files to keep
the guest and hypervisor flexibility to support different GHCB version
in the same release.
There is no functional change intended.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20210427111636.1207-3-brijesh.singh@amd.com
SEV-SNP builds upon the SEV-ES functionality while adding new hardware
protection. Version 2 of the GHCB specification adds new NAE events that
are SEV-SNP specific. Rename the sev-es.{ch} to sev.{ch} so that all
SEV* functionality can be consolidated in one place.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20210427111636.1207-2-brijesh.singh@amd.com
can also be accounted to it. Along with a consolidation to the guest-specific context
tracking helpers.
- Provide for the host NMI handler running after a VMX VMEXIT to be able to run
on the kernel stack correctly.
- Initialize MSR_TSC_AUX when RDPID is supported and not RDTSCP (virt relevant -
real hw supports both)
- A code generation improvement to TASK_SIZE_MAX through the use of alternatives
- The usual misc. and related cleanups and improvements
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmCXqpcACgkQEsHwGGHe
VUp6Kg/+O0y2PvL6dhfYnUvTmQD7be0DOfWeFSLfBBA0c6yaHL1INbFHWDDptNuJ
ZV50V+vyqXWV9q0AWF94fYHBs2kB0S79/En0Pwt1a3kb/xlfVTh8VAMPr36utnTY
VWvOwHgixfPbY+8g1AoqIm/IeFuYWubXQ9CyBrLx/zkJjszfot1eooGRYKDPc2qi
dNEqBO4IKzw24OdO+oIzW1/owLfnBF+GnXrwCb8fFC2U7luyFAJmp9c1bYnyNuCm
BdQySOTfm8nnE2RpN4wfc8Akvu/ETKHOPSQOqHIb5glzv6lVfRKXu3CgpYbzoCNl
Iohb6z8xmgAG29g2VpBjNvCWyyO79y4Ckf94ibWl+qt01EdeYefcP0euK+MGi85A
cN/MrMt7QjHHEO7ok5J9rBSeKobOtng6A4MHenSOLvjifOYoupRFijaLVxRluATW
3NsC2IhL10u1c69Zsq6JJFJKoAytInKSigEN9VFZp+4NdE/FzDxfebC/6rSKznGi
XoaEjOOX0JQ5TXM1gDoyzowAvt2vgndvldpwJTnPY5NP3X9fdiHhoOF9cU2yvl+x
ZjgD1VxRWLGZKBojNfAa+0oDMZ/cTwPoeZ5Rr5p7SMr/Xw2fsUQ68KVjhOR7ZbaU
8zEV//JtetwGSN86NhQ/V32hqiF2fni62yBZjYGZ8XM/AnDqaMQ=
=O3BS
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v5.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
"A bunch of things accumulated for x86 in the last two weeks:
- Fix guest vtime accounting so that ticks happening while the guest
is running can also be accounted to it. Along with a consolidation
to the guest-specific context tracking helpers.
- Provide for the host NMI handler running after a VMX VMEXIT to be
able to run on the kernel stack correctly.
- Initialize MSR_TSC_AUX when RDPID is supported and not RDTSCP (virt
relevant - real hw supports both)
- A code generation improvement to TASK_SIZE_MAX through the use of
alternatives
- The usual misc and related cleanups and improvements"
* tag 'x86_urgent_for_v5.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
KVM: x86: Consolidate guest enter/exit logic to common helpers
context_tracking: KVM: Move guest enter/exit wrappers to KVM's domain
context_tracking: Consolidate guest enter/exit wrappers
sched/vtime: Move guest enter/exit vtime accounting to vtime.h
sched/vtime: Move vtime accounting external declarations above inlines
KVM: x86: Defer vtime accounting 'til after IRQ handling
context_tracking: Move guest exit vtime accounting to separate helpers
context_tracking: Move guest exit context tracking to separate helpers
KVM/VMX: Invoke NMI non-IST entry instead of IST entry
x86/cpu: Remove write_tsc() and write_rdtscp_aux() wrappers
x86/cpu: Initialize MSR_TSC_AUX if RDTSCP *or* RDPID is supported
x86/resctrl: Fix init const confusion
x86: Delete UD0, UD1 traces
x86/smpboot: Remove duplicate includes
x86/cpu: Use alternative to generate the TASK_SIZE_MAX constant
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmCVVmMQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpv7yEAC/WV1alcH9XdEqLrc2aDwlaScMmSlrMQhY
ihtDCR9BsX11E3QcUB7D+VYjBo68uKR+ksa1/GN2Xp+vvqmdjQvZindgto/5b6u1
ko0Dradl2zulCAc7QIdjb2tbmL+Q+JOX5wxv14/+2XabEcce3OegWIvIgX+56NFW
ZHg80SQzXUhEtQcAUVCoPeBN+H+xzadgz38VlOI08gOG7/M6tS965GH3tZqTjh2K
P7dLjUn0WcxZ3euAYAsQzNN2O2ObJfpCsQtsG2eSf8DGpanPe4gQjAud1BstDtN0
CJ0+b6DHgzQYOAgPFjm7l0jjs+VnIYIMnoBBxm5EkIoktsj0hHdqTnEugoz4wTnS
T8WgojaU6jYNx+Jj6vciCLk0lb5c3O3nxmw3w84/rtTwtaEChCAbWdAkl4cleNaw
3/Z2bksCVrQWDVskmu4FP7+kGYpjpV+ZiA2+6OGwILTCN+W7vi079NByQAzdLaRb
K/4lEGM7VYEXtq/I7C6VzjtY7gq46TJmpFW+OdQnPIguavp+7vlUl2pLV3oTeGBc
E6c+xltgIN+sbbDc/57EJEvhHQod4A6HYOGwBMyjHrhr/sdQ4xvUaJPNmG9HfqRK
SM3TOlwpHRWFTgbO+6qoJQSMvACQyE/SDqiPi08q75zFVTNCcYM7uYV3fJMsQ9sj
vA+5HAaRKQ==
=YwTw
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.13-2021-05-07' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Mostly fixes for merge window merged code. In detail:
- Error case memory leak fixes (Colin, Zqiang)
- Add the tools/io_uring/ to the list of maintained files (Lukas)
- Set of fixes for the modified buffer registration API (Pavel)
- Sanitize io thread setup on x86 (Stefan)
- Ensure we truncate transfer count for registered buffers (Thadeu)"
* tag 'io_uring-5.13-2021-05-07' of git://git.kernel.dk/linux-block:
x86/process: setup io_threads more like normal user space threads
MAINTAINERS: add io_uring tool to IO_URING
io_uring: truncate lengths larger than MAX_RW_COUNT on provide buffers
io_uring: Fix memory leak in io_sqe_buffers_register()
io_uring: Fix premature return from loop and memory leak
io_uring: fix unchecked error in switch_start()
io_uring: allow empty slots for reg buffers
io_uring: add more build check for uapi
io_uring: dont overlap internal and user req flags
io_uring: fix drain with rsrc CQEs
Simplify the code by making PV features shutdown happen in one place.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210414123544.1060604-6-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Crash shutdown handler only disables kvmclock and steal time, other PV
features remain active so we risk corrupting memory or getting some
side-effects in kdump kernel. Move crash handler to kvm.c and unify
with CPU offline.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210414123544.1060604-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currenly, we disable kvmclock from machine_shutdown() hook and this
only happens for boot CPU. We need to disable it for all CPUs to
guard against memory corruption e.g. on restore from hibernate.
Note, writing '0' to kvmclock MSR doesn't clear memory location, it
just prevents hypervisor from updating the location so for the short
while after write and while CPU is still alive, the clock remains usable
and correct so we don't need to switch to some other clocksource.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210414123544.1060604-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Various PV features (Async PF, PV EOI, steal time) work through memory
shared with hypervisor and when we restore from hibernation we must
properly teardown all these features to make sure hypervisor doesn't
write to stale locations after we jump to the previously hibernated kernel
(which can try to place anything there). For secondary CPUs the job is
already done by kvm_cpu_down_prepare(), register syscore ops to do
the same for boot CPU.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210414123544.1060604-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add ADL-P to the device_info table and support MACROS.
Bspec: 49185, 55372, 55373
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Clinton Taylor <Clinton.A.Taylor@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210506161930.309688-4-imre.deak@intel.com
As io_threads are fully set up USER threads it's clearer to separate the
code path from the KTHREAD logic.
The only remaining difference to user space threads is that io_threads
never return to user space again. Instead they loop within the given
worker function.
The fact that they never return to user space means they don't have an
user space thread stack. In order to indicate that to tools like gdb we
reset the stack and instruction pointers to 0.
This allows gdb attach to user space processes using io-uring, which like
means that they have io_threads, without printing worrying message like
this:
warning: Selected architecture i386:x86-64 is not compatible with reported target architecture i386
warning: Architecture rejected target-supplied description
The output will be something like this:
(gdb) info threads
Id Target Id Frame
* 1 LWP 4863 "io_uring-cp-for" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
2 LWP 4864 "iou-mgr-4863" 0x0000000000000000 in ?? ()
3 LWP 4865 "iou-wrk-4863" 0x0000000000000000 in ?? ()
(gdb) thread 3
[Switching to thread 3 (LWP 4865)]
#0 0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0
Fixes: 4727dc20e0 ("arch: setup PF_IO_WORKER threads like PF_KTHREAD")
Link: https://lore.kernel.org/io-uring/044d0bad-6888-a211-e1d3-159a4aeed52d@polymtl.ca/T/#m1bbf5727e3d4e839603f6ec7ed79c7eebfba6267
Signed-off-by: Stefan Metzmacher <metze@samba.org>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Andy Lutomirski <luto@kernel.org>
cc: linux-kernel@vger.kernel.org
cc: io-uring@vger.kernel.org
cc: x86@kernel.org
Link: https://lore.kernel.org/r/20210505110310.237537-1-metze@samba.org
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In VMX, the host NMI handler needs to be invoked after NMI VM-Exit.
Before commit 1a5488ef0d ("KVM: VMX: Invoke NMI handler via indirect
call instead of INTn"), this was done by INTn ("int $2"). But INTn
microcode is relatively expensive, so the commit reworked NMI VM-Exit
handling to invoke the kernel handler by function call.
But this missed a detail. The NMI entry point for direct invocation is
fetched from the IDT table and called on the kernel stack. But on 64-bit
the NMI entry installed in the IDT expects to be invoked on the IST stack.
It relies on the "NMI executing" variable on the IST stack to work
correctly, which is at a fixed position in the IST stack. When the entry
point is unexpectedly called on the kernel stack, the RSP-addressed "NMI
executing" variable is obviously also on the kernel stack and is
"uninitialized" and can cause the NMI entry code to run in the wrong way.
Provide a non-ist entry point for VMX which shares the C-function with
the regular NMI entry and invoke the new asm entry point instead.
On 32-bit this just maps to the regular NMI entry point as 32-bit has no
ISTs and is not affected.
[ tglx: Made it independent for backporting, massaged changelog ]
Fixes: 1a5488ef0d ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Lai Jiangshan <laijs@linux.alibaba.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87r1imi8i1.ffs@nanos.tec.linutronix.de
Drop write_tsc() and write_rdtscp_aux(); the former has no users, and the
latter has only a single user and is slightly misleading since the only
in-kernel consumer of MSR_TSC_AUX is RDPID, not RDTSCP.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210504225632.1532621-3-seanjc@google.com
Initialize MSR_TSC_AUX with CPU node information if RDTSCP or RDPID is
supported. This fixes a bug where vdso_read_cpunode() will read garbage
via RDPID if RDPID is supported but RDTSCP is not. While no known CPU
supports RDPID but not RDTSCP, both Intel's SDM and AMD's APM allow for
RDPID to exist without RDTSCP, e.g. it's technically a legal CPU model
for a virtual machine.
Note, technically MSR_TSC_AUX could be initialized if and only if RDPID
is supported since RDTSCP is currently not used to retrieve the CPU node.
But, the cost of the superfluous WRMSR is negigible, whereas leaving
MSR_TSC_AUX uninitialized is just asking for future breakage if someone
decides to utilize RDTSCP.
Fixes: a582c540ac ("x86/vdso: Use RDPID in preference to LSL when available")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210504225632.1532621-2-seanjc@google.com
'pr_fmt' already has 'kvm-guest: ' so 'KVM' prefix is redundant.
"Unregister pv shared memory" is very ambiguous, it's hard to
say which particular PV feature it relates to.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210414123544.1060604-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Stage-2 isolation for the host kernel when running in protected mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
x86:
- Optimizations and cleanup of nested SVM code
- AMD: Support for virtual SPEC_CTRL
- Optimizations of the new MMU code: fast invalidation,
zap under read lock, enable/disably dirty page logging under
read lock
- /dev/kvm API for AMD SEV live migration (guest API coming soon)
- support SEV virtual machines sharing the same encryption context
- support SGX in virtual machines
- add a few more statistics
- improved directed yield heuristics
- Lots and lots of cleanups
Generic:
- Rework of MMU notifier interface, simplifying and optimizing
the architecture-specific code
- Some selftests improvements
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCJ13kUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroM1HAgAqzPxEtiTPTFeFJV5cnPPJ3dFoFDK
y/juZJUQ1AOtvuWzzwuf175ewkv9vfmtG6rVohpNSkUlJYeoc6tw7n8BTTzCVC1b
c/4Dnrjeycr6cskYlzaPyV6MSgjSv5gfyj1LA5UEM16LDyekmaynosVWY5wJhju+
Bnyid8l8Utgz+TLLYogfQJQECCrsU0Wm//n+8TWQgLf1uuiwshU5JJe7b43diJrY
+2DX+8p9yWXCTz62sCeDWNahUv8AbXpMeJ8uqZPYcN1P0gSEUGu8xKmLOFf9kR7b
M4U1Gyz8QQbjd2lqnwiWIkvRLX6gyGVbq2zH0QbhUe5gg3qGUX7JjrhdDQ==
=AXUi
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"This is a large update by KVM standards, including AMD PSP (Platform
Security Processor, aka "AMD Secure Technology") and ARM CoreSight
(debug and trace) changes.
ARM:
- CoreSight: Add support for ETE and TRBE
- Stage-2 isolation for the host kernel when running in protected
mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
x86:
- AMD PSP driver changes
- Optimizations and cleanup of nested SVM code
- AMD: Support for virtual SPEC_CTRL
- Optimizations of the new MMU code: fast invalidation, zap under
read lock, enable/disably dirty page logging under read lock
- /dev/kvm API for AMD SEV live migration (guest API coming soon)
- support SEV virtual machines sharing the same encryption context
- support SGX in virtual machines
- add a few more statistics
- improved directed yield heuristics
- Lots and lots of cleanups
Generic:
- Rework of MMU notifier interface, simplifying and optimizing the
architecture-specific code
- a handful of "Get rid of oprofile leftovers" patches
- Some selftests improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits)
KVM: selftests: Speed up set_memory_region_test
selftests: kvm: Fix the check of return value
KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt()
KVM: SVM: Skip SEV cache flush if no ASIDs have been used
KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids()
KVM: SVM: Drop redundant svm_sev_enabled() helper
KVM: SVM: Move SEV VMCB tracking allocation to sev.c
KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup()
KVM: SVM: Unconditionally invoke sev_hardware_teardown()
KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported)
KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y
KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables
KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features
KVM: SVM: Move SEV module params/variables to sev.c
KVM: SVM: Disable SEV/SEV-ES if NPT is disabled
KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails
KVM: SVM: Zero out the VMCB array used to track SEV ASID association
x86/sev: Drop redundant and potentially misleading 'sev_enabled'
KVM: x86: Move reverse CPUID helpers to separate header file
KVM: x86: Rename GPR accessors to make mode-aware variants the defaults
...
This reverts commit cd544fd1dc.
As discussed in [1] this commit was a no-op because the mapping type was
checked in vma_to_resize before move_vma is ever called. This meant that
vm_ops->mremap() would never be called on such mappings. Furthermore,
we've since expanded support of MREMAP_DONTUNMAP to non-anonymous
mappings, and these special mappings are still protected by the existing
check of !VM_DONTEXPAND and !VM_PFNMAP which will result in a -EINVAL.
1. https://lkml.org/lkml/2020/12/28/2340
Link: https://lkml.kernel.org/r/20210323182520.2712101-2-bgeffon@google.com
Signed-off-by: Brian Geffon <bgeffon@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Alejandro Colomar <alx.manpages@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: "Michael S . Tsirkin" <mst@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Implement concurrent TLB flushes, which overlaps the local TLB flush with the
remote TLB flush. In testing this improved sysbench performance measurably by
a couple of percentage points, especially if TLB-heavy security mitigations
are active.
- Further micro-optimizations to improve the performance of TLB flushes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmCKbNcRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hjYBAAsyNUa/gOu0g6/Cx8R86w9HtHHmm5vso/
6nJjWj2fd2qJ9JShlddxvXEMeXtPTYabVWQkiiriFMuofk6JeKnlHm1Jzl6keABX
OQFwjIFeNASPRcdXvuuYPOVWAJJdr2oL9QUr6OOK1ccQJTz/Cd0zA+VQ5YqcsCon
yaWbkxELwKXpgql+qt66eAZ6Q2Y1TKXyrTW7ZgxQi0yeeWqMaEOub0/oyS7Ax1Rg
qEJMwm1prb76NPzeqR/G3e4KTrDZfQ/B/KnSsz36GTJpl4eye6XqWDUgm1nAGNIc
5dbc4Vx7JtZsUOuC0AmzWb3hsDyzVcN/lQvijdZ2RsYR3gvuYGaBhKqExqV0XH6P
oqaWOKWCz+LqWbsgJmxCpqkt1LZl5+VUOcfJ97WkIS7DyIPtSHTzQXbBMZqKLeat
mn5UcKYB2Gi7wsUPv6VC2ChKbDqN0VT8G86XbYylGo4BE46KoZKPUNY/QWKLUPd6
0UKcVeNM2HFyf1C73p/tO/z7hzu3qLuMMnsphP6/c2pKLpdgawEXgbnVKNId1B/c
NrzyhTvVaMt+Um28bBRhHONIlzPJwWcnZbdY7NqMnu+LBKQ68cL/h4FOIV/RDLNb
GJLgfAr8fIw/zIpqYuFHiiMNo9wWqVtZko1MvXhGceXUL69QuzTra2XR/6aDxkPf
6gQVesetTvo=
=3Cyp
-----END PGP SIGNATURE-----
Merge tag 'x86-mm-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 tlb updates from Ingo Molnar:
"The x86 MM changes in this cycle were:
- Implement concurrent TLB flushes, which overlaps the local TLB
flush with the remote TLB flush.
In testing this improved sysbench performance measurably by a
couple of percentage points, especially if TLB-heavy security
mitigations are active.
- Further micro-optimizations to improve the performance of TLB
flushes"
* tag 'x86-mm-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp: Micro-optimize smp_call_function_many_cond()
smp: Inline on_each_cpu_cond() and on_each_cpu()
x86/mm/tlb: Remove unnecessary uses of the inline keyword
cpumask: Mark functions as pure
x86/mm/tlb: Do not make is_lazy dirty for no reason
x86/mm/tlb: Privatize cpu_tlbstate
x86/mm/tlb: Flush remote and local TLBs concurrently
x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()
x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote()
smp: Run functions concurrently in smp_call_function_many_cond()