1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

17622 commits

Author SHA1 Message Date
Joerg Roedel
d187f21733 x86/sev: Make sure IRQs are disabled while GHCB is active
The #VC handler only cares about IRQs being disabled while the GHCB is
active, as it must not be interrupted by something which could cause
another #VC while it holds the GHCB (NMI is the exception for which the
backup GHCB exits).

Make sure nothing interrupts the code path while the GHCB is active
by making sure that callers of __sev_{get,put}_ghcb() have disabled
interrupts upfront.

 [ bp: Massage commit message. ]

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210618115409.22735-2-joro@8bytes.org
2021-06-21 15:51:21 +02:00
Linus Torvalds
8363e795eb A first set of urgent fixes to the FPU/XSTATE handling mess^W code.
(There's a lot more in the pipe):
 
 - Prevent corruption of the XSTATE buffer in signal handling by
   validating what is being copied from userspace first.
 
 - Invalidate other task's preserved FPU registers on XRSTOR failure
   (#PF) because latter can still modify some of them.
 
 - Restore the proper PKRU value in case userspace modified it
 
 - Reset FPU state when signal restoring fails
 
 Other:
 
 - Map EFI boot services data memory as encrypted in a SEV guest so that
   the guest can access it and actually boot properly
 
 - Two SGX correctness fixes: proper resources freeing and a NUMA fix
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmDO5vQACgkQEsHwGGHe
 VUrUjw//fRU8BPZ3/SWNQO188QhHdFpm3jqtjRJsZD1FfnnLdxIg2SCP4RjFxv+Y
 eFyN0nYLekG8a3CMV081H9Rhr5tt3bflk0oTcGAar7m2qQiCiqaAH0wptIlQonSu
 nQCSs+PeaaK4nRCtW+TUJnwG0ZU/y7fEXa3pxJ6hSMnxZjz3lj70zKhpA1nQtqRZ
 OOStvBNtaWcDdTTE4r8XuFIxuMUUEuwHlQQmkAVHQYUf6vxGYfnDYEg83Wddvq1E
 1leSRNFlLcCAbPUV/fax3KGvaekeJ1U411uWqXlain6m105+mk+irmrLxtur/lJ5
 cWTVb5CbIHFZnJvC5jzNPv/03GbIIQaVm4jPI2qB1AZbjcVlAPKj1Ne+U1fzvmDT
 wNUob/rnIXiGptvtUMNYGURxBTj65Nnom3iAJV+AdMOThDwYMvsJJjFkMnC5wO2n
 ZAexumWPnUzWoxSMTraT7a6b/kilFUrcPljxSrFd9yVeU8E6a1OSW35oWoQ3itrc
 xx/ne8RodLmCPC9DjecFcQR+qUuXsF+XCCj07QpfKNTAObr17e9nsKJneR6MX79C
 Lpc7Ka/CiTGYcebWX7tqtjwGPfa6iqekswxYRRp7j54bQ4sHmKyordZy0Q8+c079
 gmMlPdNbqQg3YwHyXW2yeJETDS1HBp61RRojAP15BsL73wyYQNE=
 =AuXr
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.13_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:
 "A first set of urgent fixes to the FPU/XSTATE handling mess^W code.
  (There's a lot more in the pipe):

   - Prevent corruption of the XSTATE buffer in signal handling by
     validating what is being copied from userspace first.

   - Invalidate other task's preserved FPU registers on XRSTOR failure
     (#PF) because latter can still modify some of them.

   - Restore the proper PKRU value in case userspace modified it

   - Reset FPU state when signal restoring fails

  Other:

   - Map EFI boot services data memory as encrypted in a SEV guest so
     that the guest can access it and actually boot properly

   - Two SGX correctness fixes: proper resources freeing and a NUMA fix"

* tag 'x86_urgent_for_v5.13_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Avoid truncating memblocks for SGX memory
  x86/sgx: Add missing xa_destroy() when virtual EPC is destroyed
  x86/fpu: Reset state for all signal restore failures
  x86/pkru: Write hardware init value to PKRU when xstate is init
  x86/process: Check PF_KTHREAD and not current->mm for kernel threads
  x86/fpu: Invalidate FPU state after a failed XRSTOR from a user buffer
  x86/fpu: Prevent state corruption in __fpu__restore_sig()
  x86/ioremap: Map EFI-reserved memory as encrypted for SEV
2021-06-20 09:09:58 -07:00
Peter Zijlstra
b03fbd4ff2 sched: Introduce task_is_running()
Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
task_is_running(p).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210611082838.222401495@infradead.org
2021-06-18 11:43:07 +02:00
Ingo Molnar
b2c0931a07 Merge branch 'sched/urgent' into sched/core, to resolve conflicts
This commit in sched/urgent moved the cfs_rq_is_decayed() function:

  a7b359fc6a: ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")

and this fresh commit in sched/core modified it in the old location:

  9e077b52d8: ("sched/pelt: Check that *_avg are null when *_sum are")

Merge the two variants.

Conflicts:
	kernel/sched/fair.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-06-18 11:31:25 +02:00
Vineeth Pillai
a6c776a952 hyperv: Detect Nested virtualization support for SVM
Previously, to detect nested virtualization enlightenment support,
we were using HV_X64_ENLIGHTENED_VMCS_RECOMMENDED feature bit of
HYPERV_CPUID_ENLIGHTMENT_INFO.EAX CPUID as docuemented in TLFS:
 "Bit 14: Recommend a nested hypervisor using the enlightened VMCS
  interface. Also indicates that additional nested enlightenments
  may be available (see leaf 0x4000000A)".

Enlightened VMCS, however, is an Intel only feature so the above
detection method doesn't work for AMD. So, use the
HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS.EAX CPUID information ("The
maximum input value for hypervisor CPUID information.") and this
works for both AMD and Intel.

Signed-off-by: Vineeth Pillai <viremana@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Message-Id: <43b25ff21cd2d9a51582033c9bdd895afefac056.1622730232.git.viremana@linux.microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17 13:09:36 -04:00
Kai Huang
4692bc775d x86/sgx: Add missing xa_destroy() when virtual EPC is destroyed
xa_destroy() needs to be called to destroy a virtual EPC's page array
before calling kfree() to free the virtual EPC. Currently it is not
called so add the missing xa_destroy().

Fixes: 540745ddbc ("x86/sgx: Introduce virtual EPC for use by KVM guests")
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Tested-by: Yang Zhong <yang.zhong@intel.com>
Link: https://lkml.kernel.org/r/20210615101639.291929-1-kai.huang@intel.com
2021-06-15 18:03:45 +02:00
Pawan Gupta
293649307e x86/tsx: Clear CPUID bits when TSX always force aborts
As a result of TSX deprecation, some processors always abort TSX
transactions by default after a microcode update.

When TSX feature cannot be used it is better to hide it. Clear CPUID.RTM
and CPUID.HLE bits when TSX transactions always abort.

 [ bp: Massage commit message and comments. ]

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Link: https://lkml.kernel.org/r/5209b3d72ffe5bd3cafdcc803f5b883f785329c3.1623704845.git-series.pawan.kumar.gupta@linux.intel.com
2021-06-15 17:46:48 +02:00
Joerg Roedel
07570cef5e x86/sev: Propagate #GP if getting linear instruction address failed
When an instruction is fetched from user-space, segmentation needs to
be taken into account. This means that getting the linear address of an
instruction can fail. Hardware would raise a #GP exception in that case,
but the #VC exception handler would emulate it as a page-fault.

The insn_fetch_from_user*() functions now provide the relevant
information in case of a failure. Use that and propagate a #GP when the
linear address of an instruction to fetch could not be calculated.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210614135327.9921-7-joro@8bytes.org
2021-06-15 11:55:26 +02:00
Joerg Roedel
4aaa7eacd7 x86/insn: Extend error reporting from insn_fetch_from_user[_inatomic]()
The error reporting from the insn_fetch_from_user*() functions is not
very verbose. Extend it to include information on whether the linear
RIP could not be calculated or whether the memory access faulted.

This will be used in the SEV-ES code to propagate the correct
exception depending on what went wrong during instruction fetch.

 [ bp: Massage comments. ]

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210614135327.9921-6-joro@8bytes.org
2021-06-15 11:39:30 +02:00
Joerg Roedel
4aca2d99fd x86/sev: Fix error message in runtime #VC handler
The runtime #VC handler is not "early" anymore. Fix the copy&paste error
and remove that word from the error message.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210614135327.9921-2-joro@8bytes.org
2021-06-15 11:24:07 +02:00
Linus Torvalds
191aaf6cc4 Misc fixes:
- Fix the NMI watchdog on ancient Intel CPUs
 
  - Remove a misguided, NMI-unsafe KASAN callback
    from the NMI-safe irq_work path used by perf.
 
  - Fix uncore events on Ice Lake servers.
 
  - Someone booted maxcpus=1 on an SNB-EP, and the
    uncore driver emitted warnings and was probably
    buggy. Fix it.
 
  - KCSAN found a genuine data race in the core perf
    code. Somewhat ironically the bug was introduced
    through a recent race fix. :-/ In our defense, the
    new race window was much more narrow. Fix it.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmDErJkRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gjNxAAhWPl+zsVr+bMZGQVnjPf7swXSaqsphtU
 LrP0hrs4nH0JiB7lZJVjPhCMQKXb+gvP0CTmxkOXmNORDKDK3slIS/zp9uyH1F+d
 nXhmWi7c1bHU0vortnv87LGJpeeI1E7rQ/uBxK6b2v6kOBmCnjvQEiPvJEIGTtpE
 YimVBERdPDTBQiW5EQbbyL3VScwm5QUN2STnLPjUtVc9HES/zCdhXNlsASfhn/Tn
 8rlSAqVEOUcsTpTXYadHckNi1zn4zrpuhWKpSHXrvXCo3qU8QpISjYNwAJ/0IGBj
 CMdg2r+MneF6gop76R5aRcA0JDvDgtv56LKFVhi9gEkE5em9YAni17HU0IeTvJmT
 mL9j64h8oUErC/TpAU1vXCJjIxH7jLq8YQoNwHUvF0pSvcNGsaFeWu1ADQuTEIi9
 fyKHRpFwPMBhwc+AMaRepgQ9FlvE4567fQmwlrUDUKlCU0x0dfvFCM2z/o61YFlH
 oFgB0h0SNxdoj5EXny50LtokP1Kp/oBNVhhNsUpH8wVxWLi61BHJOslcc7nzdP6t
 JBqVE6bLQlxmlKt2AwiOkxe9xVv34o3AMxUYtUBYgCTZSlRjL//7pcqgG5r+CZH/
 eXEU3wWcGtRPEItGXtiGT9Vm2ZYSaUMFF7k7OrTPCHgkW51oEW4FUoaV7M+9fl43
 638x9Wnse4Q=
 =9LoT
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2021-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
 "Misc fixes:

   - Fix the NMI watchdog on ancient Intel CPUs

   - Remove a misguided, NMI-unsafe KASAN callback from the NMI-safe
     irq_work path used by perf.

   - Fix uncore events on Ice Lake servers.

   - Someone booted maxcpus=1 on an SNB-EP, and the uncore driver
     emitted warnings and was probably buggy. Fix it.

   - KCSAN found a genuine data race in the core perf code. Somewhat
     ironically the bug was introduced through a recent race fix. :-/
     In our defense, the new race window was much more narrow. Fix it"

* tag 'perf-urgent-2021-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/nmi_watchdog: Fix old-style NMI watchdog regression on old Intel CPUs
  irq_work: Make irq_work_queue() NMI-safe again
  perf/x86/intel/uncore: Fix M2M event umask for Ice Lake server
  perf/x86/intel/uncore: Fix a kernel WARNING triggered by maxcpus=1
  perf: Fix data race between pin_count increment/decrement
2021-06-12 11:34:49 -07:00
ChenXiaoSong
1d3156396c x86/sgx: Correct kernel-doc's arg name in sgx_encl_release()
Fix the following kernel-doc warning:

  arch/x86/kernel/cpu/sgx/encl.c:392: warning: Function parameter \
    or member 'ref' not described in 'sgx_encl_release'

 [ bp: Massage commit message. ]

Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210609035510.2083694-1-chenxiaosong2@huawei.com
2021-06-11 10:42:38 +02:00
CodyYao-oc
a8383dfb21 x86/nmi_watchdog: Fix old-style NMI watchdog regression on old Intel CPUs
The following commit:

   3a4ac121c2 ("x86/perf: Add hardware performance events support for Zhaoxin CPU.")

Got the old-style NMI watchdog logic wrong and broke it for basically every
Intel CPU where it was active. Which is only truly old CPUs, so few people noticed.

On CPUs with perf events support we turn off the old-style NMI watchdog, so it
was pretty pointless to add the logic for X86_VENDOR_ZHAOXIN to begin with ... :-/

Anyway, the fix is to restore the old logic and add a 'break'.

[ mingo: Wrote a new changelog. ]

Fixes: 3a4ac121c2 ("x86/perf: Add hardware performance events support for Zhaoxin CPU.")
Signed-off-by: CodyYao-oc <CodyYao-oc@zhaoxin.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210607025335.9643-1-CodyYao-oc@zhaoxin.com
2021-06-10 10:04:40 +02:00
Thomas Gleixner
efa1655049 x86/fpu: Reset state for all signal restore failures
If access_ok() or fpregs_soft_set() fails in __fpu__restore_sig() then the
function just returns but does not clear the FPU state as it does for all
other fatal failures.

Clear the FPU state for these failures as well.

Fixes: 72a671ced6 ("x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/87mtryyhhz.ffs@nanos.tec.linutronix.de
2021-06-10 08:04:24 +02:00
Dave Airlie
a2098e857b Cross-subsystem Changes:
-  x86/gpu: add JasperLake to gen11 early quirks
   (Although the patch lacks the Ack info, it has been Acked by Borislav)
 
 Driver Changes:
 
 - General DMC improves (Anusha)
 - More ADL-P enabling (Vandita, Matt, Jose, Mika, Anusha, Imre, Lucas, Jani, Manasi, Ville, Stanislav)
 - Introduce MBUS relative dbuf offset (Ville)
 - PSR fixes and improvements (Gwan, Jose, Ville)
 - Re-enable LTTPR non-transparent LT mode for DPCD_REV < 1.4 (Ville)
 - Remove duplicated declarations (Shaokun, Wan)
 - Check HDMI sink deep color capabilities during .mode_valid (Ville)
 - Fix display flicker screan related to console and FBC (Chris)
 - Remaining conversions of GRAPHICS_VER (Lucas)
 - Drop invalid FIXME (Jose)
 - Fix bigjoiner check in dsc_disable (Vandita)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmDBMp8ACgkQ+mJfZA7r
 E8rkngf/cq6JI3nLmQpNSoDJ1VosuuLgVKGMeL+NR4UmHqsjzaxTL7evaJzf38mS
 wDaTvB3eEUKAFuvIY/US6xO3gPXb1TtmJ4UBizzkK7DOeh53LXvrxX+ifdg6RXx9
 7WsNvnUMItGX5+CRtHeWqmqptBCXTup1ntjAvTOKc9S20gshDHX0/eyk04Ub5FOb
 cVgt9FoDhTVY6Z2wWG9G0pezbuWc3rDMei+cboXUXCx+QEjjdYNyrb32UT6e1Qfm
 oBWRhOMTe+aJtbGen+l134I1uS3XCfjZ8zHVqLXMUhCJ443yB0LEhPdk56PJSD9F
 MoKujBlyxF1dM7SDQ/h6+7uhpvOkvA==
 =0nIT
 -----END PGP SIGNATURE-----

Merge tag 'drm-intel-next-2021-06-09' of git://anongit.freedesktop.org/drm/drm-intel into drm-next

Cross-subsystem Changes:

-  x86/gpu: add JasperLake to gen11 early quirks
  (Although the patch lacks the Ack info, it has been Acked by Borislav)

Driver Changes:

- General DMC improves (Anusha)
- More ADL-P enabling (Vandita, Matt, Jose, Mika, Anusha, Imre, Lucas, Jani, Manasi, Ville, Stanislav)
- Introduce MBUS relative dbuf offset (Ville)
- PSR fixes and improvements (Gwan, Jose, Ville)
- Re-enable LTTPR non-transparent LT mode for DPCD_REV < 1.4 (Ville)
- Remove duplicated declarations (Shaokun, Wan)
- Check HDMI sink deep color capabilities during .mode_valid (Ville)
- Fix display flicker screan related to console and FBC (Chris)
- Remaining conversions of GRAPHICS_VER (Lucas)
- Drop invalid FIXME (Jose)
- Fix bigjoiner check in dsc_disable (Vandita)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YMEy2Ew82BeL/hDK@intel.com
2021-06-10 13:45:11 +10:00
Andy Lutomirski
f72a249b0b x86/fpu: Add address range checks to copy_user_to_xstate()
copy_user_to_xstate() uses __copy_from_user(), which provides a negligible
speedup.  Fortunately, both call sites are at least almost correct.

__fpu__restore_sig() checks access_ok() with xstate_sigframe_size()
length and ptrace regset access uses fpu_user_xstate_size. These should
be valid upper bounds on the length, so, at worst, this would cause
spurious failures and not accesses to kernel memory.

Nonetheless, this is far more fragile than necessary and none of these
callers are in a hotpath.

Use copy_from_user() instead.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rik van Riel <riel@surriel.com>
Link: https://lkml.kernel.org/r/20210608144346.140254130@linutronix.de
2021-06-09 14:46:20 +02:00
Andy Lutomirski
d8778e393a x86/fpu: Invalidate FPU state after a failed XRSTOR from a user buffer
Both Intel and AMD consider it to be architecturally valid for XRSTOR to
fail with #PF but nonetheless change the register state.  The actual
conditions under which this might occur are unclear [1], but it seems
plausible that this might be triggered if one sibling thread unmaps a page
and invalidates the shared TLB while another sibling thread is executing
XRSTOR on the page in question.

__fpu__restore_sig() can execute XRSTOR while the hardware registers
are preserved on behalf of a different victim task (using the
fpu_fpregs_owner_ctx mechanism), and, in theory, XRSTOR could fail but
modify the registers.

If this happens, then there is a window in which __fpu__restore_sig()
could schedule out and the victim task could schedule back in without
reloading its own FPU registers. This would result in part of the FPU
state that __fpu__restore_sig() was attempting to load leaking into the
victim task's user-visible state.

Invalidate preserved FPU registers on XRSTOR failure to prevent this
situation from corrupting any state.

[1] Frequent readers of the errata lists might imagine "complex
    microarchitectural conditions".

Fixes: 1d731e731c ("x86/fpu: Add a fastpath to __fpu__restore_sig()")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rik van Riel <riel@surriel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210608144345.758116583@linutronix.de
2021-06-09 09:49:38 +02:00
Thomas Gleixner
484cea4f36 x86/fpu: Prevent state corruption in __fpu__restore_sig()
The non-compacted slowpath uses __copy_from_user() and copies the entire
user buffer into the kernel buffer, verbatim.  This means that the kernel
buffer may now contain entirely invalid state on which XRSTOR will #GP.
validate_user_xstate_header() can detect some of that corruption, but that
leaves the onus on callers to clear the buffer.

Prior to XSAVES support, it was possible just to reinitialize the buffer,
completely, but with supervisor states that is not longer possible as the
buffer clearing code split got it backwards. Fixing that is possible but
not corrupting the state in the first place is more robust.

Avoid corruption of the kernel XSAVE buffer by using copy_user_to_xstate()
which validates the XSAVE header contents before copying the actual states
to the kernel. copy_user_to_xstate() was previously only called for
compacted-format kernel buffers, but it works for both compacted and
non-compacted forms.

Using it for the non-compacted form is slower because of multiple
__copy_from_user() operations, but that cost is less important than robust
code in an already slow path.

[ Changelog polished by Dave Hansen ]

Fixes: b860eb8dce ("x86/fpu/xstate: Define new functions for clearing fpregs and xstates")
Reported-by: syzbot+2067e764dbcd10721e2e@syzkaller.appspotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rik van Riel <riel@surriel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210608144345.611833074@linutronix.de
2021-06-09 09:28:21 +02:00
Borislav Petkov
ec35d1d93b x86/setup: Document that Windows reserves the first MiB
It does so unconditionally too, on Intel and AMD machines, to work
around BIOS bugs, as confirmed by Microsoft folks (see Link for full
details).

Reflow the paragraph, while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/MWHPR21MB159330952629D36EEDE706B3D7379@MWHPR21MB1593.namprd21.prod.outlook.com
2021-06-08 22:26:43 +02:00
Tejas Upadhyay
31b77c70d9 x86/gpu: add JasperLake to gen11 early quirks
Let's reserve JSL stolen memory for graphics.

JasperLake is a gen11 platform which is compatible with
ICL/EHL changes.

This was missed in commit 24ea098b7c ("drm/i915/jsl: Split
EHL/JSL platform info and PCI ids")

V2:
    - Added maintainer list in cc
    - Added patch ref in commit message
V1:
    - Added Cc: x86@kernel.org

Fixes: 24ea098b7c ("drm/i915/jsl: Split EHL/JSL platform info and PCI ids")
Cc: <stable@vger.kernel.org> # v5.11+
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210608053411.394166-1-tejaskumarx.surendrakumar.upadhyay@intel.com
2021-06-08 12:24:43 +02:00
Mike Rapoport
23721c8e92 x86/crash: Remove crash_reserve_low_1M()
The entire memory range under 1M is unconditionally reserved in
setup_arch(), so there is no need for crash_reserve_low_1M() anymore.

Remove this function.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210601075354.5149-4-rppt@kernel.org
2021-06-07 12:14:45 +02:00
Mike Rapoport
1a6a9044b9 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options
The CONFIG_X86_RESERVE_LOW build time and reservelow= command line option
allowed to control the amount of memory under 1M that would be reserved at
boot to avoid using memory that can be potentially clobbered by BIOS.

Since the entire range under 1M is always reserved there is no need for
these options anymore and they can be removed.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210601075354.5149-3-rppt@kernel.org
2021-06-07 11:12:25 +02:00
Borislav Petkov
0a5f38c81e Linux 5.13-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmC9UH8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGRDYH/3WgnRz5DfVhjmlD
 Lg38mPmbZWhFibXghrYrpbVpTyhjGFRuNtXAt2p7/nYnM71wzI6Qkx6cRKZeB5HE
 /SqeksPWUEgJaUuoXeQBrBaG7q/+9ph7Rgaf2wP7k+E00RI3E4pbMubuqFAUeikr
 itKFD9aTUsgT5XbG2hH5Ddwh5hBD2C/1PVt3jpLnJkXRCn91uEh+R7SHXP/fsjAd
 ZaGOVbAGm+jePCQDBXpVUn+8fJdxvQg7rxWVRRRhi5LXG+pnAezbkGl746zBwaSw
 K6lmVSA+eAiVkKu6nR4HJv9Hax1juFbp9xpcCo4jzxO5NJF4jsmytjLEaYFdi4NX
 G542808=
 =BPDL
 -----END PGP SIGNATURE-----

Merge tag 'v5.13-rc5' into x86/cleanups

Pick up dependent changes in order to base further cleanups ontop.

Signed-off-by: Borislav Petkov <bp@suse.de>
2021-06-07 11:02:30 +02:00
Mike Rapoport
f1d4d47c58 x86/setup: Always reserve the first 1M of RAM
There are BIOSes that are known to corrupt the memory under 1M, or more
precisely under 640K because the memory above 640K is anyway reserved
for the EGA/VGA frame buffer and BIOS.

To prevent usage of the memory that will be potentially clobbered by the
kernel, the beginning of the memory is always reserved. The exact size
of the reserved area is determined by CONFIG_X86_RESERVE_LOW build time
and the "reservelow=" command line option. The reserved range may be
from 4K to 640K with the default of 64K. There are also configurations
that reserve the entire 1M range, like machines with SandyBridge graphic
devices or systems that enable crash kernel.

In addition to the potentially clobbered memory, EBDA of unknown size may
be as low as 128K and the memory above that EBDA start is also reserved
early.

It would have been possible to reserve the entire range under 1M unless for
the real mode trampoline that must reside in that area.

To accommodate placement of the real mode trampoline and keep the memory
safe from being clobbered by BIOS, reserve the first 64K of RAM before
memory allocations are possible and then, after the real mode trampoline
is allocated, reserve the entire range from 0 to 1M.

Update trim_snb_memory() and reserve_real_mode() to avoid redundant
reservations of the same memory range.

Also make sure the memory under 1M is not getting freed by
efi_free_boot_services().

 [ bp: Massage commit message and comments. ]

Fixes: a799c2bd29 ("x86/setup: Consolidate early memory reservations")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Hugh Dickins <hughd@google.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213177
Link: https://lkml.kernel.org/r/20210601075354.5149-2-rppt@kernel.org
2021-06-03 19:57:55 +02:00
Ingo Molnar
a9e906b71f Merge branch 'sched/urgent' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-06-03 19:00:49 +02:00
Borislav Petkov
2b31e8ed96 x86/alternative: Optimize single-byte NOPs at an arbitrary position
Up until now the assumption was that an alternative patching site would
have some instructions at the beginning and trailing single-byte NOPs
(0x90) padding. Therefore, the patching machinery would go and optimize
those single-byte NOPs into longer ones.

However, this assumption is broken on 32-bit when code like
hv_do_hypercall() in hyperv_init() would use the ratpoline speculation
killer CALL_NOSPEC. The 32-bit version of that macro would align certain
insns to 16 bytes, leading to the compiler issuing a one or more
single-byte NOPs, depending on the holes it needs to fill for alignment.

That would lead to the warning in optimize_nops() to fire:

  ------------[ cut here ]------------
  Not a NOP at 0xc27fb598
   WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:211 optimize_nops.isra.13

due to that function verifying whether all of the following bytes really
are single-byte NOPs.

Therefore, carve out the NOP padding into a separate function and call
it for each NOP range beginning with a single-byte NOP.

Fixes: 23c1ad538f ("x86/alternatives: Optimize optimize_nops()")
Reported-by: Richard Narron <richard@aaazen.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213301
Link: https://lkml.kernel.org/r/20210601212125.17145-1-bp@alien8.de
2021-06-03 16:33:09 +02:00
Thomas Gleixner
9bfecd0583 x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()
While digesting the XSAVE-related horrors which got introduced with
the supervisor/user split, the recent addition of ENQCMD-related
functionality got on the radar and turned out to be similarly broken.

update_pasid(), which is only required when X86_FEATURE_ENQCMD is
available, is invoked from two places:

 1) From switch_to() for the incoming task

 2) Via a SMP function call from the IOMMU/SMV code

#1 is half-ways correct as it hacks around the brokenness of get_xsave_addr()
   by enforcing the state to be 'present', but all the conditionals in that
   code are completely pointless for that.

   Also the invocation is just useless overhead because at that point
   it's guaranteed that TIF_NEED_FPU_LOAD is set on the incoming task
   and all of this can be handled at return to user space.

#2 is broken beyond repair. The comment in the code claims that it is safe
   to invoke this in an IPI, but that's just wishful thinking.

   FPU state of a running task is protected by fregs_lock() which is
   nothing else than a local_bh_disable(). As BH-disabled regions run
   usually with interrupts enabled the IPI can hit a code section which
   modifies FPU state and there is absolutely no guarantee that any of the
   assumptions which are made for the IPI case is true.

   Also the IPI is sent to all CPUs in mm_cpumask(mm), but the IPI is
   invoked with a NULL pointer argument, so it can hit a completely
   unrelated task and unconditionally force an update for nothing.
   Worse, it can hit a kernel thread which operates on a user space
   address space and set a random PASID for it.

The offending commit does not cleanly revert, but it's sufficient to
force disable X86_FEATURE_ENQCMD and to remove the broken update_pasid()
code to make this dysfunctional all over the place. Anything more
complex would require more surgery and none of the related functions
outside of the x86 core code are blatantly wrong, so removing those
would be overkill.

As nothing enables the PASID bit in the IA32_XSS MSR yet, which is
required to make this actually work, this cannot result in a regression
except for related out of tree train-wrecks, but they are broken already
today.

Fixes: 20f0afd1fb ("x86/mmu: Allocate/free a PASID")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/87mtsd6gr9.ffs@nanos.tec.linutronix.de
2021-06-03 16:33:09 +02:00
Naveen N. Rao
2e38eb04c9 kprobes: Do not increment probe miss count in the fault handler
Kprobes has a counter 'nmissed', that is used to count the number of
times a probe handler was not called. This generally happens when we hit
a kprobe while handling another kprobe.

However, if one of the probe handlers causes a fault, we are currently
incrementing 'nmissed'. The comment in fault handler indicates that this
can be used to account faults taken by the probe handlers. But, this has
never been the intention as is evident from the comment above 'nmissed'
in 'struct kprobe':

	/*count the number of times this probe was temporarily disarmed */
	unsigned long nmissed;

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lkml.kernel.org/r/20210601120150.672652-1-naveen.n.rao@linux.vnet.ibm.com
2021-06-03 15:47:26 +02:00
Borislav Petkov
7ee0e638a5 x86/alternative: Align insn bytes vertically
For easier inspection which bytes have changed.

For example:

  feat: 7*32+12, old: (__x86_indirect_thunk_r10+0x0/0x20 (ffffffff81c02480) len: 17), repl: (ffffffff897813aa, len: 17)
  ffffffff81c02480:   old_insn: 41 ff e2 90 90 90 90 90 90 90 90 90 90 90 90 90 90
  ffffffff897813aa:   rpl_insn: e8 07 00 00 00 f3 90 0f ae e8 eb f9 4c 89 14 24 c3
  ffffffff81c02480: final_insn: e8 07 00 00 00 f3 90 0f ae e8 eb f9 4c 89 14 24 c3

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210601193713.16190-1-bp@alien8.de
2021-06-03 14:24:44 +02:00
Praveen Kumar
450605c28d x86/hyperv: fix logical processor creation
Microsoft Hypervisor expects the logical processor index to be the same
as CPU's index during logical processor creation. Using cpu_physical_id
confuses hypervisor's scheduler. That causes the root partition not boot
when core scheduler is used.

This patch removes the call to cpu_physical_id and uses the CPU index
directly for bringing up logical processor. This scheme works for both
classic scheduler and core scheduler.

Fixes: 333abaf5ab (x86/hyperv: implement and use hv_smp_prepare_cpus)
Signed-off-by: Praveen Kumar <kumarpraveen@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210531074046.113452-1-kumarpraveen@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-06-02 15:13:20 +00:00
Andrew Cooper
cbcddaa33d perf/x86/rapl: Use CPUID bit on AMD and Hygon parts
AMD and Hygon CPUs have a CPUID bit for RAPL.  Drop the fam17h suffix as
it is stale already.

Make use of this instead of a model check to work more nicely in virtual
environments where RAPL typically isn't available.

 [ bp: drop the ../cpu/powerflags.c hunk which is superfluous as the
   "rapl" bit name appears already in flags. ]

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210514135920.16093-1-andrew.cooper3@citrix.com
2021-06-01 21:10:33 +02:00
Peter Zijlstra
ec6aba3d2b kprobes: Remove kprobe::fault_handler
The reason for kprobe::fault_handler(), as given by their comment:

 * We come here because instructions in the pre/post
 * handler caused the page_fault, this could happen
 * if handler tries to access user space by
 * copy_from_user(), get_user() etc. Let the
 * user-specified handler try to fix it first.

Is just plain bad. Those other handlers are ran from non-preemptible
context and had better use _nofault() functions. Also, there is no
upstream usage of this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20210525073213.561116662@infradead.org
2021-06-01 16:00:08 +02:00
Borislav Petkov
9a90ed065a x86/thermal: Fix LVT thermal setup for SMI delivery mode
There are machines out there with added value crap^WBIOS which provide an
SMI handler for the local APIC thermal sensor interrupt. Out of reset,
the BSP on those machines has something like 0x200 in that APIC register
(timestamps left in because this whole issue is timing sensitive):

  [    0.033858] read lvtthmr: 0x330, val: 0x200

which means:

 - bit 16 - the interrupt mask bit is clear and thus that interrupt is enabled
 - bits [10:8] have 010b which means SMI delivery mode.

Now, later during boot, when the kernel programs the local APIC, it
soft-disables it temporarily through the spurious vector register:

  setup_local_APIC:

  	...

	/*
	 * If this comes from kexec/kcrash the APIC might be enabled in
	 * SPIV. Soft disable it before doing further initialization.
	 */
	value = apic_read(APIC_SPIV);
	value &= ~APIC_SPIV_APIC_ENABLED;
	apic_write(APIC_SPIV, value);

which means (from the SDM):

"10.4.7.2 Local APIC State After It Has Been Software Disabled

...

* The mask bits for all the LVT entries are set. Attempts to reset these
bits will be ignored."

And this happens too:

  [    0.124111] APIC: Switch to symmetric I/O mode setup
  [    0.124117] lvtthmr 0x200 before write 0xf to APIC 0xf0
  [    0.124118] lvtthmr 0x10200 after write 0xf to APIC 0xf0

This results in CPU 0 soft lockups depending on the placement in time
when the APIC soft-disable happens. Those soft lockups are not 100%
reproducible and the reason for that can only be speculated as no one
tells you what SMM does. Likely, it confuses the SMM code that the APIC
is disabled and the thermal interrupt doesn't doesn't fire at all,
leading to CPU 0 stuck in SMM forever...

Now, before

  4f432e8bb1 ("x86/mce: Get rid of mcheck_intel_therm_init()")

due to how the APIC_LVTTHMR was read before APIC initialization in
mcheck_intel_therm_init(), it would read the value with the mask bit 16
clear and then intel_init_thermal() would replicate it onto the APs and
all would be peachy - the thermal interrupt would remain enabled.

But that commit moved that reading to a later moment in
intel_init_thermal(), resulting in reading APIC_LVTTHMR on the BSP too
late and with its interrupt mask bit set.

Thus, revert back to the old behavior of reading the thermal LVT
register before the APIC gets initialized.

Fixes: 4f432e8bb1 ("x86/mce: Get rid of mcheck_intel_therm_init()")
Reported-by: James Feeney <james@nurealm.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lkml.kernel.org/r/YKIqDdFNaXYd39wz@zn.tnic
2021-05-31 22:32:26 +02:00
Pu Wen
280b68a3b3 x86/cstate: Allow ACPI C1 FFH MWAIT use on Hygon systems
Hygon systems support the MONITOR/MWAIT instructions and these can be
used for ACPI C1 in the same way as on AMD and Intel systems.

The BIOS declares a C1 state in _CST to use FFH and CPUID_Fn00000005_EDX
is non-zero on Hygon systems.

Allow ffh_cstate_init() to succeed on Hygon systems to default using FFH
MWAIT instead of HALT for ACPI C1.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210528081417.31474-1-puwen@hygon.cn
2021-05-31 10:47:04 +02:00
Thomas Gleixner
7d65f9e806 x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
PIC interrupts do not support affinity setting and they can end up on
any online CPU. Therefore, it's required to mark the associated vectors
as system-wide reserved. Otherwise, the corresponding irq descriptors
are copied to the secondary CPUs but the vectors are not marked as
assigned or reserved. This works correctly for the IO/APIC case.

When the IO/APIC is disabled via config, kernel command line or lack of
enumeration then all legacy interrupts are routed through the PIC, but
nothing marks them as system-wide reserved vectors.

As a consequence, a subsequent allocation on a secondary CPU can result in
allocating one of these vectors, which triggers the BUG() in
apic_update_vector() because the interrupt descriptor slot is not empty.

Imran tried to work around that by marking those interrupts as allocated
when a CPU comes online. But that's wrong in case that the IO/APIC is
available and one of the legacy interrupts, e.g. IRQ0, has been switched to
PIC mode because then marking them as allocated will fail as they are
already marked as system vectors.

Stay consistent and update the legacy vectors after attempting IO/APIC
initialization and mark them as system vectors in case that no IO/APIC is
available.

Fixes: 69cde0004a ("x86/vector: Use matrix allocator for vector assignment")
Reported-by: Imran Khan <imran.f.khan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210519233928.2157496-1-imran.f.khan@oracle.com
2021-05-29 11:41:14 +02:00
Tony Luck
40cd0aae59 x86/mce: Include a MCi_MISC value in faked mce logs
When BIOS reports memory errors to Linux using the ACPI/APEI
error reporting method Linux creates a "struct mce" to pass
to the normal reporting code path.

The constructed record doesn't include a value for the "misc"
field of the structure, and so mce_usable_address() says this
record doesn't include a valid address.

Net result is that functions like uc_decode_notifier() will
just ignore this record instead of taking action to offline
a page.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210527222846.931851-1-tony.luck@intel.com
2021-05-28 16:57:16 +02:00
Muralidhara M K
94a311ce24 x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types
Add the (HWID, MCATYPE) tuples and names for new SMCA bank types.

Also, add their respective error descriptions to the MCE decoding module
edac_mce_amd. Also while at it, optimize the string names for some SMCA
banks.

 [ bp: Drop repeated comments, explain why UMC_V2 is a separate entry. ]

Signed-off-by: Muralidhara M K <muralimk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi  <nchatrad@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lkml.kernel.org/r/20210526164601.66228-1-nchatrad@amd.com
2021-05-27 20:08:14 +02:00
Daniel Vetter
5522e9f7b0 Linux 5.13-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmCqzFgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGIgQH/3nAV/fYbUCubEQe
 RXUcjMGznIpdHeMiY/hPezObYnpBI3UAi2JwHCvQfoE8ckbx4tq8Xp+TUWebsdaf
 zpDhKXDj2jHha1f5AixHCn1UFxiqOSn3d2muY2Bh1Nhg7iJuzU8xjIMCcOdss+fp
 8e4wqidOHkpWvGJ96CQ5zCNxeXI+/f7VX2IgdJ+RCDwzbqJlIvvXwAkg1KrguUEz
 EPmhpODqjPbVVc/mhtguMLMWl78WKCTBOSHCcYBolatXfm2ojsnX1hXprypWY4Mg
 vKXxF/91AS8InCC08Jw+puz+fXDBx1jtNmFFhDOFTyz/TvwPaKZiWbAeXOZFJA2Z
 Wm4su7g=
 =cqxg
 -----END PGP SIGNATURE-----

Merge v5.13-rc3 into drm-next

drm/i915 is extremely on fire without the below revert from -rc3:

commit 293837b9ac
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed May 19 05:55:57 2021 -1000

    Revert "i915: fix remap_io_sg to verify the pgprot"

Backmerge so we don't have a too wide bisect window for anything
that's a more involved workload than booting the driver.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2021-05-27 13:07:47 +02:00
Linus Torvalds
7de7ac8d60 - Fix how SEV handles MMIO accesses by forwarding potential page faults instead
of killing the machine and by using the accessors with the exact functionality
 needed when accessing memory.
 
 - Fix a confusion with Clang LTO compiler switches passed to the it
 
 - Handle the case gracefully when VMGEXIT has been executed in userspace
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmCqKdwACgkQEsHwGGHe
 VUrnfBAAitJ9ytn5PzrLhg9cKt+BRVg8QQExWUYqOrSDXHus5+X/21YKey7BBhIj
 rMJSHi7qytO5rrfj5nw3dIH30hnat8nn5GWcNMG0hi1ptep+GP0xMG1nGw7INJDW
 85FpQI9jpO+vz0AcoZYAtSOWbwonVqbhjdHGzDhIi2e0Qt+1uKbjsT+iPxANBpyB
 fyEU3biPyWfKY4JSr1n0EHBywR329IW5I+yZInb2SBEU42V4vDBGFCXgdS8eFGo5
 KPz/bikERC/gZuDIRXDP6riKIpy1yCO1JZb0EgukwDddbzNz/ox7dX9JL+dEeRzl
 0zr28cJSoZgYQjdi3LU412CMVa8eYw7Ca0/mbhADdZK6Wd7xUNEiUR7FFoBA2Jxp
 +oYzYe4KvlsaFQyPrt8mfJDA36r+FZcqr3WJF+LYmPbRi+cbNDbKSoeDqShAh+Fq
 uUVNloWiOltsRuCS5/du8qzhmJLdIH1uFqtYK37PGLzAHz+KJ9SAdLWaYaLx4GFd
 rrFuCnk5DmoDf3I5lQvIzIEmYysEQOloGgDR6dDaPFRymOgor7BsCdR+dtxVQ6P6
 SMSUzyJLq4tC4dzT5PxWfZDlO+wIxu5QAOhu95oWIdZbsaoABZYCuLf7T7XQr9PA
 DLil4v4i7/FGpDBh+2s3V5hTXHKATuI7SGXnMNfx1eLurChg07k=
 =51BK
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Fix how SEV handles MMIO accesses by forwarding potential page faults
   instead of killing the machine and by using the accessors with the
   exact functionality needed when accessing memory.

 - Fix a confusion with Clang LTO compiler switches passed to the it

 - Handle the case gracefully when VMGEXIT has been executed in
   userspace

* tag 'x86_urgent_for_v5.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev-es: Use __put_user()/__get_user() for data accesses
  x86/sev-es: Forward page-faults which happen during emulation
  x86/sev-es: Don't return NULL from sev_es_get_ghcb()
  x86/build: Fix location of '-plugin-opt=' flags
  x86/sev-es: Invalidate the GHCB after completing VMGEXIT
  x86/sev-es: Move sev_es_put_ghcb() in prep for follow on patch
2021-05-23 06:12:25 -10:00
Linus Torvalds
a0e31f3a38 Merge branch 'for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull siginfo fix from Eric Biederman:
 "During the merge window an issue with si_perf and the siginfo ABI came
  up. The alpha and sparc siginfo structure layout had changed with the
  addition of SIGTRAP TRAP_PERF and the new field si_perf.

  The reason only alpha and sparc were affected is that they are the
  only architectures that use si_trapno.

  Looking deeper it was discovered that si_trapno is used for only a few
  select signals on alpha and sparc, and that none of the other
  _sigfault fields past si_addr are used at all. Which means technically
  no regression on alpha and sparc.

  While the alignment concerns might be dismissed the abuse of si_errno
  by SIGTRAP TRAP_PERF does have the potential to cause regressions in
  existing userspace.

  While we still have time before userspace starts using and depending
  on the new definition siginfo for SIGTRAP TRAP_PERF this set of
  changes cleans up siginfo_t.

   - The si_trapno field is demoted from magic alpha and sparc status
     and made an ordinary union member of the _sigfault member of
     siginfo_t. Without moving it of course.

   - si_perf is replaced with si_perf_data and si_perf_type ending the
     abuse of si_errno.

   - Unnecessary additions to signalfd_siginfo are removed"

* 'for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  signalfd: Remove SIL_PERF_EVENT fields from signalfd_siginfo
  signal: Deliver all of the siginfo perf data in _perf
  signal: Factor force_sig_perf out of perf_sigtrap
  signal: Implement SIL_FAULT_TRAPNO
  siginfo: Move si_trapno inside the union inside _si_fault
2021-05-21 06:12:52 -10:00
H. Peter Anvin (Intel)
056c52f5e8 x86/kexec: Set_[gi]dt() -> native_[gi]dt_invalidate() in machine_kexec_*.c
These files contain private set_gdt() functions which are only used to
invalid the gdt; machine_kexec_64.c also contains a set_idt()
function to invalidate the idt.

phys_to_virt(0) *really* doesn't make any sense for creating an
invalid GDT. A NULL pointer (virtual 0) makes a lot more sense;
although neither will allow any actual memory reference, a NULL
pointer stands out more.

Replace these calls with native_[gi]dt_invalidate().

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210519212154.511983-7-hpa@zytor.com
2021-05-21 12:36:45 +02:00
H. Peter Anvin (Intel)
8ec9069a43 x86/idt: Remove address argument from idt_invalidate()
There is no reason to specify any specific address to idt_invalidate(). It
looks mostly like an artifact of unifying code done differently by
accident. The most "sensible" address to set here is a NULL pointer -
virtual address zero, just as a visual marker.

This also makes it possible to mark the struct desc_ptr in idt_invalidate()
as static const.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210519212154.511983-5-hpa@zytor.com
2021-05-21 12:36:45 +02:00
David Bartley
2ade8fc650 x86/amd_nb: Add AMD family 19h model 50h PCI ids
This is required to support Zen3 APUs in k10temp.

Signed-off-by: David Bartley <andareed@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Wei Huang <wei.huang2@amd.com>
Link: https://lkml.kernel.org/r/20210520174130.94954-1-andareed@gmail.com
2021-05-21 12:01:38 +02:00
Dave Airlie
2ba0478550 Core Changes:
- drm: Rename DP_PSR_SELECTIVE_UPDATE to better mach eDP spec (Jose).
 
 Driver Changes:
 
 - Display plane clock rates fixes and improvements (Ville).
 - Uninint DMC FW loader state during shutdown (Imre).
 - Convert snprintf to sysfs_emit (Xuezhi).
 - Fix invalid access to ACPI _DSM objects (Takashi).
 - A big refactor around how i915 addresses the graphics
   and display IP versions. (Matt, Lucas).
 - Backlight fix (Lyude).
 - Display watermark and DBUF fixes (Ville).
 - HDCP fix (Anshuman).
 - Improve cases where display is not available (Jose).
 - Defeature PSR2 for RKL and ALD-S (Jose).
 - VLV DSI panel power fixes and improvements (Hans).
 - display-12 workaround (Jose).
 - Fix modesetting (Imre).
 - Drop redundant address-of op before lttpr_common_caps array (Imre).
 - Fix compiler checks (Jose, Jason).
 - GLK display fixes (Ville).
 - Fix error code returns (Dan).
 - eDP novel: back again to slow and wide link training everywhere (Kai-Heng).
 - Abstract DMC FW path (Rodrigo).
 - Preparation and changes for upcoming
   XeLPD display IP (Jose, Matt, Ville, Juha-Pekka, Animesh).
 - Fix comment typo in DSI code (zuoqilin).
 - Simplify CCS and UV plane alignment handling (Imre).
 - PSR Fixes on TGL (Gwan-gyeong, Jose).
 - Add intel_dp_hdcp.h and rename init (Jani).
 - Move crtc and dpll declarations around (Jani).
 - Fix pre-skl DP AUX precharge length (Ville).
 - Remove stray newlines from random files (Ville).
 - crtc->index and intel_crtc+drm_crtc pointer clean-up (Ville).
 - Add frontbuffer tracking tracepoints (Ville).
 - ADL-S PCI ID updates (Anand).
 - Use unique backlight device names (Jani).
 - A few clean-ups on i915/audio (Jani).
 - Use intel_framebuffer instead of drm one on intel_fb functions (Imre).
 - Add the missing MC CCS/XYUV8888 format support on display >= 12 (Imre).
 - Nuke display error state (Ville).
 - ADL-P initial enablement patches
   starting to land (Clint, Imre, Jose, Umesh, Vandita, Mika).
 - Display clean-up around VBT and the strap bits (Lucas).
 - Try YCbCr420 color when RGB fails (Werner).
 - More PSR fixes and improvements (Jose).
 - Other generic display code clean-up (Jose, Ville).
 - Use correct downstream caps for check Src-Ctl mode for PCON (Ankit).
 - Disable HiZ Raw Stall Optimization on broken gen7 (Simon).
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmClYcoACgkQ+mJfZA7r
 E8oXBwf/Rfb8o/4WZeoc3vxtFlWenA/9QJA2Xs4ui6U3vJScpaHFLq5Ki6aOSxIO
 WudQvatS1Bw+QzzAjSZFZx+WhCwop4BLhFJJxVK2RD4REeSjJvPZ6oovgndMOGY4
 RvyeXoIJoXoHPQ7uJXMZZGRthYTWR83Aw93hi3uTd4jU+JB8WtHgvvycKTVKIkVB
 T6V3PSuTmXwhHNURfev8d/JyiZMphRDJLD3esamwn2XRYtPDZjfkavwYQVeUlbms
 TstymTGZXjNvPnX9HkzoURdF4F394iNyx3lX1j5nyYm0QgyHJKJI8moy8Dfv4+AB
 JlL5vE7cTKtnKC5OUPCh9NZRH4pNZw==
 =uO7R
 -----END PGP SIGNATURE-----

Merge tag 'drm-intel-next-2021-05-19-1' of git://anongit.freedesktop.org/drm/drm-intel into drm-next

Core Changes:

- drm: Rename DP_PSR_SELECTIVE_UPDATE to better mach eDP spec (Jose).

Driver Changes:

- Display plane clock rates fixes and improvements (Ville).
- Uninint DMC FW loader state during shutdown (Imre).
- Convert snprintf to sysfs_emit (Xuezhi).
- Fix invalid access to ACPI _DSM objects (Takashi).
- A big refactor around how i915 addresses the graphics
  and display IP versions. (Matt, Lucas).
- Backlight fix (Lyude).
- Display watermark and DBUF fixes (Ville).
- HDCP fix (Anshuman).
- Improve cases where display is not available (Jose).
- Defeature PSR2 for RKL and ALD-S (Jose).
- VLV DSI panel power fixes and improvements (Hans).
- display-12 workaround (Jose).
- Fix modesetting (Imre).
- Drop redundant address-of op before lttpr_common_caps array (Imre).
- Fix compiler checks (Jose, Jason).
- GLK display fixes (Ville).
- Fix error code returns (Dan).
- eDP novel: back again to slow and wide link training everywhere (Kai-Heng).
- Abstract DMC FW path (Rodrigo).
- Preparation and changes for upcoming
  XeLPD display IP (Jose, Matt, Ville, Juha-Pekka, Animesh).
- Fix comment typo in DSI code (zuoqilin).
- Simplify CCS and UV plane alignment handling (Imre).
- PSR Fixes on TGL (Gwan-gyeong, Jose).
- Add intel_dp_hdcp.h and rename init (Jani).
- Move crtc and dpll declarations around (Jani).
- Fix pre-skl DP AUX precharge length (Ville).
- Remove stray newlines from random files (Ville).
- crtc->index and intel_crtc+drm_crtc pointer clean-up (Ville).
- Add frontbuffer tracking tracepoints (Ville).
- ADL-S PCI ID updates (Anand).
- Use unique backlight device names (Jani).
- A few clean-ups on i915/audio (Jani).
- Use intel_framebuffer instead of drm one on intel_fb functions (Imre).
- Add the missing MC CCS/XYUV8888 format support on display >= 12 (Imre).
- Nuke display error state (Ville).
- ADL-P initial enablement patches
  starting to land (Clint, Imre, Jose, Umesh, Vandita, Mika).
- Display clean-up around VBT and the strap bits (Lucas).
- Try YCbCr420 color when RGB fails (Werner).
- More PSR fixes and improvements (Jose).
- Other generic display code clean-up (Jose, Ville).
- Use correct downstream caps for check Src-Ctl mode for PCON (Ankit).
- Disable HiZ Raw Stall Optimization on broken gen7 (Simon).

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YKVioeu0JkUAlR7y@intel.com
2021-05-21 08:55:23 +10:00
Joerg Roedel
4954f5b8ef x86/sev-es: Use __put_user()/__get_user() for data accesses
The put_user() and get_user() functions do checks on the address which is
passed to them. They check whether the address is actually a user-space
address and whether its fine to access it. They also call might_fault()
to indicate that they could fault and possibly sleep.

All of these checks are neither wanted nor needed in the #VC exception
handler, which can be invoked from almost any context and also for MMIO
instructions from kernel space on kernel memory. All the #VC handler
wants to know is whether a fault happened when the access was tried.

This is provided by __put_user()/__get_user(), which just do the access
no matter what. Also add comments explaining why __get_user() and
__put_user() are the best choice here and why it is safe to use them
in this context. Also explain why copy_to/from_user can't be used.

In addition, also revert commit

  7024f60d65 ("x86/sev-es: Handle string port IO to kernel memory properly")

because using __get_user()/__put_user() fixes the same problem while
the above commit introduced several problems:

  1) It uses access_ok() which is only allowed in task context.

  2) It uses memcpy() which has no fault handling at all and is
     thus unsafe to use here.

  [ bp: Fix up commit ID of the reverted commit above. ]

Fixes: f980f9c31a ("x86/sev-es: Compile early handler code into kernel image")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # v5.10+
Link: https://lkml.kernel.org/r/20210519135251.30093-4-joro@8bytes.org
2021-05-19 18:45:37 +02:00
Joerg Roedel
c25bbdb564 x86/sev-es: Forward page-faults which happen during emulation
When emulating guest instructions for MMIO or IOIO accesses, the #VC
handler might get a page-fault and will not be able to complete. Forward
the page-fault in this case to the correct handler instead of killing
the machine.

Fixes: 0786138c78 ("x86/sev-es: Add a Runtime #VC Exception Handler")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # v5.10+
Link: https://lkml.kernel.org/r/20210519135251.30093-3-joro@8bytes.org
2021-05-19 17:13:04 +02:00
Joerg Roedel
b250f2f779 x86/sev-es: Don't return NULL from sev_es_get_ghcb()
sev_es_get_ghcb() is called from several places but only one of them
checks the return value. The reaction to returning NULL is always the
same: calling panic() and kill the machine.

Instead of adding checks to all call sites, move the panic() into the
function itself so that it will no longer return NULL.

Fixes: 0786138c78 ("x86/sev-es: Add a Runtime #VC Exception Handler")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # v5.10+
Link: https://lkml.kernel.org/r/20210519135251.30093-2-joro@8bytes.org
2021-05-19 17:05:13 +02:00
Chang S. Bae
2beb4a53fc x86/signal: Detect and prevent an alternate signal stack overflow
The kernel pushes context on to the userspace stack to prepare for the
user's signal handler. When the user has supplied an alternate signal
stack, via sigaltstack(2), it is easy for the kernel to verify that the
stack size is sufficient for the current hardware context.

Check if writing the hardware context to the alternate stack will exceed
it's size. If yes, then instead of corrupting user-data and proceeding with
the original signal handler, an immediate SIGSEGV signal is delivered.

Refactor the stack pointer check code from on_sig_stack() and use the new
helper.

While the kernel allows new source code to discover and use a sufficient
alternate signal stack size, this check is still necessary to protect
binaries with insufficient alternate signal stack size from data
corruption.

Fixes: c2bc11f10a ("x86, AVX-512: Enable AVX-512 States Context Switch")
Reported-by: Florian Weimer <fweimer@redhat.com>
Suggested-by: Jann Horn <jannh@google.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Len Brown <len.brown@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20210518200320.17239-6-chang.seok.bae@intel.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=153531
2021-05-19 12:40:30 +02:00
Chang S. Bae
1c33bb0507 x86/elf: Support a new ELF aux vector AT_MINSIGSTKSZ
Historically, signal.h defines MINSIGSTKSZ (2KB) and SIGSTKSZ (8KB), for
use by all architectures with sigaltstack(2). Over time, the hardware state
size grew, but these constants did not evolve. Today, literal use of these
constants on several architectures may result in signal stack overflow, and
thus user data corruption.

A few years ago, the ARM team addressed this issue by establishing
getauxval(AT_MINSIGSTKSZ). This enables the kernel to supply a value
at runtime that is an appropriate replacement on current and future
hardware.

Add getauxval(AT_MINSIGSTKSZ) support to x86, analogous to the support
added for ARM in

  94b07c1f8c ("arm64: signal: Report signal frame size to userspace via auxv").

Also, include a documentation to describe x86-specific auxiliary vectors.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Len Brown <len.brown@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20210518200320.17239-4-chang.seok.bae@intel.com
2021-05-19 12:18:45 +02:00
Chang S. Bae
939ef71329 x86/signal: Introduce helpers to get the maximum signal frame size
Signal frames do not have a fixed format and can vary in size when a number
of things change: supported XSAVE features, 32 vs. 64-bit apps, etc.

Add support for a runtime method for userspace to dynamically discover
how large a signal stack needs to be.

Introduce a new variable, max_frame_size, and helper functions for the
calculation to be used in a new user interface. Set max_frame_size to a
system-wide worst-case value, instead of storing multiple app-specific
values.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Len Brown <len.brown@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H.J. Lu <hjl.tools@gmail.com>
Link: https://lkml.kernel.org/r/20210518200320.17239-3-chang.seok.bae@intel.com
2021-05-19 11:46:27 +02:00