1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

19645 commits

Author SHA1 Message Date
Thomas Gleixner
e95256335d x86/cpu: Move cpu_core_id into topology info
Rename it to core_id and stick it to the other ID fields.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.566519388@linutronix.de
2023-10-10 14:38:17 +02:00
Thomas Gleixner
8a169ed40f x86/cpu: Move cpu_die_id into topology info
Move the next member.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.388185134@linutronix.de
2023-10-10 14:38:17 +02:00
Thomas Gleixner
02fb601d27 x86/cpu: Move phys_proc_id into topology info
Rename it to pkg_id which is the terminology used in the kernel.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.329006989@linutronix.de
2023-10-10 14:38:17 +02:00
Thomas Gleixner
b9655e702d x86/cpu: Encapsulate topology information in cpuinfo_x86
The topology related information is randomly scattered across cpuinfo_x86.

Create a new structure cpuinfo_topo and move in a first step initial_apicid
and apicid into it.

Aside of being better readable this is in preparation for replacing the
horribly fragile CPU topology evaluation code further down the road.

Consolidate APIC ID fields to u32 as that represents the hardware type.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.269787744@linutronix.de
2023-10-10 14:38:17 +02:00
Thomas Gleixner
965e05ff8a x86/apic: Fake primary thread mask for XEN/PV
The SMT control mechanism got added as speculation attack vector
mitigation. The implemented logic relies on the primary thread mask to
be set up properly.

This turns out to be an issue with XEN/PV guests because their CPU hotplug
mechanics do not enumerate APICs and therefore the mask is never correctly
populated.

This went unnoticed so far because by chance XEN/PV ends up with
smp_num_siblings == 2. So cpu_smt_control stays at its default value
CPU_SMT_ENABLED and the primary thread mask is never evaluated in the
context of CPU hotplug.

This stopped "working" with the upcoming overhaul of the topology
evaluation which legitimately provides a fake topology for XEN/PV. That
sets smp_num_siblings to 1, which causes the core CPU hot-plug core to
refuse to bring up the APs.

This happens because cpu_smt_control is set to CPU_SMT_NOT_SUPPORTED which
causes cpu_bootable() to evaluate the unpopulated primary thread mask with
the conclusion that all non-boot CPUs are not valid to be plugged.

The core code has already been made more robust against this kind of fail,
but the primary thread mask really wants to be populated to avoid other
issues all over the place.

Just fake the mask by pretending that all XEN/PV vCPUs are primary threads,
which is consistent because all of XEN/PVs topology is fake or non-existent.

Fixes: 6a4d2657e0 ("x86/smp: Provide topology_is_primary_thread()")
Fixes: f54d4434c2 ("x86/apic: Provide cpu_primary_thread mask")
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.210011520@linutronix.de
2023-10-10 14:38:17 +02:00
Pu Wen
ee545b94d3 x86/cpu/hygon: Fix the CPU topology evaluation for real
Hygon processors with a model ID > 3 have CPUID leaf 0xB correctly
populated and don't need the fixed package ID shift workaround. The fixup
is also incorrect when running in a guest.

Fixes: e0ceeae708 ("x86/CPU/hygon: Fix phys_proc_id calculation logic for multi-die processors")
Signed-off-by: Pu Wen <puwen@hygon.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/tencent_594804A808BD93A4EBF50A994F228E3A7F07@qq.com
Link: https://lore.kernel.org/r/20230814085112.089607918@linutronix.de
2023-10-10 14:38:16 +02:00
Joerg Roedel
b9cb9c4558 x86/sev: Check IOBM for IOIO exceptions from user-space
Check the IO permission bitmap (if present) before emulating IOIO #VC
exceptions for user-space. These permissions are checked by hardware
already before the #VC is raised, but due to the VC-handler decoding
race it needs to be checked again in software.

Fixes: 25189d08e5 ("x86/sev-es: Add support for handling IOIO exceptions")
Reported-by: Tom Dohrmann <erbse.13@gmx.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Tom Dohrmann <erbse.13@gmx.de>
Cc: <stable@kernel.org>
2023-10-09 15:47:57 +02:00
Borislav Petkov (AMD)
a37cd2a59d x86/sev: Disable MMIO emulation from user mode
A virt scenario can be constructed where MMIO memory can be user memory.
When that happens, a race condition opens between when the hardware
raises the #VC and when the #VC handler gets to emulate the instruction.

If the MOVS is replaced with a MOVS accessing kernel memory in that
small race window, then write to kernel memory happens as the access
checks are not done at emulation time.

Disable MMIO emulation in user mode temporarily until a sensible use
case appears and justifies properly handling the race window.

Fixes: 0118b604c2 ("x86/sev-es: Handle MMIO String Instructions")
Reported-by: Tom Dohrmann <erbse.13@gmx.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Tom Dohrmann <erbse.13@gmx.de>
Cc: <stable@kernel.org>
2023-10-09 15:45:34 +02:00
Randy Dunlap
025d5ac978 x86/resctrl: Fix kernel-doc warnings
The kernel test robot reported kernel-doc warnings here:

  monitor.c:34: warning: Cannot understand  * @rmid_free_lru    A least recently used list of free RMIDs on line 34 - I thought it was a doc line
  monitor.c:41: warning: Cannot understand  * @rmid_limbo_count     count of currently unused but (potentially) on line 41 - I thought it was a doc line
  monitor.c:50: warning: Cannot understand  * @rmid_entry - The entry in the limbo and free lists.  on line 50 - I thought it was a doc line

We don't have a syntax for documenting individual data items via
kernel-doc, so remove the "/**" kernel-doc markers and add a hyphen
for consistency.

Fixes: 6a445edce6 ("x86/intel_rdt/cqm: Add RDT monitoring initialization")
Fixes: 24247aeeab ("x86/intel_rdt/cqm: Improve limbo list processing")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20231006235132.16227-1-rdunlap@infradead.org
2023-10-08 11:45:16 +02:00
Waiman Long
2743fe89d4 x86/idle: Disable IBRS when CPU is offline to improve single-threaded performance
Commit bf5835bcdb ("intel_idle: Disable IBRS during long idle")
disables IBRS when the CPU enters long idle. However, when a CPU
becomes offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS
is enabled. That will impact the performance of a sibling CPU. Mitigate
this performance impact by clearing all the mitigation bits in SPEC_CTRL
MSR when offline. When the CPU is online again, it will be re-initialized
and so restoring the SPEC_CTRL value isn't needed.

Add a comment to say that native_play_dead() is a __noreturn function,
but it can't be marked as such to avoid confusion about the missing
MSR restoration code.

When DPDK is running on an isolated CPU thread processing network packets
in user space while its sibling thread is idle. The performance of the
busy DPDK thread with IBRS on and off in the sibling idle thread are:

                                IBRS on         IBRS off
                                -------         --------
  packets/second:                  7.8M           10.4M
  avg tsc cycles/packet:         282.26          209.86

This is a 25% performance degradation. The test system is a Intel Xeon
4114 CPU @ 2.20GHz.

[ mingo: Extended the changelog with performance data from the 0/4 mail. ]

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20230727184600.26768-3-longman@redhat.com
2023-10-07 11:33:28 +02:00
Ingo Molnar
3fc18b06b8 Linux 6.6-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmUZ4WEeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGnIYH/07zef2U1nlqI+ro
 HRL2GlWGIs9yE70Oax+A3eYUYsjJIPu0yiDhFHUgOV3VyAALo44ZX/WNwKCGsI3e
 zhuOeItyyVcLGZXVC/jxSu0uveyfEiEYIWRYGyQ6Sna8Ksdk/qwhNgQNotdWdQG5
 7xt8z32couglu0uOkxcGqjTxmbjO6WSM5qi7Ts+xLsgrcS5cRuNhAg/vezp9bfeL
 1IUieCih4RJFgar/6LPOiB8uoVXEBonVbtlTRRqYdnqcsSIC+ACR9ZFk/+X88b5z
 S+Ta5VTcOAPu+2M/lSGe+PlUECvoBNK0SIYnaVCP2paPmDxfDXOFvSy/qJE87/7L
 9BeonFw=
 =8FTr
 -----END PGP SIGNATURE-----

Merge tag 'v6.6-rc4' into x86/entry, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-10-05 10:05:51 +02:00
Frederic Weisbecker
448e9f34d9 rcu: Standardize explicit CPU-hotplug calls
rcu_report_dead() and rcutree_migrate_callbacks() have their headers in
rcupdate.h while those are pure rcutree calls, like the other CPU-hotplug
functions.

Also rcu_cpu_starting() and rcu_report_dead() have different naming
conventions while they mirror each other's effects.

Fix the headers and propose a naming that relates both functions and
aligns with the prefix of other rcutree CPU-hotplug functions.

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-10-04 22:29:45 +02:00
Baoquan He
9c08a2a139 x86: kdump: use generic interface to simplify crashkernel reservation code
With the help of newly changed function parse_crashkernel() and generic
reserve_crashkernel_generic(), crashkernel reservation can be simplified
by steps:

1) Add a new header file <asm/crash_core.h>, and define CRASH_ALIGN,
   CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX and
   DEFAULT_CRASH_KERNEL_LOW_SIZE in <asm/crash_core.h>;

2) Add arch_reserve_crashkernel() to call parse_crashkernel() and
   reserve_crashkernel_generic(), and do the ARCH specific work if
   needed.

3) Add ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION Kconfig in
   arch/x86/Kconfig.

When adding DEFAULT_CRASH_KERNEL_LOW_SIZE, add crash_low_size_default() to
calculate crashkernel low memory because x86_64 has special requirement.

The old reserve_crashkernel_low() and reserve_crashkernel() can be
removed.

[bhe@redhat.com: move crash_low_size_default() code into <asm/crash_core.h>]
  Link: https://lkml.kernel.org/r/ZQpeAjOmuMJBFw1/@MiWiFi-R3L-srv
Link: https://lkml.kernel.org/r/20230914033142.676708-7-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Jiahao <chenjiahao16@huawei.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-04 10:41:58 -07:00
Baoquan He
a9e1a3d84e crash_core: change the prototype of function parse_crashkernel()
Add two parameters 'low_size' and 'high' to function parse_crashkernel(),
later crashkernel=,high|low parsing will be added.  Make adjustments in
all call sites of parse_crashkernel() in arch.

Link: https://lkml.kernel.org/r/20230914033142.676708-3-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Jiahao <chenjiahao16@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-04 10:41:58 -07:00
Zhu Wang
90879f5dfc x86/fpu/xstate: Address kernel-doc warning
Fix kernel-doc warning:

  arch/x86/kernel/fpu/xstate.c:1753: warning: Excess function parameter 'tsk' description in 'fpu_xstate_prctl'

Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
2023-10-03 22:46:12 +02:00
Wang Jinchao
9f76d60626 x86/boot: Harmonize the style of array-type parameter for fixup_pointer() calls
The usage of '&' before the array parameter is redundant because '&array'
is equivalent to 'array'. Therefore, there is no need to include '&'
before the array parameter. In fact, using '&' can cause more confusion,
especially for individuals who are not familiar with the address-of
operation for arrays. They might mistakenly believe that one is different
from the other and spend additional time realizing that they are actually
the same.

Harmonizing the style by removing the unnecessary '&' would save time for
those individuals.

Signed-off-by: Wang Jinchao <wangjinchao@xfusion.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/ZMt24BGEX9IhPSY6@fedora
2023-10-03 11:28:38 +02:00
Yazen Ghannam
2a565258b3 x86/amd_nb: Use Family 19h Models 60h-7Fh Function 4 IDs
Three PCI IDs for DF Function 4 were defined but not used.

Add them to the "link" list.

Fixes: f8faf34966 ("x86/amd_nb: Add AMD PCI IDs for SMN communication")
Fixes: 23a5b8bb02 ("x86/amd_nb: Add PCI ID for family 19h model 78h")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230803150430.3542854-1-yazen.ghannam@amd.com
2023-10-03 11:25:01 +02:00
Masahiro Yamada
94ea9c0521 x86/headers: Replace #include <asm/export.h> with #include <linux/export.h>
The following commit:

  ddb5cdbafa ("kbuild: generate KSYMTAB entries by modpost")

deprecated <asm/export.h>, which is now a wrapper of <linux/export.h>.

Use <linux/export.h> in *.S as well as in *.c files.

After all the <asm/export.h> lines are replaced, <asm/export.h> and
<asm-generic/export.h> will be removed.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230806145958.380314-2-masahiroy@kernel.org
2023-10-03 10:38:07 +02:00
Yuntao Wang
001470fed5 x86/boot: Fix incorrect startup_gdt_descr.size
Since the size value is added to the base address to yield the last valid
byte address of the GDT, the current size value of startup_gdt_descr is
incorrect (too large by one), fix it.

[ mingo: This probably never mattered, because startup_gdt[] is only used
         in a very controlled fashion - but make it consistent nevertheless. ]

Fixes: 866b556efa ("x86/head/64: Install startup GDT")
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20230807084547.217390-1-ytcoode@gmail.com
2023-10-03 10:28:29 +02:00
Dave Hansen
3e32552652 x86/boot: Move x86_cache_alignment initialization to correct spot
c->x86_cache_alignment is initialized from c->x86_clflush_size.
However, commit fbf6449f84 moved c->x86_clflush_size initialization
to later in boot without moving the c->x86_cache_alignment assignment:

  fbf6449f84 ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach")

This presumably left c->x86_cache_alignment set to zero for longer
than it should be.

The result was an oops on 32-bit kernels while accessing a pointer
at 0x20.  The 0x20 came from accessing a structure member at offset
0x10 (buffer->cpumask) from a ZERO_SIZE_PTR=0x10.  kmalloc() can
evidently return ZERO_SIZE_PTR when it's given 0 as its alignment
requirement.

Move the c->x86_cache_alignment initialization to be after
c->x86_clflush_size has an actual value.

Fixes: fbf6449f84 ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20231002220045.1014760-1-dave.hansen@linux.intel.com
2023-10-03 09:27:12 +02:00
Saurabh Sengar
0d294c8c4e x86/of: Move the x86_flattree_get_config() call out of x86_dtb_init()
Fetching the device tree configuration before initmem_init() is necessary
to allow the parsing of NUMA node information. However moving the entire
x86_dtb_init() call before initmem_init() is not correct as APIC/IO-APIC enumeration
has to be after initmem_init().

Thus, move the x86_flattree_get_config() call out of x86_dtb_init(),
into setup_arch(), to call it before initmem_init(), and
leave the ACPI/IOAPIC registration sequence as-is.

[ mingo: Updated the changelog for clarity. ]

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/1692949657-16446-1-git-send-email-ssengar@linux.microsoft.com
2023-10-02 21:30:09 +02:00
Tom Lendacky
62d5e970d0 x86/sev: Change npages to unsigned long in snp_accept_memory()
In snp_accept_memory(), the npages variables value is calculated from
phys_addr_t variables but is an unsigned int. A very large range passed
into snp_accept_memory() could lead to truncating npages to zero. This
doesn't happen at the moment but let's be prepared.

Fixes: 6c32117963 ("x86/sev: Add SNP-specific unaccepted memory support")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/6d511c25576494f682063c9fb6c705b526a3757e.1687441505.git.thomas.lendacky@amd.com
2023-10-02 14:55:41 +02:00
Tom Lendacky
6bc6f7d9d7 x86/sev: Use the GHCB protocol when available for SNP CPUID requests
SNP retrieves the majority of CPUID information from the SNP CPUID page.
But there are times when that information needs to be supplemented by the
hypervisor, for example, obtaining the initial APIC ID of the vCPU from
leaf 1.

The current implementation uses the MSR protocol to retrieve the data from
the hypervisor, even when a GHCB exists. The problem arises when an NMI
arrives on return from the VMGEXIT. The NMI will be immediately serviced
and may generate a #VC requiring communication with the hypervisor.

Since a GHCB exists in this case, it will be used. As part of using the
GHCB, the #VC handler will write the GHCB physical address into the GHCB
MSR and the #VC will be handled.

When the NMI completes, processing resumes at the site of the VMGEXIT
which is expecting to read the GHCB MSR and find a CPUID MSR protocol
response. Since the NMI handling overwrote the GHCB MSR response, the
guest will see an invalid reply from the hypervisor and self-terminate.

Fix this problem by using the GHCB when it is available. Any NMI
received is properly handled because the GHCB contents are copied into
a backup page and restored on NMI exit, thus preserving the active GHCB
request or result.

  [ bp: Touchups. ]

Fixes: ee0bfa08a3 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/a5856fa1ebe3879de91a8f6298b6bbd901c61881.1690578565.git.thomas.lendacky@amd.com
2023-10-02 14:55:39 +02:00
Baolin Liu
b5034c6385 x86/cpu/amd: Remove redundant 'break' statement
This break is after the return statement, so it is redundant & confusing,
and should be deleted.

Signed-off-by: Baolin Liu <liubaolin@kylinos.cn>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/396ba14d.2726.189d957b74b.Coremail.liubaolin12138@163.com
2023-09-29 11:24:09 +02:00
Haitao Huang
c6c2adcba5 x86/sgx: Resolves SECS reclaim vs. page fault for EAUG race
The SGX EPC reclaimer (ksgxd) may reclaim the SECS EPC page for an
enclave and set secs.epc_page to NULL. The SECS page is used for EAUG
and ELDU in the SGX page fault handler. However, the NULL check for
secs.epc_page is only done for ELDU, not EAUG before being used.

Fix this by doing the same NULL check and reloading of the SECS page as
needed for both EAUG and ELDU.

The SECS page holds global enclave metadata. It can only be reclaimed
when there are no other enclave pages remaining. At that point,
virtually nothing can be done with the enclave until the SECS page is
paged back in.

An enclave can not run nor generate page faults without a resident SECS
page. But it is still possible for a #PF for a non-SECS page to race
with paging out the SECS page: when the last resident non-SECS page A
triggers a #PF in a non-resident page B, and then page A and the SECS
both are paged out before the #PF on B is handled.

Hitting this bug requires that race triggered with a #PF for EAUG.
Following is a trace when it happens.

BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:sgx_encl_eaug_page+0xc7/0x210
Call Trace:
 ? __kmem_cache_alloc_node+0x16a/0x440
 ? xa_load+0x6e/0xa0
 sgx_vma_fault+0x119/0x230
 __do_fault+0x36/0x140
 do_fault+0x12f/0x400
 __handle_mm_fault+0x728/0x1110
 handle_mm_fault+0x105/0x310
 do_user_addr_fault+0x1ee/0x750
 ? __this_cpu_preempt_check+0x13/0x20
 exc_page_fault+0x76/0x180
 asm_exc_page_fault+0x27/0x30

Fixes: 5a90d2c3f5 ("x86/sgx: Support adding of pages to an initialized enclave")
Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20230728051024.33063-1-haitao.huang%40linux.intel.com
2023-09-28 16:16:40 -07:00
Adam Dunlap
fbf6449f84 x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach
Instead of setting x86_virt_bits to a possibly-correct value and then
correcting it later, do all the necessary checks before setting it.

At this point, the #VC handler references boot_cpu_data.x86_virt_bits,
and in the previous version, it would be triggered by the CPUIDs between
the point at which it is set to 48 and when it is set to the correct
value.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Adam Dunlap <acdunlap@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jacob Xu <jacobhxu@google.com>
Link: https://lore.kernel.org/r/20230912002703.3924521-3-acdunlap@google.com
2023-09-28 22:49:35 +02:00
Pu Wen
a5ef7d68ce x86/srso: Add SRSO mitigation for Hygon processors
Add mitigation for the speculative return stack overflow vulnerability
which exists on Hygon processors too.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/tencent_4A14812842F104E93AA722EC939483CEFF05@qq.com
2023-09-28 09:57:07 +02:00
Muralidhara M K
24775700ea x86/amd_nb: Add AMD Family MI300 PCI IDs
Add new Root, Device 18h Function 3, and Function 4 PCI IDS
for AMD F19h Model 90h-9fh (MI300A).

Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: Suma Hegde <suma.hegde@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20230926051932.193239-1-suma.hegde@amd.com
2023-09-27 09:53:23 +02:00
Hugh Dickins
f4c5ca9850 x86_64: Show CR4.PSE on auxiliaries like on BSP
Set CR4.PSE in secondary_startup_64: the Intel SDM is clear that it does
not matter whether it's 0 or 1 when 4-level-pts are enabled, but it's
distracting to find CR4 different on BSP and auxiliaries - on x86_64,
BSP alone got to add the PSE bit, in probe_page_size_mask().

Peter Zijlstra adds:

   "I think the point is that PSE bit is completely without
    meaning in long mode.

    But yes, having the same CR4 bits set across BSP and APs is
    definitely sane."

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/103ad03a-8c93-c3e2-4226-f79af4d9a074@google.com
2023-09-24 13:23:54 +02:00
Christophe JAILLET
94adf495e7 x86/kgdb: Fix a kerneldoc warning when build with W=1
When compiled with W=1, the following warning is generated:

  arch/x86/kernel/kgdb.c:698: warning: Cannot understand  *
   on line 698 - I thought it was a doc line

Remove the corresponding empty comment line to fix the warning.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/aad659537c1d4ebd86912a6f0be458676c8e69af.1695401178.git.christophe.jaillet@wanadoo.fr
2023-09-24 11:00:13 +02:00
Linus Torvalds
b61ec8d0f1 Fix the patching ordering between static calls and return thunks.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmUN5ZsACgkQEsHwGGHe
 VUrx/BAAwPv/CY946qbSyhl1AvpvjkWEi2O6KJfz0OUPCkL0qBfX9eLTJjDKiDoS
 qlqO+yT8qmapf4oZ1rKyVH7AIHmy0b2N9Ks3T0I6lcvZ9JP5ADtoeT6RdnS+ALQb
 l82HlkhmIBJLC6r2wnxlRPKGv+VrRXuk0uiqvFVhslHyAeslxp3kkK6pAASBqEuf
 k0Hc8+s/qh8lpXi/O898QJYNRL0Abfxd/Jfnn5JHedA/SB6XAPwN7s5Uk2mQL7s8
 nujV8msV7EgRz6Svjwph+3BaU3ul2bIsbm27R0GVEhFJ9eYbjsn8pr62/HRh2X+v
 8lgijBDlbB7fcZQBvIzyyP2UFNbM6QN7uFzZOYL92fl+JAqOQ5B6SQeyrLo65w7Y
 5zxPZxMPKkuzCyRYdZUJx0KPQS1cagtn/Q8D0fy61jvAufh6sxl9BXXYZimaI0rD
 uN6AlTTR8nTMM+9KlHUG1wSC3Yz9UYwk5GVKk7dhhorpJuVj5yj+hEVHsP8AXceJ
 WLxh/pUItIiPuYc45eEwNGo6x4rS3ARrXYQezXyAyyLv4tP2Zu7iJx+3NVkRFHIx
 kU4H0aC33Au0lvFkeO6g/qZ+u46sGf2VsuVPBiVreARY+CM5HRKA16soPIdh0Uzw
 anq8nMaRNL/myVeVAMdJi6mO9sxtmJwCzQWN7fh6fH0c+8/NmeY=
 =xLBC
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 rethunk fixes from Borislav Petkov:
 "Fix the patching ordering between static calls and return thunks"

* tag 'x86_urgent_for_v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86,static_call: Fix static-call vs return-thunk
  x86/alternatives: Remove faulty optimization
2023-09-22 12:35:56 -07:00
Linus Torvalds
e583bffeb8 Misc x86 fixes:
- Fix a kexec bug,
  - Fix an UML build bug,
  - Fix a handful of SRSO related bugs,
  - Fix a shadow stacks handling bug & robustify related code.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmUNbQIRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jVIg/9EChW7qFTda8joR41Uayg07VIOpGirDLu
 7hjzOnt4Ni93cfFNUBkKDKXoWxGAiOD+cRDnT6+zsJAvAZR26Y3UNVLYlAy+lFKK
 9kRxeDM7nOEKqCC+zneinMFcIKmZRttMLpj8O901jB2S08x4UarnNx5SaWEcqYbn
 rf1XIEuEvlxqMfafNueS/TRadV52qVU8Y+2+inkDnM7dDNwt+rCs5tQ9ebJos8QO
 tsAoQes1G+0mjPrpqAgsIic5e3QCHliwVr8ASQrKZ9KR+fokEJRbSBNjgHUCNeVN
 0bHBhuDJBSniC7QmAQGEizrZWmHl2HxwYYnCE0gd0g24b7sDTwWBFSXWratCrPdX
 e4qYq36xonWdQcbpVF8ijMXF/R810vDyis/yc/BTeo5EBWf6aM/cx1dmS9DUxRpA
 QsIW2amSCPVYwYE5MAC+K/DM9nq1tk8YnKi4Mob3L28+W3CmVwSwT9S86z2QLlZu
 +KgVV4yBtJY1FJqVur5w3awhFtqLfBdfIvs6uQCd9VZXVPbBfS8+rOQmmhFixEDu
 FSPlAChmXYTAM2R+UxcEvw1Zckrtd2BCOa8UrY2lq57mduBK1EymdpfjlrUAChLG
 x7fQBOGNgOTLwYcsurIdS5jAqiudpnJ/KDG8ZAmKsVoW96JCPp9B3tVZMp9tT30C
 8HRwSPX4384=
 =58St
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2023-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 fixes from Ingo Molnar:

 - Fix a kexec bug

 - Fix an UML build bug

 - Fix a handful of SRSO related bugs

 - Fix a shadow stacks handling bug & robustify related code

* tag 'x86-urgent-2023-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/shstk: Add warning for shadow stack double unmap
  x86/shstk: Remove useless clone error handling
  x86/shstk: Handle vfork clone failure correctly
  x86/srso: Fix SBPB enablement for spec_rstack_overflow=off
  x86/srso: Don't probe microcode in a guest
  x86/srso: Set CPUID feature bits independently of bug or mitigation status
  x86/srso: Fix srso_show_state() side effect
  x86/asm: Fix build of UML with KASAN
  x86/mm, kexec, ima: Use memblock_free_late() from ima_free_kexec_buffer()
2023-09-22 12:26:42 -07:00
Peter Zijlstra
aee9d30b97 x86,static_call: Fix static-call vs return-thunk
Commit

  7825451fa4 ("static_call: Add call depth tracking support")

failed to realize the problem fixed there is not specific to call depth
tracking but applies to all return-thunk uses.

Move the fix to the appropriate place and condition.

Fixes: ee88d363d1 ("x86,static_call: Use alternative RET encoding")
Reported-by: David Kaplan <David.Kaplan@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
2023-09-22 18:58:24 +02:00
Josh Poimboeuf
4ba89dd6dd x86/alternatives: Remove faulty optimization
The following commit

  095b8303f3 ("x86/alternative: Make custom return thunk unconditional")

made '__x86_return_thunk' a placeholder value.  All code setting
X86_FEATURE_RETHUNK also changes the value of 'x86_return_thunk'.  So
the optimization at the beginning of apply_returns() is dead code.

Also, before the above-mentioned commit, the optimization actually had a
bug It bypassed __static_call_fixup(), causing some raw returns to
remain unpatched in static call trampolines.  Thus the 'Fixes' tag.

Fixes: d2408e043e ("x86/alternative: Optimize returns patching")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/16d19d2249d4485d8380fb215ffaae81e6b8119e.1693889988.git.jpoimboe@kernel.org
2023-09-22 18:52:39 +02:00
Paolo Bonzini
7deda2ce5b x86/cpu: Clear SVM feature if disabled by BIOS
When SVM is disabled by BIOS, one cannot use KVM but the
SVM feature is still shown in the output of /proc/cpuinfo.
On Intel machines, VMX is cleared by init_ia32_feat_ctl(),
so do the same on AMD and Hygon processors.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230921114940.957141-1-pbonzini@redhat.com
2023-09-22 10:55:26 +02:00
Yang Li
57baabe365 x86/platform/uv/apic: Clean up inconsistent indenting
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230816003842.116574-1-yang.lee@linux.alibaba.com
2023-09-21 10:22:13 +02:00
Colin Ian King
fef44ebaf6 x86/unwind/orc: Remove redundant initialization of 'mid' pointer in __orc_find()
The 'mid' pointer is being initialized with a value that is never read,
it is being re-assigned and used inside a for-loop. Remove the
redundant initialization.

Cleans up clang scan build warning:

  arch/x86/kernel/unwind_orc.c:88:7: warning: Value stored to 'mid' during its initialization is never read [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20230920114141.118919-1-colin.i.king@gmail.com
2023-09-21 08:41:23 +02:00
Rick Edgecombe
509ff51ee6 x86/shstk: Add warning for shadow stack double unmap
There are several ways a thread's shadow stacks can get unmapped. This
can happen on exit or exec, as well as error handling in exec or clone.
The task struct already keeps track of the thread's shadow stack. Use the
size variable to keep track of if the shadow stack has already been freed.

When an attempt to double unmap the thread shadow stack is caught, warn
about it and abort the operation.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: H.J. Lu <hjl.tools@gmail.com>
Link: https://lore.kernel.org/all/20230908203655.543765-4-rick.p.edgecombe%40intel.com
2023-09-19 09:18:34 -07:00
Rick Edgecombe
748c90c693 x86/shstk: Remove useless clone error handling
When clone fails after the shadow stack is allocated, any allocated shadow
stack is cleaned up in exit_thread() in copy_process(). So the logic in
copy_thread() is unneeded, and also will not handle failures that happen
outside of copy_thread().

In addition, since there is a second attempt to unmap the same shadow
stack, there is a race where an newly mapped region could get unmapped.

So remove the logic in copy_thread() and rely on exit_thread() to handle
clone failure.

Fixes: b2926a36b9 ("x86/shstk: Handle thread shadow stack")
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: H.J. Lu <hjl.tools@gmail.com>
Link: https://lore.kernel.org/all/20230908203655.543765-3-rick.p.edgecombe%40intel.com
2023-09-19 09:18:34 -07:00
Rick Edgecombe
331955600d x86/shstk: Handle vfork clone failure correctly
Shadow stacks are allocated automatically and freed on exit, depending
on the clone flags. The two cases where new shadow stacks are not
allocated are !CLONE_VM (fork()) and CLONE_VFORK (vfork()). For
!CLONE_VM, although a new stack is not allocated, it can be freed normally
because it will happen in the child's copy of the VM.

However, for CLONE_VFORK the parent and the child are actually using the
same shadow stack. So the kernel doesn't need to allocate *or* free a
shadow stack for a CLONE_VFORK child. CLONE_VFORK children already need
special tracking to avoid returning to userspace until the child exits or
execs. Shadow stack uses this same tracking to avoid freeing CLONE_VFORK
shadow stacks.

However, the tracking is not setup until the clone has succeeded
(internally). Which means, if a CLONE_VFORK fails, the existing logic will
not know it is a CLONE_VFORK and proceed to unmap the parents shadow stack.
This error handling cleanup logic runs via exit_thread() in the
bad_fork_cleanup_thread label in copy_process(). The issue was seen in
the glibc test "posix/tst-spawn3-pidfd" while running with shadow stack
using currently out-of-tree glibc patches.

Fix it by not unmapping the vfork shadow stack in the error case as well.
Since clone is implemented in core code, it is not ideal to pass the clone
flags along the error path in order to have shadow stack code have
symmetric logic in the freeing half of the thread shadow stack handling.

Instead use the existing state for thread shadow stacks to track whether
the thread is managing its own shadow stack. For CLONE_VFORK, simply set
shstk->base and shstk->size to 0, and have it mean the thread is not
managing a shadow stack and so should skip cleanup work. Implement this
by breaking up the CLONE_VFORK and !CLONE_VM cases in
shstk_alloc_thread_stack() to separate conditionals since, the logic is
now different between them. In the case of CLONE_VFORK && !CLONE_VM, the
existing behavior is to not clean up the shadow stack in the child (which
should go away quickly with either be exit or exec), so maintain that
behavior by handling the CLONE_VFORK case first in the allocation path.

This new logioc cleanly handles the case of normal, successful
CLONE_VFORK's skipping cleaning up their shadow stack's on exit as well.
So remove the existing, vfork shadow stack freeing logic. This is in
deactivate_mm() where vfork_done is used to tell if it is a vfork child
that can skip cleaning up the thread shadow stack.

Fixes: b2926a36b9 ("x86/shstk: Handle thread shadow stack")
Reported-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: H.J. Lu <hjl.tools@gmail.com>
Link: https://lore.kernel.org/all/20230908203655.543765-2-rick.p.edgecombe%40intel.com
2023-09-19 09:18:34 -07:00
Josh Poimboeuf
01b057b2f4 x86/srso: Fix SBPB enablement for spec_rstack_overflow=off
If the user has requested no SRSO mitigation, other mitigations can use
the lighter-weight SBPB instead of IBPB.

Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/b20820c3cfd1003171135ec8d762a0b957348497.1693889988.git.jpoimboe@kernel.org
2023-09-19 10:54:39 +02:00
Josh Poimboeuf
02428d0366 x86/srso: Don't probe microcode in a guest
To support live migration, the hypervisor sets the "lowest common
denominator" of features.  Probing the microcode isn't allowed because
any detected features might go away after a migration.

As Andy Cooper states:

  "Linux must not probe microcode when virtualised.  What it may see
  instantaneously on boot (owing to MSR_PRED_CMD being fully passed
  through) is not accurate for the lifetime of the VM."

Rely on the hypervisor to set the needed IBPB_BRTYPE and SBPB bits.

Fixes: 1b5277c0ea ("x86/srso: Add SRSO_NO support")
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/3938a7209606c045a3f50305d201d840e8c834c7.1693889988.git.jpoimboe@kernel.org
2023-09-19 10:54:23 +02:00
Josh Poimboeuf
91857ae203 x86/srso: Set CPUID feature bits independently of bug or mitigation status
Booting with mitigations=off incorrectly prevents the
X86_FEATURE_{IBPB_BRTYPE,SBPB} CPUID bits from getting set.

Also, future CPUs without X86_BUG_SRSO might still have IBPB with branch
type prediction flushing, in which case SBPB should be used instead of
IBPB.  The current code doesn't allow for that.

Also, cpu_has_ibpb_brtype_microcode() has some surprising side effects
and the setting of these feature bits really doesn't belong in the
mitigation code anyway.  Move it to earlier.

Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/869a1709abfe13b673bdd10c2f4332ca253a40bc.1693889988.git.jpoimboe@kernel.org
2023-09-19 10:54:07 +02:00
Josh Poimboeuf
a8cf700c17 x86/srso: Fix srso_show_state() side effect
Reading the 'spec_rstack_overflow' sysfs file can trigger an unnecessary
MSR write, and possibly even a (handled) exception if the microcode
hasn't been updated.

Avoid all that by just checking X86_FEATURE_IBPB_BRTYPE instead, which
gets set by srso_select_mitigation() if the updated microcode exists.

Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/27d128899cb8aee9eb2b57ddc996742b0c1d776b.1693889988.git.jpoimboe@kernel.org
2023-09-19 10:53:34 +02:00
Juergen Gross
a4a7644c15 x86/xen: move paravirt lazy code
Only Xen is using the paravirt lazy mode code, so it can be moved to
Xen specific sources.

This allows to make some of the functions static or to merge them into
their only call sites.

While at it do a rename from "paravirt" to "xen" for all moved
specifiers.

No functional change.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20230913113828.18421-3-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2023-09-19 07:04:49 +02:00
Rik van Riel
34cf99c250 x86/mm, kexec, ima: Use memblock_free_late() from ima_free_kexec_buffer()
The code calling ima_free_kexec_buffer() runs long after the memblock
allocator has already been torn down, potentially resulting in a use
after free in memblock_isolate_range().

With KASAN or KFENCE, this use after free will result in a BUG
from the idle task, and a subsequent kernel panic.

Switch ima_free_kexec_buffer() over to memblock_free_late() to avoid
that bug.

Fixes: fee3ff99bc ("powerpc: Move arch independent ima kexec functions to drivers/of/kexec.c")
Suggested-by: Mike Rappoport <rppt@kernel.org>
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230817135558.67274c83@imladris.surriel.com
2023-09-18 09:24:15 +02:00
Linus Torvalds
e789286468 Misc fixes:
- Fix an UV boot crash,
 - Skip spurious ENDBR generation on _THIS_IP_,
 - Fix ENDBR use in putuser() asm methods,
 - Fix corner case boot crashes on 5-level paging,
 - and fix a false positive WARNING on LTO kernels.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmUHOrwRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1j6Jw/+PjUfh/4+KM/Z8VzcBy2UhY3NMF2ptGCN
 47FPLy+8ADqOvIfgsPsBEO9VXwdvkHfH64YWRUlULjvPNOSs+37drBYMe9AI9xKE
 u6NhoBHmsnOtoLkBFIQYlJys9GyAeoMlwSSHxzRwQS+3VztRjoH636jiBcg/h7DR
 zhakfnJD1SSOZuEyyDPnO0uIUarrcqC2jdZwucSqZnvZFdA/pexUHQEe2RtMXLln
 EIA5kuEo7UdADcSr/Lbty7MKO+6xpRTjxF0yNItPtwPWsnxrSAC7P+dQ37uB975U
 Z0CJ/kw54XG5Sdoech7XCWYmJzDxSPhiziA1USKpZJMo5phAG/apTJK6NG4TG94r
 U+3QhLopUoSd8N74/VtZn0FjOrMsk7YKD5o8twlTbpCd2VaiJk4X5oBIns6Tvq05
 0Vmsx15XO3mEzg1wWbbdjhjHW0czRgBpikS9QyexZKVkukYcW8QT6bk1gK1ypg94
 do4PHKB53QIt31dedy/ddpQj4u1sJ4+a9/+IG29kjUB33M4+uFedtw11vfe+CDN0
 XLRc6HfPblogIIEO/figJgwSq/TPCOsNHTwKkejq+D1Ey6DsrnX9Gg4BWVz/3dDW
 84SW7TaO2FGEDh4NkR2ijkZlbpofFnCvhCh/uohodPlqDrTVTuRKCZW9I5plmGVa
 qeXUpNDFs1E=
 =BMjT
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - Fix an UV boot crash

   - Skip spurious ENDBR generation on _THIS_IP_

   - Fix ENDBR use in putuser() asm methods

   - Fix corner case boot crashes on 5-level paging

   - and fix a false positive WARNING on LTO kernels"

* tag 'x86-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/purgatory: Remove LTO flags
  x86/boot/compressed: Reserve more memory for page tables
  x86/ibt: Avoid duplicate ENDBR in __put_user_nocheck*()
  x86/ibt: Suppress spurious ENDBR
  x86/platform/uv: Use alternate source for socket to node data
2023-09-17 11:13:37 -07:00
Linus Torvalds
e5a710d132 Fix a performance regression on large SMT systems, an Intel SMT4
balancing bug, and a topology setup bug on (Intel) hybrid processors.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmUHOVQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iOahAAj3YsoNbT/k6m9yp622n1OopaNEQvsK+/
 F2Q5g/hJrm3+W5764rF8CvjhDbmrv6owjp3yUyZLDIfSAFZYMvwoNody3a373Yr3
 VFBMJ00jNIv/TAFCJZYeybg3yViwObKKfpu4JBj//QU+4uGWCoBMolkVekU2bBti
 r50fMxBPgg2Yic57DCC8Y+JZzHI/ydQ3rvVXMzkrTZCO/zY4/YmERM9d+vp4wl4B
 uG9cfXQ4Yf/1gZo0XDlTUkOJUXPnkMgi+N4eHYlGuyOCoIZOfATI24hRaPBoQcdx
 PDwHcKmyNxH9iaRppNQMvi797g3KrKVEmZwlZg1IfsILhKC0F4GsQ85tw8qQWE8j
 brFPkWVUxAUSl4LXoqVInaxDHmJWR2UC3RA7b+BxFF/GMLTow0z4a+JMC6eKlNyR
 uBisZnuEuecqwF9TLhyD3KBHh7PihUPz8PuFHk+Um5sggaUE82I+VwX6uxEi5y8r
 ke2kAkpuMxPWT5lwDmFPAXWfvpZz5vvTIRUxGGj2+s4d8b0YfLtZsx5+uOIacaub
 Gw+wYFfSowph72tR/SUVq0An/UTSPPBxty8eFIVeE6sW9bw3ghTtkf8300xjV7Rj
 sKVxXy/podAo8wG7R6aZfTfsCpohmeEjskiatYdThYamPPx7V0R5pq4twmTXTHLJ
 bFvQ1GFCOu0=
 =jIeN
 -----END PGP SIGNATURE-----

Merge tag 'sched-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
 "Fix a performance regression on large SMT systems, an Intel SMT4
  balancing bug, and a topology setup bug on (Intel) hybrid processors"

* tag 'sched-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sched: Restore the SD_ASYM_PACKING flag in the DIE domain
  sched/fair: Fix SMT4 group_smt_balance handling
  sched/fair: Optimize should_we_balance() for large SMT systems
2023-09-17 11:10:23 -07:00
Nikolay Borisov
61382281e9 x86/entry: Make IA32 syscalls' availability depend on ia32_enabled()
Another major aspect of supporting running of 32bit processes is the
ability to access 32bit syscalls. Such syscalls can be invoked by
using the legacy int 0x80 handler and  sysenter/syscall instructions.

If IA32 emulation is disabled ensure that each of those 3 distinct
mechanisms are also disabled. For int 0x80 a #GP exception would be
generated since the respective descriptor is not going to be loaded at
all. Invoking sysenter will also result in a #GP since IA32_SYSENTER_CS
contains an invalid segment. Finally, syscall instruction cannot really
be disabled so it's configured to execute a minimal handler.

Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230623111409.3047467-6-nik.borisov@suse.com
2023-09-14 13:19:53 +02:00
Nikolay Borisov
f71e1d2ff8 x86/entry: Rename ignore_sysret()
The SYSCALL instruction cannot really be disabled in compatibility mode.
The best that can be done is to configure the CSTAR msr to point to a
minimal handler. Currently this handler has a rather misleading name -
ignore_sysret() as it's not really doing anything with sysret.

Give it a more descriptive name.

Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230623111409.3047467-3-nik.borisov@suse.com
2023-09-14 13:19:53 +02:00