x2apic_disable() clears x2apic_state and x2apic_mode unconditionally, even
when the state is X2APIC_ON_LOCKED, which prevents the kernel to disable
it thereby creating inconsistent state.
Due to the early state check for X2APIC_ON, the code path which warns about
a locked X2APIC cannot be reached.
Test for state < X2APIC_ON instead and move the clearing of the state and
mode variables to the place which actually disables X2APIC.
[ tglx: Massaged change log. Added Fixes tag. Moved clearing so it's at the
right place for back ports ]
Fixes: a57e456a7b ("x86/apic: Fix fallout from x2apic cleanup")
Signed-off-by: Yuntao Wang <yuntao.wang@linux.dev>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240813014827.895381-1-yuntao.wang@linux.dev
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmZCbbIACgkQaDWVMHDJ
krC5Dw/9GsRdLN4lGkUhJJ9p1iNbB3qXiOgNIS9ROsh0VHI0FXf36XTsuoWE5+nT
eXb/4s0oWApqs2YbriquILfk7Q7fj0/fOYIJJ182sB/WlXBWNIjoR42rNPqDbxA8
yJVoY+RuJ1TClMYMOt6vN3hLbcOd7TU61mXmZexvI48p4GsKKYjqh9LHcaLuTUFU
Ibo4tJKxJQRCvY1pX9aJ6ICkMBpewFTrNcnm5iDnGGG+7f73Pq7+9ZdhWjPPt3K5
u/BaQtjTvPB7PlwiKV29c37Ud6FMv/CwOI9rdwRrJRHog6jzqcJd5IV2XlfaYoXL
se0yOptSP96MljguWFJ2yIynWtFYJqsmrHOSvytT9VxM10izUqEhU+qrkWzP1sqR
NN701cH4FUZV9odYx5kwhquAVecgpW//W47hp4Ghq3BBQvX8Y158Blqem+VDZzYU
6fxz/IxX1SURN/yEAompReMdoBdI+vFwwM3ZKCLMY3ZfUaDaZiWGXjhSBHzzGSVD
tg3Clq7AXHcVbRe8h1d0CTh4O8XRLMJDcUM5rq06agrDuJf4F8tXqNZoYSku4Haw
dd+U61bWAOFwv7MkePNjg8xglBSV3gyS4uoCxolZvScHz/nxLD+wLWnb6Z5oMlLn
ZroyXUMM2FMhoK4ODp3l1OzSRMj780QCvJCRW6SeJ4u2wuCoXdE=
=4t5X
-----END PGP SIGNATURE-----
Merge tag 'x86_apic_for_6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 APIC update from Dave Hansen:
"Coccinelle complained about some 64-bit divisions, but the divisor was
really just a 32-bit value being stored as 'unsigned long'.
Fixing the types fixes the warning"
* tag 'x86_apic_for_6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Improve data types to fix Coccinelle warnings
- Rework the x86 CPU vendor/family/model code: introduce the 'VFM'
value that is an 8+8+8 bit concatenation of the vendor/family/model
value, and add macros that work on VFM values. This simplifies the
addition of new Intel models & families, and simplifies existing
enumeration & quirk code.
- Add support for the AMD 0x80000026 leaf, to better parse topology
information.
- Optimize the NUMA allocation layout of more per-CPU data structures
- Improve the workaround for AMD erratum 1386
- Clear TME from /proc/cpuinfo as well, when disabled by the firmware
- Improve x86 self-tests
- Extend the mce_record tracepoint with the ::ppin and ::microcode fields
- Implement recovery for MCE errors in TDX/SEAM non-root mode
- Misc cleanups and fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmZBwL0RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1gfuBAAkfVxMAfXvI4Vn3Em9Pix5zgvOoEshPoI
Pti8+fqgKAaR/Nn+ZCEUk6nou8E6R0Lyo7yDk4aZ0zGmUwQS0IoRTvj721YojCTS
Chr7butXH2xkYYQVBiJvKdHVhPBgs6jvExLyRL4WJ6s6zunS86Xka3nVRKD9QqW6
RpEc83wW9b/oSzxn/Cwzxk9RvXatLL82EMOYPL2B40Lde8EM+zoYsfOwGndGlCB2
gHpnSL1Jzry5kTeG7rromWWVp6YrDW63R2KO+DB0r7rrrtEyXtoCr7OdxruUijPB
sSpzN6etRbUuH0ijMbh7EW8KlUkGBx46Y+1eRMeN/qYy0vuwP9v0vP9n/7fXLjvu
FEI82W07lHjY3OvHh2FzvcHMTWaHVYqwDRLki7ortjtg53F/0l07Cbqxf2zJg+r3
jIaVCifk4qo6Rq+TvHtGcuDYi36u93UKVcfjQN1K/a2WdzJvpDL63PklzBeTno5s
7QBSG1FxEbfIXeQaf/AwfjnfzlQhI9ws1F+GuFAP7mGH8vEnDlGhLv5vsnloxcMB
HnHJE1wOzq6A3ixCFreXccikfsTUgsfmrLExhVs9Er/MsKRsGfSySyFUHA4L/Ygm
6zqfgYwSJzbn5EnfPmiO1R+tNhlcAi0YENeAOle4HQTeBwqebKl+Zh3zbzpgM2I3
cppkgnY/HTQ=
=Zrlk
-----END PGP SIGNATURE-----
Merge tag 'x86-cpu-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Ingo Molnar:
- Rework the x86 CPU vendor/family/model code: introduce the 'VFM'
value that is an 8+8+8 bit concatenation of the vendor/family/model
value, and add macros that work on VFM values. This simplifies the
addition of new Intel models & families, and simplifies existing
enumeration & quirk code.
- Add support for the AMD 0x80000026 leaf, to better parse topology
information
- Optimize the NUMA allocation layout of more per-CPU data structures
- Improve the workaround for AMD erratum 1386
- Clear TME from /proc/cpuinfo as well, when disabled by the firmware
- Improve x86 self-tests
- Extend the mce_record tracepoint with the ::ppin and ::microcode fields
- Implement recovery for MCE errors in TDX/SEAM non-root mode
- Misc cleanups and fixes
* tag 'x86-cpu-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
x86/mm: Switch to new Intel CPU model defines
x86/tsc_msr: Switch to new Intel CPU model defines
x86/tsc: Switch to new Intel CPU model defines
x86/cpu: Switch to new Intel CPU model defines
x86/resctrl: Switch to new Intel CPU model defines
x86/microcode/intel: Switch to new Intel CPU model defines
x86/mce: Switch to new Intel CPU model defines
x86/cpu: Switch to new Intel CPU model defines
x86/cpu/intel_epb: Switch to new Intel CPU model defines
x86/aperfmperf: Switch to new Intel CPU model defines
x86/apic: Switch to new Intel CPU model defines
perf/x86/msr: Switch to new Intel CPU model defines
perf/x86/intel/uncore: Switch to new Intel CPU model defines
perf/x86/intel/pt: Switch to new Intel CPU model defines
perf/x86/lbr: Switch to new Intel CPU model defines
perf/x86/intel/cstate: Switch to new Intel CPU model defines
x86/bugs: Switch to new Intel CPU model defines
x86/bugs: Switch to new Intel CPU model defines
x86/cpu/vfm: Update arch/x86/include/asm/intel-family.h
x86/cpu/vfm: Add new macros to work with (vendor/family/model) values
...
With 'iommu=off' on the kernel command line and x2APIC enabled by the BIOS
the code which disables the x2APIC triggers an unchecked MSR access error:
RDMSR from 0x802 at rIP: 0xffffffff94079992 (native_apic_msr_read+0x12/0x50)
This is happens because default_acpi_madt_oem_check() selects an x2APIC
driver before the x2APIC is disabled.
When the x2APIC is disabled because interrupt remapping cannot be enabled
due to 'iommu=off' on the command line, x2apic_disable() invokes
apic_set_fixmap() which in turn tries to read the APIC ID. This triggers
the MSR warning because x2APIC is disabled, but the APIC driver is still
x2APIC based.
Prevent that by adding an argument to apic_set_fixmap() which makes the
APIC ID read out conditional and set it to false from the x2APIC disable
path. That's correct as the APIC ID has already been read out during early
discovery.
Fixes: d10a904435 ("x86/apic: Consolidate boot_cpu_physical_apicid initialization sites")
Reported-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Adrian Huang <ahuang12@lenovo.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/875xw5t6r7.ffs@tglx
New CPU #defines encode vendor and family as well as model.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/all/20240424181504.41634-1-tony.luck%40intel.com
So we are using the 'ia32_cap' value in a number of places,
which got its name from MSR_IA32_ARCH_CAPABILITIES MSR register.
But there's very little 'IA32' about it - this isn't 32-bit only
code, nor does it originate from there, it's just a historic
quirk that many Intel MSR names are prefixed with IA32_.
This is already clear from the helper method around the MSR:
x86_read_arch_cap_msr(), which doesn't have the IA32 prefix.
So rename 'ia32_cap' to 'x86_arch_cap_msr' to be consistent with
its role and with the naming of the helper function.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Nikolay Borisov <nik.borisov@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.1712813475.git.jpoimboe@kernel.org
Given that acpi_pm_read_early() returns a u32 (masked to 24 bits), several
variables that store its return value are improved by adjusting their data
types from unsigned long to u32. Specifically, change deltapm's type from
long to u32 because its value fits into 32 bits and it cannot be negative.
These data type improvements resolve the following two Coccinelle/
coccicheck warnings reported by do_div.cocci:
arch/x86/kernel/apic/apic.c:734:1-7: WARNING: do_div() does a 64-by-32
division, please consider using div64_long instead.
arch/x86/kernel/apic/apic.c:742:2-8: WARNING: do_div() does a 64-by-32
division, please consider using div64_long instead.
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20240318104721.117741-3-thorsten.blum%40toblux.com
Now that all external fiddling with num_processors and disabled_cpus is
gone, move the last user prefill_possible_map() into the topology code too
and remove the global visibility of these variables.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210251.994756960@linutronix.de
The APIC/CPU registration sits in the middle of the APIC code. In fact this
is a topology evaluation function and has nothing to do with the inner
workings of the local APIC.
Move it out into a file which reflects what this is about.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210251.543948812@linutronix.de
The ACPI ID for CPUs is preset with U32_MAX which is completely non
obvious. Use a proper define for it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154640.177504138@linutronix.de
Paranoia is not wrong, but having an APIC callback which is in most
implementations a complete NOOP and in one actually looking whether the
APICID of an upcoming CPU has been registered. The same APICID which was
used to bring the CPU out of wait for startup.
That's paranoia for the paranoia sake. Remove the voodoo.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154640.116510935@linutronix.de
There is absolutely no point to write the APIC ID which was read from the
local APIC earlier, back into the local APIC for the 64-bit UP case.
Remove that along with the apic callback which is solely there for this
pointless exercise.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154640.055288922@linutronix.de
physid_t is a wrapper around bitmap. Just remove the onion layer and use
bitmap functionality directly.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154639.994904510@linutronix.de
There is no point for this function. The only case where this is used is
when there is no XAPIC available, which means the broadcast address is 0xF.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240212154639.057209154@linutronix.de
APIC IDs are used with random data types u16, u32, int, unsigned int,
unsigned long.
Make it all consistently use u32 because that reflects the hardware
register width and fixup the most obvious usage sites of that.
The APIC callbacks will be addressed separately.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.922905727@linutronix.de
APIC ID checks compare with BAD_APICID all over the place, but some
initializers and some code which fiddles with global data structure use
-1[U] instead. That simply cannot work at all.
Fix it up and use BAD_APICID consistently all over the place.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.862835121@linutronix.de
The SMT control mechanism got added as speculation attack vector
mitigation. The implemented logic relies on the primary thread mask to
be set up properly.
This turns out to be an issue with XEN/PV guests because their CPU hotplug
mechanics do not enumerate APICs and therefore the mask is never correctly
populated.
This went unnoticed so far because by chance XEN/PV ends up with
smp_num_siblings == 2. So cpu_smt_control stays at its default value
CPU_SMT_ENABLED and the primary thread mask is never evaluated in the
context of CPU hotplug.
This stopped "working" with the upcoming overhaul of the topology
evaluation which legitimately provides a fake topology for XEN/PV. That
sets smp_num_siblings to 1, which causes the core CPU hot-plug core to
refuse to bring up the APs.
This happens because cpu_smt_control is set to CPU_SMT_NOT_SUPPORTED which
causes cpu_bootable() to evaluate the unpopulated primary thread mask with
the conclusion that all non-boot CPUs are not valid to be plugged.
The core code has already been made more robust against this kind of fail,
but the primary thread mask really wants to be populated to avoid other
issues all over the place.
Just fake the mask by pretending that all XEN/PV vCPUs are primary threads,
which is consistent because all of XEN/PVs topology is fake or non-existent.
Fixes: 6a4d2657e0 ("x86/smp: Provide topology_is_primary_thread()")
Fixes: f54d4434c2 ("x86/apic: Provide cpu_primary_thread mask")
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.210011520@linutronix.de
Move them to one place so the static call conversion gets simpler.
No functional change.
[ dhansen: merge against recent x86/apic changes ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
In preparation for converting the hotpath APIC callbacks to static keys,
provide common initialization infrastructure.
Lift apic_install_drivers() from probe_64.c and convert all places which
switch the apic instance by storing the pointer to use apic_install_driver()
as a first step.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
Yet another wrapper of a wrapper gone along with the outdated comment
that this compiles to a single instruction.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
Every callsite hands in the same constants which is a pointless exercise
and cannot be optimized by the compiler due to the indirect calls.
Use the constants in the eoi() callbacks and remove the arguments.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
Move it next to apic_mem_wait_icr_idle(), rename it so that it's clear what
it does and rewrite it in readable form.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
Two copies and also needlessly public. Move it into ipi.c so it can be
inlined. Rename it to apic_mem_wait_icr_idle().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
Really not a hotpath and again no reason for having a gazillion of empty
callbacks returning 1. Make it return bool and provide one shared
implementation for the remaining users.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
default_setup_apic_routing() is a complete misnomer. On 64bit it does the
actual APIC probing and on 32bit it is used to force select the bigsmp APIC
and to emit a redundant message in the apic::setup_apic_routing() callback.
Rename the 64bit and 32bit function so they reflect what they are doing and
remove the useless APIC callback.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
apic::init_apic_ldr() is only invoked when the APIC is initialized. So
there is really no point in having:
- Default empty callbacks all over the place
- Two implementations of the actual LDR init function where one is
just unreadable gunk but does exactly the same as the other.
Make the apic::init_apic_ldr() invocation conditional, remove the empty
callbacks and consolidate the two implementation into one.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
If the system has more than 8 CPUs then XAPIC and the bigsmp APIC driver is
required. This is ensured via:
1) Enumerating all possible CPUs up to NR_CPUS
2) Checking at boot CPU APIC setup time whether the system has more than
8 CPUs and has an XAPIC.
If that's the case then it's attempted to install the bigsmp APIC
driver and a magic variable 'def_to_bigsmp' is set to one.
3) If that magic variable is set and CONFIG_X86_BIGSMP=n and the system
has more than 8 CPUs smp_sanity_check() removes all CPUs >= #8 from
the present and possible mask in the most convoluted way.
This logic is completely broken for the case where the bigsmp driver is
enabled, but not selected due to a command line option specifying the
default APIC. In that case the system boots with default APIC in logical
destination mode and fails to reduce the number of CPUs.
That aside the above which is sprinkled over 3 different places is yet
another piece of art.
It would have been too obvious to check the requirements upfront and limit
nr_cpu_ids _before_ enumerating tons of CPUs and then removing them again.
Implement exactly this. Check the bigsmp requirement when the boot APIC is
registered which happens _before_ ACPI/MPTABLE parsing and limit the number
of CPUs to 8 if it can't be used. Switch it over when the boot CPU apic is
set up if necessary.
[ dhansen: fix nr_cpu_ids off-by-one in default_setup_apic_routing() ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
This per CPU variable is just yet another form of voodoo programming. The
boot ordering is:
per_cpu(x86_cpu_to_logical_apicid, cpu) = 1U << cpu;
.....
setup_apic()
apic->init_apic_ldr()
default_init_apic_ldr()
apic_write(SET_APIC_LOGICAL_ID(1UL << smp_processor_id(), APIC_LDR);
id = GET_APIC_LOGICAL_ID(apic_read(APIC_LDR);
WARN_ON(id != per_cpu(x86_cpu_to_logical_apicid, cpu));
per_cpu(x86_cpu_to_logical_apicid, cpu) = id;
So first write the default into LDR and then validate it against the same default
which was set up during early boot APIC enumeration.
Brilliant, isn't it?
The comment above the per CPU variable declaration describes it well:
'Let's keep it ugly for now.'
Remove the useless gunk and use '1U << cpu' consistently all over the place.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
apic::x86_32_early_logical_apicid() is yet another historical joke.
It is used to preset the x86_cpu_to_logical_apicid per CPU variable during
APIC enumeration with:
- 1 shifted left by the CPU number
- the physical APIC ID in case of bigsmp
The latter is hillarious because bigsmp uses physical destination mode
which never can use the logical APIC ID.
It gets even worse. As bigsmp can be enforced late in the boot process the
probe function overwrites the per CPU variable which is never used for this
APIC type once again.
Remove that gunk and store 1 << cpunr unconditionally if and only if the
CPU number is less than 8, because the default logical destination mode
only allows up to 8 CPUs.
This is just an intermediate step before removing the per CPU insanity
completely. Stay tuned.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
No need for an extra variable to find out whether the APIC has been mapped
or is accessible (X2APIC mode).
Provide an inline for this and check apic_mmio_base which is only set when
the local APIC has been mapped.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
num_processors is 0 by default and only gets incremented when local APICs
are registered.
Make init_apic_mappings(), which tries to enable the local APIC in the case
that no SMP configuration was found set num_processors to 1.
This allows to remove yet another check for the local APIC and yet another
place which registers the boot CPUs local APIC ID.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
Convert places which just write mp_lapic_addr and let them register the
local APIC address directly instead of relying on magic other code to do
so.
Add a WARN_ON() into register_lapic_address() which is raised when
register_lapic_address() is invoked more than once during boot.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
Split the fixmap setup out of register_lapic_address() and reuse it when
the X2APIC is disabled during setup.
This avoids registering the APIC ID (setting 'mp_lapic_addr') twice.
[ dhansen: changelog wording tweak ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
Quite some APIC init functions are pure boolean, but use the success = 0,
fail < 0 model. That's confusing as hell when reading through the code.
Convert them to boolean.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
This historical leftover is really uninteresting today. Whatever MPTABLE or
MADT delivers we only trust the hardware anyway.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
Register the boot CPU APIC right when the boot CPUs APIC is read from the
hardware. No point is doing this on random places and having wild
heuristics to save the boot CPU APIC ID slot and CPU number 0 reserved.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
boot_cpu_physical_apicid is written in random places and in the last
consequence filled with the APIC ID read from the local APIC. That causes
it to have inconsistent state when the MPTABLE is broken. As a consequence
tons of moronic checks are sprinkled all over the place.
Consolidate the code and read it exactly once when either X2APIC mode is
detected early or when the APIC mapping is established.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
max_physical_apicid is assigned but never read.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
No point in having a wrapper around read_apic_id().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
Another variable name which is confusing at best. Convert to bool.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
It reflects a state and not a command. Make it bool while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
Marking primary threads in the cpumask during early boot is only correct in
certain configurations, but broken e.g. for the legacy hyperthreading
detection.
This is due to the complete mess in the CPUID evaluation code which
initializes smp_num_siblings only half during early init and fixes it up
later when identify_boot_cpu() is invoked.
So using smp_num_siblings before identify_boot_cpu() leads to incorrect
results.
Fixing the early CPU init code to provide the proper data is a larger scale
surgery as the code has dependencies on data structures which are not
initialized during early boot.
Move the initialization of cpu_primary_thread_mask wich depends on
smp_num_siblings being correct to an early initcall so that it is set up
correctly before SMP bringup.
Fixes: f54d4434c2 ("x86/apic: Provide cpu_primary_thread mask")
Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/r/87sfbhlwp9.ffs@tglx
In parallel startup mode the APs are kicked alive by the control CPU
quickly after each other and run through the early startup code in
parallel. The real-mode startup code is already serialized with a
bit-spinlock to protect the real-mode stack.
In parallel startup mode the smpboot_control variable obviously cannot
contain the Linux CPU number so the APs have to determine their Linux CPU
number on their own. This is required to find the CPUs per CPU offset in
order to find the idle task stack and other per CPU data.
To achieve this, export the cpuid_to_apicid[] array so that each AP can
find its own CPU number by searching therein based on its APIC ID.
Introduce a flag in the top bits of smpboot_control which indicates that
the AP should find its CPU number by reading the APIC ID from the APIC.
This is required because CPUID based APIC ID retrieval can only provide the
initial APIC ID, which might have been overruled by the firmware. Some AMD
APUs come up with APIC ID = initial APIC ID + 0x10, so the APIC ID to CPU
number lookup would fail miserably if based on CPUID. Also virtualization
can make its own APIC ID assignements. The only requirement is that the
APIC IDs are consistent with the APCI/MADT table.
For the boot CPU or in case parallel bringup is disabled the control bits
are empty and the CPU number is directly available in bit 0-23 of
smpboot_control.
[ tglx: Initial proof of concept patch with bitlock and APIC ID lookup ]
[ dwmw2: Rework and testing, commit message, CPUID 0x1 and CPU0 support ]
[ seanc: Fix stray override of initial_gs in common_cpu_up() ]
[ Oleksandr Natalenko: reported suspend/resume issue fixed in
x86_acpi_suspend_lowlevel ]
[ tglx: Make it read the APIC ID from the APIC instead of using CPUID,
split the bitlock part out ]
Co-developed-by: Thomas Gleixner <tglx@linutronix.de>
Co-developed-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205257.411554373@linutronix.de
For parallel CPU brinugp it's required to read the APIC ID in the low level
startup code. The virtual APIC base address is a constant because its a
fix-mapped address. Exposing that constant which is composed via macros to
assembly code is non-trivial due to header inclusion hell.
Aside of that it's constant only because of the vsyscall ABI
requirement. Once vsyscall is out of the picture the fixmap can be placed
at runtime.
Avoid header hell, stay flexible and store the address in a variable which
can be exposed to the low level startup code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205257.299231005@linutronix.de
Make the primary thread tracking CPU mask based in preparation for simpler
handling of parallel bootup.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205257.186599880@linutronix.de
The detection of atomic update failure in reserve_eilvt_offset() is
not correct. The value returned by atomic_cmpxchg() should be compared
to the old value from the location to be updated.
If these two are the same, then atomic update succeeded and
"eilvt_offsets[offset]" location is updated to "new" in an atomic way.
Otherwise, the atomic update failed and it should be retried with the
value from "eilvt_offsets[offset]" - exactly what atomic_try_cmpxchg()
does in a correct and more optimal way.
Fixes: a68c439b19 ("apic, x86: Check if EILVT APIC registers are available (AMD only)")
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230227160917.107820-1-ubizjak@gmail.com
A kernel that was compiled without CONFIG_X86_X2APIC was unable to boot on
platforms that have x2APIC already enabled in the BIOS before starting the
kernel.
The kernel was supposed to panic with an approprite error message in
validate_x2apic() due to the missing X2APIC support.
However, validate_x2apic() was run too late in the boot cycle, and the
kernel tried to initialize the APIC nonetheless. This resulted in an
earlier panic in setup_local_APIC() because the APIC was not registered.
In my experiments, a panic message in setup_local_APIC() was not visible
in the graphical console, which resulted in a hang with no indication
what has gone wrong.
Instead of calling panic(), disable the APIC, which results in a somewhat
working system with the PIC only (and no SMP). This way the user is able to
diagnose the problem more easily.
Disabling X2APIC mode is not an option because it's impossible on systems
with locked x2APIC.
The proper place to disable the APIC in this case is in check_x2apic(),
which is called early from setup_arch(). Doing this in
__apic_intr_mode_select() is too late.
Make check_x2apic() unconditionally available and remove the empty stub.
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reported-by: Robert Elliott (Servers) <elliott@hpe.com>
Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/d573ba1c-0dc4-3016-712a-cc23a8a33d42@molgen.mpg.de
Link: https://lore.kernel.org/lkml/20220911084711.13694-3-mat.jonczyk@o2.pl
Link: https://lore.kernel.org/all/20221129215008.7247-1-mat.jonczyk@o2.pl
The APIC supports two modes, legacy APIC (or xAPIC), and Extended APIC
(or x2APIC). X2APIC mode is mostly compatible with legacy APIC, but
it disables the memory-mapped APIC interface in favor of one that uses
MSRs. The APIC mode is controlled by the EXT bit in the APIC MSR.
The MMIO/xAPIC interface has some problems, most notably the APIC LEAK
[1]. This bug allows an attacker to use the APIC MMIO interface to
extract data from the SGX enclave.
Introduce support for a new feature that will allow the BIOS to lock
the APIC in x2APIC mode. If the APIC is locked in x2APIC mode and the
kernel tries to disable the APIC or revert to legacy APIC mode a GP
fault will occur.
Introduce support for a new MSR (IA32_XAPIC_DISABLE_STATUS) and handle
the new locked mode when the LEGACY_XAPIC_DISABLED bit is set by
preventing the kernel from trying to disable the x2APIC.
On platforms with the IA32_XAPIC_DISABLE_STATUS MSR, if SGX or TDX are
enabled the LEGACY_XAPIC_DISABLED will be set by the BIOS. If
legacy APIC is required, then it SGX and TDX need to be disabled in the
BIOS.
[1]: https://aepicleak.com/aepicleak.pdf
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Link: https://lkml.kernel.org/r/20220816231943.1152579-1-daniel.sneddon@linux.intel.com