1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
linux/arch/x86/kernel/cpu
Linus Torvalds 7fef099702 x86/resctl: fix scheduler confusion with 'current'
The implementation of 'current' on x86 is very intentionally special: it
is a very common thing to look up, and it uses 'this_cpu_read_stable()'
to get the current thread pointer efficiently from per-cpu storage.

And the keyword in there is 'stable': the current thread pointer never
changes as far as a single thread is concerned.  Even if when a thread
is preempted, or moved to another CPU, or even across an explicit call
'schedule()' that thread will still have the same value for 'current'.

It is, after all, the kernel base pointer to thread-local storage.
That's why it's stable to begin with, but it's also why it's important
enough that we have that special 'this_cpu_read_stable()' access for it.

So this is all done very intentionally to allow the compiler to treat
'current' as a value that never visibly changes, so that the compiler
can do CSE and combine multiple different 'current' accesses into one.

However, there is obviously one very special situation when the
currently running thread does actually change: inside the scheduler
itself.

So the scheduler code paths are special, and do not have a 'current'
thread at all.  Instead there are _two_ threads: the previous and the
next thread - typically called 'prev' and 'next' (or prev_p/next_p)
internally.

So this is all actually quite straightforward and simple, and not all
that complicated.

Except for when you then have special code that is run in scheduler
context, that code then has to be aware that 'current' isn't really a
valid thing.  Did you mean 'prev'? Did you mean 'next'?

In fact, even if then look at the code, and you use 'current' after the
new value has been assigned to the percpu variable, we have explicitly
told the compiler that 'current' is magical and always stable.  So the
compiler is quite free to use an older (or newer) value of 'current',
and the actual assignment to the percpu storage is not relevant even if
it might look that way.

Which is exactly what happened in the resctl code, that blithely used
'current' in '__resctrl_sched_in()' when it really wanted the new
process state (as implied by the name: we're scheduling 'into' that new
resctl state).  And clang would end up just using the old thread pointer
value at least in some configurations.

This could have happened with gcc too, and purely depends on random
compiler details.  Clang just seems to have been more aggressive about
moving the read of the per-cpu current_task pointer around.

The fix is trivial: just make the resctl code adhere to the scheduler
rules of using the prev/next thread pointer explicitly, instead of using
'current' in a situation where it just wasn't valid.

That same code is then also used outside of the scheduler context (when
a thread resctl state is explicitly changed), and then we will just pass
in 'current' as that pointer, of course.  There is no ambiguity in that
case.

The fix may be trivial, but noticing and figuring out what went wrong
was not.  The credit for that goes to Stephane Eranian.

Reported-by: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/lkml/20230303231133.1486085-1-eranian@google.com/
Link: https://lore.kernel.org/lkml/alpine.LFD.2.01.0908011214330.3304@localhost.localdomain/
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Stephane Eranian <eranian@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-08 11:48:11 -08:00
..
mce - Add support for reporting more bits of the physical address on error, 2023-02-21 08:04:51 -08:00
microcode x86/microcode/core: Return an error only when necessary 2023-02-06 13:41:31 +01:00
mtrr x86/mtrr: Make message for disabled MTRRs more descriptive 2022-12-05 11:08:25 +01:00
resctrl x86/resctl: fix scheduler confusion with 'current' 2023-03-08 11:48:11 -08:00
sgx mm: replace vma->vm_flags direct modifications with modifier calls 2023-02-09 16:51:39 -08:00
.gitignore .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
acrn.c x86/acrn: Set up timekeeping 2022-08-04 11:11:59 +02:00
amd.c x86/amd: Cache debug register values in percpu variables 2023-01-31 20:09:26 +01:00
aperfmperf.c x86/aperfmperf: Erase stale arch_freq_scale values when disabling frequency invariance readings 2023-01-16 10:19:15 +01:00
bugs.c x86/speculation: Allow enabling STIBP with legacy IBRS 2023-02-27 18:57:09 +01:00
cacheinfo.c x86/cacheinfo: Remove unused trace variable 2023-02-11 10:43:31 +01:00
centaur.c x86/cpu/centaur: Add Centaur family >=7 CPUs initialization support 2020-09-11 10:53:19 +02:00
common.c A healthy mix of EFI contributions this time: 2023-02-23 14:41:48 -08:00
cpu.h x86/cpu: Remove redundant extern x86_read_arch_cap_msr() 2023-01-10 12:40:24 +01:00
cpuid-deps.c x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag 2023-01-23 17:38:31 +01:00
cyrix.c x86/cyrix: include header linux/isa-dma.h 2022-07-26 14:03:12 -05:00
feat_ctl.c x86/cpu: Include the header of init_ia32_feat_ctl()'s prototype 2022-09-26 17:06:27 +02:00
hygon.c - Split MTRR and PAT init code to accomodate at least Xen PV and TDX 2022-12-13 14:56:56 -08:00
hypervisor.c x86/paravirt: Remove const mark from x86_hyper_xen_hvm variable 2019-07-17 08:09:59 +02:00
intel.c - Add support for multiple testing sequences to the Intel In-Field Scan 2022-12-13 15:05:29 -08:00
intel_epb.c x86/intel_epb: Set Alder Lake N and Raptor Lake P normal EPB 2022-11-03 11:31:01 -07:00
intel_pconfig.c x86/pconfig: Detect PCONFIG targets 2018-03-12 12:10:54 +01:00
Makefile x86/cpu: Re-enable stackprotector 2022-10-17 16:40:56 +02:00
match.c x86/cpu: Add a steppings field to struct x86_cpu_id 2020-04-20 12:19:21 +02:00
mkcapflags.sh x86/cpu: Print VMX flags in /proc/cpuinfo using VMX_FEATURES_* 2020-01-13 18:36:02 +01:00
mshyperv.c ARM: 2023-02-25 11:30:21 -08:00
perfctr-watchdog.c x86/nmi_watchdog: Fix old-style NMI watchdog regression on old Intel CPUs 2021-06-10 10:04:40 +02:00
powerflags.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
proc.c x86/aperfmperf: Integrate the fallback code from show_cpuinfo() 2022-04-27 20:22:20 +02:00
rdrand.c x86/rdrand: Remove "nordrand" flag in favor of "random.trust_cpu" 2022-07-18 15:04:04 +02:00
scattered.c x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag 2023-01-23 17:38:31 +01:00
topology.c x86/topology: Fix duplicated core ID within a package 2022-10-17 11:58:52 -07:00
transmeta.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
tsx.c x86/cpu: Remove redundant extern x86_read_arch_cap_msr() 2023-01-10 12:40:24 +01:00
umc.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
umwait.c KVM: VMX: Stop context switching MSR_IA32_UMWAIT_CONTROL 2020-06-22 20:54:57 -04:00
vmware.c sched/clock/x86: Mark sched_clock() noinstr 2023-01-31 15:01:47 +01:00
vortex.c x86/CPU: Add support for Vortex CPUs 2021-10-21 15:49:07 +02:00
zhaoxin.c x86/cpu: Reinitialize IA32_FEAT_CTL MSR on BSP during wakeup 2020-06-15 14:18:37 +02:00