1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
linux/arch/x86/kernel
Linus Torvalds 7fef099702 x86/resctl: fix scheduler confusion with 'current'
The implementation of 'current' on x86 is very intentionally special: it
is a very common thing to look up, and it uses 'this_cpu_read_stable()'
to get the current thread pointer efficiently from per-cpu storage.

And the keyword in there is 'stable': the current thread pointer never
changes as far as a single thread is concerned.  Even if when a thread
is preempted, or moved to another CPU, or even across an explicit call
'schedule()' that thread will still have the same value for 'current'.

It is, after all, the kernel base pointer to thread-local storage.
That's why it's stable to begin with, but it's also why it's important
enough that we have that special 'this_cpu_read_stable()' access for it.

So this is all done very intentionally to allow the compiler to treat
'current' as a value that never visibly changes, so that the compiler
can do CSE and combine multiple different 'current' accesses into one.

However, there is obviously one very special situation when the
currently running thread does actually change: inside the scheduler
itself.

So the scheduler code paths are special, and do not have a 'current'
thread at all.  Instead there are _two_ threads: the previous and the
next thread - typically called 'prev' and 'next' (or prev_p/next_p)
internally.

So this is all actually quite straightforward and simple, and not all
that complicated.

Except for when you then have special code that is run in scheduler
context, that code then has to be aware that 'current' isn't really a
valid thing.  Did you mean 'prev'? Did you mean 'next'?

In fact, even if then look at the code, and you use 'current' after the
new value has been assigned to the percpu variable, we have explicitly
told the compiler that 'current' is magical and always stable.  So the
compiler is quite free to use an older (or newer) value of 'current',
and the actual assignment to the percpu storage is not relevant even if
it might look that way.

Which is exactly what happened in the resctl code, that blithely used
'current' in '__resctrl_sched_in()' when it really wanted the new
process state (as implied by the name: we're scheduling 'into' that new
resctl state).  And clang would end up just using the old thread pointer
value at least in some configurations.

This could have happened with gcc too, and purely depends on random
compiler details.  Clang just seems to have been more aggressive about
moving the read of the per-cpu current_task pointer around.

The fix is trivial: just make the resctl code adhere to the scheduler
rules of using the prev/next thread pointer explicitly, instead of using
'current' in a situation where it just wasn't valid.

That same code is then also used outside of the scheduler context (when
a thread resctl state is explicitly changed), and then we will just pass
in 'current' as that pointer, of course.  There is no ambiguity in that
case.

The fix may be trivial, but noticing and figuring out what went wrong
was not.  The credit for that goes to Stephane Eranian.

Reported-by: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/lkml/20230303231133.1486085-1-eranian@google.com/
Link: https://lore.kernel.org/lkml/alpine.LFD.2.01.0908011214330.3304@localhost.localdomain/
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Stephane Eranian <eranian@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-08 11:48:11 -08:00
..
acpi Changes in this cycle: 2023-02-20 18:32:55 -08:00
apic x86/ioapic: Use irq_domain_create_hierarchy() 2023-02-13 19:31:24 +00:00
cpu x86/resctl: fix scheduler confusion with 'current' 2023-03-08 11:48:11 -08:00
fpu Updates in this cycle: 2023-02-20 18:50:02 -08:00
kprobes probes updates for 6.3: 2023-02-23 13:03:08 -08:00
.gitignore
alternative.c x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions 2023-01-31 15:05:31 +01:00
amd_gart_64.c x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros 2022-12-15 10:37:27 -08:00
amd_nb.c x86/amd_nb: Add AMD PCI IDs for SMN communication 2022-07-20 17:35:40 +02:00
aperture_64.c x86: Fix various duplicate-word comment typos 2022-08-15 19:17:52 +02:00
apm_32.c efi: x86: Wire up IBT annotation in memory attributes table 2023-02-09 19:30:54 +01:00
asm-offsets.c - Fixup comment typo 2023-02-25 09:11:30 -08:00
asm-offsets_32.c x86/stackprotector/32: Make the canary into a regular percpu variable 2021-03-08 13:19:05 +01:00
asm-offsets_64.c x86: Fixup asm-offsets duplicate 2022-10-17 16:41:06 +02:00
audit_64.c audit: add support for the openat2 syscall 2021-10-01 16:52:48 -04:00
bootflag.c
callthunks.c x86/calldepth: Fix incorrect init section references 2022-12-27 12:51:58 +01:00
cfi.c x86: Add support for CONFIG_CFI_CLANG 2022-09-26 10:13:16 -07:00
check.c
cpuid.c driver core: make struct class.devnode() take a const * 2022-11-24 17:12:27 +01:00
crash.c x86/crash: Disable virt in core NMI crash handler to avoid double shootdown 2023-01-24 10:05:21 -08:00
crash_core_32.c
crash_core_64.c
crash_dump_32.c vmcore: convert copy_oldmem_page() to take an iov_iter 2022-04-29 14:37:59 -07:00
crash_dump_64.c use less confusing names for iov_iter direction initializers 2022-11-25 13:01:55 -05:00
devicetree.c x86/of: Add support for boot time interrupt delivery mode configuration 2022-12-02 14:57:14 +01:00
doublefault_32.c exit/doublefault: Remove apparently bogus comment about rewind_stack_do_exit 2021-10-20 13:09:43 -05:00
dumpstack.c - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
dumpstack_32.c x86/percpu: Move irq_stack variables next to current_task 2022-10-17 16:41:05 +02:00
dumpstack_64.c x86/percpu: Move irq_stack variables next to current_task 2022-10-17 16:41:05 +02:00
e820.c x86/setup: Move duplicate boot_cpu_data definition out of the ifdeffery 2023-01-11 12:45:16 +01:00
early-quirks.c drm/i915/rpl-p: Add PCI IDs 2022-04-19 17:14:09 -07:00
early_printk.c x86/earlyprintk: Clean up pciserial 2022-08-29 12:19:25 +02:00
ebda.c
eisa.c
espfix_64.c x86/espfix: Use get_random_long() rather than archrandom 2022-10-31 20:12:50 +01:00
ftrace.c New Feature: 2022-12-17 14:06:53 -06:00
ftrace_32.S x86: Prepare asm files for straight-line-speculation 2021-12-08 12:25:37 +01:00
ftrace_64.S Merge branch 'x86/urgent' into x86/core, to resolve conflict 2022-10-22 10:06:18 +02:00
head32.c
head64.c x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros 2022-12-15 10:37:27 -08:00
head_32.S x86/asm/32: Remove setup_once() 2022-12-02 14:06:34 +01:00
head_64.S x86/callthunks: Add call patching for call depth tracking 2022-10-17 16:41:13 +02:00
hpet.c clocksource: Verify HPET and PMTMR when TSC unverified 2023-02-02 14:23:02 -08:00
hw_breakpoint.c x86/amd: Cache debug register values in percpu variables 2023-01-31 20:09:26 +01:00
i8237.c
i8253.c
i8259.c x86/i8259: Mark legacy PIC interrupts with IRQ_LEVEL 2023-01-16 17:24:56 +01:00
idt.c x86/traps: Add #VE support for TDX guest 2022-04-07 08:27:51 -07:00
io_delay.c
ioport.c
irq.c x86/irq: Ensure PI wakeup handler is unregistered before module unload 2021-10-22 12:45:35 -04:00
irq_32.c x86/percpu: Move irq_stack variables next to current_task 2022-10-17 16:41:05 +02:00
irq_64.c x86/percpu: Move irq_stack variables next to current_task 2022-10-17 16:41:05 +02:00
irq_work.c
irqflags.S x86: Prepare asm files for straight-line-speculation 2021-12-08 12:25:37 +01:00
irqinit.c x86/i8259: Mark legacy PIC interrupts with IRQ_LEVEL 2023-01-16 17:24:56 +01:00
itmt.c x86/sched: Decrease further the priorities of SMT siblings 2021-10-05 15:51:59 +02:00
jailhouse.c
jump_label.c jump_label: make initial NOP patching the special case 2022-06-24 09:48:55 +02:00
kdebugfs.c x86/boot: Fix memremap of setup_indirect structures 2022-03-09 12:49:44 +01:00
kexec-bzimage64.c integrity-v6.0 2022-08-02 15:21:18 -07:00
kgdb.c x86: Fix various typos in comments 2021-03-18 15:31:53 +01:00
ksysfs.c x86/boot: Fix memremap of setup_indirect structures 2022-03-09 12:49:44 +01:00
kvm.c ARM64: 2022-12-15 11:12:21 -08:00
kvmclock.c sched/clock/x86: Mark sched_clock() noinstr 2023-01-31 15:01:47 +01:00
ldt.c memcg: enable accounting for ldt_struct objects 2021-09-03 09:58:13 -07:00
machine_kexec_32.c x86/kexec: Set_[gi]dt() -> native_[gi]dt_invalidate() in machine_kexec_*.c 2021-05-21 12:36:45 +02:00
machine_kexec_64.c x86/kexec: fix memory leak of elf header buffer 2022-06-01 15:57:16 -07:00
Makefile x86/signal/compat: Move sigaction_compat_abi() to signal_64.c 2023-01-06 04:16:02 +01:00
mmconf-fam10h_64.c x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFG 2021-05-10 07:51:38 +02:00
module.c Livepatching changes for 6.3 2023-02-23 14:00:10 -08:00
mpparse.c x86: Avoid magic number with ELCR register accesses 2021-08-10 23:31:43 +02:00
msr.c driver core: make struct class.devnode() take a const * 2022-11-24 17:12:27 +01:00
nmi.c ARM: 2023-02-25 11:30:21 -08:00
nmi_selftest.c
paravirt-spinlocks.c x86/paravirt: Add new features for paravirt patching 2021-03-11 19:51:49 +01:00
paravirt.c - Cache the AMD debug registers in per-CPU variables to avoid MSR writes 2023-02-21 14:51:40 -08:00
pci-dma.c swiotlb: merge swiotlb-xen initialization into swiotlb 2022-04-18 07:21:13 +02:00
pcspeaker.c
perf_regs.c
platform-quirks.c
pmem.c x86/pmem: Fix platform-device leak in error path 2022-06-20 18:01:16 +02:00
probe_roms.c x86/kernel: Validate ROM memory before accessing when SEV-SNP is active 2022-04-06 13:23:09 +02:00
process.c Power management updates for 6.3-rc1 2023-02-21 12:13:58 -08:00
process.h x86: Snapshot thread flags 2021-12-01 00:06:43 +01:00
process_32.c x86/resctl: fix scheduler confusion with 'current' 2023-03-08 11:48:11 -08:00
process_64.c x86/resctl: fix scheduler confusion with 'current' 2023-03-08 11:48:11 -08:00
ptrace.c x86: Improve formatting of user_regset arrays 2022-11-01 15:36:52 -07:00
pvclock.c sched/clock/x86: Mark sched_clock() noinstr 2023-01-31 15:01:47 +01:00
quirks.c
reboot.c x86/reboot: Disable virtualization in an emergency if SVM is supported 2023-01-24 10:05:22 -08:00
reboot_fixups_32.c
relocate_kernel_32.S x86/kexec: Disable RET on kexec 2022-07-09 13:12:32 +02:00
relocate_kernel_64.S x86/callthunks: Add call patching for call depth tracking 2022-10-17 16:41:13 +02:00
resource.c x86/PCI: Tidy E820 removal messages 2022-12-10 10:33:11 -06:00
rethook.c x86,rethook: Fix arch_rethook_trampoline() to generate a complete pt_regs 2022-03-28 19:38:51 -07:00
rtc.c x86/rtc: Simplify PNP ids check 2023-01-06 04:22:34 +01:00
setup.c x86/setup: Move duplicate boot_cpu_data definition out of the ifdeffery 2023-01-11 12:45:16 +01:00
setup_percpu.c - Add the call depth tracking mitigation for Retbleed which has 2022-12-14 15:03:00 -08:00
sev-shared.c Revert "x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV" 2022-07-27 18:09:13 +02:00
sev.c x86/insn: Avoid namespace clash by separating instruction decoder MMIO type from MMIO trace type 2023-01-03 18:46:06 +01:00
sev_verify_cbit.S x86: Prepare asm files for straight-line-speculation 2021-12-08 12:25:37 +01:00
signal.c x86/signal: Fix the value returned by strict_sas_size() 2023-01-15 09:54:27 +01:00
signal_32.c - Cache the AMD debug registers in per-CPU variables to avoid MSR writes 2023-02-21 14:51:40 -08:00
signal_64.c x86/signal/compat: Move sigaction_compat_abi() to signal_64.c 2023-01-06 04:16:02 +01:00
smp.c x86/reboot: Disable SVM, not just VMX, when stopping CPUs 2023-01-24 10:05:22 -08:00
smpboot.c x86/hotplug: Remove incorrect comment about mwait_play_dead() 2023-02-14 23:44:34 +01:00
stacktrace.c x86: remove __range_not_ok() 2022-02-25 09:36:05 +01:00
static_call.c x86/static_call: Add support for Jcc tail-calls 2023-01-31 15:05:31 +01:00
step.c ptrace: Reimplement PTRACE_KILL by always sending SIGKILL 2022-05-11 14:34:28 -05:00
sys_ia32.c
sys_x86_64.c x86/mm: Cleanup the control_va_addr_alignment() __setup handler 2022-05-04 18:20:42 +02:00
tboot.c mm: remove rb tree. 2022-09-26 19:46:16 -07:00
time.c
tls.c x86/gsseg: Move load_gs_index() to its own new header file 2023-01-12 13:06:36 +01:00
tls.h
topology.c x86/cpu: Switch to cpu_feature_enabled() for X86_FEATURE_XENPV 2022-11-22 16:18:19 +01:00
trace.c trace/osnoise: Fix an ifdef comment 2021-10-25 23:02:36 -04:00
trace_clock.c
tracepoint.c x86/traceponit: Fix comment about irq vector tracepoints 2022-05-26 22:03:52 -04:00
traps.c - Add the call depth tracking mitigation for Retbleed which has 2022-12-14 15:03:00 -08:00
tsc.c Updates for timekeeping, timers and clockevent/source drivers: 2023-02-21 09:45:13 -08:00
tsc_msr.c
tsc_sync.c x86/tsc: Add a timer to make sure TSC_adjust is always checked 2021-12-02 00:40:35 +01:00
umip.c x86/umip: Downgrade warning messages to debug loglevel 2021-09-25 13:23:28 +02:00
unwind_frame.c x86: kmsan: don't instrument stack walking functions 2022-10-03 14:03:25 -07:00
unwind_guess.c x86/unwind: Recover kretprobe trampoline entry 2021-09-30 21:24:07 -04:00
unwind_orc.c x86/unwind/orc: Add 'signal' field to ORC metadata 2023-02-11 12:37:51 +01:00
uprobes.c uprobes/x86: Allow to probe a NOP instruction with 0x66 prefix 2022-12-05 11:55:18 +01:00
verify_cpu.S x86: Prepare asm files for straight-line-speculation 2021-12-08 12:25:37 +01:00
vm86_32.c x86/32: Remove lazy GS macros 2022-04-14 14:09:43 +02:00
vmlinux.lds.S objtool/idle: Validate __cpuidle code as noinstr 2023-01-13 11:48:15 +01:00
vsmp_64.c
x86_init.c x86/boot: Skip realmode init code when running as Xen PV guest 2022-11-25 12:05:22 +01:00