... to avoid unwanted gcc optimizations
SMP kernels fail to boot with commit 596ff4a09b
("cpumask: re-introduce constant-sized cpumask optimizations").
|
| percpu: BUG: failure at mm/percpu.c:2981/pcpu_build_alloc_info()!
|
The write operation performed by the SCOND instruction in the atomic
inline asm code is not properly passed to the compiler. The compiler
cannot correctly optimize a nested loop that runs through the cpumask
in the pcpu_build_alloc_info() function.
Fix this by add a compiler barrier (memory clobber in inline asm).
Apparently atomic ops used to have memory clobber implicitly via
surrounding smp_mb(). However commit b64be68369
("ARC: atomics: implement relaxed variants") removed the smp_mb() for
the relaxed variants, but failed to add the explicit compiler barrier.
Link: https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/135
Cc: <stable@vger.kernel.org> # v6.3+
Fixes: b64be68369 ("ARC: atomics: implement relaxed variants")
Signed-off-by: Pavel Kozlov <pavel.kozlov@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@kernel.org>
[vgupta: tweaked the changelog and added Fixes tag]
The current ARC fetch/return atomics provide fully ordered semantics
only with 2 full barriers around the operation.
Instead implement them as relaxed variants without any barriers and
rely on generic code to generate the fully-ordered, acquire and release
varaints by adding the appropriate full barriers.
This helps elide some extra barriers in case of acquire/release/relaxed
calls.
bloat-o-meter for hsdk defconfig shows codegen improvements, although
numbers below inflated due to unrelated inlining heuristic changes
| bloat-o-meter vmlinux-643babe34fd7-non-relaxed vmlinux-45aa05cb44d7-relaxed
| add/remove: 2/5 grow/shrink: 42/1222 up/down: 4158/-14312 (-10154)
| Function old new delta
| ..
| sys_renameat 462 476 +14
| ip_mc_inc_group 424 436 +12
| do_read_cache_page 1882 1894 +12
| ..
| refcount_dec_and_mutex_lock 254 250 -4
| refcount_dec_and_lock_irqsave 258 254 -4
| refcount_dec_and_lock 254 250 -4
| ..
| tcp_v6_route_req 246 238 -8
| tcp_v4_destroy_sock 286 278 -8
| tcp_twsk_unique 352 344 -8
Link: https://lore.kernel.org/r/20180830144344.GW24142@hirez.programming.kicks-ass.net
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@kernel.org>