There's no need to keep checking the same starting node for whether a
grace period is in progress as we advance up the funnel lock loop. Its
sufficient if we just checked it in the start, and then subsequently
checked the internal nodes as we advanced up the combining tree. This
also makes sense because the grace-period updates propogate from the
root to the leaf, so there's a chance we may find a grace period has
started as we advance up, lets check for the same.
Reported-by: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The funnel locking loop in rcu_start_this_gp uses rcu_root as a
temporary variable while walking the combining tree. This causes a
tiresome exercise of a code reader reminding themselves that rcu_root
may not be root. Lets just call it rnp, and rename other variables as
well to be more appropriate.
Original patch: https://patchwork.kernel.org/patch/10396577/
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix name in comment as well. ]
The name 'c' is used for variables and parameters holding the requested
grace-period sequence number. However it is no longer very meaningful
given the conversions from ->gpnum and (especially) ->completed to
->gp_seq. This commit therefore renames 'c' to 'gp_seq_req'.
Previous patch discussion is at:
https://patchwork.kernel.org/patch/10396579/
Signed-off-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcu_data structure's ->gpwrap indicator is currently reset only
when the CPU in question detects a new grace period. This is in theory
sufficient because any CPU that has been out of action for long enough
that its ->gpwrap indicator is set is guaranteed to see both the end
of an old grace period and the start of a new one.
However, the current code leaves a short window during which the ->gpwrap
indicator has been reset but the corresponding ->gp_seq counter has not
yet been brought up to date. This is harmless because interrupts are
disabled, but it is likely to (at the very least) cause confusion.
This commit therefore moves the resetting of ->gpwrap to follow the
updating of ->gp_seq. While in the area, it also resets ->gp_seq_needed.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The new ->gq_seq grace-period sequence numbers must be shifted down,
which give artifacts when these numbers wrap. This commit therefore
enables rcutorture and rcuperf to handle grace-period sequence numbers
even if they do wrap. It does this by allowing a special subtraction
function to be specified, and this function subtracts before shifting.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
In the old days of ->gpnum and ->completed, the code requesting a new
grace period checked to see if that grace period had already started,
bailing early if so. The new-age ->gp_seq approach instead checks
whether the grace period has already finished. A compensating change
pushed the requested grace period down to the bottom of the tree, thus
reducing lock contention and even eliminating it in some cases. But why
not further reduce contention, especially on large systems, by doing both,
especially given that the cost of doing both is extremely small?
This commit therefore adds a new rcu_seq_started() function that checks
whether a specified grace period has already started. It then uses
this new function in place of rcu_seq_done() in the rcu_start_this_gp()
function's funnel locking code.
Reported-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The "cpustart" trace event shows a stale gp_seq. This is because it uses
rdp->gp_seq, which is updated only at the end of the __note_gp_changes()
function. This commit therefore instead uses rnp->gp_seq.
An alternative fix would be to update rdp->gp_seq earlier, but this would
break RCU's detection of the beginning of a new-to-this-CPU grace period.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Currently Tree RCU's clean-up code emits a "CleanupMore" trace event in
response to late-arriving grace-period requests even if the grace period
was already requested. This makes "CleanupMore" show up an extra time (in
addition to once for each rcu_node structure that was previously marked
with the request), and for no good reason. This commit therefore avoids
emitting this trace message unless the the only request for this next
grace period arrived during or after the cleanup scan of the rcu_node
structures.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The old grace-period start code would acquire only the leaf's rcu_node
structure's ->lock if that structure believed that a grace period was
in progress. The new code advances to the leaf's parent in this case,
needlessly acquiring then leaf's parent's ->lock. This commit therefore
checks the grace-period state after marking the leaf with the need for
the specified grace period, and if the leaf believes that a grace period
is in progress, takes an early exit.
Reported-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Add "Startedleaf" tracing as suggested by Joel Fernandes. ]
Now that the rcu_data structure contains ->gp_seq_needed, create an
rcu_accelerate_cbs_unlocked() helper function that locklessly checks to
see if new callbacks' required grace period has already been requested.
If so, update the callback list locally and again locklessly. (Though
interrupts must be and are disabled to avoid racing with conflicting
updates in interrupt handlers.)
Otherwise, call rcu_accelerate_cbs() as before.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Now that everything has been converted to use ->gp_seq instead of
->gpnum and ->completed, this commit removes ->gpnum and ->completed.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes the rcu_quiescent_state_report tracepoint use ->gp_seq
instead of ->gpnum.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes the rcu_unlock_preempted_task tracepoint use ->gp_seq
instead of ->gpnum.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes the rcu_future_grace_period tracepoint use gp_seq
instead of ->gpnum and ->completed.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes the rcu_grace_period tracepoint use gp_seq instead
of ->gpnum or ->completed. It also introduces a "cpuofl-bgp" string to
less obscurely indicate when a CPU has gone offline while a grace period
is waiting on it.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes rcu_nocb_wait_gp() check rdp->gp_seq_needed to see
if the current CPU already knows about the needed grace period having
already been requested. If so, it avoids acquiring the corresponding
leaf rcu_node structure's ->lock, thus decreasing contention. This
optimization is intended for cases where either multiple leader rcuo
kthreads are running on the same CPU or these kthreads are running on
a non-offloaded (e.g., housekeeping) CPU.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Move lock release past "if" as suggested by Joel Fernandes. ]
[ paulmck: Fix caching of furthest-future requested grace period. ]
One problem with the ->need_future_gp[] array is that the grace-period
assignment of each element changes as the grace periods complete.
This means that it is necessary to hold a lock when checking this
array to learn if a given grace period has already been requested.
This increase lock contention, which is the opposite of helpful.
This commit therefore replaces the ->need_future_gp[] with a single
->gp_seq_needed value and keeps it updated in the rcu_data structure.
This will enable reliable lockless checking of whether or not a given
grace period has already been requested.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
SRCU has long used ->srcu_gp_seq, and now RCU uses ->gp_seq. This
commit therefore moves the rcutorture_get_gp_data() function from
a ->gpnum / ->completed pair to ->gp_seq.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes the RCU CPU stall-warning code in print_other_cpu_stall(),
print_cpu_stall(), and check_cpu_stall() use ->gp_seq instead of ->gpnum
and ->completed.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit converts the grace-period request code paths from ->completed
and ->gpnum to ->gp_seq. The need_future_gp_element() macro encapsulates
the shift operation required to use ->gp_seq as an index to the
->need_future_gp[] array. The rcu_cbs_completed() function is removed
in favor of the rcu_seq_snap() function. The rcu_start_this_gp()
gets some temporary consistency checks and uses rcu_seq_done(),
rcu_seq_current(), rcu_seq_state(), and rcu_gp_in_progress() in place
of the earlier open-coded comparisons of ->gpnum and ->completed.
The rcu_future_gp_cleanup() function replaces use of ->completed
with ->gp_seq. The rcu_accelerate_cbs() function replaces a call to
rcu_cbs_completed() with one to rcu_seq_snap(). The rcu_advance_cbs()
function replaces an access to >completed with one to ->gp_seq and adds
some temporary warnings. The rcu_nocb_wait_gp() function replaces a
call to rcu_cbs_completed() with one to rcu_seq_snap() and an open-coded
comparison with rcu_seq_done().
The temporary warnings will be removed when the various ->gpnum and
->completed fields are removed. Their purpose is to locate code who
might still be using ->gpnum and ->completed. (Much easier that way
than trying to trace down the causes of too-short grace periods and
grace-period hangs!)
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit switches the quiescent-state no-backtracking checks from
->gpnum and ->completed to ->gp_seq.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit switches the interrupt-disabled detection mechanism to
->gp_seq. This mechanism is used as part of RCU CPU stall warnings,
and detects cases where the stall is due to a CPU having interrupts
disabled.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes rcu_gp_in_progress() use ->gp_seq instead of
->completed and ->gpnum. The READ_ONCE() invocations are buried
in rcu_seq_current().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes rcu_try_advance_all_cbs() use ->gp_seq. It uses
rcu_seq_ctr() in order to shift away the state bits, so that the
low-order bits of the result may safely be used to index ->nocb_gp_wq[].
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes rcu_try_advance_all_cbs() use ->gp_seq, with the
exception of tracing, which will be converted later.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes rcu_implicit_dynticks_qs() use ->gp_seq, with the
exception of tracing, which will be converted later.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit converts rcu_gpnum_ovf() to use ->gp_seq instead of ->gpnum.
Same size unsigned long, so same approach.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit moves __note_gp_changes(), note_gp_changes(), and
__rcu_pending() to ->gp_seq, creating new rcu_seq_completed_gp() and
rcu_seq_new_gp() functions for this purpose.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Reinstate "cpuend: trace as suggested by Joel Fernandes. ]
This commit converts get_state_synchronize_rcu(), cond_synchronize_rcu(),
get_state_synchronize_sched(), and cond_synchronize_sched() from ->gpnum
and ->completed to ->gp_seq. Note that this also introduces a full
memory barrier in the already-done paths off cond_synchronize_rcu() and
cond_synchronize_sched(), as work with LKMM indicates that the earlier
smp_load_acquire() were insufficiently strong in some situations where
these two functions were called just as the grace period ended. In such
cases, these two functions would not gain the benefit of memory ordering
at the end of the grace period.
Please note that the performance impact is negligible, as you shouldn't
be using either function anywhere near a fastpath in any case.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit switches the functions reporting quiescent states from
use of ->gpnum to ->gp_seq. In either case, the point is to handle
races where a given grace period ends before a quiescent state can
be reported. Failing to catch these races would result in too-short
grace periods, hence the checking.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit switches rcu_check_gp_kthread_starvation() from printing
->gpnum and ->completed to printing ->gp_seq upon detecting a starving
RCU grace-period kthread during an RCU CPU stall warning.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcutorture test invokes rcu_batches_started(),
rcu_batches_completed(), rcu_batches_started_bh(),
rcu_batches_completed_bh(), rcu_batches_started_sched(), and
rcu_batches_completed_sched() to do grace-period consistency checks,
and rcuperf uses the _completed variants for statistics.
These functions use ->gpnum and ->completed. This commit therefore
replaces them with rcu_get_gp_seq(), rcu_bh_get_gp_seq(), and
rcu_sched_get_gp_seq(), adjusting rcutorture and rcuperf to make
use of them.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit moves rcu_gp_slow() to ->gp_seq. This function only uses
the grace-period number to modulate delay, so rcu_seq_ctr(rsp->gp_seq)
gets the same effect, at least in cases where the delay is to happen
more than four times per wrap of an unsigned long.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit adds grace-period sequence numbers (->gp_seq) to the
rcu_state, rcu_node, and rcu_data structures, and updates them.
It also checks for consistency between rsp->gpnum and rsp->gp_seq.
These ->gp_seq counters will eventually replace the existing ->gpnum
and ->completed counters, allowing a single memory access to determine
whether or not a grace period is in progress and if so, which one.
This in turn will enable changes that will reduce ->lock contention on
the leaf rcu_node structures.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
At the end of rcu_gp_cleanup(), if another grace period is needed, but
not via rcu_accelerate_cbs(), the ->gp_flags field is written twice,
once when making the new grace-period request, and once when clearing
all other types of requests. This commit therefore adds an else-clause
to avoid this double write.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit causes a splat if RCU is idle and a request for a new grace
period is ignored for more than one second. This splat normally indicates
that some code path asked for a new grace period, but failed to wake up
the RCU grace-period kthread.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix bug located by Dan Carpenter and his static checker. ]
[ paulmck: Fix self-deadlock bug located 0day test robot. ]
[ paulmck: Disable unless CONFIG_PROVE_RCU=y. ]
syzkaller managed to trigger the following bug through fault injection:
[...]
[ 141.043668] verifier bug. No program starts at insn 3
[ 141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
[ 141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
fixup_call_args kernel/bpf/verifier.c:5587 [inline]
[ 141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
[ 141.047355] CPU: 3 PID: 4072 Comm: a.out Not tainted 4.18.0-rc4+ #51
[ 141.048446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS 1.10.2-1 04/01/2014
[ 141.049877] Call Trace:
[ 141.050324] __dump_stack lib/dump_stack.c:77 [inline]
[ 141.050324] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
[ 141.050950] ? dump_stack_print_info.cold.2+0x52/0x52 lib/dump_stack.c:60
[ 141.051837] panic+0x238/0x4e7 kernel/panic.c:184
[ 141.052386] ? add_taint.cold.5+0x16/0x16 kernel/panic.c:385
[ 141.053101] ? __warn.cold.8+0x148/0x1ba kernel/panic.c:537
[ 141.053814] ? __warn.cold.8+0x117/0x1ba kernel/panic.c:530
[ 141.054506] ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
[ 141.054506] ? fixup_call_args kernel/bpf/verifier.c:5587 [inline]
[ 141.054506] ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
[ 141.055163] __warn.cold.8+0x163/0x1ba kernel/panic.c:538
[ 141.055820] ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
[ 141.055820] ? fixup_call_args kernel/bpf/verifier.c:5587 [inline]
[ 141.055820] ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
[...]
What happens in jit_subprogs() is that kcalloc() for the subprog func
buffer is failing with NULL where we then bail out. Latter is a plain
return -ENOMEM, and this is definitely not okay since earlier in the
loop we are walking all subprogs and temporarily rewrite insn->off to
remember the subprog id as well as insn->imm to temporarily point the
call to __bpf_call_base + 1 for the initial JIT pass. Thus, bailing
out in such state and handing this over to the interpreter is troublesome
since later/subsequent e.g. find_subprog() lookups are based on wrong
insn->imm.
Therefore, once we hit this point, we need to jump to out_free path
where we undo all changes from earlier loop, so that interpreter can
work on unmodified insn->{off,imm}.
Another point is that should find_subprog() fail in jit_subprogs() due
to a verifier bug, then we also should not simply defer the program to
the interpreter since also here we did partial modifications. Instead
we should just bail out entirely and return an error to the user who is
trying to load the program.
Fixes: 1c2a088a66 ("bpf: x64: add JIT support for multi-function programs")
Reported-by: syzbot+7d427828b2ea6e592804@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull timekeeping updates from John Stultz:
- Make the timekeeping update more precise when NTP frequency is set
directly by updating the multiplier.
- Adjust selftests
Currently, the parallelized initialization of expedited grace periods uses
the workqueue associated with each rcu_node structure's ->grplo field.
This works fine unless that CPU is offline. This commit therefore uses
the CPU corresponding to the lowest-numbered online CPU, or just queues
the work on WORK_CPU_UNBOUND if there are no online CPUs corresponding
to this rcu_node structure.
Note that this patch uses cpu_is_offline() instead of the usual approach
of checking bits in the rcu_node structure's ->qsmaskinitnext field. This
is safe because preemption is disabled across both the cpu_is_offline()
check and the call to queue_work_on().
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
[ paulmck: Disable preemption to close offline race window. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Apply Peter Zijlstra feedback on CPU selection. ]
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
- Join split message for easier grepping,
- Use pr_*() instead of printk*(),
- Use %u to format unsigned cpu numbers.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180712144118.8819-1-geert+renesas@glider.be
When extracting bitfield from a number, btf_int_bits_seq_show() builds
a mask and accesses least significant byte of the number in a way
specific to little-endian. This patch fixes that by checking endianness
of the machine and then shifting left and right the unneeded bits.
Thanks to Martin Lau for the help in navigating potential pitfalls when
dealing with endianess and for the final solution.
Fixes: b00b8daec8 ("bpf: btf: Add pretty print capability for data with BTF type info")
Signed-off-by: Okash Khawaja <osk@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCW0ZhphQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qlxwAP4jKOp86ljtdqHSlh9mOHChnvwdMprQ
kaQ82WiamGikIwD/Vo/6ynsVl5eYMDO2VQSv6W1DHmhV9I/KgdqcQBl8jAk=
=RF9o
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.18-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull kprobe fix from Steven Rostedt:
"This fixes a memory leak in the kprobe code"
* tag 'trace-v4.18-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/kprobe: Release kprobe print_fmt properly
We don't release tk->tp.call.print_fmt when destroying
local uprobe. Also there's missing print_fmt kfree in
create_local_trace_kprobe error path.
Link: http://lkml.kernel.org/r/20180709141906.2390-1-jolsa@kernel.org
Cc: stable@vger.kernel.org
Fixes: e12f03d703 ("perf/core: Implement the 'perf_kprobe' PMU")
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
It is unwise to take spin locks from the handlers of trace events.
Mainly, because they can introduce lockups, because it introduces locks
in places that are normally not tested. Worse yet, because trace events
are tucked away in the include/trace/events/ directory, locks that are
taken there are forgotten about.
As a general rule, I tell people never to take any locks in a trace
event handler.
Several cgroup trace event handlers call cgroup_path() which eventually
takes the kernfs_rename_lock spinlock. This injects the spinlock in the
code without people realizing it. It also can cause issues for the
PREEMPT_RT patch, as the spinlock becomes a mutex, and the trace event
handlers are called with preemption disabled.
By moving the calculation of the cgroup_path() out of the trace event
handlers and into a macro (surrounded by a
trace_cgroup_##type##_enabled()), then we could place the cgroup_path
into a string, and pass that to the trace event. Not only does this
remove the taking of the spinlock out of the trace event handler, but
it also means that the cgroup_path() only needs to be called once (it
is currently called twice, once to get the length to reserver the
buffer for, and once again to get the path itself. Now it only needs to
be done once.
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
suppress_message_printing() is not longer called in console_unlock().
Therefore it is not longer needed with disabled CONFIG_PRINTK.
This fixes the warning:
kernel/printk/printk.c:2033:13: warning: ‘suppress_message_printing’ defined but not used [-Wunused-function]
static bool suppress_message_printing(int level) { return false; }
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Maninder Singh <maninder1.s@samsung.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Declaring the rseq_cs field as a union between __u64 and two __u32
allows both 32-bit and 64-bit kernels to read the full __u64, and
therefore validate that a 32-bit user-space cleared the upper 32
bits, thus ensuring a consistent behavior between native 32-bit
kernels and 32-bit compat tasks on 64-bit kernels.
Check that the rseq_cs value read is < TASK_SIZE.
The asm/byteorder.h header needs to be included by rseq.h, now
that it is not using linux/types_32_64.h anymore.
Considering that only __32 and __u64 types are declared in linux/rseq.h,
the linux/types.h header should always be included for both kernel and
user-space code: including stdint.h is just for u64 and u32, which are
not used in this header at all.
Use copy_from_user()/clear_user() to interact with a 64-bit field,
because arm32 does not implement 64-bit __get_user, and ppc32 does not
64-bit get_user. Considering that the rseq_cs pointer does not need to
be loaded/stored with single-copy atomicity from the kernel anymore, we
can simply use copy_from_user()/clear_user().
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-api@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Watson <davejwatson@fb.com>
Cc: Paul Turner <pjt@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Chris Lameter <cl@linux.com>
Cc: Ben Maurer <bmaurer@fb.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180709195155.7654-5-mathieu.desnoyers@efficios.com
Update rseq uapi header comments to reflect that user-space need to do
thread-local loads/stores from/to the struct rseq fields.
As a consequence of this added requirement, the kernel does not need
to perform loads/stores with single-copy atomicity.
Update the comment associated to the "flags" fields to describe
more accurately that it's only useful to facilitate single-stepping
through rseq critical sections with debuggers.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-api@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Watson <davejwatson@fb.com>
Cc: Paul Turner <pjt@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Chris Lameter <cl@linux.com>
Cc: Ben Maurer <bmaurer@fb.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180709195155.7654-4-mathieu.desnoyers@efficios.com