1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

28896 commits

Author SHA1 Message Date
Joel Fernandes (Google)
226ca5e766 rcu: Identify grace period is in progress as we advance up the tree
There's no need to keep checking the same starting node for whether a
grace period is in progress as we advance up the funnel lock loop. Its
sufficient if we just checked it in the start, and then subsequently
checked the internal nodes as we advanced up the combining tree. This
also makes sense because the grace-period updates propogate from the
root to the leaf, so there's a chance we may find a grace period has
started as we advance up, lets check for the same.

Reported-by: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:57 -07:00
Joel Fernandes (Google)
df2bf8f7f7 rcu: Use better variable names in funnel locking loop
The funnel locking loop in rcu_start_this_gp uses rcu_root as a
temporary variable while walking the combining tree. This causes a
tiresome exercise of a code reader reminding themselves that rcu_root
may not be root. Lets just call it rnp, and rename other variables as
well to be more appropriate.

Original patch: https://patchwork.kernel.org/patch/10396577/

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix name in comment as well. ]
2018-07-12 15:38:57 -07:00
Joel Fernandes
b73de91d6a rcu: Rename the grace-period-request variables and parameters
The name 'c' is used for variables and parameters holding the requested
grace-period sequence number.  However it is no longer very meaningful
given the conversions from ->gpnum and (especially) ->completed to
->gp_seq. This commit therefore renames 'c' to 'gp_seq_req'.

Previous patch discussion is at:
https://patchwork.kernel.org/patch/10396579/

Signed-off-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:56 -07:00
Paul E. McKenney
3d18469a2b rcu: Regularize resetting of rcu_data wrap indicator
The rcu_data structure's ->gpwrap indicator is currently reset only
when the CPU in question detects a new grace period.  This is in theory
sufficient because any CPU that has been out of action for long enough
that its ->gpwrap indicator is set is guaranteed to see both the end
of an old grace period and the start of a new one.

However, the current code leaves a short window during which the ->gpwrap
indicator has been reset but the corresponding ->gp_seq counter has not
yet been brought up to date.  This is harmless because interrupts are
disabled, but it is likely to (at the very least) cause confusion.

This commit therefore moves the resetting of ->gpwrap to follow the
updating of ->gp_seq.  While in the area, it also resets ->gp_seq_needed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:56 -07:00
Paul E. McKenney
d72193123c rcutorture: Correctly handle grace-period sequence wrap
The new ->gq_seq grace-period sequence numbers must be shifted down,
which give artifacts when these numbers wrap.  This commit therefore
enables rcutorture and rcuperf to handle grace-period sequence numbers
even if they do wrap.  It does this by allowing a special subtraction
function to be specified, and this function subtracts before shifting.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:55 -07:00
Paul E. McKenney
2e3e5e5501 rcu: Make rcu_start_this_gp() check for grace period already started
In the old days of ->gpnum and ->completed, the code requesting a new
grace period checked to see if that grace period had already started,
bailing early if so.  The new-age ->gp_seq approach instead checks
whether the grace period has already finished.  A compensating change
pushed the requested grace period down to the bottom of the tree, thus
reducing lock contention and even eliminating it in some cases.  But why
not further reduce contention, especially on large systems, by doing both,
especially given that the cost of doing both is extremely small?

This commit therefore adds a new rcu_seq_started() function that checks
whether a specified grace period has already started.  It then uses
this new function in place of rcu_seq_done() in the rcu_start_this_gp()
function's funnel locking code.

Reported-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:54 -07:00
Joel Fernandes (Google)
5ca0905f67 rcu: Fix cpustart tracepoint gp_seq number
The "cpustart" trace event shows a stale gp_seq. This is because it uses
rdp->gp_seq, which is updated only at the end of the __note_gp_changes()
function. This commit therefore instead uses rnp->gp_seq.

An alternative fix would be to update rdp->gp_seq earlier, but this would
break RCU's detection of the beginning of a new-to-this-CPU grace period.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:53 -07:00
Joel Fernandes (Google)
5b55072f22 rcu: Produce last "CleanupMore" trace only if late-breaking request
Currently Tree RCU's clean-up code emits a "CleanupMore" trace event in
response to late-arriving grace-period requests even if the grace period
was already requested. This makes "CleanupMore" show up an extra time (in
addition to once for each rcu_node structure that was previously marked
with the request), and for no good reason.  This commit therefore avoids
emitting this trace message unless the the only request for this next
grace period arrived during or after the cleanup scan of the rcu_node
structures.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:53 -07:00
Paul E. McKenney
a2165e4168 rcu: Don't funnel-lock above leaf node if GP in progress
The old grace-period start code would acquire only the leaf's rcu_node
structure's ->lock if that structure believed that a grace period was
in progress.  The new code advances to the leaf's parent in this case,
needlessly acquiring then leaf's parent's ->lock.  This commit therefore
checks the grace-period state after marking the leaf with the need for
the specified grace period, and if the leaf believes that a grace period
is in progress, takes an early exit.

Reported-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Add "Startedleaf" tracing as suggested by Joel Fernandes. ]
2018-07-12 15:38:52 -07:00
Paul E. McKenney
e44e73ca47 rcu: Make simple callback acceleration refer to rdp->gp_seq_needed
Now that the rcu_data structure contains ->gp_seq_needed, create an
rcu_accelerate_cbs_unlocked() helper function that locklessly checks to
see if new callbacks' required grace period has already been requested.
If so, update the callback list locally and again locklessly.  (Though
interrupts must be and are disabled to avoid racing with conflicting
updates in interrupt handlers.)

Otherwise, call rcu_accelerate_cbs() as before.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:49 -07:00
Paul E. McKenney
ff3bb6f4d0 rcu: Remove ->gpnum and ->completed
Now that everything has been converted to use ->gp_seq instead of
->gpnum and ->completed, this commit removes ->gpnum and ->completed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:48 -07:00
Paul E. McKenney
fee5997c17 rcu: Convert rcu_fqs tracepoint to ->gp_seq
This commit makes the rcu_fqs tracepoint use ->gp_seq instead of ->gpnum.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:47 -07:00
Paul E. McKenney
db023296f0 rcu: Convert rcu_quiescent_state_report tracepoint to ->gp_seq
This commit makes the rcu_quiescent_state_report tracepoint use ->gp_seq
instead of ->gpnum.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:47 -07:00
Paul E. McKenney
865aa1e08d rcu: Convert rcu_unlock_preempted_task tracepoint to ->gp_seq
This commit makes the rcu_unlock_preempted_task tracepoint use ->gp_seq
instead of ->gpnum.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:46 -07:00
Paul E. McKenney
598ce09480 rcu: Convert rcu_preempt_task tracepoint to ->gp_seq
This commit makes the rcu_preempt_task tracepoint use ->gp_seq instead
of ->gpnum.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:45 -07:00
Paul E. McKenney
abd13fdd95 rcu: Convert rcu_future_grace_period tracepoint to gp_seq
This commit makes the rcu_future_grace_period tracepoint use gp_seq
instead of ->gpnum and ->completed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:44 -07:00
Paul E. McKenney
477351f782 rcu: Convert rcu_grace_period tracepoint to gp_seq
This commit makes the rcu_grace_period tracepoint use gp_seq instead
of ->gpnum or ->completed.  It also introduces a "cpuofl-bgp" string to
less obscurely indicate when a CPU has gone offline while a grace period
is waiting on it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:38:43 -07:00
Paul E. McKenney
ab5e869c1f rcu: Make rcu_nocb_wait_gp() check if GP already requested
This commit makes rcu_nocb_wait_gp() check rdp->gp_seq_needed to see
if the current CPU already knows about the needed grace period having
already been requested.  If so, it avoids acquiring the corresponding
leaf rcu_node structure's ->lock, thus decreasing contention.  This
optimization is intended for cases where either multiple leader rcuo
kthreads are running on the same CPU or these kthreads are running on
a non-offloaded (e.g., housekeeping) CPU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Move lock release past "if" as suggested by Joel Fernandes. ]
[ paulmck: Fix caching of furthest-future requested grace period. ]
2018-07-12 15:38:42 -07:00
Paul E. McKenney
7a1d0f23ad rcu: Move from ->need_future_gp[] to ->gp_seq_needed
One problem with the ->need_future_gp[] array is that the grace-period
assignment of each element changes as the grace periods complete.
This means that it is necessary to hold a lock when checking this
array to learn if a given grace period has already been requested.
This increase lock contention, which is the opposite of helpful.
This commit therefore replaces the ->need_future_gp[] with a single
->gp_seq_needed value and keeps it updated in the rcu_data structure.

This will enable reliable lockless checking of whether or not a given
grace period has already been requested.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 15:37:48 -07:00
Paul E. McKenney
aebc82644b rcutorture: Convert rcutorture_get_gp_data() to ->gp_seq
SRCU has long used ->srcu_gp_seq, and now RCU uses ->gp_seq.  This
commit therefore moves the rcutorture_get_gp_data() function from
a ->gpnum / ->completed pair to ->gp_seq.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:57 -07:00
Paul E. McKenney
471f87c3d9 rcu: Make RCU CPU stall warnings use ->gp_seq
This commit makes the RCU CPU stall-warning code in print_other_cpu_stall(),
print_cpu_stall(), and check_cpu_stall() use ->gp_seq instead of ->gpnum
and ->completed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:56 -07:00
Paul E. McKenney
29365e563b rcu: Convert grace-period requests to ->gp_seq
This commit converts the grace-period request code paths from ->completed
and ->gpnum to ->gp_seq.  The need_future_gp_element() macro encapsulates
the shift operation required to use ->gp_seq as an index to the
->need_future_gp[] array.  The rcu_cbs_completed() function is removed
in favor of the rcu_seq_snap() function.  The rcu_start_this_gp()
gets some temporary consistency checks and uses rcu_seq_done(),
rcu_seq_current(), rcu_seq_state(), and rcu_gp_in_progress() in place
of the earlier open-coded comparisons of ->gpnum and ->completed.
The rcu_future_gp_cleanup() function replaces use of ->completed
with ->gp_seq.  The rcu_accelerate_cbs() function replaces a call to
rcu_cbs_completed() with one to rcu_seq_snap().  The rcu_advance_cbs()
function replaces an access to >completed with one to ->gp_seq and adds
some temporary warnings.  The rcu_nocb_wait_gp() function replaces a
call to rcu_cbs_completed() with one to rcu_seq_snap() and an open-coded
comparison with rcu_seq_done().

The temporary warnings will be removed when the various ->gpnum and
->completed fields are removed.  Their purpose is to locate code who
might still be using ->gpnum and ->completed.  (Much easier that way
than trying to trace down the causes of too-short grace periods and
grace-period hangs!)

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:55 -07:00
Paul E. McKenney
d43a5d32e1 rcu: Convert ->completedqs to ->gp_seq
This commit switches the quiescent-state no-backtracking checks from
->gpnum and ->completed to ->gp_seq.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:54 -07:00
Paul E. McKenney
8aa670cdac rcu: Convert ->rcu_iw_gpnum to ->gp_seq
This commit switches the interrupt-disabled detection mechanism to
->gp_seq.  This mechanism is used as part of RCU CPU stall warnings,
and detects cases where the stall is due to a CPU having interrupts
disabled.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:53 -07:00
Paul E. McKenney
ba04107fc9 rcu: Move rcu_gp_in_progress() to ->gp_seq
This commit makes rcu_gp_in_progress() use ->gp_seq instead of
->completed and ->gpnum.  The READ_ONCE() invocations are buried
in rcu_seq_current().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:52 -07:00
Paul E. McKenney
e0da2374c3 rcu: Move rcu_nocb_gp_get() to ->gp_seq
This commit makes rcu_try_advance_all_cbs() use ->gp_seq.  It uses
rcu_seq_ctr() in order to shift away the state bits, so that the
low-order bits of the result may safely be used to index ->nocb_gp_wq[].

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:52 -07:00
Paul E. McKenney
03c8cb765a rcu: Move rcu_try_advance_all_cbs() to ->gp_seq
This commit makes rcu_try_advance_all_cbs() use ->gp_seq, with the
exception of tracing, which will be converted later.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:51 -07:00
Paul E. McKenney
e05720b097 rcu: Move rcu_implicit_dynticks_qs() to ->gp_seq
This commit makes rcu_implicit_dynticks_qs() use ->gp_seq, with the
exception of tracing, which will be converted later.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:51 -07:00
Paul E. McKenney
a66ae8ae35 rcu: Convert rcu_gpnum_ovf() to ->gp_seq
This commit converts rcu_gpnum_ovf() to use ->gp_seq instead of ->gpnum.
Same size unsigned long, so same approach.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:50 -07:00
Paul E. McKenney
67e14c1e39 rcu: Move RCU's grace-period-change code to ->gp_seq
This commit moves __note_gp_changes(), note_gp_changes(), and
__rcu_pending() to ->gp_seq, creating new rcu_seq_completed_gp() and
rcu_seq_new_gp() functions for this purpose.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Reinstate "cpuend: trace as suggested by Joel Fernandes. ]
2018-07-12 14:27:50 -07:00
Paul E. McKenney
e4be81a2ed rcu: Convert conditional grace-period primitives to ->gp_seq
This commit converts get_state_synchronize_rcu(), cond_synchronize_rcu(),
get_state_synchronize_sched(), and cond_synchronize_sched() from ->gpnum
and ->completed to ->gp_seq.  Note that this also introduces a full
memory barrier in the already-done paths off cond_synchronize_rcu() and
cond_synchronize_sched(), as work with LKMM indicates that the earlier
smp_load_acquire() were insufficiently strong in some situations where
these two functions were called just as the grace period ended.  In such
cases, these two functions would not gain the benefit of memory ordering
at the end of the grace period.

Please note that the performance impact is negligible, as you shouldn't
be using either function anywhere near a fastpath in any case.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:49 -07:00
Paul E. McKenney
c9a24e2d0c rcu: Make quiescent-state reporting use ->gp_seq
This commit switches the functions reporting quiescent states from
use of ->gpnum to ->gp_seq.  In either case, the point is to handle
races where a given grace period ends before a quiescent state can
be reported.  Failing to catch these races would result in too-short
grace periods, hence the checking.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:48 -07:00
Paul E. McKenney
78c5a67f17 rcu: Convert rcu_check_gp_kthread_starvation() to GP sequence number
This commit switches rcu_check_gp_kthread_starvation() from printing
->gpnum and ->completed to printing ->gp_seq upon detecting a starving
RCU grace-period kthread during an RCU CPU stall warning.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:48 -07:00
Paul E. McKenney
17ef2fe97c rcu: Make rcutorture's batches-completed API use ->gp_seq
The rcutorture test invokes rcu_batches_started(),
rcu_batches_completed(), rcu_batches_started_bh(),
rcu_batches_completed_bh(), rcu_batches_started_sched(), and
rcu_batches_completed_sched() to do grace-period consistency checks,
and rcuperf uses the _completed variants for statistics.
These functions use ->gpnum and ->completed.  This commit therefore
replaces them with rcu_get_gp_seq(), rcu_bh_get_gp_seq(), and
rcu_sched_get_gp_seq(), adjusting rcutorture and rcuperf to make
use of them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:47 -07:00
Paul E. McKenney
dee4f42298 rcu: Move rcu_gp_slow() to ->gp_seq
This commit moves rcu_gp_slow() to ->gp_seq.  This function only uses
the grace-period number to modulate delay, so rcu_seq_ctr(rsp->gp_seq)
gets the same effect, at least in cases where the delay is to happen
more than four times per wrap of an unsigned long.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:46 -07:00
Paul E. McKenney
de30ad512a rcu: Introduce grace-period sequence numbers
This commit adds grace-period sequence numbers (->gp_seq) to the
rcu_state, rcu_node, and rcu_data structures, and updates them.
It also checks for consistency between rsp->gpnum and rsp->gp_seq.
These ->gp_seq counters will eventually replace the existing ->gpnum
and ->completed counters, allowing a single memory access to determine
whether or not a grace period is in progress and if so, which one.
This in turn will enable changes that will reduce ->lock contention on
the leaf rcu_node structures.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:27:46 -07:00
Paul E. McKenney
609af1cdf0 Merge branches 'expedited.2018.07.12a', 'fixes.2018.07.12a', 'srcu.2018.06.25b' and 'torture.2018.06.25b' into HEAD
expedited.2018.07.12a: Expedited grace-period updates.
fixes.2018.07.12a: Pre-gp_seq miscellaneous fixes.
srcu.2018.06.25b: SRCU updates.
torture.2018.06.25b: Pre-gp_seq torture-test updates.
2018-07-12 14:26:14 -07:00
Paul E. McKenney
18390aeae7 rcu: Make rcu_gp_cleanup() write only once to ->gp_flags
At the end of rcu_gp_cleanup(), if another grace period is needed, but
not via rcu_accelerate_cbs(), the ->gp_flags field is written twice,
once when making the new grace-period request, and once when clearing
all other types of requests.  This commit therefore adds an else-clause
to avoid this double write.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:25:17 -07:00
Paul E. McKenney
26d950a945 rcu: Diagnostics for grace-period startup hangs
This commit causes a splat if RCU is idle and a request for a new grace
period is ignored for more than one second.  This splat normally indicates
that some code path asked for a new grace period, but failed to wake up
the RCU grace-period kthread.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix bug located by Dan Carpenter and his static checker. ]
[ paulmck: Fix self-deadlock bug located 0day test robot. ]
[ paulmck: Disable unless CONFIG_PROVE_RCU=y. ]
2018-07-12 14:24:42 -07:00
Daniel Borkmann
c7a8978432 bpf: don't leave partial mangled prog in jit_subprogs error path
syzkaller managed to trigger the following bug through fault injection:

  [...]
  [  141.043668] verifier bug. No program starts at insn 3
  [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
                 get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
  [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
                 fixup_call_args kernel/bpf/verifier.c:5587 [inline]
  [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
                 bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
  [  141.047355] CPU: 3 PID: 4072 Comm: a.out Not tainted 4.18.0-rc4+ #51
  [  141.048446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS 1.10.2-1 04/01/2014
  [  141.049877] Call Trace:
  [  141.050324]  __dump_stack lib/dump_stack.c:77 [inline]
  [  141.050324]  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  [  141.050950]  ? dump_stack_print_info.cold.2+0x52/0x52 lib/dump_stack.c:60
  [  141.051837]  panic+0x238/0x4e7 kernel/panic.c:184
  [  141.052386]  ? add_taint.cold.5+0x16/0x16 kernel/panic.c:385
  [  141.053101]  ? __warn.cold.8+0x148/0x1ba kernel/panic.c:537
  [  141.053814]  ? __warn.cold.8+0x117/0x1ba kernel/panic.c:530
  [  141.054506]  ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
  [  141.054506]  ? fixup_call_args kernel/bpf/verifier.c:5587 [inline]
  [  141.054506]  ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
  [  141.055163]  __warn.cold.8+0x163/0x1ba kernel/panic.c:538
  [  141.055820]  ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
  [  141.055820]  ? fixup_call_args kernel/bpf/verifier.c:5587 [inline]
  [  141.055820]  ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
  [...]

What happens in jit_subprogs() is that kcalloc() for the subprog func
buffer is failing with NULL where we then bail out. Latter is a plain
return -ENOMEM, and this is definitely not okay since earlier in the
loop we are walking all subprogs and temporarily rewrite insn->off to
remember the subprog id as well as insn->imm to temporarily point the
call to __bpf_call_base + 1 for the initial JIT pass. Thus, bailing
out in such state and handing this over to the interpreter is troublesome
since later/subsequent e.g. find_subprog() lookups are based on wrong
insn->imm.

Therefore, once we hit this point, we need to jump to out_free path
where we undo all changes from earlier loop, so that interpreter can
work on unmodified insn->{off,imm}.

Another point is that should find_subprog() fail in jit_subprogs() due
to a verifier bug, then we also should not simply defer the program to
the interpreter since also here we did partial modifications. Instead
we should just bail out entirely and return an error to the user who is
trying to load the program.

Fixes: 1c2a088a66 ("bpf: x64: add JIT support for multi-function programs")
Reported-by: syzbot+7d427828b2ea6e592804@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-07-12 14:00:54 -07:00
Thomas Gleixner
c6bb11147e Merge branch 'fortglx/4.19/time' of https://git.linaro.org/people/john.stultz/linux into timers/core
Pull timekeeping updates from John Stultz:

  - Make the timekeeping update more precise when NTP frequency is set
    directly by updating the multiplier.

  - Adjust selftests
2018-07-12 22:19:58 +02:00
Boqun Feng
fcc6354365 rcu: Make expedited GPs handle CPU 0 being offline
Currently, the parallelized initialization of expedited grace periods uses
the workqueue associated with each rcu_node structure's ->grplo field.
This works fine unless that CPU is offline.  This commit therefore uses
the CPU corresponding to the lowest-numbered online CPU, or just queues
the work on WORK_CPU_UNBOUND if there are no online CPUs corresponding
to this rcu_node structure.

Note that this patch uses cpu_is_offline() instead of the usual approach
of checking bits in the rcu_node structure's ->qsmaskinitnext field.  This
is safe because preemption is disabled across both the cpu_is_offline()
check and the call to queue_work_on().

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
[ paulmck: Disable preemption to close offline race window. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Apply Peter Zijlstra feedback on CPU selection. ]
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2018-07-12 12:36:06 -07:00
Geert Uytterhoeven
7a6e55375d hrtimer: Improve kernel message printing
- Join split message for easier grepping,
  - Use pr_*() instead of printk*(),
  - Use %u to format unsigned cpu numbers.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180712144118.8819-1-geert+renesas@glider.be
2018-07-12 21:29:30 +02:00
Okash Khawaja
b65f370d06 bpf: btf: Fix bitfield extraction for big endian
When extracting bitfield from a number, btf_int_bits_seq_show() builds
a mask and accesses least significant byte of the number in a way
specific to little-endian. This patch fixes that by checking endianness
of the machine and then shifting left and right the unneeded bits.

Thanks to Martin Lau for the help in navigating potential pitfalls when
dealing with endianess and for the final solution.

Fixes: b00b8daec8 ("bpf: btf: Add pretty print capability for data with BTF type info")
Signed-off-by: Okash Khawaja <osk@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-11 22:36:08 +02:00
Linus Torvalds
c25c74b747 This fixes a memory leak in the kprobe code.
-----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCW0ZhphQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qlxwAP4jKOp86ljtdqHSlh9mOHChnvwdMprQ
 kaQ82WiamGikIwD/Vo/6ynsVl5eYMDO2VQSv6W1DHmhV9I/KgdqcQBl8jAk=
 =RF9o
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.18-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull kprobe fix from Steven Rostedt:
 "This fixes a memory leak in the kprobe code"

* tag 'trace-v4.18-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/kprobe: Release kprobe print_fmt properly
2018-07-11 13:03:51 -07:00
Jiri Olsa
0fc8c3581d tracing/kprobe: Release kprobe print_fmt properly
We don't release tk->tp.call.print_fmt when destroying
local uprobe. Also there's missing print_fmt kfree in
create_local_trace_kprobe error path.

Link: http://lkml.kernel.org/r/20180709141906.2390-1-jolsa@kernel.org

Cc: stable@vger.kernel.org
Fixes: e12f03d703 ("perf/core: Implement the 'perf_kprobe' PMU")
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-11 15:50:52 -04:00
Steven Rostedt (VMware)
e4f8d81c73 cgroup/tracing: Move taking of spin lock out of trace event handlers
It is unwise to take spin locks from the handlers of trace events.
Mainly, because they can introduce lockups, because it introduces locks
in places that are normally not tested. Worse yet, because trace events
are tucked away in the include/trace/events/ directory, locks that are
taken there are forgotten about.

As a general rule, I tell people never to take any locks in a trace
event handler.

Several cgroup trace event handlers call cgroup_path() which eventually
takes the kernfs_rename_lock spinlock. This injects the spinlock in the
code without people realizing it. It also can cause issues for the
PREEMPT_RT patch, as the spinlock becomes a mutex, and the trace event
handlers are called with preemption disabled.

By moving the calculation of the cgroup_path() out of the trace event
handlers and into a macro (surrounded by a
trace_cgroup_##type##_enabled()), then we could place the cgroup_path
into a string, and pass that to the trace event. Not only does this
remove the taking of the spinlock out of the trace event handler, but
it also means that the cgroup_path() only needs to be called once (it
is currently called twice, once to get the length to reserver the
buffer for, and once again to get the path itself. Now it only needs to
be done once.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2018-07-11 10:48:47 -07:00
Petr Mladek
ffaa619af1 printk: Fix warning about unused suppress_message_printing
suppress_message_printing() is not longer called in console_unlock().
Therefore it is not longer needed with disabled CONFIG_PRINTK.

This fixes the warning:

kernel/printk/printk.c:2033:13: warning: ‘suppress_message_printing’ defined but not used [-Wunused-function]
 static bool suppress_message_printing(int level) { return false; }

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Maninder Singh <maninder1.s@samsung.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-07-11 10:38:23 +02:00
Mathieu Desnoyers
ec9c82e03a rseq: uapi: Declare rseq_cs field as union, update includes
Declaring the rseq_cs field as a union between __u64 and two __u32
allows both 32-bit and 64-bit kernels to read the full __u64, and
therefore validate that a 32-bit user-space cleared the upper 32
bits, thus ensuring a consistent behavior between native 32-bit
kernels and 32-bit compat tasks on 64-bit kernels.

Check that the rseq_cs value read is < TASK_SIZE.

The asm/byteorder.h header needs to be included by rseq.h, now
that it is not using linux/types_32_64.h anymore.

Considering that only __32 and __u64 types are declared in linux/rseq.h,
the linux/types.h header should always be included for both kernel and
user-space code: including stdint.h is just for u64 and u32, which are
not used in this header at all.

Use copy_from_user()/clear_user() to interact with a 64-bit field,
because arm32 does not implement 64-bit __get_user, and ppc32 does not
64-bit get_user. Considering that the rseq_cs pointer does not need to
be loaded/stored with single-copy atomicity from the kernel anymore, we
can simply use copy_from_user()/clear_user().

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-api@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Watson <davejwatson@fb.com>
Cc: Paul Turner <pjt@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Chris Lameter <cl@linux.com>
Cc: Ben Maurer <bmaurer@fb.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180709195155.7654-5-mathieu.desnoyers@efficios.com
2018-07-10 22:18:52 +02:00
Mathieu Desnoyers
0fb9a1abc8 rseq: uapi: Update uapi comments
Update rseq uapi header comments to reflect that user-space need to do
thread-local loads/stores from/to the struct rseq fields.

As a consequence of this added requirement, the kernel does not need
to perform loads/stores with single-copy atomicity.

Update the comment associated to the "flags" fields to describe
more accurately that it's only useful to facilitate single-stepping
through rseq critical sections with debuggers.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-api@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Watson <davejwatson@fb.com>
Cc: Paul Turner <pjt@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Chris Lameter <cl@linux.com>
Cc: Ben Maurer <bmaurer@fb.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180709195155.7654-4-mathieu.desnoyers@efficios.com
2018-07-10 22:18:52 +02:00