1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

28415 commits

Author SHA1 Message Date
Peter Zijlstra
f83ee19be4 kthread: Simplify kthread_park() completion
Oleg explains the reason we could hit park+park is that
smpboot_update_cpumask_percpu_thread()'s

  for_each_cpu_and(cpu, &tmp, cpu_online_mask)
	smpboot_park_kthread();

turns into:

  for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask, (void)and)
	smpboot_park_kthread();

on UP, ignoring the mask. But since we just completely removed that
function, this is no longer relevant.

So revert commit:

  b1f5b378e1 ("kthread: Allow kthread_park() on a parked kthread")

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-03 09:20:44 +02:00
Peter Zijlstra
167a88677b smpboot: Remove cpumask from the API
Now that the sole use of the whole smpboot_*cpumask() API is gone,
remove it.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-03 09:20:44 +02:00
Peter Zijlstra
9cf57731b6 watchdog/softlockup: Replace "watchdog/%u" threads with cpu_stop_work
Oleg suggested to replace the "watchdog/%u" threads with
cpu_stop_work. That removes one thread per CPU while at the same time
fixes softlockup vs SCHED_DEADLINE.

But more importantly, it does away with the single
smpboot_update_cpumask_percpu_thread() user, which allows
cleanups/shrinkage of the smpboot interface.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-03 09:20:43 +02:00
Ingo Molnar
4520843dfa Merge branch 'sched/urgent' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-03 09:20:22 +02:00
Peter Zijlstra
1cef1150ef kthread, sched/core: Fix kthread_parkme() (again...)
Gaurav reports that commit:

  85f1abe001 ("kthread, sched/wait: Fix kthread_parkme() completion issue")

isn't working for him. Because of the following race:

> controller Thread                               CPUHP Thread
> takedown_cpu
> kthread_park
> kthread_parkme
> Set KTHREAD_SHOULD_PARK
>                                                 smpboot_thread_fn
>                                                 set Task interruptible
>
>
> wake_up_process
>  if (!(p->state & state))
>                 goto out;
>
>                                                 Kthread_parkme
>                                                 SET TASK_PARKED
>                                                 schedule
>                                                 raw_spin_lock(&rq->lock)
> ttwu_remote
> waiting for __task_rq_lock
>                                                 context_switch
>
>                                                 finish_lock_switch
>
>
>
>                                                 Case TASK_PARKED
>                                                 kthread_park_complete
>
>
> SET Running

Furthermore, Oleg noticed that the whole scheduler TASK_PARKED
handling is buggered because the TASK_DEAD thing is done with
preemption disabled, the current code can still complete early on
preemption :/

So basically revert that earlier fix and go with a variant of the
alternative mentioned in the commit. Promote TASK_PARKED to special
state to avoid the store-store issue on task->state leading to the
WARN in kthread_unpark() -> __kthread_bind().

But in addition, add wait_task_inactive() to kthread_park() to ensure
the task really is PARKED when we return from kthread_park(). This
avoids the whole kthread still gets migrated nonsense -- although it
would be really good to get this done differently.

Reported-by: Gaurav Kohli <gkohli@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 85f1abe001 ("kthread, sched/wait: Fix kthread_parkme() completion issue")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-03 09:17:30 +02:00
Vincent Guittot
3482d98bbc sched/util_est: Fix util_est_dequeue() for throttled cfs_rq
When a cfs_rq is throttled, parent cfs_rq->nr_running is decreased and
everything happens at cfs_rq level. Currently util_est stays unchanged
in such case and it keeps accounting the utilization of throttled tasks.
This can somewhat make sense as we don't dequeue tasks but only throttled
cfs_rq.

If a task of another group is enqueued/dequeued and root cfs_rq becomes
idle during the dequeue, util_est will be cleared whereas it was
accounting util_est of throttled tasks before. So the behavior of util_est
is not always the same regarding throttled tasks and depends of side
activity. Furthermore, util_est will not be updated when the cfs_rq is
unthrottled as everything happens at cfs_rq level. Main results is that
util_est will stay null whereas we now have running tasks. We have to wait
for the next dequeue/enqueue of the previously throttled tasks to get an
up to date util_est.

Remove the assumption that cfs_rq's estimated utilization of a CPU is 0
if there is no running task so the util_est of a task remains until the
latter is dequeued even if its cfs_rq has been throttled.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 7f65ea42eb ("sched/fair: Add util_est on top of PELT")
Link: http://lkml.kernel.org/r/1528972380-16268-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-03 09:17:30 +02:00
Xunlei Pang
f1d1be8aee sched/fair: Advance global expiration when period timer is restarted
When period gets restarted after some idle time, start_cfs_bandwidth()
doesn't update the expiration information, expire_cfs_rq_runtime() will
see cfs_rq->runtime_expires smaller than rq clock and go to the clock
drift logic, wasting needless CPU cycles on the scheduler hot path.

Update the global expiration in start_cfs_bandwidth() to avoid frequent
expire_cfs_rq_runtime() calls once a new period begins.

Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180620101834.24455-2-xlpang@linux.alibaba.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-03 09:17:29 +02:00
Xunlei Pang
512ac999d2 sched/fair: Fix bandwidth timer clock drift condition
I noticed that cgroup task groups constantly get throttled even
if they have low CPU usage, this causes some jitters on the response
time to some of our business containers when enabling CPU quotas.

It's very simple to reproduce:

  mkdir /sys/fs/cgroup/cpu/test
  cd /sys/fs/cgroup/cpu/test
  echo 100000 > cpu.cfs_quota_us
  echo $$ > tasks

then repeat:

  cat cpu.stat | grep nr_throttled  # nr_throttled will increase steadily

After some analysis, we found that cfs_rq::runtime_remaining will
be cleared by expire_cfs_rq_runtime() due to two equal but stale
"cfs_{b|q}->runtime_expires" after period timer is re-armed.

The current condition to judge clock drift in expire_cfs_rq_runtime()
is wrong, the two runtime_expires are actually the same when clock
drift happens, so this condtion can never hit. The orginal design was
correctly done by this commit:

  a9cf55b286 ("sched: Expire invalid runtime")

... but was changed to be the current implementation due to its locking bug.

This patch introduces another way, it adds a new field in both structures
cfs_rq and cfs_bandwidth to record the expiration update sequence, and
uses them to figure out if clock drift happens (true if they are equal).

Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 51f2176d74 ("sched/fair: Fix unlocked reads of some cfs_b->quota/period")
Link: http://lkml.kernel.org/r/20180620101834.24455-1-xlpang@linux.alibaba.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-03 09:17:29 +02:00
Vincent Guittot
296b2ffe7f sched/rt: Fix call to cpufreq_update_util()
With commit:

  8f111bc357 ("cpufreq/schedutil: Rewrite CPUFREQ_RT support")

the schedutil governor uses rq->rt.rt_nr_running to detect whether an
RT task is currently running on the CPU and to set frequency to max
if necessary.

cpufreq_update_util() is called in enqueue/dequeue_top_rt_rq() but
rq->rt.rt_nr_running has not been updated yet when dequeue_top_rt_rq() is
called so schedutil still considers that an RT task is running when the
last task is dequeued. The update of rq->rt.rt_nr_running happens later
in dequeue_rt_stack().

In fact, we can take advantage of the sequence that the dequeue then
re-enqueue rt entities when a rt task is enqueued or dequeued;
As a result enqueue_top_rt_rq() is always called when a task is
enqueued or dequeued and also when groups are throttled or unthrottled.
The only place that not use enqueue_top_rt_rq() is when root rt_rq is
throttled.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: efault@gmx.de
Cc: juri.lelli@redhat.com
Cc: patrick.bellasi@arm.com
Cc: viresh.kumar@linaro.org
Fixes: 8f111bc357 ('cpufreq/schedutil: Rewrite CPUFREQ_RT support')
Link: http://lkml.kernel.org/r/1530021202-21695-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-03 09:17:28 +02:00
Frederic Weisbecker
d9c0ffcabd sched/nohz: Skip remote tick on idle task entirely
Some people have reported that the warning in sched_tick_remote()
occasionally triggers, especially in favour of some RCU-Torture
pressure:

	WARNING: CPU: 11 PID: 906 at kernel/sched/core.c:3138 sched_tick_remote+0xb6/0xc0
	Modules linked in:
	CPU: 11 PID: 906 Comm: kworker/u32:3 Not tainted 4.18.0-rc2+ #1
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
	Workqueue: events_unbound sched_tick_remote
	RIP: 0010:sched_tick_remote+0xb6/0xc0
	Code: e8 0f 06 b8 00 c6 03 00 fb eb 9d 8b 43 04 85 c0 75 8d 48 8b 83 e0 0a 00 00 48 85 c0 75 81 eb 88 48 89 df e8 bc fe ff ff eb aa <0f> 0b eb
	+c5 66 0f 1f 44 00 00 bf 17 00 00 00 e8 b6 2e fe ff 0f b6
	Call Trace:
	 process_one_work+0x1df/0x3b0
	 worker_thread+0x44/0x3d0
	 kthread+0xf3/0x130
	 ? set_worker_desc+0xb0/0xb0
	 ? kthread_create_worker_on_cpu+0x70/0x70
	 ret_from_fork+0x35/0x40

This happens when the remote tick applies on an idle task. Usually the
idle_cpu() check avoids that, but it is performed before we lock the
runqueue and it is therefore racy. It was intended to be that way in
order to prevent from useless runqueue locks since idle task tick
callback is a no-op.

Now if the racy check slips out of our hands and we end up remotely
ticking an idle task, the empty task_tick_idle() is harmless. Still
it won't pass the WARN_ON_ONCE() test that ensures rq_clock_task() is
not too far from curr->se.exec_start because update_curr_idle() doesn't
update the exec_start value like other scheduler policies. Hence the
reported false positive.

So let's have another check, while the rq is locked, to make sure we
don't remote tick on an idle task. The lockless idle_cpu() still applies
to avoid unecessary rq lock contention.

Reported-by: Jacek Tomaka <jacekt@dug.com>
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reported-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1530203381-31234-1-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-03 09:17:28 +02:00
Linus Torvalds
4e33d7d479 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Verify netlink attributes properly in nf_queue, from Eric Dumazet.

 2) Need to bump memory lock rlimit for test_sockmap bpf test, from
    Yonghong Song.

 3) Fix VLAN handling in lan78xx driver, from Dave Stevenson.

 4) Fix uninitialized read in nf_log, from Jann Horn.

 5) Fix raw command length parsing in mlx5, from Alex Vesker.

 6) Cleanup loopback RDS connections upon netns deletion, from Sowmini
    Varadhan.

 7) Fix regressions in FIB rule matching during create, from Jason A.
    Donenfeld and Roopa Prabhu.

 8) Fix mpls ether type detection in nfp, from Pieter Jansen van Vuuren.

 9) More bpfilter build fixes/adjustments from Masahiro Yamada.

10) Fix XDP_{TX,REDIRECT} flushing in various drivers, from Jesper
    Dangaard Brouer.

11) fib_tests.sh file permissions were broken, from Shuah Khan.

12) Make sure BH/preemption is disabled in data path of mac80211, from
    Denis Kenzior.

13) Don't ignore nla_parse_nested() return values in nl80211, from
    Johannes berg.

14) Properly account sock objects ot kmemcg, from Shakeel Butt.

15) Adjustments to setting bpf program permissions to read-only, from
    Daniel Borkmann.

16) TCP Fast Open key endianness was broken, it always took on the host
    endiannness. Whoops. Explicitly make it little endian. From Yuching
    Cheng.

17) Fix prefix route setting for link local addresses in ipv6, from
    David Ahern.

18) Potential Spectre v1 in zatm driver, from Gustavo A. R. Silva.

19) Various bpf sockmap fixes, from John Fastabend.

20) Use after free for GRO with ESP, from Sabrina Dubroca.

21) Passing bogus flags to crypto_alloc_shash() in ipv6 SR code, from
    Eric Biggers.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
  qede: Adverstise software timestamp caps when PHC is not available.
  qed: Fix use of incorrect size in memcpy call.
  qed: Fix setting of incorrect eswitch mode.
  qed: Limit msix vectors in kdump kernel to the minimum required count.
  ipvlan: call dev_change_flags when ipvlan mode is reset
  ipv6: sr: fix passing wrong flags to crypto_alloc_shash()
  net: fix use-after-free in GRO with ESP
  tcp: prevent bogus FRTO undos with non-SACK flows
  bpf: sockhash, add release routine
  bpf: sockhash fix omitted bucket lock in sock_close
  bpf: sockmap, fix smap_list_map_remove when psock is in many maps
  bpf: sockmap, fix crash when ipv6 sock is added
  net: fib_rules: bring back rule_exists to match rule during add
  hv_netvsc: split sub-channel setup into async and sync
  net: use dev_change_tx_queue_len() for SIOCSIFTXQLEN
  atm: zatm: Fix potential Spectre v1
  s390/qeth: consistently re-enable device features
  s390/qeth: don't clobber buffer on async TX completion
  s390/qeth: avoid using is_multicast_ether_addr_64bits on (u8 *)[6]
  s390/qeth: fix race when setting MAC address
  ...
2018-07-02 11:18:28 -07:00
Mauro Carvalho Chehab
ea272257cc docs: histogram.txt: convert it to ReST file format
Despite being mentioned at Documentation/trace/ftrace.rst as a
rst file, this file was still a text one, with several issues.

Convert it to ReST and add it to the trace index:
- Mark the document title as such;
- Identify and indent the literal blocks;
- Use the proper markups for table.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-07-02 11:26:02 -06:00
Chengguang Xu
d5641c64c4 PM / hibernate: cast PAGE_SIZE to int when comparing with error code
If PAGE_SIZE is unsigned type then negative error code will be
larger than PAGE_SIZE.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-07-02 11:48:30 +02:00
Jessica Yu
f314dfea16 modsign: log module name in the event of an error
Now that we have the load_info struct all initialized (including
info->name, which contains the name of the module) before
module_sig_check(), make the load_info struct and hence module name
available to mod_verify_sig() so that we can log the module name in the
event of an error.

Signed-off-by: Jessica Yu <jeyu@kernel.org>
2018-07-02 11:36:17 +02:00
Thomas Gleixner
5f936e19cc alarmtimer: Prevent overflow for relative nanosleep
Air Icy reported:

  UBSAN: Undefined behaviour in kernel/time/alarmtimer.c:811:7
  signed integer overflow:
  1529859276030040771 + 9223372036854775807 cannot be represented in type 'long long int'
  Call Trace:
   alarm_timer_nsleep+0x44c/0x510 kernel/time/alarmtimer.c:811
   __do_sys_clock_nanosleep kernel/time/posix-timers.c:1235 [inline]
   __se_sys_clock_nanosleep kernel/time/posix-timers.c:1213 [inline]
   __x64_sys_clock_nanosleep+0x326/0x4e0 kernel/time/posix-timers.c:1213
   do_syscall_64+0xb8/0x3a0 arch/x86/entry/common.c:290

alarm_timer_nsleep() uses ktime_add() to add the current time and the
relative expiry value. ktime_add() has no sanity checks so the addition
can overflow when the relative timeout is large enough.

Use ktime_add_safe() which has the necessary sanity checks in place and
limits the result to the valid range.

Fixes: 9a7adcf5c6 ("timers: Posix interface for alarm-timers")
Reported-by: Team OWL337 <icytxw@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1807020926360.1595@nanos.tec.linutronix.de
2018-07-02 11:33:26 +02:00
Thomas Gleixner
78c9c4dfbf posix-timers: Sanitize overrun handling
The posix timer overrun handling is broken because the forwarding functions
can return a huge number of overruns which does not fit in an int. As a
consequence timer_getoverrun(2) and siginfo::si_overrun can turn into
random number generators.

The k_clock::timer_forward() callbacks return a 64 bit value now. Make
k_itimer::ti_overrun[_last] 64bit as well, so the kernel internal
accounting is correct. 3Remove the temporary (int) casts.

Add a helper function which clamps the overrun value returned to user space
via timer_getoverrun(2) or siginfo::si_overrun limited to a positive value
between 0 and INT_MAX. INT_MAX is an indicator for user space that the
overrun value has been clamped.

Reported-by: Team OWL337 <icytxw@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Link: https://lkml.kernel.org/r/20180626132705.018623573@linutronix.de
2018-07-02 11:33:25 +02:00
Thomas Gleixner
6fec64e1c9 posix-timers: Make forward callback return s64
The posix timer ti_overrun handling is broken because the forwarding
functions can return a huge number of overruns which does not fit in an
int. As a consequence timer_getoverrun(2) and siginfo::si_overrun can turn
into random number generators.

As a first step to address that let the timer_forward() callbacks return
the full 64 bit value.

Cast it to (int) temporarily until k_itimer::ti_overrun is converted to
64bit and the conversion to user space visible values is sanitized.

Reported-by: Team OWL337 <icytxw@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Link: https://lkml.kernel.org/r/20180626132704.922098090@linutronix.de
2018-07-02 11:33:25 +02:00
Thomas Gleixner
0cc3cd2165 cpu/hotplug: Boot HT siblings at least once
Due to the way Machine Check Exceptions work on X86 hyperthreads it's
required to boot up _all_ logical cores at least once in order to set the
CR4.MCE bit.

So instead of ignoring the sibling threads right away, let them boot up
once so they can configure themselves. After they came out of the initial
boot stage check whether its a "secondary" sibling and cancel the operation
which puts the CPU back into offline state.

Reported-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tony Luck <tony.luck@intel.com>
2018-07-02 11:25:29 +02:00
Linus Torvalds
c350d6d1d7 Add a missing export required by riscv and unicore
-----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAls4b+ILHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYNrVw//SiXXoTDekxaQmyExQcEhac4ZzTjERoDFaE9zo14D
 x7iwdOXFQN9pRJwtUX70QDxQxc8iJT1j6CXMRUJZL9ZKdEOVbqfX2k7P8W4TJkhU
 id2PeZTxmreWOKL8EJB1mEjckZxP/qQLKBp9qwNET6rPf18WmSDb6wQuUcaLgDhl
 n4UJrzXuiCaEQNor30BvGDM4jQ4IOFh2JVjWqwPubcgPgOoopZ0TWSPeAxRo38MH
 2hDTSMFEV/QqM/slDhHxJacwFyMc2mWxyHg66/YGzTmX/ZciABq2eMiTwhSMumSp
 wDP/4/9TvaM6QGPkWblTFr18sRECBaNo59e5Lz0g/KtJCXwIeAiBqCKaIct6CYAf
 hHdSaxvYzPgIOs91uSgGbhtlweg3TdjSRJIbB8k1xkub7uAnT0hpeHnqPy4DUfjy
 DB25YTt0Qabnmfyz37UpbCQN0PPYY4TLvZc9N81WXiKW2BOIogvNeiCNbS06rvXa
 wxn/haocbMSbh3k41jxbkrFBKiALiz1QaZRgH0omdOvXSN//WgI8qAt6F8begDc8
 dBu5aTu50oDv6kLoe4aONTH72Jrt/q6xBI9dN5cD73luBm1p+QyoYhekHUPNMef7
 7ek/lBJLo1Xq+bEuqgnaPZRC7mWYGTvnEe+r0uYW3cOw+PGx/VNfPc6B43J/YaCS
 Hv4=
 =IUWC
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-4.18-2' of git://git.infradead.org/users/hch/dma-mapping

Pull dma mapping fixlet from Christoph Hellwig:
 "Add a missing export required by riscv and unicore"

* tag 'dma-mapping-4.18-2' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: export swiotlb_dma_ops
2018-07-01 10:45:13 -07:00
David S. Miller
271b955e52 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-07-01

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) A bpf_fib_lookup() helper fix to change the API before freeze to
   return an encoding of the FIB lookup result and return the nexthop
   device index in the params struct (instead of device index as return
   code that we had before), from David.

2) Various BPF JIT fixes to address syzkaller fallout, that is, do not
   reject progs when set_memory_*() fails since it could still be RO.
   Also arm32 JIT was not using bpf_jit_binary_lock_ro() API which was
   an issue, and a memory leak in s390 JIT found during review, from
   Daniel.

3) Multiple fixes for sockmap/hash to address most of the syzkaller
   triggered bugs. Usage with IPv6 was crashing, a GPF in bpf_tcp_close(),
   a missing sock_map_release() routine to hook up to callbacks, and a
   fix for an omitted bucket lock in sock_close(), from John.

4) Two bpftool fixes to remove duplicated error message on program load,
   and another one to close the libbpf object after program load. One
   additional fix for nfp driver's BPF offload to avoid stopping offload
   completely if replace of program failed, from Jakub.

5) Couple of BPF selftest fixes that bail out in some of the test
   scripts if the user does not have the right privileges, from Jeffrin.

6) Fixes in test_bpf for s390 when CONFIG_BPF_JIT_ALWAYS_ON is set
   where we need to set the flag that some of the test cases are expected
   to fail, from Kleber.

7) Fix to detangle BPF_LIRC_MODE2 dependency from CONFIG_CGROUP_BPF
   since it has no relation to it and lirc2 users often have configs
   without cgroups enabled and thus would not be able to use it, from Sean.

8) Fix a selftest failure in sockmap by removing a useless setrlimit()
   call that would set a too low limit where at the same time we are
   already including bpf_rlimit.h that does the job, from Yonghong.

9) Fix BPF selftest config with missing missing NET_SCHED, from Anders.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-01 09:27:44 +09:00
John Fastabend
caac76a517 bpf: sockhash, add release routine
Add map_release_uref pointer to hashmap ops. This was dropped when
original sockhash code was ported into bpf-next before initial
commit.

Fixes: 8111038444 ("bpf: sockmap, add hash map support")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:21:32 +02:00
John Fastabend
e9db4ef6bf bpf: sockhash fix omitted bucket lock in sock_close
First the sk_callback_lock() was being used to protect both the
sock callback hooks and the psock->maps list. This got overly
convoluted after the addition of sockhash (in sockmap it made
some sense because masp and callbacks were tightly coupled) so
lets split out a specific lock for maps and only use the callback
lock for its intended purpose. This fixes a couple cases where
we missed using maps lock when it was in fact needed. Also this
makes it easier to follow the code because now we can put the
locking closer to the actual code its serializing.

Next, in sock_hash_delete_elem() the pattern was as follows,

  sock_hash_delete_elem()
     [...]
     spin_lock(bucket_lock)
     l = lookup_elem_raw()
     if (l)
        hlist_del_rcu()
        write_lock(sk_callback_lock)
         .... destroy psock ...
        write_unlock(sk_callback_lock)
     spin_unlock(bucket_lock)

The ordering is necessary because we only know the {p}sock after
dereferencing the hash table which we can't do unless we have the
bucket lock held. Once we have the bucket lock and the psock element
it is deleted from the hashmap to ensure any other path doing a lookup
will fail. Finally, the refcnt is decremented and if zero the psock
is destroyed.

In parallel with the above (or free'ing the map) a tcp close event
may trigger tcp_close(). Which at the moment omits the bucket lock
altogether (oops!) where the flow looks like this,

  bpf_tcp_close()
     [...]
     write_lock(sk_callback_lock)
     for each psock->maps // list of maps this sock is part of
         hlist_del_rcu(ref_hash_node);
         .... destroy psock ...
     write_unlock(sk_callback_lock)

Obviously, and demonstrated by syzbot, this is broken because
we can have multiple threads deleting entries via hlist_del_rcu().

To fix this we might be tempted to wrap the hlist operation in a
bucket lock but that would create a lock inversion problem. In
summary to follow locking rules the psocks maps list needs the
sk_callback_lock (after this patch maps_lock) but we need the bucket
lock to do the hlist_del_rcu.

To resolve the lock inversion problem pop the head of the maps list
repeatedly and remove the reference until no more are left. If a
delete happens in parallel from the BPF API that is OK as well because
it will do a similar action, lookup the lock in the map/hash, delete
it from the map/hash, and dec the refcnt. We check for this case
before doing a destroy on the psock to ensure we don't have two
threads tearing down a psock. The new logic is as follows,

  bpf_tcp_close()
  e = psock_map_pop(psock->maps) // done with map lock
  bucket_lock() // lock hash list bucket
  l = lookup_elem_raw(head, hash, key, key_size);
  if (l) {
     //only get here if elmnt was not already removed
     hlist_del_rcu()
     ... destroy psock...
  }
  bucket_unlock()

And finally for all the above to work add missing locking around  map
operations per above. Then add RCU annotations and use
rcu_dereference/rcu_assign_pointer to manage values relying on RCU so
that the object is not free'd from sock_hash_free() while it is being
referenced in bpf_tcp_close().

Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com
Fixes: 8111038444 ("bpf: sockmap, add hash map support")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:21:32 +02:00
John Fastabend
54fedb42c6 bpf: sockmap, fix smap_list_map_remove when psock is in many maps
If a hashmap is free'd with open socks it removes the reference to
the hash entry from the psock. If that is the last reference to the
psock then it will also be free'd by the reference counting logic.
However the current logic that removes the hash reference from the
list of references is broken. In smap_list_remove() we first check
if the sockmap entry matches and then check if the hashmap entry
matches. But, the sockmap entry sill always match because its NULL in
this case which causes the first entry to be removed from the list.
If this is always the "right" entry (because the user adds/removes
entries in order) then everything is OK but otherwise a subsequent
bpf_tcp_close() may reference a free'd object.

To fix this create two list handlers one for sockmap and one for
sockhash.

Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com
Fixes: 8111038444 ("bpf: sockmap, add hash map support")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:21:31 +02:00
John Fastabend
9901c5d77e bpf: sockmap, fix crash when ipv6 sock is added
This fixes a crash where we assign tcp_prot to IPv6 sockets instead
of tcpv6_prot.

Previously we overwrote the sk->prot field with tcp_prot even in the
AF_INET6 case. This patch ensures the correct tcp_prot and tcpv6_prot
are used.

Tested with 'netserver -6' and 'netperf -H [IPv6]' as well as
'netperf -H [IPv4]'. The ESTABLISHED check resolves the previously
crashing case here.

Fixes: 174a79ff95 ("bpf: sockmap with sk redirect support")
Reported-by: syzbot+5c063698bdbfac19f363@syzkaller.appspotmail.com
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:21:31 +02:00
Daniel Borkmann
85782e037f bpf: undo prog rejection on read-only lock failure
Partially undo commit 9facc33687 ("bpf: reject any prog that failed
read-only lock") since it caused a regression, that is, syzkaller was
able to manage to cause a panic via fault injection deep in set_memory_ro()
path by letting an allocation fail: In x86's __change_page_attr_set_clr()
it was able to change the attributes of the primary mapping but not in
the alias mapping via cpa_process_alias(), so the second, inner call
to the __change_page_attr() via __change_page_attr_set_clr() had to split
a larger page and failed in the alloc_pages() with the artifically triggered
allocation error which is then propagated down to the call site.

Thus, for set_memory_ro() this means that it returned with an error, but
from debugging a probe_kernel_write() revealed EFAULT on that memory since
the primary mapping succeeded to get changed. Therefore the subsequent
hdr->locked = 0 reset triggered the panic as it was performed on read-only
memory, so call-site assumptions were infact wrong to assume that it would
either succeed /or/ not succeed at all since there's no such rollback in
set_memory_*() calls from partial change of mappings, in other words, we're
left in a state that is "half done". A later undo via set_memory_rw() is
succeeding though due to matching permissions on that part (aka due to the
try_preserve_large_page() succeeding). While reproducing locally with
explicitly triggering this error, the initial splitting only happens on
rare occasions and in real world it would additionally need oom conditions,
but that said, it could partially fail. Therefore, it is definitely wrong
to bail out on set_memory_ro() error and reject the program with the
set_memory_*() semantics we have today. Shouldn't have gone the extra mile
since no other user in tree today infact checks for any set_memory_*()
errors, e.g. neither module_enable_ro() / module_disable_ro() for module
RO/NX handling which is mostly default these days nor kprobes core with
alloc_insn_page() / free_insn_page() as examples that could be invoked long
after bootup and original 314beb9bca ("x86: bpf_jit_comp: secure bpf jit
against spraying attacks") did neither when it got first introduced to BPF
so "improving" with bailing out was clearly not right when set_memory_*()
cannot handle it today.

Kees suggested that if set_memory_*() can fail, we should annotate it with
__must_check, and all callers need to deal with it gracefully given those
set_memory_*() markings aren't "advisory", but they're expected to actually
do what they say. This might be an option worth to move forward in future
but would at the same time require that set_memory_*() calls from supporting
archs are guaranteed to be "atomic" in that they provide rollback if part
of the range fails, once that happened, the transition from RW -> RO could
be made more robust that way, while subsequent RO -> RW transition /must/
continue guaranteeing to always succeed the undo part.

Reported-by: syzbot+a4eb8c7766952a1ca872@syzkaller.appspotmail.com
Reported-by: syzbot+d866d1925855328eac3b@syzkaller.appspotmail.com
Fixes: 9facc33687 ("bpf: reject any prog that failed read-only lock")
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-29 10:47:35 -07:00
Richard Guy Briggs
4fa7f08699 audit: simplify audit_enabled check in audit_watch_log_rule_change()
Check the audit_enabled flag and bail immediately.  This does not change
the functionality, but brings the code format in line with similar
checks in audit_tree_log_remove_rule(), audit_mark_log_rule_change(),
and elsewhere in the audit code.

See: https://github.com/linux-audit/audit-kernel/issues/50

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: tweaked subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-06-28 11:44:31 -04:00
Richard Guy Briggs
65a8766f5f audit: check audit_enabled in audit_tree_log_remove_rule()
Respect the audit_enabled flag when printing tree rule config change
records.

See: https://github.com/linux-audit/audit-kernel/issues/50

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: tweak the subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-06-28 11:41:02 -04:00
Hans de Goede
d48de54a9d printk: Export is_console_locked
This is a preparation patch for adding a number of WARN_CONSOLE_UNLOCKED()
calls to the fbcon code, which may be built as a module (event though
usually it is not).

Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
2018-06-28 15:20:27 +02:00
Christoph Hellwig
210d0797c9 swiotlb: export swiotlb_dma_ops
For architectures that do not use per-device dma ops we need to export
the dma_map_ops structure returned from get_arch_dma_ops().

Fixes: 10314e09 ("riscv: add swiotlb support")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Andreas Schwab <schwab@suse.de>
2018-06-28 14:00:40 +02:00
Namit Gupta
63842c2134 printk: Remove unnecessary kmalloc() from syslog during clear
When the request is only for clearing logs, there is no need for
allocation/deallocation. Only the indexes need to be reset and returned.
Rest of the patch is mostly made up of changes because of indention.

Link: http://lkml.kernel.org/r/20180620135951epcas5p3bd2a8f25ec689ca333bce861b527dba2~54wyKcT0_3155531555epcas5p3y@epcas5p3.samsung.com
Cc: linux-kernel@vger.kernel.org
Cc: pankaj.m@samsung.com
Cc: a.sahrawat@samsung.com
Signed-off-by: Namit Gupta <gupta.namit@samsung.com>
Signed-off-by: Himanshu Maithani <himanshu.m@samsung.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-06-27 16:16:28 +02:00
Maninder Singh
375899cddc printk: make sure to print log on console.
This patch make sure printing of log on console if loglevel
at time of storing log is less than current console loglevel.

@why
In SMP printk can work asynchronously, logs can be missed on console
because it checks current log level at time of console_unlock,
not at time of storing logs.

func()
{
....
....
        console_verbose();  // user wants to have all the logs on console.
        pr_alert();
	dump_backtrace(); //prints with default loglevel.
        ...
        console_silent(); // stop all logs from printing on console.
}

Now if console_lock was owned by another process, the messages might
be handled after the consoles were silenced.

Reused flag LOG_NOCONS as its usage is gone long back by the commit
5c2992ee7f ("printk: remove console flushing special cases
for partial buffered lines").

Note that there are still some corner cases where this patch is not enough.
For example, when the messages are flushed later from printk_safe buffers
or when there are races between console_verbose() and console_silent()
callers.

Link: http://lkml.kernel.org/r/20180601090029epcas5p3cc93d4bfbebb3199f0a2684058da7e26~z-a_jkmrI2993329933epcas5p3q@epcas5p3.samsung.com
Cc: linux-kernel@vger.kernel.org
Cc: a.sahrawat@samsung.com
Cc: pankaj.m@samsung.com
Cc: v.narang@samsung.com
Cc: <maninder1.s@samsung.com>
Signed-off-by: Vaneet Narang <v.narang@samsung.com>
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-06-27 16:14:28 +02:00
Amir Goldstein
36f10f55ff fsnotify: let connector point to an abstract object
Make the code to attach/detach a connector to object more generic
by letting the fsnotify connector point to an abstract fsnotify_connp_t.
Code that needs to dereference an inode or mount object now uses the
helpers fsnotify_conn_{inode,mount}.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2018-06-27 13:45:05 +02:00
Mathieu Malaterre
9331510135 perf/core: Move inline keyword at the beginning of declaration
Fix non-fatal warning triggered during compilation with W=1:

  kernel/events/core.c:6106:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
   static void __always_inline
   ^~~~~~

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180626202301.20270-1-malat@debian.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-27 09:55:58 +02:00
Paul E. McKenney
8c42b1f39f rcu: Exclude near-simultaneous RCU CPU stall warnings
There is a two-jiffy delay between the time that a CPU will self-report
an RCU CPU stall warning and the time that some other CPU will report a
warning on behalf of the first CPU.  This has worked well in the past,
but on busy systems, it is possible for the two warnings to overlap,
which makes interpreting them extremely difficult.

This commit therefore uses a cmpxchg-based timing decision that
allows only one report in a given one-minute period (assuming default
stall-warning Kconfig parameters).  This approach will of course fail
if you are seeing minute-long vCPU preemption, but in that case the
overlapping RCU CPU stall warnings are the least of your worries.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-26 12:25:56 -07:00
Boqun Feng
ce11fae8d4 rcu: Use the proper lockdep annotation in dump_blkd_tasks()
Sparse reported this:

| kernel/rcu/tree_plugin.h:814:9: warning: incorrect type in argument 1 (different modifiers)
| kernel/rcu/tree_plugin.h:814:9:    expected struct lockdep_map const *lock
| kernel/rcu/tree_plugin.h:814:9:    got struct lockdep_map [noderef] *<noident>

This is caused by using vanilla lockdep annotations on rcu_node::lock,
and that requires accessing ->lock of rcu_node directly. However we need
to keep rcu_node::lock __private to avoid breaking its extra ordering
guarantee. And we have a dedicated lockdep annotation for
rcu_node::lock, so use it.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-26 12:25:55 -07:00
Paul E. McKenney
4bc8d55574 rcu: Add debugging info to assertion
The WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp()) in
rcu_gp_cleanup() triggers (inexplicably, of course) every so often.
This commit therefore extracts more information.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-26 12:25:55 -07:00
Sean Young
fdb5c4531c bpf: fix attach type BPF_LIRC_MODE2 dependency wrt CONFIG_CGROUP_BPF
If the kernel is compiled with CONFIG_CGROUP_BPF not enabled, it is not
possible to attach, detach or query IR BPF programs to /dev/lircN devices,
making them impossible to use. For embedded devices, it should be possible
to use IR decoding without cgroups or CONFIG_CGROUP_BPF enabled.

This change requires some refactoring, since bpf_prog_{attach,detach,query}
functions are now always compiled, but their code paths for cgroups need
moving out. Rather than a #ifdef CONFIG_CGROUP_BPF in kernel/bpf/syscall.c,
moving them to kernel/bpf/cgroup.c and kernel/bpf/sockmap.c does not
require #ifdefs since that is already conditionally compiled.

Fixes: f4364dcfc8 ("media: rc: introduce BPF_PROG_LIRC_MODE2")
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-26 11:28:38 +02:00
Frederic Weisbecker
26c6ccdf5c perf/hw_breakpoint: Clean up and consolidate modify_user_hw_breakpoint_check()
Remove the dance around old and new attributes. Just don't modify the
previous breakpoint at all until we have verified everything.

Original-patch-by: Andy Lutomirski <luto@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joel Fernandes <joel.opensrc@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1529981939-8231-13-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-26 09:07:59 +02:00
Frederic Weisbecker
cb8b78815b perf/hw_breakpoint: Pass new breakpoint type to modify_breakpoint_slot()
We soon won't be able to rely on bp->attr anymore to get the new
type of the modifying breakpoint because the new attributes are going
to be copied only once we successfully modified the breakpoint slot.

This will fix the current misdesigned layout where the new attr are
copied to the modifying breakpoint before we actually know if the
modification will be validated.

In order to prepare for that, allow modify_breakpoint_slot() to take
the new breakpoint type.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joel Fernandes <joel.opensrc@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1529981939-8231-12-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-26 09:07:59 +02:00
Frederic Weisbecker
cffbb3bd44 perf/hw_breakpoint: Remove default hw_breakpoint_arch_parse()
All architectures have implemented it, we can now remove the poor weak
version.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joel Fernandes <joel.opensrc@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1529981939-8231-11-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-26 09:07:58 +02:00
Frederic Weisbecker
8e983ff9ac perf/hw_breakpoint: Pass arch breakpoint struct to arch_check_bp_in_kernelspace()
We can't pass the breakpoint directly on arch_check_bp_in_kernelspace()
anymore because its architecture internal datas (struct arch_hw_breakpoint)
are not yet filled by the time we call the function, and most
implementation need this backend to be up to date. So arrange the
function to take the probing struct instead.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joel Fernandes <joel.opensrc@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1529981939-8231-3-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-26 09:07:54 +02:00
Frederic Weisbecker
9a4903dde2 perf/hw_breakpoint: Split attribute parse and commit
arch_validate_hwbkpt_settings() mixes up attribute check and commit into
a single code entity. Therefore the validation may return an error due to
incorrect atributes while still leaving halfway modified architecture
breakpoint data.

This is harmless when we deal with a new breakpoint but it becomes a
problem when we modify an existing breakpoint.

Split attribute parse and commit to fix that. The architecture is
passed a "struct arch_hw_breakpoint" to fill on top of the new attr
and the core takes care about copying the backend data once it's fully
validated. The architectures then need to implement the new API.

Original-patch-by: Andy Lutomirski <luto@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joel Fernandes <joel.opensrc@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1529981939-8231-2-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-26 09:07:53 +02:00
Ingo Molnar
f446474889 Merge branch 'linus' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-26 09:02:41 +02:00
Paul E. McKenney
6050003763 torture: Keep old-school dmesg format
This commit adds "#define pr_fmt(fmt) fmt" to the torture-test files
in order to keep the current dmesg format.  Once Joe's commits have
hit mainline, these definitions will be changed in order to automatically
generate the dmesg line prefix that the scripts expect.  This will have
the beneficial side-effect of allowing printk() formats to be used more
widely and of shortening some pr_*() lines.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Joe Perches <joe@perches.com>
2018-06-25 11:30:10 -07:00
Paul E. McKenney
90127d605f torture: Make online/offline messages appear only for verbose=2
Some bugs reproduce quickly only at high CPU-hotplug rates, so the
rcutorture TREE03 scenario now has only 200 milliseconds spacing between
CPU-hotplug operations.  At this rate, the torture-test pair of console
messages per operation becomes a bit voluminous.  This commit therefore
converts the torture-test set of "verbose" kernel-boot arguments from
bool to int, and prints the extra console messages only when verbose=2.
The default is still verbose=1.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-25 11:30:10 -07:00
Paul E. McKenney
5ab07a8df4 srcu: Add address of first callback to rcutorture output
This commit adds the address of the first callback to the per-CPU rcutorture
output in order to allow lost wakeups to be more efficiently tracked down.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-25 11:26:24 -07:00
Paul E. McKenney
17294ce6a4 srcu: Document that srcu_funnel_gp_start() implies srcu_funnel_exp_start()
This commit updates the header comment of srcu_funnel_gp_start() to
document the fact that srcu_funnel_gp_start() does the work of
srcu_funnel_exp_start(), in some cases by invoking it directly.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-25 11:26:24 -07:00
Paul E. McKenney
5ef98a6328 srcu: Fix typos in __call_srcu() header comment
This commit simply changes some copy-pasta call_rcu() instances to
the correct call_srcu().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-25 11:26:24 -07:00
Paul E. McKenney
5257514d88 rcu: Make expedited grace period use direct call on last leaf
During expedited grace-period initialization, a work item is scheduled
for each leaf rcu_node structure.  However, that initialization code
is itself (normally) executing from a workqueue, so one of the leaf
rcu_node structures could just as well be handled by that pre-existing
workqueue, and with less overhead.  This commit therefore uses a
shiny new rcu_is_leaf_node() macro to execute the last leaf rcu_node
structure's initialization directly from the pre-existing workqueue.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-25 11:25:41 -07:00
Masahiro Yamada
996302c5e8 module: replace VMLINUX_SYMBOL_STR() with __stringify() or string literal
With the special case handling for Blackfin and Metag was removed by
commit 94e58e0ac3 ("export.h: remove code for prefixing symbols with
underscore"), VMLINUX_SYMBOL_STR() is now equivalent to __stringify().

Replace the remaining usages to prepare for the entire removal of
VMLINUX_SYMBOL_STR().

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
2018-06-25 11:18:29 +02:00