When NEWLY_IDLE load balance is not triggered, we might need to update the
blocked load anyway. We can kick an ilb so an idle CPU will take care of
updating blocked load or we can try to update them locally before entering
idle. In the latter case, we reuse part of the nohz_idle_balance.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: brendan.jackman@arm.com
Cc: dietmar.eggemann@arm.com
Cc: morten.rasmussen@foss.arm.com
Cc: valentin.schneider@arm.com
Link: http://lkml.kernel.org/r/1518622006-16089-4-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We're going to want to call nohz_idle_balance() or parts thereof from
idle_balance(). Since we already have a forward declaration of
idle_balance() move it down such that it's below nohz_idle_balance()
avoiding the need for a forward declaration for that.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that we have two back-to-back NO_HZ_COMMON blocks, merge them.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This pure code movement results in two #ifdef CONFIG_NO_HZ_COMMON
sections landing next to each other.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Avoid calling update_blocked_averages() when it does not in fact have
any by re-using/extending update_nohz_stats().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Instead of using the cfs_rq_is_decayed() which monitors all *_avg
and *_sum, we create a cfs_rq_has_blocked() which only takes care of
util_avg and load_avg. We are only interested by these 2 values which are
decaying faster than the *_sum so we can stop the periodic update earlier.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: brendan.jackman@arm.com
Cc: dietmar.eggemann@arm.com
Cc: morten.rasmussen@foss.arm.com
Cc: valentin.schneider@arm.com
Link: http://lkml.kernel.org/r/1518517879-2280-3-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Stopped the periodic update of blocked load when all idle CPUs have fully
decayed. We introduce a new nohz.has_blocked that reflect if some idle
CPUs has blocked load that have to be periodiccally updated. nohz.has_blocked
is set everytime that a Idle CPU can have blocked load and it is then clear
when no more blocked load has been detected during an update. We don't need
atomic operation but only to make cure of the right ordering when updating
nohz.idle_cpus_mask and nohz.has_blocked.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: brendan.jackman@arm.com
Cc: dietmar.eggemann@arm.com
Cc: morten.rasmussen@foss.arm.com
Cc: valentin.schneider@arm.com
Link: http://lkml.kernel.org/r/1518517879-2280-2-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It was suggested that a migration hint might be usefull for the
CPU-freq governors.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The primary observation is that nohz enter/exit is always from the
current CPU, therefore NOHZ_TICK_STOPPED does not in fact need to be
an atomic.
Secondary is that we appear to have 2 nearly identical hooks in the
nohz enter code, set_cpu_sd_state_idle() and
nohz_balance_enter_idle(). Fold the whole set_cpu_sd_state thing into
nohz_balance_{enter,exit}_idle.
Removes an atomic op from both enter and exit paths.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since we already iterate CPUs looking for work on NEWIDLE, use this
iteration to age the blocked load. If the domain for which this is
done completely spand the idle set, we can push the ILB based aging
forward.
Suggested-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Teach the idle balancer about the need to update statistics which have
a different periodicity from regular balancing.
Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The current:
if (nohz_kick_needed())
nohz_balancer_kick()
is pointless complexity, fold them into a single call and avoid the
various conditions at the call site.
When we introduce multiple different needs to kick the ilb, the above
construct also becomes a problem.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Split the NOHZ idle balancer into doing two separate actions:
- update blocked load statistic
- actually load-balance
Since the latter requires the former, ensure this happens. For now
always tag both bits at the same time.
Prepares for a future where we can toggle only the STATS bit.
Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Using atomic_t allows us to use the more flexible bitops provided
there. Also its smaller.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Instead of trying to duplicate scheduler state to track if an RT task
is running, directly use the scheduler runqueue state for it.
This vastly simplifies things and fixes a number of bugs related to
sugov and the scheduler getting out of sync wrt this state.
As a consequence we not also update the remove cfs/dl state when
iterating the shared mask.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Due to using GCC defines for configuration, some labels might be unused in
certain configurations. While adding a __maybe_unused to the label is
fine in general, the line has to be terminated with ';'. This is also
reflected in the GCC documentation, but GCC parsed the previous variant
without an error message.
This has been spotted while compiling with goto-cc, the compiler for the
CPROVER tool suite.
Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Signed-off-by: Michael Tautschnig <tautschn@amazon.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1519717660-16157-1-git-send-email-nmanthey@amazon.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Audit link denied events generate duplicate PATH records which disagree
in different ways from symlink and hardlink denials.
audit_log_link_denied() should not directly generate PATH records.
See: https://github.com/linux-audit/audit-kernel/issues/21
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Audit link denied events emit disjointed records when audit is disabled.
No records should be emitted when audit is disabled.
See: https://github.com/linux-audit/audit-kernel/issues/21
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
otherwise kernel can oops later in seq_release() due to dereferencing null
file->private_data which is only set if seq_open() succeeds.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
IP: seq_release+0xc/0x30
Call Trace:
close_pdeo+0x37/0xd0
proc_reg_release+0x5d/0x60
__fput+0x9d/0x1d0
____fput+0x9/0x10
task_work_run+0x75/0x90
do_exit+0x252/0xa00
do_group_exit+0x36/0xb0
SyS_exit_group+0xf/0x10
Fixes: 516fb7f2e7 ("/proc/module: use the same logic as /proc/kallsyms for address exposure")
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org # 4.15+
Signed-off-by: Leon Yu <chianglungyu@gmail.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
This commit adds new field "addr" to bpf_perf_event_data which could be
read and used by bpf programs attached to perf events. The value of the
field is copied from bpf_perf_event_data_kern.addr and contains the
address value recorded by specifying sample_type with PERF_SAMPLE_ADDR
when calling perf_event_open.
Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
commit d178bc3a70 ("user namespace: usb:
make usb urbs user namespace aware (v2)") changed kill_pid_info_as_uid
to kill_pid_info_as_cred, saving and passing a cred structure instead of
uids. Since the secid can be obtained from the cred, drop the secid fields
from the usb_dev_state and async structures, and drop the secid argument to
kill_pid_info_as_cred. Replace the secid argument to security_task_kill
with the cred. Update SELinux, Smack, and AppArmor to use the cred, which
avoids the need for Smack and AppArmor to use a secid at all in this hook.
Further changes to Smack might still be required to take full advantage of
this change, since it should now be possible to perform capability
checking based on the supplied cred. The changes to Smack and AppArmor
have only been compile-tested.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
devm_memremap_pages() was re-worked in e8d5134833 "memremap: change
devm_memremap_pages interface to use struct dev_pagemap" to take a
caller allocated struct dev_pagemap as a function parameter. A call to
devres_free() was left in the error cleanup path which results in a
kernel panic if the remap fails for some reason. Remove it to fix the
panic and let devm_memremap_pages() fail gracefully.
Fixes: e8d5134833 ("memremap: change devm_memremap_pages interface...")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If you pass in an invalid audit boot parameter value, e.g. "audit=off",
the kernel panics very early in boot before the regular console is
initialized. Unless you have earlyprintk enabled, there is no
indication of what the problem is on the console.
Convert the panic() calls to pr_err(), and leave auditing enabled if an
invalid parameter value was passed in.
Modify the parameter to also accept "on" or "off" as valid values, and
update the documentation accordingly.
Signed-off-by: Greg Edwards <gedwards@ddn.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
All of the conflicts were cases of overlapping changes.
In net/core/devlink.c, we have to make care that the
resouce size_params have become a struct member rather
than a pointer to such an object.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Use an appropriate TSQ pacing shift in mac80211, from Toke
Høiland-Jørgensen.
2) Just like ipv4's ip_route_me_harder(), we have to use skb_to_full_sk
in ip6_route_me_harder, from Eric Dumazet.
3) Fix several shutdown races and similar other problems in l2tp, from
James Chapman.
4) Handle missing XDP flush properly in tuntap, for real this time.
From Jason Wang.
5) Out-of-bounds access in powerpc ebpf tailcalls, from Daniel
Borkmann.
6) Fix phy_resume() locking, from Andrew Lunn.
7) IFLA_MTU values are ignored on newlink for some tunnel types, fix
from Xin Long.
8) Revert F-RTO middle box workarounds, they only handle one dimension
of the problem. From Yuchung Cheng.
9) Fix socket refcounting in RDS, from Ka-Cheong Poon.
10) Don't allow ppp unit registration to an unregistered channel, from
Guillaume Nault.
11) Various hv_netvsc fixes from Stephen Hemminger.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (98 commits)
hv_netvsc: propagate rx filters to VF
hv_netvsc: filter multicast/broadcast
hv_netvsc: defer queue selection to VF
hv_netvsc: use napi_schedule_irqoff
hv_netvsc: fix race in napi poll when rescheduling
hv_netvsc: cancel subchannel setup before halting device
hv_netvsc: fix error unwind handling if vmbus_open fails
hv_netvsc: only wake transmit queue if link is up
hv_netvsc: avoid retry on send during shutdown
virtio-net: re enable XDP_REDIRECT for mergeable buffer
ppp: prevent unregistered channels from connecting to PPP units
tc-testing: skbmod: fix match value of ethertype
mlxsw: spectrum_switchdev: Check success of FDB add operation
net: make skb_gso_*_seglen functions private
net: xfrm: use skb_gso_validate_network_len() to check gso sizes
net: sched: tbf: handle GSO_BY_FRAGS case in enqueue
net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len
rds: Incorrect reference counting in TCP socket creation
net: ethtool: don't ignore return from driver get_fecparam method
vrf: check forwarding on the original netdevice when generating ICMP dest unreachable
...
Pull timer fixes from Thomas Gleixner:
"A small set of fixes from the timer departement:
- Add a missing timer wheel clock forward when migrating timers off a
unplugged CPU to prevent operating on a stale clock base and
missing timer deadlines.
- Use the proper shift count to extract data from a register value to
prevent evaluating unrelated bits
- Make the error return check in the FSL timer driver work correctly.
Checking an unsigned variable for less than zero does not really
work well.
- Clarify the confusing comments in the ARC timer code"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers: Forward timer base before migrating timers
clocksource/drivers/arc_timer: Update some comments
clocksource/drivers/mips-gic-timer: Use correct shift count to extract data
clocksource/drivers/fsl_ftm_timer: Fix error return checking
Make it easier to concatenate all the scheduler .c files for single-module
compilation.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge these two small .c modules as they implement two aspects
of idle task handling.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Do the following cleanups and simplifications:
- sched/sched.h already includes <asm/paravirt.h>, so no need to
include it in sched/core.c again.
- order the <linux/sched/*.h> headers alphabetically
- add all <linux/sched/*.h> headers to kernel/sched/sched.h
- remove all unnecessary includes from the .c files that
are already included in kernel/sched/sched.h.
Finally, make all scheduler .c files use a single common header:
#include "sched.h"
... which now contains a union of the relied upon headers.
This makes the various .c files easier to read and easier to handle.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull libnvdimm fixes from Dan Williams:
"A 4.16 regression fix, three fixes for -stable, and a cleanup fix:
- During the merge window support for the new ACPI NVDIMM Platform
Capabilities structure disabled support for "deep flush", a
force-unit- access like mechanism for persistent memory. Restore
that mechanism.
- VFIO like RDMA is yet one more memory registration / pinning
interface that is incompatible with Filesystem-DAX. Disable long
term pins of Filesystem-DAX mappings via VFIO.
- The Filesystem-DAX detection to prevent long terms pins mistakenly
also disabled Device-DAX pins which are not subject to the same
block- map collision concerns.
- Similar to the setup path, softlockup warnings can trigger in the
shutdown path for large persistent memory namespaces. Teach
for_each_device_pfn() to perform cond_resched() in all cases.
- Boaz noticed that the might_sleep() in dax_direct_access() is stale
as of the v4.15 kernel.
These have received a build success notification from the 0day robot,
and the longterm pin fixes have appeared in -next. However, I recently
rebased the tree to remove some other fixes that need to be reworked
after review feedback.
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
memremap: fix softlockup reports at teardown
libnvdimm: re-enable deep flush for pmem devices via fsync()
vfio: disable filesystem-dax page pinning
dax: fix vma_is_fsdax() helper
dax: ->direct_access does not sleep anymore
A good number of small style inconsistencies have accumulated
in the scheduler core, so do a pass over them to harmonize
all these details:
- fix speling in comments,
- use curly braces for multi-line statements,
- remove unnecessary parentheses from integer literals,
- capitalize consistently,
- remove stray newlines,
- add comments where necessary,
- remove invalid/unnecessary comments,
- align structure definitions and other data types vertically,
- add missing newlines for increased readability,
- fix vertical tabulation where it's misaligned,
- harmonize preprocessor conditional block labeling
and vertical alignment,
- remove line-breaks where they uglify the code,
- add newline after local variable definitions,
No change in functionality:
md5:
1191fa0a890cfa8132156d2959d7e9e2 built-in.o.before.asm
1191fa0a890cfa8132156d2959d7e9e2 built-in.o.after.asm
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- Fixed style error: Missing space before the open parenthesis
- Fixed style warnings: 2x Missing blank line after declaration
One warning left: else after return
(I don't feel comfortable fixing that without side effects)
Signed-off-by: Mario Leinweber <marioleinweber@web.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20180302182007.28691-1-marioleinweber@web.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The cond_resched() currently in the setup path needs to be duplicated in
the teardown path. Rather than require each instance of
for_each_device_pfn() to open code the same sequence, embed it in the
helper.
Link: https://github.com/intel/ixpdimm_sw/issues/11
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org>
Fixes: 7138970383 ("mm, zone_device: Replace {get, put}_zone_device_page()...")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Since commit afcc90f862 ("usercopy: WARN() on slab cache usercopy
region violations"), MIPS systems booting with a compat root filesystem
emit a warning when copying compat siginfo to userspace:
WARNING: CPU: 0 PID: 953 at mm/usercopy.c:81 usercopy_warn+0x98/0xe8
Bad or missing usercopy whitelist? Kernel memory exposure attempt
detected from SLAB object 'task_struct' (offset 1432, size 16)!
Modules linked in:
CPU: 0 PID: 953 Comm: S01logging Not tainted 4.16.0-rc2 #10
Stack : ffffffff808c0000 0000000000000000 0000000000000001 65ac85163f3bdc4a
65ac85163f3bdc4a 0000000000000000 90000000ff667ab8 ffffffff808c0000
00000000000003f8 ffffffff808d0000 00000000000000d1 0000000000000000
000000000000003c 0000000000000000 ffffffff808c8ca8 ffffffff808d0000
ffffffff808d0000 ffffffff80810000 fffffc0000000000 ffffffff80785c30
0000000000000009 0000000000000051 90000000ff667eb0 90000000ff667db0
000000007fe0d938 0000000000000018 ffffffff80449958 0000000020052798
ffffffff808c0000 90000000ff664000 90000000ff667ab0 00000000100c0000
ffffffff80698810 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 ffffffff8010d02c 65ac85163f3bdc4a
...
Call Trace:
[<ffffffff8010d02c>] show_stack+0x9c/0x130
[<ffffffff80698810>] dump_stack+0x90/0xd0
[<ffffffff80137b78>] __warn+0x100/0x118
[<ffffffff80137bdc>] warn_slowpath_fmt+0x4c/0x70
[<ffffffff8021e4a8>] usercopy_warn+0x98/0xe8
[<ffffffff8021e68c>] __check_object_size+0xfc/0x250
[<ffffffff801bbfb8>] put_compat_sigset+0x30/0x88
[<ffffffff8011af24>] setup_rt_frame_n32+0xc4/0x160
[<ffffffff8010b8b4>] do_signal+0x19c/0x230
[<ffffffff8010c408>] do_notify_resume+0x60/0x78
[<ffffffff80106f50>] work_notifysig+0x10/0x18
---[ end trace 88fffbf69147f48a ]---
Commit 5905429ad8 ("fork: Provide usercopy whitelisting for
task_struct") noted that:
"While the blocked and saved_sigmask fields of task_struct are copied to
userspace (via sigmask_to_save() and setup_rt_frame()), it is always
copied with a static length (i.e. sizeof(sigset_t))."
However, this is not true in the case of compat signals, whose sigset
is copied by put_compat_sigset and receives size as an argument.
At most call sites, put_compat_sigset is copying a sigset from the
current task_struct. This triggers a warning when
CONFIG_HARDENED_USERCOPY is active. However, by marking this function as
static inline, the warning can be avoided because in all of these cases
the size is constant at compile time, which is allowed. The only site
where this is not the case is handling the rt_sigpending syscall, but
there the copy is being made from a stack local variable so does not
trigger the warning.
Move put_compat_sigset to compat.h, and mark it static inline. This
fixes the WARN on MIPS.
Fixes: afcc90f862 ("usercopy: WARN() on slab cache usercopy region violations")
Signed-off-by: Matt Redfearn <matt.redfearn@mips.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Dmitry V . Levin" <ldv@altlinux.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/18639/
Signed-off-by: James Hogan <jhogan@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf 2018-02-28
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Add schedule points and reduce the number of loop iterations
the test_bpf kernel module is performing in order to not hog
the CPU for too long, from Eric.
2) Fix an out of bounds access in tail calls in the ppc64 BPF
JIT compiler, from Daniel.
3) Fix a crash on arm64 on unaligned BPF xadd operations that
could be triggered via interpreter and JIT, from Daniel.
Please not that once you merge net into net-next at some point, there
is a minor merge conflict in test_verifier.c since test cases had
been added at the end in both trees. Resolution is trivial: keep all
the test cases from both trees.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull printk fix from Petr Mladek:
"Make sure that we wake up userspace loggers. This fixes a race
introduced by the console waiter logic during this merge window"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
printk: Wake klogd when passing console_lock owner
On CPU hotunplug the enqueued timers of the unplugged CPU are migrated to a
live CPU. This happens from the control thread which initiated the unplug.
If the CPU on which the control thread runs came out from a longer idle
period then the base clock of that CPU might be stale because the control
thread runs prior to any event which forwards the clock.
In such a case the timers from the unplugged CPU are queued on the live CPU
based on the stale clock which can cause large delays due to increased
granularity of the outer timer wheels which are far away from base:;clock.
But there is a worse problem than that. The following sequence of events
illustrates it:
- CPU0 timer1 is queued expires = 59969 and base->clk = 59131.
The timer is queued at wheel level 2, with resulting expiry time = 60032
(due to level granularity).
- CPU1 enters idle @60007, with next timer expiry @60020.
- CPU0 is hotplugged at @60009
- CPU1 exits idle and runs the control thread which migrates the
timers from CPU0
timer1 is now queued in level 0 for immediate handling in the next
softirq because the requested expiry time 59969 is before CPU1 base->clk
60007
- CPU1 runs code which forwards the base clock which succeeds because the
next expiring timer. which was collected at idle entry time is still set
to 60020.
So it forwards beyond 60007 and therefore misses to expire the migrated
timer1. That timer gets expired when the wheel wraps around again, which
takes between 63 and 630ms depending on the HZ setting.
Address both problems by invoking forward_timer_base() for the control CPUs
timer base. All other places, which might run into a similar problem
(mod_timer()/add_timer_on()) already invoke forward_timer_base() to avoid
that.
[ tglx: Massaged comment and changelog ]
Fixes: a683f390b9 ("timers: Forward the wheel clock whenever possible")
Co-developed-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: linux-arm-msm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180118115022.6368-1-clingutla@codeaurora.org
Surprisingly there is no simple way to see if the IRQ line in question
is wakeup source or not.
Note that wakeup might be an OOB (out-of-band) source like GPIO line
which makes things slightly more complicated.
Add a sysfs node to cover this case.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tony Lindgren <tony@atomide.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: "Rafael J . Wysocki" <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20180226155043.67937-1-andriy.shevchenko@linux.intel.com
Show unadorned pointers in lockdep reports - lockdep is a debugging
facility and hashing pointers there doesn't make a whole lotta sense.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20180226134926.23069-1-bp@alien8.de
wake_klogd is a local variable in console_unlock(). The information
is lost when the console_lock owner using the busy wait added by
the commit dbdda842fe ("printk: Add console owner and waiter
logic to load balance console writes"). The following race is
possible:
CPU0 CPU1
console_unlock()
for (;;)
/* calling console for last message */
printk()
log_store()
log_next_seq++;
/* see new message */
if (seen_seq != log_next_seq) {
wake_klogd = true;
seen_seq = log_next_seq;
}
console_lock_spinning_enable();
if (console_trylock_spinning())
/* spinning */
if (console_lock_spinning_disable_and_check()) {
printk_safe_exit_irqrestore(flags);
return;
console_unlock()
if (seen_seq != log_next_seq) {
/* already seen */
/* nothing to do */
Result: Nobody would wakeup klogd.
One solution would be to make a global variable from wake_klogd.
But then we would need to manipulate it under a lock or so.
This patch wakes klogd also when console_lock is passed to the
spinning waiter. It looks like the right way to go. Also userspace
should have a chance to see and store any "flood" of messages.
Note that the very late klogd wake up was a historic solution.
It made sense on single CPU systems or when sys_syslog() operations
were synchronized using the big kernel lock like in v2.1.113.
But it is questionable these days.
Fixes: dbdda842fe ("printk: Add console owner and waiter logic to load balance console writes")
Link: http://lkml.kernel.org/r/20180226155734.dzwg3aovqnwtvkoy@pathway.suse.cz
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org
Cc: Tejun Heo <tj@kernel.org>
Suggested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Pull x86 fixes from Thomas Gleixner:
"Yet another pile of melted spectrum related changes:
- sanitize the array_index_nospec protection mechanism: Remove the
overengineered array_index_nospec_mask_check() magic and allow
const-qualified types as index to avoid temporary storage in a
non-const local variable.
- make the microcode loader more robust by properly propagating error
codes. Provide information about new feature bits after micro code
was updated so administrators can act upon.
- optimizations of the entry ASM code which reduce code footprint and
make the code simpler and faster.
- fix the {pmd,pud}_{set,clear}_flags() implementations to work
properly on paravirt kernels by removing the address translation
operations.
- revert the harmful vmexit_fill_RSB() optimization
- use IBRS around firmware calls
- teach objtool about retpolines and add annotations for indirect
jumps and calls.
- explicitly disable jumplabel patching in __init code and handle
patching failures properly instead of silently ignoring them.
- remove indirect paravirt calls for writing the speculation control
MSR as these calls are obviously proving the same attack vector
which is tried to be mitigated.
- a few small fixes which address build issues with recent compiler
and assembler versions"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
KVM/VMX: Optimize vmx_vcpu_run() and svm_vcpu_run() by marking the RDMSR path as unlikely()
KVM/x86: Remove indirect MSR op calls from SPEC_CTRL
objtool, retpolines: Integrate objtool with retpoline support more closely
x86/entry/64: Simplify ENCODE_FRAME_POINTER
extable: Make init_kernel_text() global
jump_label: Warn on failed jump_label patching attempt
jump_label: Explicitly disable jump labels in __init code
x86/entry/64: Open-code switch_to_thread_stack()
x86/entry/64: Move ASM_CLAC to interrupt_entry()
x86/entry/64: Remove 'interrupt' macro
x86/entry/64: Move the switch_to_thread_stack() call to interrupt_entry()
x86/entry/64: Move ENTER_IRQ_STACK from interrupt macro to interrupt_entry
x86/entry/64: Move PUSH_AND_CLEAR_REGS from interrupt macro to helper function
x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP
objtool: Add module specific retpoline rules
objtool: Add retpoline validation
objtool: Use existing global variables for options
x86/mm/sme, objtool: Annotate indirect call in sme_encrypt_execute()
x86/boot, objtool: Annotate indirect jump in secondary_startup_64()
x86/paravirt, objtool: Annotate indirect calls
...