1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

23773 commits

Author SHA1 Message Date
Linus Torvalds
6c3c1eb3c3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Receive packet length needs to be adjust by 2 on RX to accomodate
    the two padding bytes in altera_tse driver.  From Vlastimil Setka.

 2) If rx frame is dropped due to out of memory in macb driver, we leave
    the receive ring descriptors in an undefined state.  From Punnaiah
    Choudary Kalluri

 3) Some netlink subsystems erroneously signal NLM_F_MULTI.  That is
    only for dumps.  Fix from Nicolas Dichtel.

 4) Fix mis-use of raw rt->rt_pmtu value in ipv4, one must always go via
    the ipv4_mtu() helper.  From Herbert Xu.

 5) Fix null deref in bridge netfilter, and miscalculated lengths in
    jump/goto nf_tables verdicts.  From Florian Westphal.

 6) Unhash ping sockets properly.

 7) Software implementation of BPF divide did 64/32 rather than 64/64
    bit divide.  The JITs got it right.  Fix from Alexei Starovoitov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits)
  ipv4: Missing sk_nulls_node_init() in ping_unhash().
  net: fec: Fix RGMII-ID mode
  net/mlx4_en: Schedule napi when RX buffers allocation fails
  netxen_nic: use spin_[un]lock_bh around tx_clean_lock
  net/mlx4_core: Fix unaligned accesses
  mlx4_en: Use correct loop cursor in error path.
  cxgb4: Fix MC1 memory offset calculation
  bnx2x: Delay during kdump load
  net: Fix Kernel Panic in bonding driver debugfs file: rlb_hash_table
  net: dsa: Fix scope of eeprom-length property
  net: macb: Fix race condition in driver when Rx frame is dropped
  hv_netvsc: Fix a bug in netvsc_start_xmit()
  altera_tse: Correct rx packet length
  mlx4: Fix tx ring affinity_mask creation
  tipc: fix problem with parallel link synchronization mechanism
  tipc: remove wrong use of NLM_F_MULTI
  bridge/nl: remove wrong use of NLM_F_MULTI
  bridge/mdb: remove wrong use of NLM_F_MULTI
  net: sched: act_connmark: don't zap skb->nfct
  trivial: net: systemport: bcmsysport.h: fix 0x0x prefix
  ...
2015-05-01 20:51:04 -07:00
Linus Torvalds
4a152c3913 Power management and ACPI fixes for v4.1-rc2
- Fix for a regression in the cpuidle core introduced by one of
    the recent commits in the clockevents_notify() removal series
    that put a call to a function which had to be executed with
    disabled interrupts into a code path running with enabled
    interrupts (Rafael J Wysocki).
 
  - Fix for a build problem in ACPICA (with GCC 4.5) introduced by one
    of the recent ACPICA tools commits that added a duplicate typedef
    to one of the ACPICA's header files by mistake (Olaf Hering).
 
  - Fix for a regression in the ACPI SBS (Smart Battery Subsystem)
    driver introduced during the 3.18 development cycle causing the
    smart battery manager to be marked as not present when it should
    be marked as present (Chris Bainbridge).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJVQoeSAAoJEILEb/54YlRxuscP/3Vz98JY0P7sMLnxVU8RPBxW
 cBN7G7F1gZup66/yV3MzvYHbgOFwNdQ2aMVxN+qV7rVMw27uwHljVr3svB+/GS15
 TeyKxvFWZYwhPKdsNAoHdkGBsptAK4DdyA+N2wH3MZ4dd8HepxpBG6QvzaEt4s43
 wFhhFrzEP3HDjrPoF/7TwKbsSrFuU5/U5PTd3dIuukj6JAl0PWnjczuVsg1PRyCW
 9kvefs3U56s8GbWOhDC+QyQ3eYJ2y35j/XnCKFWwljcdnis6HpsKj+oxSG2o0K4o
 9LLL3MgnTQuFbenuFIJUsMdFFdFrFkEeiV1EAdKCzQcXdlipxvi2cPttdNWVd6eZ
 JobXyhq7iYUQK5E5Dp6TCokxhzOyjvTo7VmEkkkqLVmfeO0FnggUAduRMAIacYXS
 ZML45m4UtvCJRoQXXxyzwYQXFeRD/nJQlOavC95jGGqtAij0Xk4xqTauQKIiqiaE
 zobOJ5ZxJMP+njFfzyDxjm68LjobYm1fUJTpWzpUSExjEKImRcw/QMzw23MyHHUL
 IJ7hP8FcELglkZWk/0ZwoNhbwM5Q2qGX5WMtqyunNFcf83aBsCr8bSjOrm4zmuL7
 NWjjeXuYwzPuC7SnFqf2K/CCfdkrdnU1frKLTpOm8t8W0P4mr1ckYrPemNemr8sN
 e5GozK5m13T2gTT4+q6M
 =v78a
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-4.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI fixes from Rafael Wysocki:
 "Three regression fixes this time, one for a recent regression in the
  cpuidle core affecting multiple systems, one for an inadvertently
  added duplicate typedef in ACPICA that breaks compilation with GCC 4.5
  and one for an ACPI Smart Battery Subsystem driver regression
  introduced during the 3.18 cycle (stable-candidate).

  Specifics:

   - Fix for a regression in the cpuidle core introduced by one of the
     recent commits in the clockevents_notify() removal series that put
     a call to a function which had to be executed with disabled
     interrupts into a code path running with enabled interrupts (Rafael
     J Wysocki)

   - Fix for a build problem in ACPICA (with GCC 4.5) introduced by one
     of the recent ACPICA tools commits that added a duplicate typedef
     to one of the ACPICA's header files by mistake (Olaf Hering)

   - Fix for a regression in the ACPI SBS (Smart Battery Subsystem)
     driver introduced during the 3.18 development cycle causing the
     smart battery manager to be marked as not present when it should be
     marked as present (Chris Bainbridge)"

* tag 'pm+acpi-4.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpuidle: Run tick_broadcast_exit() with disabled interrupts
  ACPI / SBS: Enable battery manager when present
  ACPICA: remove duplicate u8 typedef
2015-04-30 14:23:31 -07:00
Linus Torvalds
9dbbe3cfc3 Remove from guest code the handling of task migration during a
pvclock read; instead use the correct protocol in KVM.
 
 This removes the need for task migration notifiers in core
 scheduler code.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJVQkUWAAoJEL/70l94x66DhfcH/A8RTHUOELtoy+v2weahn21m
 FFWEnEUlCWzYgmiddgFdlr6+ub386W3ryFsXKPqjrn/8LVv3yS7tK1NJF8d03LQw
 n7HtIsrF01E9UI8CIWO4S/mUxWQev6vEJ9NXtNrsJcRmhSeLaIZkPjTH8Zqyx4i9
 ZvG4731WHXmxvbJ03bfJU9Y8OwHXe55GMi614aTxPndVBGdvIRu2Oj6aTfQTeab/
 7tEujub0MKWp74a7eyNU4GItcvIAXZCQt2wMc5dN1VK3ma5FTOnHIOuhAb8mACFF
 qEeGhtxAnOf7W+s9J8i7zVBdA5MOS0vUKng361ZOVGDb0OLqcVADW7GpuTZfRAM=
 =2A7v
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm changes from Paolo Bonzini:
 "Remove from guest code the handling of task migration during a pvclock
  read; instead use the correct protocol in KVM.

  This removes the need for task migration notifiers in core scheduler
  code"

[ The scheduler people really hated the migration notifiers, so this was
  kind of required  - Linus ]

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  x86: pvclock: Really remove the sched notifier for cross-cpu migrations
  kvm: x86: fix kvmclock update protocol
2015-04-30 09:44:04 -07:00
David Howells
9c4249c8e0 modsign: change default key details
Change default key details to be more obviously unspecified.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-30 09:35:41 -07:00
Lai Jiangshan
042f7df15a workqueue: Allow modifying low level unbound workqueue cpumask
Allow to modify the low-level unbound workqueues cpumask through
sysfs. This is performed by traversing the entire workqueue list
and calling apply_wqattrs_prepare() on the unbound workqueues
with the new low level mask. Only after all the preparation are done,
we commit them all together.

Ordered workqueues are ignored from the low level unbound workqueue
cpumask, it will be handled in near future.

All the (default & per-node) pwqs are mandatorily controlled by
the low level cpumask. If the user configured cpumask doesn't overlap
with the low level cpumask, the low level cpumask will be used for the
wq instead.

The comment of wq_calc_node_cpumask() is updated and explicitly
requires that its first argument should be the attrs of the default
pwq.

The default wq_unbound_cpumask is cpu_possible_mask.  The workqueue
subsystem doesn't know its best default value, let the system manager
or the other subsystem set it when needed.

Changed from V8:
  merge the calculating code for the attrs of the default pwq together.
  minor change the code&comments for saving the user configured attrs.
  remove unnecessary list_del().
  minor update the comment of wq_calc_node_cpumask().
  update the comment of workqueue_set_unbound_cpumask();

Cc: Christoph Lameter <cl@linux.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Original-patch-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2015-04-30 09:24:29 -04:00
Jiri Kosina
5d4351ba65 livepatch: x86: make kASLR logic more accurate
We give up old_addr hint from the coming patch module in cases when kernel load
base has been randomized (as in such case, the coming module has no idea about
the exact randomization offset).

We are currently too pessimistic, and give up immediately as soon as
CONFIG_RANDOMIZE_BASE is set; this doesn't however directly imply that the
load base has actually been randomized. There are config options that
disable kASLR (such as hibernation), user could have disabled kaslr on
kernel command-line, etc.

The loader propagates the information whether kernel has been randomized
through bootparams. This allows us to have the condition more accurate.

On top of that, it seems unnecessary to give up old_addr hints even if
randomization is active. The relocation offset can be computed using
kaslr_ofsset(), and therefore old_addr can be adjusted accordingly.

Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-04-29 16:51:33 +02:00
Rafael J. Wysocki
df8d9eeadd cpuidle: Run tick_broadcast_exit() with disabled interrupts
Commit 335f49196f (sched/idle: Use explicit broadcast oneshot
control function) replaced clockevents_notify() invocations in
cpuidle_idle_call() with direct calls to tick_broadcast_enter()
and tick_broadcast_exit(), but it overlooked the fact that
interrupts were already enabled before calling the latter which
led to functional breakage on systems using idle states with the
CPUIDLE_FLAG_TIMER_STOP flag set.

Fix that by moving the invocations of tick_broadcast_enter()
and tick_broadcast_exit() down into cpuidle_enter_state() where
interrupts are still disabled when tick_broadcast_exit() is
called.  Also ensure that interrupts will be disabled before
running tick_broadcast_exit() even if they have been enabled by
the idle state's ->enter callback.  Trigger a WARN_ON_ONCE() in
that case, as we generally don't want that to happen for states
with CPUIDLE_FLAG_TIMER_STOP set.

Fixes: 335f49196f (sched/idle: Use explicit broadcast oneshot control function)
Reported-and-tested-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reported-and-tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-04-29 15:19:21 +02:00
Linus Torvalds
14bc84ce0b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "One additional new feature for 4.1, a new PRNG based on SHA-512 for
  the zcrypt driver.

  Two memory management related changes, the page table reallocation for
  KVM is removed, and with file ptes gone the encoding of page table
  entries is improved.

  And three bug fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/zcrypt: Introduce new SHA-512 based Pseudo Random Generator.
  s390/mm: change swap pte encoding and pgtable cleanup
  s390/mm: correct transfer of dirty & young bits in __pmd_to_pte
  s390/bpf: add dependency to z196 features
  s390/3215: free memory in error path
  s390/kvm: remove delayed reallocation of page tables for KVM
  kexec: allocate the kexec control page with KEXEC_CONTROL_MEMORY_GFP
2015-04-28 09:58:46 -07:00
Alexei Starovoitov
876a7ae65b bpf: fix 64-bit divide
ALU64_DIV instruction should be dividing 64-bit by 64-bit,
whereas do_div() does 64-bit by 32-bit divide.
x64 and arm64 JITs correctly implement 64 by 64 unsigned divide.
llvm BPF backend emits code assuming that ALU64_DIV does 64 by 64.

Fixes: 89aa075832 ("net: sock: allow eBPF programs to be attached to sockets")
Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-27 23:11:49 -04:00
Frederic Weisbecker
b05a79280b workqueue: Create low-level unbound workqueues cpumask
Create a cpumask that limits the affinity of all unbound workqueues.
This cpumask is controlled through a file at the root of the workqueue
sysfs directory.

It works on a lower-level than the per WQ_SYSFS workqueues cpumask files
such that the effective cpumask applied for a given unbound workqueue is
the intersection of /sys/devices/virtual/workqueue/$WORKQUEUE/cpumask and
the new /sys/devices/virtual/workqueue/cpumask file.

This patch implements the basic infrastructure and the read interface.
wq_unbound_cpumask is initially set to cpu_possible_mask.

Cc: Christoph Lameter <cl@linux.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2015-04-27 11:13:40 -04:00
Lai Jiangshan
2d5f0764b5 workqueue: split apply_workqueue_attrs() into 3 stages
Current apply_workqueue_attrs() includes pwqs-allocation and pwqs-installation,
so when we batch multiple apply_workqueue_attrs()s as a transaction, we can't
ensure the transaction must succeed or fail as a complete unit.

To solve this, we split apply_workqueue_attrs() into three stages.
The first stage does the preparation: allocation memory, pwqs.
The second stage does the attrs-installaion and pwqs-installation.
The third stage frees the allocated memory and (old or unused) pwqs.

As the result, batching multiple apply_workqueue_attrs()s can
succeed or fail as a complete unit:
	1) batch do all the first stage for all the workqueues
	2) only commit all when all the above succeed.

This patch is a preparation for the next patch ("Allow modifying low level
unbound workqueue cpumask") which will do a multiple apply_workqueue_attrs().

The patch doesn't have functionality changed except two minor adjustment:
	1) free_unbound_pwq() for the error path is removed, we use the
	   heavier version put_pwq_unlocked() instead since the error path
	   is rare. this adjustment simplifies the code.
	2) the memory-allocation is also moved into wq_pool_mutex.
	   this is needed to avoid to do the further splitting.

tj: minor updates to comments.

Suggested-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2015-04-27 11:13:40 -04:00
Paolo Bonzini
73459e2a1a x86: pvclock: Really remove the sched notifier for cross-cpu migrations
This reverts commits 0a4e6be9ca
and 80f7fdb1c7.

The task migration notifier was originally introduced in order to support
the pvclock vsyscall with non-synchronized TSC, but KVM only supports it
with synchronized TSC.  Hence, on KVM the race condition is only needed
due to a bad implementation on the host side, and even then it's so rare
that it's mostly theoretical.

As far as KVM is concerned it's possible to fix the host, avoiding the
additional complexity in the vDSO and the (re)introduction of the task
migration notifier.

Xen, on the other hand, hasn't yet implemented vsyscall support at
all, so we do not care about its plans for non-synchronized TSC.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-27 15:49:30 +02:00
Linus Torvalds
9ec3a646fe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull fourth vfs update from Al Viro:
 "d_inode() annotations from David Howells (sat in for-next since before
  the beginning of merge window) + four assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  RCU pathwalk breakage when running into a symlink overmounting something
  fix I_DIO_WAKEUP definition
  direct-io: only inc/dec inode->i_dio_count for file systems
  fs/9p: fix readdir()
  VFS: assorted d_backing_inode() annotations
  VFS: fs/inode.c helpers: d_inode() annotations
  VFS: fs/cachefiles: d_backing_inode() annotations
  VFS: fs library helpers: d_inode() annotations
  VFS: assorted weird filesystems: d_inode() annotations
  VFS: normal filesystems (and lustre): d_inode() annotations
  VFS: security/: d_inode() annotations
  VFS: security/: d_backing_inode() annotations
  VFS: net/: d_inode() annotations
  VFS: net/unix: d_backing_inode() annotations
  VFS: kernel/: d_inode() annotations
  VFS: audit: d_backing_inode() annotations
  VFS: Fix up some ->d_inode accesses in the chelsio driver
  VFS: Cachefiles should perform fs modifications on the top layer only
  VFS: AF_UNIX sockets should call mknod on the top layer only
2015-04-26 17:22:07 -07:00
Viresh Kumar
149aabcc44 clockevents: Shutdown detached clockevent device
A clockevent device is marked DETACHED when it is replaced by another
clockevent device.

The device is shutdown properly for drivers that implement legacy
->set_mode() callback, as we call ->set_mode() for CLOCK_EVT_MODE_UNUSED
as well.

But for the new per-state callback interface, we skip shutting down the
device, as we thought its an internal state change. That wasn't correct.

The effect is that the device is left programmed in oneshot or periodic
mode.

Fall-back to 'case CLOCK_EVT_STATE_SHUTDOWN', to shutdown the device.

Fixes: bd624d75db "clockevents: Introduce mode specific callbacks"
Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/eef0a91c51b74d4e52c8e5a95eca27b5a0563f07.1428650683.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 21:01:48 +02:00
Roger Quadros
10a50f1ab5 genirq: Set IRQCHIP_SKIP_SET_WAKE flag for dummy_irq_chip
Without this system suspend is broken on systems that have
drivers calling enable/disable_irq_wake() for interrupts based off
the dummy irq hook. (e.g. drivers/gpio/gpio-pcf857x.c)

Signed-off-by: Roger Quadros <rogerq@ti.com>
Cc: <cw00.choi@samsung.com>
Cc: <balbi@ti.com>
Cc: <tony@atomide.com>
Cc: Gregory Clement <gregory.clement@free-electrons.com>
Link: http://lkml.kernel.org/r/552E1DD3.4040106@ti.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-24 20:57:06 +02:00
Chen Hanxiao
d0f702e648 cgroup: fix some comment typos
s/effctive/effective
s/hierarhcy/hierarchy
s/shoulid/should

Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2015-04-23 11:09:36 -04:00
Martin Schwidefsky
7e01b5acd8 kexec: allocate the kexec control page with KEXEC_CONTROL_MEMORY_GFP
Introduce KEXEC_CONTROL_MEMORY_GFP to allow the architecture code
to override the gfp flags of the allocation for the kexec control
page. The loop in kimage_alloc_normal_control_pages allocates pages
with GFP_KERNEL until a page is found that happens to have an
address smaller than the KEXEC_CONTROL_MEMORY_LIMIT. On systems
with a large memory size but a small KEXEC_CONTROL_MEMORY_LIMIT
the loop will keep allocating memory until the oom killer steps in.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-04-23 16:52:01 +02:00
Thomas Gleixner
b484403b9a sched: debug: Remove the cfs bandwidth timer_active printout
The struct member is gone.

Reported-by: fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2015-04-23 13:58:09 +02:00
Linus Torvalds
27cf3a16b2 Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit
Pull audit fixes from Paul Moore:
 "Seven audit patches for v4.1, all bug fixes.

  The largest, and perhaps most significant commit helps resolve some
  memory pressure issues related to the inode cache and audit, there are
  also a few small commits which help resolve some timing issues with
  the audit log queue, and the rest fall into the always popular "code
  clean-up" category.

  In general, nothing really substantial, just a nice set of maintenance
  patches"

* 'upstream' of git://git.infradead.org/users/pcmoore/audit:
  audit: Remove condition which always evaluates to false
  audit: reduce mmap_sem hold for mm->exe_file
  audit: consolidate handling of mm->exe_file
  audit: code clean up
  audit: don't reset working wait time accidentally with auditd
  audit: don't lose set wait time on first successful call to audit_log_start()
  audit: move the tree pruning to a dedicated thread
2015-04-22 14:49:23 -07:00
kbuild test robot
9183034879 perf: perf_mux_hrtimer_cancel() can be static
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: kbuild-all@01.org
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150422200000.GA122603@lkp-sb04
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 22:06:34 +02:00
Linus Torvalds
4f2112351b This adds three fixes for the tracing code.
The first is a bug when ftrace_dump_on_oops is triggered in atomic context
 and function graph tracer is the tracer that is being reported.
 
 The second fix is bad parsing of the trace_events from the kernel
 command line, where it would ignore specific events if the system
 name is used with defining the event(it enables all events within the
 system).
 
 The last one is a fix to the TRACE_DEFINE_ENUM(), where a check was missing
 to see if the ptr was incremented to the end of the string, but the loop
 increments it again and can miss the nul delimiter to stop processing.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVN7g7AAoJEEjnJuOKh9ldLCkIAJj6wsj59wR8aIxtxnD5bJZZ
 Y9dKkas6BOaUCGp0MVBCkpm3uMgh0uPO10jOuthgc3LgQy0piwKJrIzGkI5gcRuA
 JvZw2X08/jCSNu8BHydes2A0XMkZobMuWFAeWz3CSzNBfbI3sDDqgnRQ9eyMM66R
 +sPV7ZELQLK/ZFs93gFoaZe/OKGYJcamhMtG2v7p9h1qBJZUDakRFo478bAwu5SB
 Kh6xpXA76WnmkeQekAAsWGdqIQzoNy3IoVePmdhVpZvoiKLUrf1JWVYFoNkO39Ik
 kC1gqJro0EmHQFOo5rt8yQmqXjvSU0sS/sAW3HZ0Szl5OtiUNnag+Wt3ku7lr8U=
 =VORa
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "This adds three fixes for the tracing code.

  The first is a bug when ftrace_dump_on_oops is triggered in atomic
  context and function graph tracer is the tracer that is being
  reported.

  The second fix is bad parsing of the trace_events from the kernel
  command line, where it would ignore specific events if the system name
  is used with defining the event(it enables all events within the
  system).

  The last one is a fix to the TRACE_DEFINE_ENUM(), where a check was
  missing to see if the ptr was incremented to the end of the string,
  but the loop increments it again and can miss the nul delimiter to
  stop processing"

* tag 'trace-v4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix possible out of bounds memory access when parsing enums
  tracing: Fix incorrect enabling of trace events by boot cmdline
  tracing: Handle ftrace_dump() atomic context in graph_trace_open()
2015-04-22 11:27:36 -07:00
Linus Torvalds
15ce2658dd Quentin opened a can of worms by adding extable entry checking to modpost,
but most architectures seem fixed now.  Thanks to all involved.
 
 Last minute rebase because I noticed a "[PATCH]" had snuck into a commit
 message somehow.
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVN1X3AAoJENkgDmzRrbjxmBEP/1Kv1wCrUGg07TfqNTJgr5Pe
 R81C+8P2wSEsA94lBhLNbn2DsN7uWsygcSXZMQ+5CrmbKAG45kRlMUzSlg6IdGXo
 usA9rOlH4hR+Dqn3zyx543xMJyp0a0LoSCLWGMmr/U71AXNp5kehvbIv64gJHrwg
 4CALI9qCwHH4FXF7WBYdLknlfpmD7KzCjIVckkfUfwf2A8mZXKdqBFQ+MPqbChEm
 vpFWkgp1oJgK4OcvvBeiNT2PwFZy1vmJtQWc2lieJ6HCjbqvwgM0g3EYTIQmccIm
 O40mEFMNkUuliMtrCZgOJBP2bmWy79JydiUq5Mu5Cb4bL6jGl2r6Z+w+iQ7Sggv9
 DTqTrk0/Cn02krJXDXAHVOXXgTqggf1xev6wr+3dbZn5sytQ8ddImuuVai8KALoC
 aEXAsnAK+3GnkAn9qqKVFhnrnCcI69I6srUAn5DKVWu36+Ye05FsZboIFYcxcK1F
 NZ+31xA17xA+yoo62ly/Auz7DXMa2QZfJcoQLMx2fMgL/+wMqzfGbcrZ5ZaVIj4w
 ePig1SbozAweAAfX7+W+QCQ3wqNN/nJgMO8vPldlcq7eaXH0/Dqu72cC2d82dK+H
 xl/1pEj83W0AkUkfNS8j3Z4zSzg31g8bK5SmPPRID00mx8roCfsP/Cfx5hRiO8U2
 ECtbKDxSyUWwF9vnOLVR
 =EYrW
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module updates from Rusty Russell:
 "Quentin opened a can of worms by adding extable entry checking to
  modpost, but most architectures seem fixed now.  Thanks to all
  involved.

  Last minute rebase because I noticed a "[PATCH]" had snuck into a
  commit message somehow"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  modpost: don't emit section mismatch warnings for compiler optimizations
  modpost: expand pattern matching to support substring matches
  modpost: do not try to match the SHT_NUL section.
  modpost: fix extable entry size calculation.
  modpost: fix inverted logic in is_extable_fault_address().
  modpost: handle -ffunction-sections
  modpost: Whitelist .text.fixup and .exception.text
  params: handle quotes properly for values not of form foo="bar".
  modpost: document the use of struct section_check.
  modpost: handle relocations mismatch in __ex_table.
  scripts: add check_extable.sh script.
  modpost: mismatch_handler: retrieve tosym information only when needed.
  modpost: factorize symbol pretty print in get_pretty_name().
  modpost: add handler function pointer to sectioncheck.
  modpost: add .sched.text and .kprobes.text to the TEXT_SECTIONS list.
  modpost: add strict white-listing when referencing sections.
  module: do not print allocation-fail warning on bogus user buffer size
  kernel/module.c: fix typos in message about unused symbols
2015-04-22 09:49:24 -07:00
Peter Zijlstra
272325c482 perf: Fix mux_interval hrtimer wreckage
Thomas stumbled over the hrtimer_forward_now() in
perf_event_mux_interval_ms_store() and noticed its broken-ness.

You cannot just change the expiry time of an active timer, it will
destroy the red-black tree order and cause havoc.

Change it to (re)start the timer instead, (re)starting a timer will
dequeue and enqueue a timer and therefore preserve rb-tree order.

Since we cannot enqueue remotely, wrap the thing in
cpu_function_call(), this however mandates that we restrict ourselves
to online cpus. Also serialize the entire setting so we don't get
multiple concurrent threads trying to update to different values.

Also fix a problem in perf_mux_hrtimer_restart(), checking against
hrtimer_active() can actually loose us the timer when timer->state ==
HRTIMER_STATE_CALLBACK and the callback has already decided NORESTART.

Furthermore it doesn't make any sense to test
hrtimer_callback_running() when we already tested hrtimer_active(),
but with the above change, we explicitly must call it when
callback_running.

Lastly, rename a few functions:

  s/perf_cpu_hrtimer_/perf_mux_hrtimer_/ -- because I could not find
                                            the mux timer function

  s/\<hr\>/timer/ -- because that's the normal way of calling things.

Fixes: 62b8563979 ("perf: Add sysfs entry to adjust multiplexing interval per PMU")
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150415095011.863052571@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:12:22 +02:00
Peter Zijlstra
77a4d1a1b9 sched: Cleanup bandwidth timers
Roman reported a 3 cpu lockup scenario involving __start_cfs_bandwidth().

The more I look at that code the more I'm convinced its crack, that
entire __start_cfs_bandwidth() thing is brain melting, we don't need to
cancel a timer before starting it, *hrtimer_start*() will happily remove
the timer for you if its still enqueued.

Removing that, removes a big part of the problem, no more ugly cancel
loop to get stuck in.

So now, if I understand things right, the entire reason you have this
cfs_b->lock guarded ->timer_active nonsense is to make sure we don't
accidentally lose the timer.

It appears to me that it should be possible to guarantee that same by
unconditionally (re)starting the timer when !queued. Because regardless
what hrtimer::function will return, if we beat it to (re)enqueue the
timer, it doesn't matter.

Now, because hrtimers don't come with any serialization guarantees we
must ensure both handler and (re)start loop serialize their access to
the hrtimer to avoid both trying to forward the timer at the same
time.

Update the rt bandwidth timer to match.

This effectively reverts: 09dc4ab039 ("sched/fair: Fix
tg_set_cfs_bandwidth() deadlock on rq->lock").

Reported-by: Roman Gushchin <klamm@yandex-team.ru>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: Paul Turner <pjt@google.com>
Link: http://lkml.kernel.org/r/20150415095011.804589208@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:53 +02:00
Peter Zijlstra
5de2755c8c hrtimer: Allow concurrent hrtimer_start() for self restarting timers
Because we drop cpu_base->lock around calling hrtimer::function, it is
possible for hrtimer_start() to come in between and enqueue the timer.

If hrtimer::function then returns HRTIMER_RESTART we'll hit the BUG_ON
because HRTIMER_STATE_ENQUEUED will be set.

Since the above is a perfectly valid scenario, remove the BUG_ON and
make the enqueue_hrtimer() call conditional on the timer not being
enqueued already.

NOTE: in that concurrent scenario its entirely common for both sites
to want to modify the hrtimer, since hrtimers don't provide
serialization themselves be sure to provide some such that the
hrtimer::function and the hrtimer_start() caller don't both try and
fudge the expiration state at the same time.

To that effect, add a WARN when someone tries to forward an already
enqueued timer, the most common way to change the expiry of self
restarting timers. Ideally we'd put the WARN in everything modifying
the expiry but most of that is inlines and we don't need the bloat.

Fixes: 2d44ae4d71 ("hrtimer: clean up cpu->base locking tricks")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Paul Turner <pjt@google.com>
Link: http://lkml.kernel.org/r/20150415113105.GT5029@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:52 +02:00
Thomas Gleixner
2ad5d3272d timer: Put usleep_range into the __sched section
do_usleep_range() and schedule_hrtimeout_range() are __sched as
well. So it makes no sense to have the exported function in a
different section.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203503.833709502@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:52 +02:00
Thomas Gleixner
6deba083e1 timer: Remove pointless return value of do_usleep_range()
The only user ignores it anyway and rightfully so.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203503.756060258@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:52 +02:00
Thomas Gleixner
19d9f4225d hrtimer: Avoid locking in hrtimer_cancel() if timer not active
We can do a lockless check for hrtimer_active before actually taking
the lock in hrtimer[_try_to]_cancel. This is useful for hotpath users
like nanosleep as they avoid the lock dance when the timer has
expired.

This is safe because active is true when the timer is enqueued or the
callback is running. Taking the hrtimer base lock does not protect
against concurrent hrtimer_start calls, the callsite has to do the
proper serialization itself.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203503.580273114@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:52 +02:00
Thomas Gleixner
61699e1307 hrtimer: Remove hrtimer_start() return value
No user was ever interested whether the timer was active or not when
it was started. All abusers of the return value are gone, so get rid
of it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203503.483556394@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:52 +02:00
Thomas Gleixner
b8a62f1ff0 tick: broadcast-hrtimer: Remove overly clever return value abuse
The assignment of bc_moved in the conditional construct relies on the
fact that in the case of hrtimer_start() invocation the return value
is always 0. It took me a while to understand it.

We want to get rid of the hrtimer_start() return value. Open code the
logic which makes it readable as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203503.404751457@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:52 +02:00
Thomas Gleixner
b193217e6d alarmtimer: Get rid of unused return value
We want to get rid of the hrtimer_start() return value and the alarm
timer return value is nowhere used. Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20150414203503.243910615@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:52 +02:00
Thomas Gleixner
ccdd92c17e rtmutex: Remove bogus hrtimer_active() check
The check for hrtimer_active() after starting the timer is
pointless. If the timer is inactive it has expired already and
therefor the task pointer is already NULL.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203503.081830481@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:51 +02:00
Thomas Gleixner
2e4b0d3fe8 futex: Remove bogus hrtimer_active() check
The check for hrtimer_active() after starting the timer is
pointless. If the timer is inactive it has expired already and
therefor the task pointer is already NULL.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203502.985825453@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:51 +02:00
Thomas Gleixner
3f7b349ac1 hrtimer: Remove bogus hrtimer_active() check
The check for hrtimer_active() after starting the timer is
pointless. If the timer is inactive it has expired already and
therefor the task pointer is already NULL.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203502.907149271@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:51 +02:00
Thomas Gleixner
02a171af1a hrtimer: Make hrtimer_start() a inline wrapper
No point for an extra export just to set the extra argument of
hrtimer_start_range_ns() to 0.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203502.808544539@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:51 +02:00
Thomas Gleixner
58f1f803f1 hrtimer: Get rid of __hrtimer_start_range_ns()
No more callers. Remove the leftovers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203502.707871492@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:51 +02:00
Thomas Gleixner
cc9684d3c1 sched: deadline: Use hrtimer_start()
hrtimer_start() does not longer defer already expired timers to the
softirq. Get rid of the __hrtimer_start_range_ns() invocation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203502.627353666@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:51 +02:00
Thomas Gleixner
4961b6e118 sched: core: Use hrtimer_start[_expires]()
hrtimer_start() now enforces a timer interrupt when an already expired
timer is enqueued.

Get rid of the __hrtimer_start_range_ns() invocations and the loops
around it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203502.531131739@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:51 +02:00
Thomas Gleixner
3497d206c4 perf: core: Use hrtimer_start()
hrtimer_start() does not longer defer already expired timers to the
softirq. Get rid of the __hrtimer_start_range_ns() invocation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203502.452104213@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:51 +02:00
Thomas Gleixner
c1ad348b45 tick: Nohz: Rework next timer evaluation
The evaluation of the next timer in the nohz code is based on jiffies
while all the tick internals are nano seconds based. We have also to
convert hrtimer nanoseconds to jiffies in the !highres case. That's
just wrong and introduces interesting corner cases.

Turn it around and convert the next timer wheel timer expiry and the
rcu event to clock monotonic and base all calculations on
nanoseconds. That identifies the case where no timer is pending
clearly with an absolute expiry value of KTIME_MAX.

Makes the code more readable and gets rid of the jiffies magic in the
nohz code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Link: http://lkml.kernel.org/r/20150414203502.184198593@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:50 +02:00
Thomas Gleixner
157d29e101 tick: Sched: Restructure code
Get rid of one indentation level. Preparatory patch for a major
rework. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Link: http://lkml.kernel.org/r/20150414203502.101563235@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:50 +02:00
Thomas Gleixner
0ff53d0964 tick: sched: Force tick interrupt and get rid of softirq magic
We already got rid of the hrtimer reprogramming loops and hoops as
hrtimer now enforces an interrupt if the enqueued time is in the past.

Do the same for the nohz non highres mode. That gets rid of the need
to raise the softirq which only serves the purpose of getting the
machine out of the inner idle loop.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Link: http://lkml.kernel.org/r/20150414203502.023464878@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:50 +02:00
Thomas Gleixner
afc08b15cc tick: sched: Remove hrtimer_active() checks
hrtimer_start() enforces a timer interrupt if the timer is already
expired. Get rid of the checks and the forward loop.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Link: http://lkml.kernel.org/r/20150414203501.943658239@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:50 +02:00
Thomas Gleixner
c6eb3f70d4 hrtimer: Get rid of hrtimer softirq
hrtimer softirq is a leftover from the initial implementation and
serves only the purpose to handle the enqueueing of already expired
timers in the high resolution timer mode. We discussed whether we
change the return value and force all start sites to handle that the
timer is already expired, but that would be a Herculean task and I'm
not sure whether its a good idea to enforce that handling on
everyone.

A simpler solution is to enforce a timer interrupt instead of raising
and scheduling a softirq. Just use the existing infrastructure to do
so and remove all the softirq leftovers.

The HRTIMER softirq enum is now unused, but kept around because trace
parsers rely on the existing numbering.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203501.840834708@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:50 +02:00
Thomas Gleixner
895bdfa793 hrtimer: Keep pointer to first timer and simplify __remove_hrtimer()
__remove_hrtimer() needs to evaluate the expiry time to figure out
whether the timer which is removed is eventually the first expiring
timer on the cpu. Keep a pointer to it, which is lazily updated, so we
can avoid the evaluation dance and retrieve the information from there.

Generates slightly better code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203501.752838019@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:50 +02:00
Thomas Gleixner
b97f44c9b6 hrtimer: Make use of timerqueue_add/del return values
Use the return value instead of reevaluating the information.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203501.658152945@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:50 +02:00
Thomas Gleixner
34aee88a02 hrtimer: Use cpu_base->active_base for hotpath iterators
The active_bases field is guaranteed to be in sync with the timerqueue
of the corresponding clock base. So we can use it for iterating over
the clock bases. This allows to break out early if no more active
clock bases are available and avoids touching the cache lines of
inactive clock bases.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203501.322887675@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:49 +02:00
Thomas Gleixner
e19ffe8be2 hrtimer: Use bits for various boolean indicators
No point in wasting 12 byte storage space. Generates better code as well.

Text size reduction:
       x8664 -64, i386 -16, ARM -132, ARM64 -0, power64 -48

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203501.227955358@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:49 +02:00
Thomas Gleixner
868a3e915f hrtimer: Make offset update smarter
On every tick/hrtimer interrupt we update the offset variables of the
clock bases. That's silly because these offsets change very seldom.

Add a sequence counter to the time keeping code which keeps track of
the offset updates (clock_was_set()). Have a sequence cache in the
hrtimer cpu bases to evaluate whether the offsets must be updated or
not. This allows us later to avoid pointless cacheline pollution.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20150414203501.132820245@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
2015-04-22 17:06:49 +02:00
Thomas Gleixner
21d6d52a1b hrtimer: Get rid of softirq time
The softirq time field in the clock bases is an optimization from the
early days of hrtimers. It provides a coarse "jiffies" like time
mostly for self rearming timers.

But that comes with a price:
    - Larger code size
    - Extra storage space
    - Duplicated functions with really small differences
   
The benefit of this is optimization is marginal for contemporary
systems.

Consolidate everything on the high resolution timer
implementation. This makes further optimizations possible.

Text size reduction:
       x8664 -95, i386 -356, ARM -148, ARM64 -40, power64 -16

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203501.039977424@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:49 +02:00