There are cases where RCU callback needs to be bounced to a sleepable
context. This is currently done by the RCU callback queueing a work
item, which can be cumbersome to write and confusing to read.
This patch introduces rcu_work, a workqueue work variant which gets
executed after a RCU grace period, and converts the open coded
bouncing in fs/aio and kernel/cgroup.
v3: Dropped queue_rcu_work_on(). Documented rcu grace period behavior
after queue_rcu_work().
v2: Use rcu_barrier() instead of synchronize_rcu() to wait for
completion of previously queued rcu callback as per Paul.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Dealing with 'struct timeval' users in the y2038 series is a bit tricky:
We have two definitions of timeval that are visible to user space,
one comes from glibc (or some other C library), the other comes from
linux/time.h. The kernel copy is what we want to be used for a number of
structures defined by the kernel itself, e.g. elf_prstatus (used it core
dumps), sysinfo and rusage (used in system calls). These generally tend
to be used for passing time intervals rather than absolute (epoch-based)
times, so they do not suffer from the y2038 overflow. Some of them
could be changed to use 64-bit timestamps by creating new system calls,
others like the core files cannot easily be changed.
An application using these interfaces likely also uses gettimeofday()
or other interfaces that use absolute times, and pass 'struct timeval'
pointers directly into kernel interfaces, so glibc must redefine their
timeval based on a 64-bit time_t when they introduce their y2038-safe
interfaces.
The only reasonable way forward I see is to remove the 'timeval'
definion from the kernel's uapi headers, and change the interfaces that
we do not want to (or cannot) duplicate for 64-bit times to use a new
__kernel_old_timeval definition instead. This type should be avoided
for all new interfaces (those can use 64-bit nanoseconds, or the 64-bit
version of timespec instead), and should be used with great care when
converting existing interfaces from timeval, to be sure they don't suffer
from the y2038 overflow, and only with consensus for the particular user
that using __kernel_old_timeval is better than moving to a 64-bit based
interface. The structure name is intentionally chosen to not conflict
with user space types, and to be ugly enough to discourage its use.
Note that ioctl based interfaces that pass a bare 'timeval' pointer
cannot change to '__kernel_old_timeval' because the user space source
code refers to 'timeval' instead, and we don't want to modify the user
space sources if possible. However, any application that relies on a
structure to contain an embedded 'timeval' (e.g. by passing a pointer
to the member into a function call that expects a timeval pointer) is
broken when that structure gets converted to __kernel_old_timeval. I
don't see any way around that, and we have to rely on the compiler to
produce a warning or compile failure that will alert users when they
recompile their sources against a new libc.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/20180315161739.576085-1-arnd@arndb.de
Pull x86/pti updates from Thomas Gleixner:
"Another set of melted spectrum updates:
- Iron out the last late microcode loading issues by actually
checking whether new microcode is present and preventing the CPU
synchronization to run into a timeout induced hang.
- Remove Skylake C2 from the microcode blacklist according to the
latest Intel documentation
- Fix the VM86 POPF emulation which traps if VIP is set, but VIF is
not. Enhance the selftests to catch that kind of issue
- Annotate indirect calls/jumps for objtool on 32bit. This is not a
functional issue, but for consistency sake its the right thing to
do.
- Fix a jump label build warning observed on SPARC64 which uses 32bit
storage for the code location which is casted to 64 bit pointer w/o
extending it to 64bit first.
- Add two new cpufeature bits. Not really an urgent issue, but
provides them for both x86 and x86/kvm work. No impact on the
current kernel"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode: Fix CPU synchronization routine
x86/microcode: Attempt late loading only when new microcode is present
x86/speculation: Remove Skylake C2 from Speculation Control microcode blacklist
jump_label: Fix sparc64 warning
x86/speculation, objtool: Annotate indirect calls/jumps for objtool on 32-bit kernels
x86/vm86/32: Fix POPF emulation
selftests/x86/entry_from_vm86: Add test cases for POPF
selftests/x86/entry_from_vm86: Exit with 1 if we fail
x86/cpufeatures: Add Intel PCONFIG cpufeature
x86/cpufeatures: Add Intel Total Memory Encryption cpufeature
When perf_group_dettach() is called on a group leader, it updates each
sibling's group_leader field to point to that sibling, effectively
upgrading each siblnig to a group leader. After perf_group_detach has
completed, the caller may free the leader event.
We only remove siblings from the group leader's sibling_list when the
leader has a non-empty group_node. This was fine prior to commit:
8343aae661 ("perf/core: Remove perf_event::group_entry")
... as the sibling's sibling_list would be empty. However, now that we
use the sibling_list field as both the list head and the list entry,
this leaves each sibling with a non-empty sibling list, including the
stale leader event.
If perf_group_detach() is subsequently called on a sibling, it will
appear to be a group leader, and we'll walk the sibling_list,
potentially dereferencing these stale events. In 0day testing, this has
been observed to result in kernel panics.
Let's avoid this by always removing siblings from the sibling list when
we promote them to leaders.
Fixes: 8343aae661 ("perf/core: Remove perf_event::group_entry")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: vincent.weaver@maine.edu
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: torvalds@linux-foundation.org
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: valery.cherepennikov@intel.com
Cc: linux-tip-commits@vger.kernel.org
Cc: eranian@google.com
Cc: acme@redhat.com
Cc: alexander.shishkin@linux.intel.com
Cc: davidcc@google.com
Cc: kan.liang@intel.com
Cc: Dmitry.Prohorov@intel.com
Cc: Jiri Olsa <jolsa@redhat.com>
Link: https://lkml.kernel.org/r/20180316131741.3svgr64yibc6vsid@lakrids.cambridge.arm.com
The CONFIG_MPU option was only defined on blackfin, and that architecture
is now being removed, so the respective code can be simplified.
A lot of other microcontrollers have an MPU, but I suspect that if we
want to bring that support back, we'd do it differently anyway.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The cpuhp_is_ap_state() function is no longer called outside of the
CONFIG_SMP #ifdef section, causing a harmless warning:
kernel/cpu.c:129:13: error: 'cpuhp_is_ap_state' defined but not used [-Werror=unused-function]
This moves the function into the #ifdef to get a clean build again.
Fixes: 17a2f1ced0 ("cpu/hotplug: Merge cpuhp_bp_states and cpuhp_ap_states")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Link: https://lkml.kernel.org/r/20180315153829.3819606-1-arnd@arndb.de
dump_stack related stuff should belong to lib/dump_stack.c thus move them
there. Also conditionally compile lib/dump_stack.c since dump_stack code
does not make sense if printk is disabled.
Link: http://lkml.kernel.org/r/20180213072834.GA24784@dhcp-128-65.nay.redhat.com
To: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org
Cc: akpm@linux-foundation.org
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Dave Young <dyoung@redhat.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Currently, bpf stackmap store address for each entry in the call trace.
To map these addresses to user space files, it is necessary to maintain
the mapping from these virtual address to symbols in the binary. Usually,
the user space profiler (such as perf) has to scan /proc/pid/maps at the
beginning of profiling, and monitor mmap2() calls afterwards. Given the
cost of maintaining the address map, this solution is not practical for
system wide profiling that is always on.
This patch tries to solve this problem with a variation of stackmap. This
variation is enabled by flag BPF_F_STACK_BUILD_ID. Instead of storing
addresses, the variation stores ELF file build_id + offset.
Build ID is a 20-byte unique identifier for ELF files. The following
command shows the Build ID of /bin/bash:
[user@]$ readelf -n /bin/bash
...
Build ID: XXXXXXXXXX
...
With BPF_F_STACK_BUILD_ID, bpf_get_stackid() tries to parse Build ID
for each entry in the call trace, and translate it into the following
struct:
struct bpf_stack_build_id_offset {
__s32 status;
unsigned char build_id[BPF_BUILD_ID_SIZE];
union {
__u64 offset;
__u64 ip;
};
};
The search of build_id is limited to the first page of the file, and this
page should be in page cache. Otherwise, we fallback to store ip for this
entry (ip field in struct bpf_stack_build_id_offset). This requires the
build_id to be stored in the first page. A quick survey of binary and
dynamic library files in a few different systems shows that almost all
binary and dynamic library files have build_id in the first page.
Build_id is only meaningful for user stack. If a kernel stack is added to
a stackmap with BPF_F_STACK_BUILD_ID, it will automatically fallback to
only store ip (status == BPF_STACK_BUILD_ID_IP). Similarly, if build_id
lookup failed for some reason, it will also fallback to store ip.
User space can access struct bpf_stack_build_id_offset with bpf
syscall BPF_MAP_LOOKUP_ELEM. It is necessary for user space to
maintain mapping from build id to binary files. This mostly static
mapping is much easier to maintain than per process address maps.
Note: Stackmap with build_id only works in non-nmi context at this time.
This is because we need to take mm->mmap_sem for find_vma(). If this
changes, we would like to allow build_id lookup in nmi context.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Al Viro reviewed the filter logic of ftrace trace events and found it to be
very troubling. It creates a binary tree based on the logic operators and
walks it during tracing. He sent myself and Tom Zanussi a long explanation
(and formal proof) of how to do the string parsing better and end up with a
program array that can be simply iterated to come up with the correct
results.
I took his ideas and his pseudo code and rewrote the filter logic based on
them. In doing so, I was able to remove a lot of code, and have a much more
condensed filter logic in the process. I wrote a very long comment
describing the methadology that Al proposed in my own words. For more info
on how this works, read the comment above predicate_parse().
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The pred_funcs_##type arrays consist of five functions that are assigned
based on the ops. The array must be in the same order of the ops each
function represents. The PRED_FUNC_START macro denotes the op enum that
starts the op that maps to the pred_funcs_##type arrays. This is all very
subtle and prone to bugs if the code is changed.
Add comments describing how PRED_FUNC_START and pred_funcs_##type array is
used, and also a PRED_FUNC_MAX that is the maximum number of functions in
the arrays.
Clean up select_comparison_fn() that assigns the predicates to the
pred_funcs_##type array function as well as add protection in case an op is
passed in that does not map correctly to the array.
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Instead of having a separate enum that is the index into another array, like
a string array, make a single macro that combines them into a single list,
and then the two can not get out of sync. This makes it easier to add and
remove items.
The macro trick is:
#define DOGS \
C( JACK, "Jack Russell") \
C( ITALIAN, "Italian Greyhound") \
C( GERMAN, "German Shepherd")
#undef C
#define C(a, b) a
enum { DOGS };
#undef C
#define C(a, b) b
static char dogs[] = { DOGS };
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
cpuhp_bp_states and cpuhp_ap_states have different set of steps without any
conflicting steps, so that they can be merged.
The original `[CPUHP_BRINGUP_CPU] = { },` is removed, because the new
cpuhp_hp_states has CPUHP_ONLINE index which is larger than
CPUHP_BRINGUP_CPU.
Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Link: https://lkml.kernel.org/r/20171201135008.21633-1-jiangshanlai@gmail.com
The kbuild test robot reported the following warning on sparc64:
kernel/jump_label.c: In function '__jump_label_update':
kernel/jump_label.c:376:51: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
WARN_ONCE(1, "can't patch jump_label at %pS", (void *)entry->code);
On sparc64, the jump_label entry->code field is of type u32, but
pointers are 64-bit. Silence the warning by casting entry->code to an
unsigned long before casting it to a pointer. This is also what the
sparc jump label code does.
Fixes: dc1dd184c2 ("jump_label: Warn on failed jump_label patching attempt")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: "David S . Miller" <davem@davemloft.net>
Link: https://lkml.kernel.org/r/c966fed42be6611254a62d46579ec7416548d572.1521041026.git.jpoimboe@redhat.com
Found this by accident.
There are no usages of bare cancel_work() in current kernel source.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Never directly free @dev after calling device_register(), even
if it returned an error! Always use put_device() to give up the
reference initialized in this function instead.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Problem and motivation: Once a breakpoint perf event (PERF_TYPE_BREAKPOINT)
is created, there is no flexibility to change the breakpoint type
(bp_type), breakpoint address (bp_addr), or breakpoint length (bp_len). The
only option is to close the perf event and configure a new breakpoint
event. This inflexibility has a significant performance overhead. For
example, sampling-based, lightweight performance profilers (and also
concurrency bug detection tools), monitor different addresses for a short
duration using PERF_TYPE_BREAKPOINT and change the address (bp_addr) to
another address or change the kind of breakpoint (bp_type) from "write" to
a "read" or vice-versa or change the length (bp_len) of the address being
monitored. The cost of these modifications is prohibitive since it involves
unmapping the circular buffer associated with the perf event, closing the
perf event, opening another perf event and mmaping another circular buffer.
Solution: The new ioctl flag for perf events,
PERF_EVENT_IOC_MODIFY_ATTRIBUTES, introduced in this patch takes a pointer
to a struct perf_event_attr as an argument to update an old breakpoint
event with new address, type, and size. This facility allows retaining a
previous mmaped perf events ring buffer and avoids having to close and
reopen another perf event.
This patch supports only changing PERF_TYPE_BREAKPOINT event type; future
implementations can extend this feature. The patch replicates some of its
functionality of modify_user_hw_breakpoint() in
kernel/events/hw_breakpoint.c. modify_user_hw_breakpoint cannot be called
directly since perf_event_ctx_lock() is already held in _perf_ioctl().
Evidence: Experiments show that the baseline (not able to modify an already
created breakpoint) costs an order of magnitude (~10x) more than the
suggested optimization (having the ability to dynamically modifying a
configured breakpoint via ioctl). When the breakpoints typically do not
trap, the speedup due to the suggested optimization is ~10x; even when the
breakpoints always trap, the speedup is ~4x due to the suggested
optimization.
Testing: tests posted at
https://github.com/linux-contrib/perf_event_modify_bp demonstrate the
performance significance of this patch. Tests also check the functional
correctness of the patch.
Signed-off-by: Milind Chabbi <chabbi.milind@gmail.com>
[ Using modify_user_hw_breakpoint_check function. ]
[ Reformated PERF_EVENT_IOC_*, so the values are all in one column. ]
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <onestero@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20180312134548.31532-8-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Unify the "boot" and "mono" tracing clocks and document the new behaviour.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Easton <kevin@guarana.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20180301165150.489635255@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that th MONOTONIC and BOOTTIME clocks are indentical remove all the special
casing.
The user space visible interfaces still support both clocks, but their behavior
is identical.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Easton <kevin@guarana.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20180301165150.410218515@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that the MONOTONIC and BOOTTIME clocks are indentical remove all the special
casing.
The user space visible interfaces still support both clocks, but their behavior
is identical.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Easton <kevin@guarana.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20180301165150.315745557@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that the MONOTONIC and BOOTTIME clocks are the same, remove all the
special handling from timekeeping. Keep wrappers for the existing users of
the *boot* timekeeper interfaces.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Easton <kevin@guarana.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20180301165150.236279497@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The MONOTONIC clock is not fast forwarded by the time spent in suspend on
resume. This is only done for the BOOTTIME clock. The reason why the
MONOTONIC clock is not forwarded is historical: the original Linux
implementation was using jiffies as a base for the MONOTONIC clock and
jiffies have never been advanced after resume.
At some point when timekeeping was unified in the core code, the
MONONOTIC clock was advanced after resume which also advanced jiffies causing
interesting side effects. As a consequence the the MONOTONIC clock forwarding
was disabled again and the BOOTTIME clock was introduced, which allows to read
time since boot.
Back then it was not possible to completely distangle the MONOTONIC clock and
jiffies because there were still interfaces which exposed the MONOTONIC clock
behaviour based on the timer wheel and therefore jiffies.
As of today none of the MONOTONIC clock facilities depends on jiffies
anymore so the forwarding can be done seperately. This is achieved by
forwarding the variables which are used for the jiffies update after resume
before the tick is restarted,
In timekeeping resume, the change is rather simple. Instead of updating the
offset between the MONOTONIC clock and the REALTIME/BOOTTIME clocks, advance the
time keeper base for the MONOTONIC and the MONOTONIC_RAW clocks by the time
spent in suspend.
The MONOTONIC clock is now the same as the BOOTTIME clock and the offset between
the REALTIME and the MONOTONIC clocks is the same as before suspend.
There might be side effects in applications, which rely on the
(unfortunately) well documented behaviour of the MONOTONIC clock, but the
downsides of the existing behaviour are probably worse.
There is one obvious issue. Up to now it was possible to retrieve the time
spent in suspend by observing the delta between the MONOTONIC clock and the
BOOTTIME clock. This is not longer available, but the previously introduced
mechanism to read the active non-suspended monotonic time can mitigate that
in a detectable fashion.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Easton <kevin@guarana.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20180301165150.062975504@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The planned change to unify the behaviour of the MONOTONIC and BOOTTIME
clocks vs. suspend removes the ability to retrieve the active
non-suspended time of a system.
Provide a new CLOCK_MONOTONIC_ACTIVE clock which returns the active
non-suspended time of the system via clock_gettime().
This preserves the old behaviour of CLOCK_MONOTONIC before the
BOOTTIME/MONOTONIC unification.
This new clock also allows applications to detect programmatically that
the MONOTONIC and BOOTTIME clocks are identical.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Easton <kevin@guarana.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20180301165149.965235774@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move the sample_max_stack check and setup into perf_copy_attr(),
so we have all perf_event_attr initial setup in one place
and can easily compare attrs in the new ioctl introduced
in following change.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Milind Chabbi <chabbi.milind@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <onestero@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20180312134548.31532-7-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
And rename it to modify_user_hw_breakpoint_check().
We are about to use modify_user_hw_breakpoint_check() for user space
breakpoints modification, we must be very strict to check only the
fields we can change have changed. As Peter explained:
"Suppose someone does:
attr = malloc(sizeof(*attr)); // uninitialized memory
attr->type = BP;
attr->bp_addr = new_addr;
attr->bp_type = bp_type;
attr->bp_len = bp_len;
ioctl(fd, PERF_IOC_MOD_ATTR, &attr);
And feeds absolute shite for the rest of the fields.
Then we later want to extend IOC_MOD_ATTR to allow changing
attr::sample_type but we can't, because that would break the
above application."
I'm making this check optional because we already export
modify_user_hw_breakpoint() and with this check we could
break existing users.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Milind Chabbi <chabbi.milind@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <onestero@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20180312134548.31532-6-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Moving out all the functionality without the events
disabling/enabling calls, because we want to call another
disabling/enabling functions in following change.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Milind Chabbi <chabbi.milind@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <onestero@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20180312134548.31532-5-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add the modify_bp_slot() function to keep slot numbers
correct when changing the breakpoint type.
Using existing __release_bp_slot()/__reserve_bp_slot()
call sequence to update the slot counts.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Milind Chabbi <chabbi.milind@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <onestero@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20180312134548.31532-4-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Passing bp_type argument to __reserve_bp_slot() and __release_bp_slot()
functions, so we can pass another bp_type than the one defined in
bp->attr.bp_type. This will be handy in following change that fixes
breakpoint slot counts during its modification.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Milind Chabbi <chabbi.milind@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <onestero@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20180312134548.31532-3-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pass bp_type directly as a find_slot_idx() argument,
so we don't need to have whole event to get the
breakpoint slot type. It will be used in following
changes.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Milind Chabbi <chabbi.milind@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <onestero@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20180312134548.31532-2-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since the kprobe which was optimized by jump can not change
the execution path, the kprobe for error-injection must not
be optimized. To prohibit it, set a dummy post-handler as
officially stated in Documentation/kprobes.txt.
Fixes: 4b1a29a7f5 ("error-injection: Support fault injection framework")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
There's two problems when installing cgroup events on CPUs: firstly
list_update_cgroup_event() only tries to set cpuctx->cgrp for the
first event, if that mismatches on @cgrp we'll not try again for later
additions.
Secondly, when we install a cgroup event into an active context, only
issue an event reprogram when the event matches the current cgroup
context. This avoids a pointless event reprogramming.
Signed-off-by: leilei.lin <leilei.lin@alibaba-inc.com>
[ Improved the changelog and comments. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: brendan.d.gregg@gmail.com
Cc: eranian@gmail.com
Cc: linux-kernel@vger.kernel.org
Cc: yang_oliver@hotmail.com
Link: http://lkml.kernel.org/r/20180306093637.28247-1-linxiulei@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The event schedule order (as per perf_event_sched_in()) is:
- cpu pinned
- task pinned
- cpu flexible
- task flexible
But perf_rotate_context() will unschedule cpu-flexible even if it
doesn't need a rotation.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Similar to how first programming cpu=-1 and then cpu=# is wrong, so is
rotating both. It was especially wrong when we were still programming
the PMU in this same order, because in that scenario we might never
actually end up running cpu=# events at all.
Cure this by using the active_list to pick the rotation event; since
at programming we already select the left-most event.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The last argument is, and always must be, the same.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When an event group contains more events than can be scheduled on the
hardware, iterating the full event group for ctx_sched_out is a waste
of time.
Keep track of the events that got programmed on the hardware, such
that we can iterate this smaller list in order to schedule them out.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that all the grouping is done with RB trees, we no longer need
group_entry and can replace the whole thing with sibling_list.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Scheduling in events with cpu=-1 before events with cpu=# changes
semantics and is undesirable in that it would priorize these events.
Given that groups->index is across all groups we actually have an
inter-group ordering, meaning we can merge-sort two groups, which is
just what we need to preserve semantics.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change event groups into RB trees sorted by CPU and then by a 64bit
index, so that multiplexing hrtimer interrupt handler would be able
skipping to the current CPU's list and ignore groups allocated for the
other CPUs.
New API for manipulating event groups in the trees is implemented as well
as adoption on the API in the current implementation.
pinned_group_sched_in() and flexible_group_sched_in() API are
introduced to consolidate code enabling the whole group from pinned
and flexible groups appropriately.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/372f9c8b-0cfe-4240-e44d-83d863d40813@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mark reported his arm64 perf fuzzer runs sometimes splat like:
armv8pmu_read_counter+0x1e8/0x2d8
armpmu_event_update+0x8c/0x188
armpmu_read+0xc/0x18
perf_output_read+0x550/0x11e8
perf_event_read_event+0x1d0/0x248
perf_event_exit_task+0x468/0xbb8
do_exit+0x690/0x1310
do_group_exit+0xd0/0x2b0
get_signal+0x2e8/0x17a8
do_signal+0x144/0x4f8
do_notify_resume+0x148/0x1e8
work_pending+0x8/0x14
which asserts that we only call pmu::read() on ACTIVE events.
The above callchain does:
perf_event_exit_task()
perf_event_exit_task_context()
task_ctx_sched_out() // INACTIVE
perf_event_exit_event()
perf_event_set_state(EXIT) // EXIT
sync_child_event()
perf_event_read_event()
perf_output_read()
perf_output_read_group()
leader->pmu->read()
Which results in doing a pmu::read() on an !ACTIVE event.
I _think_ this is 'new' since we added attr.inherit_stat, which added
the perf_event_read_event() to the exit path, without that
perf_event_read_output() would only trigger from samples and for
@event to trigger a sample, it's leader _must_ be ACTIVE too.
Still, adding this check makes it consistent with the @sub case for
the siblings.
Reported-and-Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull perf updates from Thomas Gleixner:
"Another set of perf updates:
- Fix a Skylake Uncore event format declaration
- Prevent perf pipe mode from crahsing which was caused by a missing
buffer allocation
- Make the perf top popup message which tells the user that it uses
fallback mode on older kernels a debug message.
- Make perf context rescheduling work correcctly
- Robustify the jump error drawing in perf browser mode so it does
not try to create references to NULL initialized offset entries
- Make trigger_on() robust so it does not enable the trigger before
everything is set up correctly to handle it
- Make perf auxtrace respect the --no-itrace option so it does not
try to queue AUX data for decoding.
- Prevent having different number of field separators in CVS output
lines when a counter is not supported.
- Make the perf kallsyms man page usage behave like it does for all
other perf commands.
- Synchronize the kernel headers"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix ctx_event_type in ctx_resched()
perf tools: Fix trigger class trigger_on()
perf auxtrace: Prevent decoding when --no-itrace
perf stat: Fix CVS output format for non-supported counters
tools headers: Sync x86's cpufeatures.h
tools headers: Sync copy of kvm UAPI headers
perf record: Fix crash in pipe mode
perf annotate browser: Be more robust when drawing jump arrows
perf top: Fix annoying fallback message on older kernels
perf kallsyms: Fix the usage on the man page
perf/x86/intel/uncore: Fix Skylake UPI event format
Pull locking fix from Thomas Gleixner:
"rt_mutex_futex_unlock() grew a new irq-off call site, but the function
assumes that its always called from irq enabled context.
Use (un)lock_irqsafe() to handle the new call site correctly"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rtmutex: Make rt_mutex_futex_unlock() safe for irq-off callsites
Pull RCU updates from Paul E. McKenney:
- Miscellaneous fixes, perhaps most notably removing obsolete
code whose only purpose in life was to gather information for
the now-removed RCU debugfs facility. Other notable changes
include removing NO_HZ_FULL_ALL in favor of the nohz_full kernel
boot parameter, minor optimizations for expedited grace periods,
some added tracing, creating an RCU-specific workqueue using Tejun's
new WQ_MEM_RECLAIM flag, and several cleanups to code and comments.
- SRCU cleanups and optimizations.
- Torture-test updates, perhaps most notably the adding of ARMv8
support, but also including numerous cleanups and usability fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The replace_filter_string() frees the current string and then copies a given
string. But in the two locations that it was used, the allocation happened
right after the filter was allocated (nothing to replace). There's no need
for this to be a helper function. Embedding the allocation in the two places
where it was called will make changing the code in the future easier.
Also make the variable consistent (always use "filter_string" as the name,
as it was used in one instance as "filter_str")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
replace_system_preds() creates a filter list to free even when it doesn't
really need to have it. Only save filters that require synchronize_sched()
in the filter list to free. This will allow the code to be updated a bit
easier in the future.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The __alloc_filter() function does nothing more that allocate the filter.
There's no reason to have it as a helper function.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>