The usefulness of having a standard way of testing syscall performance
has come up from time to time[0]. Furthermore, some of our testing
machinery (such as 'mmtests') already makes use of a simplified version
of the microbenchmark. This patch mainly takes the same idea to measure
syscall throughput compatible with 'perf-bench' via getppid(2), yet
without any of the additional template stuff from Ingo's version (based
on numa.c). The code is identical to what mmtests uses.
[0] https://lore.kernel.org/lkml/20160201074156.GA27156@gmail.com/
Committer notes:
Add mising stdlib.h and unistd.h to get the prototypes for exit() and
getppid().
Committer testing:
$ perf bench
Usage:
perf bench [<common options>] <collection> <benchmark> [<options>]
# List of all available benchmark collections:
sched: Scheduler and IPC benchmarks
syscall: System call benchmarks
mem: Memory access benchmarks
numa: NUMA scheduling and MM benchmarks
futex: Futex stressing benchmarks
epoll: Epoll stressing benchmarks
internals: Perf-internals benchmarks
all: All benchmarks
$
$ perf bench syscall
# List of available benchmarks for collection 'syscall':
basic: Benchmark for basic getppid(2) calls
all: Run all syscall benchmarks
$ perf bench syscall basic
# Running 'syscall/basic' benchmark:
# Executed 10000000 getppid() calls
Total time: 3.679 [sec]
0.367957 usecs/op
2717708 ops/sec
$ perf bench syscall all
# Running syscall/basic benchmark...
# Executed 10000000 getppid() calls
Total time: 3.644 [sec]
0.364456 usecs/op
2743815 ops/sec
$
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20190308181747.l36zqz2avtivrr3c@linux-r8p5
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Event synthesis may occur at the start or end (tail) of a perf command.
In system-wide mode it can scan every process in /proc, which may add
seconds of latency before event recording. Add a new benchmark that
times how long event synthesis takes with and without data synthesis.
An example execution looks like:
$ perf bench internals synthesize
# Running 'internals/synthesize' benchmark:
Average synthesis took: 168.253800 usec
Average data synthesis took: 208.104700 usec
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrey Zhizhikin <andrey.z@gmail.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200402154357.107873-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This program benchmarks concurrent epoll_wait(2) for file descriptors
that are monitored with with EPOLLIN along various semantics, by a
single epoll instance. Such conditions can be found when using
single/combined or multiple queuing when load balancing.
Each thread has a number of private, nonblocking file descriptors,
referred to as fdmap. A writer thread will constantly be writing to the
fdmaps of all threads, minimizing each threads's chances of epoll_wait
not finding any ready read events and blocking as this is not what we
want to stress. Full details in the start of the C file.
Committer testing:
# perf bench
Usage:
perf bench [<common options>] <collection> <benchmark> [<options>]
# List of all available benchmark collections:
sched: Scheduler and IPC benchmarks
mem: Memory access benchmarks
numa: NUMA scheduling and MM benchmarks
futex: Futex stressing benchmarks
epoll: Epoll stressing benchmarks
all: All benchmarks
# perf bench epoll
# List of available benchmarks for collection 'epoll':
wait: Benchmark epoll concurrent epoll_waits
all: Run all futex benchmarks
# perf bench epoll wait
# Running 'epoll/wait' benchmark:
Run summary [PID 19295]: 3 threads monitoring on 64 file-descriptors for 8 secs.
[thread 0] fdmap: 0xdaa650 ... 0xdaa74c [ 328241 ops/sec ]
[thread 1] fdmap: 0xdaa900 ... 0xdaa9fc [ 351695 ops/sec ]
[thread 2] fdmap: 0xdaabb0 ... 0xdaacac [ 381423 ops/sec ]
Averaged 353786 operations/sec (+- 4.35%), total secs = 8
#
Committer notes:
Fix the build on debian:experimental-x-mips, debian:experimental-x-mipsel
and others:
CC /tmp/build/perf/bench/epoll-wait.o
bench/epoll-wait.c: In function 'writerfn':
bench/epoll-wait.c:399:12: error: format '%ld' expects argument of type 'long int', but argument 2 has type 'size_t' {aka 'unsigned int'} [-Werror=format=]
printinfo("exiting writer-thread (total full-loops: %ld)\n", iter);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~
bench/epoll-wait.c:86:31: note: in definition of macro 'printinfo'
do { if (__verbose) { printf(fmt, ## arg); fflush(stdout); } } while (0)
^~~
cc1: all warnings being treated as errors
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com> <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/20181106152226.20883-2-dave@stgolabs.net
Link: http://lkml.kernel.org/r/20181106182349.thdkpvshkna5vd7o@linux-r8p5>
[ Applied above fixup as per Davidlohr's request ]
[ Use inttypes.h to print rlim_t fields, fixing the build on Alpine Linux / musl libc ]
[ Check if eventfd() is available, i.e. if HAVE_EVENTFD is defined ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To cope with the changes in:
12c89130a5 ("x86/asm/memcpy_mcsafe: Add write-protection-fault handling")
60622d6822 ("x86/asm/memcpy_mcsafe: Return bytes remaining")
bd131544aa ("x86/asm/memcpy_mcsafe: Add labels for __memcpy_mcsafe() write fault handling")
da7bc9c57e ("x86/asm/memcpy_mcsafe: Remove loop unrolling")
This needed introducing a file with a copy of the mcsafe_handle_tail()
function, that is used in the new memcpy_64.S file, as well as a dummy
mcsafe_test.h header.
Testing it:
$ nm ~/bin/perf | grep mcsafe
0000000000484130 T mcsafe_handle_tail
0000000000484300 T __memcpy_mcsafe
$
$ perf bench mem memcpy
# Running 'mem/memcpy' benchmark:
# function 'default' (Default memcpy() provided by glibc)
# Copying 1MB bytes ...
44.389205 GB/sec
# function 'x86-64-unrolled' (unrolled memcpy() in arch/x86/lib/memcpy_64.S)
# Copying 1MB bytes ...
22.710756 GB/sec
# function 'x86-64-movsq' (movsq-based memcpy() in arch/x86/lib/memcpy_64.S)
# Copying 1MB bytes ...
42.459239 GB/sec
# function 'x86-64-movsb' (movsb-based memcpy() in arch/x86/lib/memcpy_64.S)
# Copying 1MB bytes ...
42.459239 GB/sec
$
This silences this perf tools build warning:
Warning: Kernel ABI header at 'tools/arch/x86/lib/memcpy_64.S' differs from latest version at 'arch/x86/lib/memcpy_64.S'
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-igdpciheradk3gb3qqal52d0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So mem-memcpy.c started out as a simple memcpy() benchmark, then it grew
memset() functionality and now I plan to add string copy benchmarks as
well.
This makes the file name a misnomer: rename it to the more generic
mem-functions.c name.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-5-git-send-email-mingo@kernel.org
[ The "rename" was introducing __unused, wasn't removing the old file,
and didn't update tools/perf/bench/Build, fix it ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Allows a way of measuring low level kernel implementation of FUTEX_LOCK_PI and
FUTEX_UNLOCK_PI.
The program comes in two flavors:
(i) single futex (default), all threads contend on the same uaddr. For the
sake of the benchmark, we call into kernel space even when the lock is
uncontended. The kernel will set it to TID, any waters that come in and
contend for the pi futex will be handled respectively by the kernel.
(ii) -M option for multiple futexes, each thread deals with its own futex. This
is a trivial scenario and only measures kernel handling of 0->TID transition.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Mel Gorman <mgorman@suse.de>
Link: http://lkml.kernel.org/r/1436259353.12255.78.camel@stgolabs.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The futex-wake benchmark only measures wakeups done within a single
process. While this has value in its own, it does not really generate
any hb->lock contention.
A new benchmark 'wake-parallel' is added, by extending the futex-wake
code such that we can measure parallel waker threads. The program output
shows the avg per-thread latency in order to complete its share of
wakeups:
Run summary [PID 13474]: blocking on 512 threads (at [private] futex 0xa88668), 8 threads waking up 64 at a time.
[Run 1]: Avg per-thread latency (waking 64/512 threads) in 0.6230 ms (+-15.31%)
[Run 2]: Avg per-thread latency (waking 64/512 threads) in 0.5175 ms (+-29.95%)
[Run 3]: Avg per-thread latency (waking 64/512 threads) in 0.7578 ms (+-18.03%)
[Run 4]: Avg per-thread latency (waking 64/512 threads) in 0.8944 ms (+-12.54%)
[Run 5]: Avg per-thread latency (waking 64/512 threads) in 1.1204 ms (+-23.85%)
Avg per-thread latency (waking 64/512 threads) in 0.7826 ms (+-9.91%)
Naturally, different combinations of numbers of blocking and waker
threads will exhibit different information.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Link: http://lkml.kernel.org/r/1431110280-20231-1-git-send-email-dave@stgolabs.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>