1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
linux/tools/perf/tests/shell/lib/perf_json_output_lint.py
Yicong Yang cbc917a1b0 perf stat: Support per-cluster aggregation
Some platforms have 'cluster' topology and CPUs in the cluster will
share resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2
cache (for Intel Jacobsville). Currently parsing and building cluster
topology have been supported since [1].

perf stat has already supported aggregation for other topologies like
die or socket, etc. It'll be useful to aggregate per-cluster to find
problems like L3T bandwidth contention.

This patch add support for "--per-cluster" option for per-cluster
aggregation. Also update the docs and related test. The output will
be like:

[root@localhost tmp]# perf stat -a -e LLC-load --per-cluster -- sleep 5

 Performance counter stats for 'system wide':

S56-D0-CLS158    4      1,321,521,570      LLC-load
S56-D0-CLS594    4        794,211,453      LLC-load
S56-D0-CLS1030    4             41,623      LLC-load
S56-D0-CLS1466    4             41,646      LLC-load
S56-D0-CLS1902    4             16,863      LLC-load
S56-D0-CLS2338    4             15,721      LLC-load
S56-D0-CLS2774    4             22,671      LLC-load
[...]

On a legacy system without cluster or cluster support, the output will
be look like:
[root@localhost perf]# perf stat -a -e cycles --per-cluster -- sleep 1

 Performance counter stats for 'system wide':

S56-D0-CLS0   64         18,011,485      cycles
S7182-D0-CLS0   64         16,548,835      cycles

Note that this patch doesn't mix the cluster information in the outputs
of --per-core to avoid breaking any tools/scripts using it.

Note that perf recently supports "--per-cache" aggregation, but it's not
the same with the cluster although cluster CPUs may share some cache
resources. For example on my machine all clusters within a die share the
same L3 cache:
$ cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list
0-31
$ cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list
0-3

[1] commit c5e22feffd ("topology: Represent clusters of CPUs within a die")

Tested-by: Jie Zhan <zhanjie9@hisilicon.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Cc: james.clark@arm.com
Cc: 21cnbao@gmail.com
Cc: prime.zeng@hisilicon.com
Cc: Jonathan.Cameron@huawei.com
Cc: fanghao11@huawei.com
Cc: linuxarm@huawei.com
Cc: tim.c.chen@intel.com
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240208024026.2691-1-yangyicong@huawei.com
2024-02-09 14:59:53 -08:00

101 lines
3.4 KiB
Python

#!/usr/bin/python
# SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
# Basic sanity check of perf JSON output as specified in the man page.
import argparse
import sys
import json
ap = argparse.ArgumentParser()
ap.add_argument('--no-args', action='store_true')
ap.add_argument('--interval', action='store_true')
ap.add_argument('--system-wide-no-aggr', action='store_true')
ap.add_argument('--system-wide', action='store_true')
ap.add_argument('--event', action='store_true')
ap.add_argument('--per-core', action='store_true')
ap.add_argument('--per-thread', action='store_true')
ap.add_argument('--per-cache', action='store_true')
ap.add_argument('--per-cluster', action='store_true')
ap.add_argument('--per-die', action='store_true')
ap.add_argument('--per-node', action='store_true')
ap.add_argument('--per-socket', action='store_true')
ap.add_argument('--file', type=argparse.FileType('r'), default=sys.stdin)
args = ap.parse_args()
Lines = args.file.readlines()
def isfloat(num):
try:
float(num)
return True
except ValueError:
return False
def isint(num):
try:
int(num)
return True
except ValueError:
return False
def is_counter_value(num):
return isfloat(num) or num == '<not counted>' or num == '<not supported>'
def check_json_output(expected_items):
checks = {
'aggregate-number': lambda x: isfloat(x),
'core': lambda x: True,
'counter-value': lambda x: is_counter_value(x),
'cgroup': lambda x: True,
'cpu': lambda x: isint(x),
'cache': lambda x: True,
'cluster': lambda x: True,
'die': lambda x: True,
'event': lambda x: True,
'event-runtime': lambda x: isfloat(x),
'interval': lambda x: isfloat(x),
'metric-unit': lambda x: True,
'metric-value': lambda x: isfloat(x),
'metricgroup': lambda x: True,
'node': lambda x: True,
'pcnt-running': lambda x: isfloat(x),
'socket': lambda x: True,
'thread': lambda x: True,
'unit': lambda x: True,
}
input = '[\n' + ','.join(Lines) + '\n]'
for item in json.loads(input):
if expected_items != -1:
count = len(item)
if count != expected_items and count >= 1 and count <= 6 and 'metric-value' in item:
# Events that generate >1 metric may have isolated metric
# values and possibly other prefixes like interval, core,
# aggregate-number, or event-runtime/pcnt-running from multiplexing.
pass
elif count != expected_items and count >= 1 and count <= 5 and 'metricgroup' in item:
pass
elif count != expected_items:
raise RuntimeError(f'wrong number of fields. counted {count} expected {expected_items}'
f' in \'{item}\'')
for key, value in item.items():
if key not in checks:
raise RuntimeError(f'Unexpected key: key={key} value={value}')
if not checks[key](value):
raise RuntimeError(f'Check failed for: key={key} value={value}')
try:
if args.no_args or args.system_wide or args.event:
expected_items = 7
elif args.interval or args.per_thread or args.system_wide_no_aggr:
expected_items = 8
elif args.per_core or args.per_socket or args.per_node or args.per_die or args.per_cluster or args.per_cache:
expected_items = 9
else:
# If no option is specified, don't check the number of items.
expected_items = -1
check_json_output(expected_items)
except:
print('Test failed for input:\n' + '\n'.join(Lines))
raise