1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

3 commits

Author SHA1 Message Date
Ian Rogers
94dbfd6781 perf parse-events: Architecture specific leader override
Currently topdown events must appear after a slots event:

  $ perf stat -e '{slots,topdown-fe-bound}' /bin/true

   Performance counter stats for '/bin/true':

         3,183,090      slots
           986,133      topdown-fe-bound

Reversing the events yields:

  $ perf stat -e '{topdown-fe-bound,slots}' /bin/true
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (topdown-fe-bound).

For metrics the order of events is determined by iterating over a
hashmap, and so slots isn't guaranteed to be first which can yield this
error.

Change the set_leader in parse-events, called when a group is closed, so
that rather than always making the first event the leader, if the slots
event exists then it is made the leader. It is then moved to the head of
the evlist otherwise it won't be opened in the correct order.

The result is:

  $ perf stat -e '{topdown-fe-bound,slots}' /bin/true

   Performance counter stats for '/bin/true':

         3,274,795      slots
         1,001,702      topdown-fe-bound

A problem with this approach is the slots event is identified by name,
names can be overwritten like 'cpu/slots,name=foo/' and this causes the
leader change to fail.

The change also modifies and fixes mixed groups like, with the change:

  $ perf stat -e '{instructions,slots,topdown-fe-bound}' -a -- sleep 2

   Performance counter stats for 'system wide':

        5574985410      slots
         971981616      instructions
        1348461887      topdown-fe-bound

       2.001263120 seconds time elapsed

Without the change:

  $ perf stat -e '{instructions,slots,topdown-fe-bound}' -a -- sleep 2

   Performance counter stats for 'system wide':

     <not counted>      instructions
     <not counted>      slots
   <not supported>      topdown-fe-bound

       2.006247990 seconds time elapsed

Something that may be undesirable here is that the events are reordered
in the output.

Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vineet Singh <vineet.singh@intel.com>
Link: http://lore.kernel.org/lkml/20211130174945.247604-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-07 22:18:24 -03:00
Kan Liang
5f148e7c6a perf stat: Add Topdown metrics L2 events as default events
The Topdown Microarchitecture Analysis (TMA) Method is a structured
analysis methodology to identify critical performance bottlenecks in
out-of-order processors.

The Topdown metrics L1 event was added as default in 42641d6f4d
("perf stat: Add Topdown metrics events as default events")

From the Sapphire Rapids server and later platforms, the same dedicated
"metrics" register is extended to support both L1 and L2 events.

Add both L1 and L2 Topdown metrics events as default to enrich the
default measuring information if the new measurement register is
available.

On legacy systems there is no change to avoid extra multiplexing.

The topdown_level indicates the max metrics level for the top-down
statistics. Set it to 2 to display all L1 and L2 Topdown metrics events.

With the patch:

  $ perf stat sleep 1

  Performance counter stats for 'sleep 1':

           0.59 msec task-clock             #   0.001 CPUs utilized
              1      context-switches       #   1.687 K/sec
              0      cpu-migrations         #   0.000 /sec
             76      page-faults            # 128.198 K/sec
      1,405,318      cycles                 #   2.371 GHz
      1,471,136      instructions           #   1.05  insn per cycle
        310,132      branches               # 523.136 M/sec
         10,435      branch-misses          #   3.36% of all branches
      8,431,908      slots                  #  14.223 G/sec
      1,554,116      topdown-retiring       #    18.4% retiring
      1,289,585      topdown-bad-spec       #    15.2% bad speculation
      2,810,636      topdown-fe-bound       #    33.2% frontend bound
      2,810,636      topdown-be-bound       #    33.2% backend bound
        231,464      topdown-heavy-ops      #     2.7% heavy operations   #  15.6% light operations
      1,223,453      topdown-br-mispredict  #    14.5% branch mispredict  #   0.8% machine clears
      1,884,779      topdown-fetch-lat      #    22.3% fetch latency      #  10.9% fetch bandwidth
      1,454,917      topdown-mem-bound      #    17.2% memory bound       #  16.0% Core bound

    1.001179699 seconds time elapsed

    0.000000000 seconds user
    0.001238000 seconds sys

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/1625760169-18396-1-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09 14:04:32 -03:00
Kan Liang
42641d6f4d perf stat: Add Topdown metrics events as default events
The Topdown Microarchitecture Analysis (TMA) Method is a structured
analysis methodology to identify critical performance bottlenecks in
out-of-order processors. From the Ice Lake and later platforms, the
Topdown information can be retrieved from the dedicated "metrics"
register, which isn't impacted by other events. Also, the Topdown
metrics support both per thread/process and per core measuring.  Adding
Topdown metrics events as default events can enrich the default
measuring information, and would not cost any extra multiplexing.

Introduce arch_evlist__add_default_attrs() to allow architecture
specific default events. Add the Topdown metrics events in the X86
specific arch_evlist__add_default_attrs(). Other architectures can add
their own default events later separately.

With the patch:

 $ perf stat sleep 1

 Performance counter stats for 'sleep 1':

           0.82 msec task-clock:u              #    0.001 CPUs utilized
              0      context-switches:u        #    0.000 K/sec
              0      cpu-migrations:u          #    0.000 K/sec
             61      page-faults:u             #    0.074 M/sec
        319,941      cycles:u                  #    0.388 GHz
        242,802      instructions:u            #    0.76  insn per cycle
         54,380      branches:u                #   66.028 M/sec
          4,043      branch-misses:u           #    7.43% of all branches
      1,585,555      slots:u                   # 1925.189 M/sec
        238,941      topdown-retiring:u        #     15.0% retiring
        410,378      topdown-bad-spec:u        #     25.8% bad speculation
        634,222      topdown-fe-bound:u        #     39.9% frontend bound
        304,675      topdown-be-bound:u        #     19.2% backend bound

       1.001791625 seconds time elapsed

       0.000000000 seconds user
       0.001572000 seconds sys

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20210121133752.118327-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03 13:10:43 -03:00