Perf tool sets exclude_guest by default while calling perf_event_open(). Because IBS does not have filtering capability, it always gets rejected by IBS PMU driver and thus perf falls back to non-precise sampling. Fix it by not setting exclude_guest by default on AMD. Before: $ sudo ./perf record -C 0 -vvv true |& grep precise precise_ip 3 decreasing precise_ip by one (2) precise_ip 2 decreasing precise_ip by one (1) precise_ip 1 decreasing precise_ip by one (0) After: $ sudo ./perf record -C 0 -vvv true |& grep precise precise_ip 3 decreasing precise_ip by one (2) precise_ip 2 Committer notes: Fixup init to zero for perf_env in older compilers: arch/x86/util/evsel.c:15:26: error: missing field 'os_release' initializer [-Werror,-Wmissing-field-initializers] struct perf_env env = {0}; ^ Committer notes: Namhyung remarked: It'd be nice if it can cover explicit "-e cycles:pp" as well. Ravi clarified: For explicit :pp modifier, evsel->precise_max does not get set and thus perf does not try with different attr->precise_ip values while exclude_guest set. So no issue with explicit :pp: $ sudo ./perf record -C 0 -e cycles:pp -vvv |& grep "precise_ip\|exclude_guest" precise_ip 2 exclude_guest 1 precise_ip 2 exclude_guest 1 switching off exclude_guest, exclude_host precise_ip 2 ^C Also, with :P modifier, evsel->precise_max gets set but exclude_guest does not and thus :P also works fine: $ sudo ./perf record -C 0 -e cycles:P -vvv |& grep "precise_ip\|exclude_guest" precise_ip 3 decreasing precise_ip by one (2) precise_ip 2 ^C Reported-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20211103072112.32312-1-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
31 lines
813 B
C
31 lines
813 B
C
// SPDX-License-Identifier: GPL-2.0
|
|
#include <stdio.h>
|
|
#include <stdlib.h>
|
|
#include "util/evsel.h"
|
|
#include "util/env.h"
|
|
#include "linux/string.h"
|
|
|
|
void arch_evsel__set_sample_weight(struct evsel *evsel)
|
|
{
|
|
evsel__set_sample_bit(evsel, WEIGHT_STRUCT);
|
|
}
|
|
|
|
void arch_evsel__fixup_new_cycles(struct perf_event_attr *attr)
|
|
{
|
|
struct perf_env env = { .total_mem = 0, } ;
|
|
|
|
if (!perf_env__cpuid(&env))
|
|
return;
|
|
|
|
/*
|
|
* On AMD, precise cycles event sampling internally uses IBS pmu.
|
|
* But IBS does not have filtering capabilities and perf by default
|
|
* sets exclude_guest = 1. This makes IBS pmu event init fail and
|
|
* thus perf ends up doing non-precise sampling. Avoid it by clearing
|
|
* exclude_guest.
|
|
*/
|
|
if (env.cpuid && strstarts(env.cpuid, "AuthenticAMD"))
|
|
attr->exclude_guest = 0;
|
|
|
|
free(env.cpuid);
|
|
}
|