The new generation of CPUs have granular control at a cluster level.
Each package/die can have multiple power domains, which further can
have multiple fabric clusters. The TPMI interface allows control at
fabric cluster level.
Use the updated uncore sysfs feature to expose controls at cluster
level. At each cluster level there is a control for maximum and minimum
uncore frequency. Also present current uncore frequency at a cluster
level.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wendy Wang <wendy.wang@intel.com>
Link: https://lore.kernel.org/r/20230418171340.681662-4-srinivas.pandruvada@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
An SoC can contain multiple power domains with individual or collection
of mesh partitions. This partition is called fabric cluster.
Certain type of meshes will need to run at the same frequency, they will
be placed in the same fabric cluster. Benefit of fabric cluster is that
it offers a scalable mechanism to deal with partitioned fabrics in a SoC.
The current sysfs interface supports control at package and die level.
This interface is not enough to support more granular control at
fabric cluster level.
SoCs with the support of TPMI (Topology Aware Register and PM Capsule
Interface), can have multiple power domains. Each power domain can
contain one or more fabric clusters.
To support such granular controls, enhance uncore common to optionally
create new directories to provide controls at fabric cluster level. It
is also important to have flexibility to change granularity for future
version of SoCs. If the directory name contains scope like:
"package_*_die_*_power_domain_*_cluster_*", then this is not expandable.
The cpufreq policies also have different scopes. There the scope of the
policy (affected_cpus) specified by attributes inside each policy.
So, follow the same model for uncore frequency scaling sysfs as:
"sys/devices/system/cpu/cpufreq/policy*"
Allow client drivers to optionally support granular control for each
fabric cluster. Here, the directory name will be "uncore" suffixed with
an unique instance number. For example: uncore00, uncore01 etc.
Attributes in the directory identify package id, power domain and
fabric cluster id. This interface is expandable even if some new level
of granularity is introduced. A new sysfs attribute can identify new
level.
For compatibility with the existing sysfs and provide easy way to set
limits for each fabric cluster in the package/die, the existing control
at package/die levels are still provided. For majority of users, this is
an easy approach.
For example: On a single package/die system, with three power domains
and one fabric cluster per power domain:
$tree -L 2 /sys/devices/system/cpu/intel_uncore_frequency/
/sys/devices/system/cpu/intel_uncore_frequency/
├── package_00_die_00
│ ├── current_freq_khz
│ ├── initial_max_freq_khz
│ ├── initial_min_freq_khz
│ ├── max_freq_khz
│ └── min_freq_khz
├── uncore00
│ ├── current_freq_khz
│ ├── domain_id
│ ├── fabric_cluster_id
│ ├── initial_max_freq_khz
│ ├── initial_min_freq_khz
│ ├── max_freq_khz
│ ├── min_freq_khz
│ └── package_id
├── uncore01
│ ├── current_freq_khz
│ ├── domain_id
│ ├── fabric_cluster_id
│ ├── initial_max_freq_khz
│ ├── initial_min_freq_khz
│ ├── max_freq_khz
│ ├── min_freq_khz
│ └── package_id
└── uncore02
├── current_freq_khz
├── domain_id
├── fabric_cluster_id
├── initial_max_freq_khz
├── initial_min_freq_khz
├── max_freq_khz
├── min_freq_khz
└── package_id
The attribute for cluster id is "fabric_cluster_id" instead of just
"cluster_id" is to avoid confusion with usage of term clusters in
other part of the Linux kernel.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wendy Wang <wendy.wang@intel.com>
Link: https://lore.kernel.org/r/20230418171340.681662-3-srinivas.pandruvada@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Implement support of uncore frequency control via TPMI (Topology Aware
Register and PM Capsule Interface). This driver provides the similar
functionality as the current uncore frequency driver using MSRs.
The hardware interface to read/write is basically substitution of MSR
0x620 and 0x621. There are specific MMIO offset and bits to get/set
minimum and maximum uncore ratio, similar to MSRs.
The scope of the uncore MSRs is package/die. But new generation of CPUs
have more granular control at a cluster level. Each package/die can have
multiple power domains, which further can have multiple clusters. The
TPMI interface allows control at cluster level.
The primary use case for uncore sysfs is to set maximum and minimum
uncore frequency to reduce power consumption or latency. The current
uncore sysfs control is per package/die. This is enough for the majority
of users as workload will move to different power domains as it moves
between different CPUs.
The current uncore sysfs provides controls at package/die level. When
user sets maximum/minimum limits, the driver sets the same limits to
each cluster.
Here number of power domains = number of resources in this aux device.
There are offsets and bits to discover number of clusters and offset for
each cluster level controls.
The TPMI documentation can be downloaded from:
https://github.com/intel/tpmi_power_management
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wendy Wang <wendy.wang@intel.com>
Link: https://lore.kernel.org/r/20230420220514.747573-1-srinivas.pandruvada@linux.intel.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Currently when the uncore_write() returns error, it is silently
ignored. Return error to user space when uncore_write() fails.
Fixes: 49a474c7ba ("platform/x86: Add support for Uncore frequency control")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wendy Wang <wendy.wang@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20230418153230.679094-1-srinivas.pandruvada@linux.intel.com
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Here is the large set of driver core changes for 6.4-rc1.
Once again, a busy development cycle, with lots of changes happening in
the driver core in the quest to be able to move "struct bus" and "struct
class" into read-only memory, a task now complete with these changes.
This will make the future rust interactions with the driver core more
"provably correct" as well as providing more obvious lifetime rules for
all busses and classes in the kernel.
The changes required for this did touch many individual classes and
busses as many callbacks were changed to take const * parameters
instead. All of these changes have been submitted to the various
subsystem maintainers, giving them plenty of time to review, and most of
them actually did so.
Other than those changes, included in here are a small set of other
things:
- kobject logging improvements
- cacheinfo improvements and updates
- obligatory fw_devlink updates and fixes
- documentation updates
- device property cleanups and const * changes
- firwmare loader dependency fixes.
All of these have been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEp7Sw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykitQCfamUHpxGcKOAGuLXMotXNakTEsxgAoIquENm5
LEGadNS38k5fs+73UaxV
=7K4B
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the large set of driver core changes for 6.4-rc1.
Once again, a busy development cycle, with lots of changes happening
in the driver core in the quest to be able to move "struct bus" and
"struct class" into read-only memory, a task now complete with these
changes.
This will make the future rust interactions with the driver core more
"provably correct" as well as providing more obvious lifetime rules
for all busses and classes in the kernel.
The changes required for this did touch many individual classes and
busses as many callbacks were changed to take const * parameters
instead. All of these changes have been submitted to the various
subsystem maintainers, giving them plenty of time to review, and most
of them actually did so.
Other than those changes, included in here are a small set of other
things:
- kobject logging improvements
- cacheinfo improvements and updates
- obligatory fw_devlink updates and fixes
- documentation updates
- device property cleanups and const * changes
- firwmare loader dependency fixes.
All of these have been in linux-next for a while with no reported
problems"
* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
device property: make device_property functions take const device *
driver core: update comments in device_rename()
driver core: Don't require dynamic_debug for initcall_debug probe timing
firmware_loader: rework crypto dependencies
firmware_loader: Strip off \n from customized path
zram: fix up permission for the hot_add sysfs file
cacheinfo: Add use_arch[|_cache]_info field/function
arch_topology: Remove early cacheinfo error message if -ENOENT
cacheinfo: Check cache properties are present in DT
cacheinfo: Check sib_leaf in cache_leaves_are_shared()
cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
cacheinfo: Add arm64 early level initializer implementation
cacheinfo: Add arch specific early level initializer
tty: make tty_class a static const structure
driver core: class: remove struct class_interface * from callbacks
driver core: class: mark the struct class in struct class_interface constant
driver core: class: make class_register() take a const *
driver core: class: mark class_release() as taking a const *
driver core: remove incorrect comment for device_create*
MIPS: vpe-cmp: remove module owner pointer from struct class usage.
...
Make Intel uncore frequency driver support to client processor starting
from Alder Lake.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20230330145939.1022261-1-srinivas.pandruvada@linux.intel.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Direct access to the struct bus_type dev_root pointer is going away soon
so replace that with a call to bus_get_dev_root() instead, which is what
it is there for.
Cc: Mark Gross <markgross@kernel.org>
Cc: platform-driver-x86@vger.kernel.org
Acked-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20230313182918.1312597-5-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make Intel uncore frequency driver support Emerald Rapids by adding its
CPU model to the match table.
Emerald Rapids uncore frequency control is the same as in Sapphire
Rapids.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Replace the open-code with sysfs_emit() to simplify the code.
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Link: https://lore.kernel.org/r/20220923063314.239146-1-ye.xingchen@zte.com.cn
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Loading this driver in guests results in unchecked MSR access error for
MSR 0x620.
There is no use of reading and modifying package/die scope uncore MSRs
in guests. So check for CPU feature X86_FEATURE_HYPERVISOR to prevent
loading of this driver in guests.
Fixes: dbce412a77 ("platform/x86/intel-uncore-freq: Split common and enumeration part")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215870
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20220427100304.2562990-1-srinivas.pandruvada@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Currently the uncore_freq_common_init() return one on success and
zero on failure. There is only one caller and it has a "forgot to set
the error code" bug. Change uncore_freq_common_init() to return
negative error codes which makes the code simpler and avoids this kind
of bug in the future.
Fixes: dbce412a77 ("platform/x86/intel-uncore-freq: Split common and enumeration part")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20220304131925.GG28739@kili
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Split the current driver in two parts:
- Common part: All the commom function other than enumeration function.
- Enumeration/HW specific part: The current enumeration using CPU model
is left in the old module. This uses service of common driver to register
sysfs objects. Also provide callbacks for MSR access related to uncore.
- Add MODULE_DEVICE_TABLE to uncore-frequency.c
No functional changes are expected.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20220204000306.2517447-5-srinivas.pandruvada@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Add a new sysfs attribute "current_freq_khz" to display current uncore
frequency. This value is read from MSR 0x621.
Root user permission is required to read uncore current frequency.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20220204000306.2517447-4-srinivas.pandruvada@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Use of sysfs API is always preferable over using kobject calls to create
attributes. Remove usage of kobject_init_and_add() and use
sysfs_create_group(). To create relationship between sysfs attribute
and uncore instance use device_attribute*, which is defined per
uncore instance.
To create uniform locking for both read and write attributes take
lock in the sysfs callbacks, not in the actual functions where
the MSRs are read or updated.
No functional changes are expected.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20220204000306.2517447-3-srinivas.pandruvada@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Move the current driver from platform/x86/intel/uncore-frequency.c
to platform/x86/intel/uncore-frequency/uncore-frequency.c.
No functional changes are expected.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20220204000306.2517447-2-srinivas.pandruvada@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>