1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

24 commits

Author SHA1 Message Date
Dave Jiang
001c5d1934 cxl: Consolidate dport access_coordinate ->hb_coord and ->sw_coord into ->coord
The driver stores access_coordinate for host bridge in ->hb_coord and
switch CDAT access_coordinate in ->sw_coord. Since neither of these
access_coordinate clobber each other, the variable name can be consolidated
into ->coord to simplify the code.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20240403154844.3403859-5-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-08 08:25:21 -07:00
Dave Jiang
51293c565c cxl: Fix incorrect region perf data calculation
Current math in cxl_region_perf_data_calculate divides the latency by 1000
every time the function gets called. This causes the region latency to be
divided by 1000 per memory device and the math is incorrect. This is user
visible as the latency access_coordinate exposed via sysfs will show
incorrect latency data.

Normalize values from CDAT to nanoseconds. Adjust sub-nanoseconds latency
to at least 1. Remove adjustment of perf numbers from the generic target
since hmat handling code has already normalized those numbers. Now all
computation and stored numbers should be in nanoseconds.

cxl_hb_get_perf_coordinates() is removed and HB coords are calculated
in the port access_coordinate calculation path since it no longer need
to be treated special.

Fixes: 3d9f4a1972 ("cxl/region: Calculate performance data for a region")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20240403154844.3403859-4-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-08 08:25:21 -07:00
Dan Williams
88482878c3 Merge branch 'for-6.9/cxl-fixes' into for-6.9/cxl
Pick up a parsing fix for the CDAT SSLBIS structure for v6.9.
2024-03-13 00:17:18 -07:00
Dan Williams
d5c0078033 Merge branch 'for-6.9/cxl-qos' into for-6.9/cxl
Pick up support for CXL "HMEM reporting" for v6.9, i.e. build an HMAT
from CXL CDAT and PCIe switch information.
2024-03-13 00:07:36 -07:00
Robert Richter
c6c3187d66 lib/firmware_table: Provide buffer length argument to cdat_table_parse()
There exist card implementations with a CDAT table using a fixed size
buffer, but with entries filled in that do not fill the whole table
length size. Then, the last entry in the CDAT table may not mark the
end of the CDAT table buffer specified by the length field in the CDAT
header. It can be shorter with trailing unused (zero'ed) data. The
actual table length is determined while reading all CDAT entries of
the table with DOE.

If the table is greater than expected (containing zero'ed trailing
data), the CDAT parser fails with:

 [   48.691717] Malformed DSMAS table length: (24:0)
 [   48.702084] [CDAT:0x00] Invalid zero length
 [   48.711460] cxl_port endpoint1: Failed to parse CDAT: -22

In addition, a check of the table buffer length is missing to prevent
an out-of-bound access then parsing the CDAT table.

Hardening code against device returning borked table. Fix that by
providing an optional buffer length argument to
acpi_parse_entries_array() that can be used by cdat_table_parse() to
propagate the buffer size down to its users to check the buffer
length. This also prevents a possible out-of-bound access mentioned.

Add a check to warn about a malformed CDAT table length.

Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Len Brown <lenb@kernel.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/ZdEnopFO0Tl3t2O1@rric.localdomain
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-13 00:03:21 -07:00
Dave Jiang
99b52aac2d cxl: Fix the incorrect assignment of SSLBIS entry pointer initial location
The 'entry' pointer in cdat_sslbis_handler() is set to header +
sizeof(common header). However, the math missed the addition of the SSLBIS
main header. It should be header + sizeof(common header) + sizeof(*sslbis).
Use a defined struct for all the SSLBIS parts in order to avoid pointer
math errors.

The bug causes incorrect parsing of the SSLBIS table and introduces incorrect
performance values to the access_coordinates during the CXL access_coordinate
calculation path if there are CXL switches present in the topology.

The issue was found during testing of new code being added to add additional
checks for invalid CDAT values during CXL access_coordinate calculation. The
testing was done on qemu with a CXL topology including a CXL switch.

Fixes: 80aa780dda ("cxl: Add callback to parse the SSLBIS subtable from CDAT")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Link: https://lore.kernel.org/r/20240301210948.1298075-1-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12 23:41:59 -07:00
Dave Jiang
debdce20c4 cxl/region: Deal with numa nodes not enumerated by SRAT
For the numa nodes that are not created by SRAT, no memory_target is
allocated and is not managed by the HMAT_REPORTING code. Therefore
hmat_callback() memory hotplug notifier will exit early on those NUMA
nodes. The CXL memory hotplug notifier will need to call
node_set_perf_attrs() directly in order to setup the access sysfs
attributes.

In acpi_numa_init(), the last proximity domain (pxm) id created by SRAT is
stored. Add a helper function acpi_node_backed_by_real_pxm() in order to
check if a NUMA node id is defined by SRAT or created by CFMWS.

node_set_perf_attrs() symbol is exported to allow update of perf attribs
for a node. The sysfs path of
/sys/devices/system/node/nodeX/access0/initiators/* is created by
node_set_perf_attrs() for the various attributes where nodeX is matched
to the NUMA node of the CXL region.

Cc: Rafael J. Wysocki <rafael@kernel.org>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-13-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12 14:54:03 -07:00
Dave Jiang
067353a46d cxl/region: Add memory hotplug notifier for cxl region
When the CXL region is formed, the driver computes the performance data
for the region. However this data is not available at the node data
collection that has been populated by the HMAT during kernel
initialization. Add a memory hotplug notifier to update the access
coordinates to the 'struct memory_target' context kept by the
HMAT_REPORTING code.

Add CXL_CALLBACK_PRI for a memory hotplug callback priority. Set the
priority number to be called before HMAT_CALLBACK_PRI. The CXL update must
happen before hmat_callback().

A new HMAT_REPORTING helper hmat_update_target_coordinates() is added in
order to allow CXL to update the memory_target access coordinates.

A new ext_updated member is added to the memory_target to indicate that
the access coordinates within the memory_target has been updated by an
external agent such as CXL. This prevents data being overwritten by the
hmat_update_target_attrs() triggered by hmat_callback().

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Reviewed-by: Huang, Ying <ying.huang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-12-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12 12:34:12 -07:00
Dave Jiang
3d9f4a1972 cxl/region: Calculate performance data for a region
Calculate and store the performance data for a CXL region. Find the worst
read and write latency for all the included ranges from each of the devices
that attributes to the region and designate that as the latency data. Sum
all the read and write bandwidth data for each of the device region and
that is the total bandwidth for the region.

The perf list is expected to be constructed before the endpoint decoders
are registered and thus there should be no early reading of the entries
from the region assemble action. The calling of the region qos calculate
function is under the protection of cxl_dpa_rwsem and will ensure that
all DPA associated work has completed.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-10-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12 12:34:11 -07:00
Dave Jiang
6ef83c4e19 cxl: Move QoS class to be calculated from the nearest CPU
Retrieve the qos_class (QTG ID) using the access coordinates from the
nearest CPU rather than the nearst initiator that may not be a CPU.
This may be the more appropriate number that applications care about.

For most cases, access0 and access1 have the same values.

Link: https://lore.kernel.org/linux-cxl/20240112113023.00006c50@Huawei.com/
Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-8-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12 12:34:11 -07:00
Dave Jiang
863027d409 cxl: Split out host bridge access coordinates
The difference between access class 0 and access class 1 for 'struct
access_coordinate', if any, is that class 0 is for the distance from
the target to the closest initiator and that class 1 is for the distance
from the target to the closest CPU. For CXL memory, the nearest initiator
may not necessarily be a CPU node. The performance path from the CXL
endpoint to the host bridge should remain the same. However, the numbers
extracted and stored from HMAT is the difference for the two access
classes. Split out the performance numbers for the host bridge (generic
target) from the calculation of the entire path in order to allow
calculation of both access classes for a CXL region.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-7-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12 12:34:11 -07:00
Dave Jiang
032f7b37ad cxl: Split out combine_coordinates() for common shared usage
Refactor the common code of combining coordinates in order to reduce code.
Create a new function cxl_cooordinates_combine() it combine two 'struct
access_coordinate'.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-6-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12 12:34:11 -07:00
Dave Jiang
cc214417f0 cxl: Fix sysfs export of qos_class for memdev
Current implementation exports only to
/sys/bus/cxl/devices/.../memN/qos_class. With both ram and pmem exposed,
the second registered sysfs attribute is rejected as duplicate. It's not
possible to create qos_class under the dev_groups via the driver due to
the ram and pmem sysfs sub-directories already created by the device sysfs
groups. Move the ram and pmem qos_class to the device sysfs groups and add
a call to sysfs_update() after the perf data are validated so the
qos_class can be visible. The end results should be
/sys/bus/cxl/devices/.../memN/ram/qos_class and
/sys/bus/cxl/devices/.../memN/pmem/qos_class.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240206190431.1810289-4-dave.jiang@intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-16 23:20:34 -08:00
Dave Jiang
10cb393d76 cxl: Remove unnecessary type cast in cxl_qos_class_verify()
The passed in host bridge parameter for device_for_each_child() has
unnecessary void * type cast. Remove the type cast.

Suggested-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240206190431.1810289-3-dave.jiang@intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-16 23:20:34 -08:00
Dave Jiang
00413c1506 cxl: Change 'struct cxl_memdev_state' *_perf_list to single 'struct cxl_dpa_perf'
In order to address the issue with being able to expose qos_class sysfs
attributes under 'ram' and 'pmem' sub-directories, the attributes must
be defined as static attributes rather than under driver->dev_groups.
To avoid implementing locking for accessing the 'struct cxl_dpa_perf`
lists, convert the list to a single 'struct cxl_dpa_perf' entry in
preparation to move the attributes to statically defined.

While theoretically a partition may have multiple qos_class via CDAT, this
has not been encountered with testing on available hardware. The code is
simplified for now to not support the complex case until a use case is
needed to support that.

Link: https://lore.kernel.org/linux-cxl/65b200ba228f_2d43c29468@dwillia2-mobl3.amr.corp.intel.com.notmuch/
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240206190431.1810289-2-dave.jiang@intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-16 23:20:34 -08:00
Dan Williams
e16bf7e015 Merge branch 'for-6.7/cxl' into for-6.8/cxl
Pick up a late locking change + fixup that is better as merge window
material than rc material.
2024-01-05 19:24:33 -08:00
Dave Jiang
98e7ab3345 cxl: Fix device reference leak in cxl_port_perf_data_calculate()
cxl_port_perf_data_calculate() calls find_cxl_root() and does not
dereference the 'struct device' in the cxl_root->port. find_cxl_root()
calls get_device() and takes a reference on the port 'struct device'
member. Use the __free() macro to ensure the dereference happens.

Fixes: 7a4f148dd8 ("cxl: Compute the entire CXL path latency and bandwidth data")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170449246681.3779673.2288926019977963333.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-05 14:36:29 -08:00
Dave Jiang
44cd71ef7b cxl: Convert find_cxl_root() to return a 'struct cxl_root *'
Commit 790815902e ("cxl: Add support for _DSM Function for retrieving QTG ID")
introduced 'struct cxl_root', however all usages have been worked
indirectly through cxl_port. Refactor code such as find_cxl_root()
function to use 'struct cxl_root' directly.

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170449246044.3779673.13035770941393418591.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-05 14:36:29 -08:00
Dave Jiang
185c1a489f cxl: Check qos_class validity on memdev probe
Add a check to make sure the qos_class for the device will match one of
the root decoders qos_class. If no match is found, then the qos_class for
the device is set to invalid. Also add a check to ensure that the device's
host bridge matches to one of the root decoder's downstream targets.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319626313.2212653.9021004640856081917.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22 15:46:02 -08:00
Dave Jiang
86557b7edf cxl: Store QTG IDs and related info to the CXL memory device context
Once the QTG ID _DSM is executed successfully, the QTG ID is retrieved from
the return package. Create a list of entries in the cxl_memdev context and
store the QTG ID as qos_class token and the associated DPA range. This
information can be exposed to user space via sysfs in order to help region
setup for hot-plugged CXL memory devices.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319625109.2212653.11872111896220384056.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22 15:31:53 -08:00
Dave Jiang
7a4f148dd8 cxl: Compute the entire CXL path latency and bandwidth data
CXL Memory Device SW Guide [1] rev1.0 2.11.2 provides instruction on how to
calculate latency and bandwidth for CXL memory device. Calculate minimum
bandwidth and total latency for the path from the CXL device to the root
port. The QTG id is retrieved by providing the performance data as input
and calling the root port callback ->get_qos_class(). The retrieved id is
stored with the cxl_port of the CXL device.

For example for a device that is directly attached to a host bus:
Total Latency = Device Latency (from CDAT) + Dev to Host Bus (HB) Link
		Latency + Generic Port Latency
Min Bandwidth = Min bandwidth for link bandwidth between HB
		and CXL device, device CDAT bandwidth, and Generic Port
		Bandwidth

For a device that has a switch in between host bus and CXL device:
Total Latency = Device (CDAT) Latency + Dev to Switch Link Latency +
		Switch (CDAT) Latency + Switch to HB Link Latency +
		Generic Port Latency
Min Bandwidth = Min bandwidth for link bandwidth between CXL device
		to CXL switch, CXL device CDAT bandwidth, CXL switch CDAT
		bandwidth, CXL switch to HB bandwidth, and Generic Port
		Bandwidth.

[1]: https://cdrdv2-public.intel.com/643805/643805_CXL%20Memory%20Device%20SW%20Guide_Rev1p0.pdf

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319624458.2212653.13252496567443656371.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22 15:31:52 -08:00
Dave Jiang
80aa780dda cxl: Add callback to parse the SSLBIS subtable from CDAT
Provide a callback to parse the Switched Scoped Latency and Bandwidth
Information Structure (SSLBIS) in the CDAT structures. The SSLBIS
contains the bandwidth and latency information that's tied to the
CXL switch that the data table has been read from. The extracted
values are stored to the cxl_dport correlated by the port_id
depending on the SSLBIS entry.

Coherent Device Attribute Table 1.03 2.1 Switched Scoped Latency
and Bandwidth Information Structure (DSLBIS)

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319620635.2212653.5194389158785365150.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22 14:33:28 -08:00
Dave Jiang
63cef81b9d cxl: Add callback to parse the DSLBIS subtable from CDAT
Provide a callback to parse the Device Scoped Latency and Bandwidth
Information Structure (DSLBIS) in the CDAT structures. The DSLBIS
contains the bandwidth and latency information that's tied to a DSMAS
handle. The driver will retrieve the read and write latency and
bandwidth associated with the DSMAS which is tied to a DPA range.

Coherent Device Attribute Table 1.03 2.1 Device Scoped Latency and
Bandwidth Information Structure (DSLBIS)

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319620005.2212653.7475488478229720542.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22 14:33:28 -08:00
Dave Jiang
ad6f04c026 cxl: Add callback to parse the DSMAS subtables from CDAT
Provide a callback function to the CDAT parser in order to parse the
Device Scoped Memory Affinity Structure (DSMAS). Each DSMAS structure
contains the DPA range and its associated attributes in each entry. See
the CDAT specification for details. The device handle and the DPA range
is saved and to be associated with the DSLBIS locality data when the
DSLBIS entries are parsed. The xarray is a local variable. When the
total path performance data is calculated and storred this xarray can be
discarded.

Coherent Device Attribute Table 1.03 2.1 Device Scoped memory Affinity
Structure (DSMAS)

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319619355.2212653.2675953129671561293.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22 14:33:10 -08:00