linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
James Morse	fbc06c6980	x86/resctrl: Remove rdt_cdp_peer_get() When CDP is enabled, rdt_cdp_peer_get() finds the alternative CODE/DATA resource and returns the alternative domain. This is used to determine if bitmaps overlap when there are aliased entries in the two struct rdt_hw_resources. Now that the ctrl_val[] used by the CODE/DATA resources is the same, the search for an alternate resource/domain is not needed. Replace rdt_cdp_peer_get() with resctrl_peer_type(), which returns the alternative type. This can be passed to resctrl_arch_get_config() with the same resource and domain. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-23-james.morse@arm.com	2021-08-11 18:33:48 +02:00
James Morse	43ac1dbf61	x86/resctrl: Merge the ctrl_val arrays Each struct rdt_hw_resource has its own ctrl_val[] array. When CDP is enabled, two resources are in use, each with its own ctrl_val[] array that holds half of the configuration used by hardware. One uses the odd slots, the other the even. rdt_cdp_peer_get() is the helper to find the alternate resource, its domain, and corresponding entry in the other ctrl_val[] array. Once the CDP resources are merged there will be one struct rdt_hw_resource and one ctrl_val[] array for each hardware resource. This will include changes to rdt_cdp_peer_get(), making it hard to bisect any issue. Merge the ctrl_val[] arrays for three CODE/DATA/NONE resources first. Doing this before merging the resources temporarily complicates allocating and freeing the ctrl_val arrays. Add a helper to allocate the ctrl_val array, that returns the value on the L2 or L3 resource if it already exists. This gets removed once the resources are merged, and there really is only one ctrl_val[] array. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-22-james.morse@arm.com	2021-08-11 18:31:04 +02:00
James Morse	2b8dd4ab65	x86/resctrl: Calculate the index from the configuration type resctrl uses cbm_idx() to map a closid to an index in the configuration array. This is based on a multiplier and offset that are held in the resource. To merge the resources, the resctrl arch code needs to calculate the index from something else, as there will only be one resource. Decide based on the staged configuration type. This makes the static mult and offset parameters redundant. [ bp: Remove superfluous brackets in get_config_index() ] Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-21-james.morse@arm.com	2021-08-11 18:19:06 +02:00
James Morse	2e7df368fc	x86/resctrl: Apply offset correction when config is staged When resctrl comes to copy the CAT MSR values from the ctrl_val[] array into hardware, it applies an offset adjustment based on the type of the resource. CODE and DATA resources have their closid mapped into an odd/even range. This mapping is based on a property of the resource. This happens once the new control value has been written to the ctrl_val[] array. Once the CDP resources are merged, there will only be a single property that needs to cover both odd/even mappings to the single ctrl_val[] array. The offset adjustment must be applied before the new value is written to the array. Move the logic from cat_wrmsr() to resctrl_arch_update_domains(). The value provided to apply_config() is now an index in the array, not the closid. The parameters provided via struct msr_param are now indexes too. As resctrl's use of closid is a u32, struct msr_param's type is changed to match. With this, the CODE and DATA resources only use the odd or even indexes in the array. This allows the temporary num_closid/2 fixes in domain_setup_ctrlval() and reset_all_ctrls() to be removed. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-20-james.morse@arm.com	2021-08-11 18:03:28 +02:00
James Morse	141739aa73	x86/resctrl: Make ctrlval arrays the same size The CODE and DATA resources report a num_closid that is half the actual size supported by the hardware. This behaviour is visible to user-space when CDP is enabled. The CODE and DATA resources have their own ctrlval arrays which are half the size of the underlying hardware because num_closid was already adjusted. One holds the odd configurations values, the other even. Before the CDP resources can be merged, the 'half the closids' behaviour needs to be implemented by schemata_list_create(), but this causes the ctrl_val[] array to be full sized. Remove the logic from the architecture specific rdt_get_cdp_config() setup, and add it to schemata_list_create(). Functions that walk all the configurations, such as domain_setup_ctrlval() and reset_all_ctrls(), take num_closid directly from struct rdt_hw_resource also have to halve num_closid as only the lower half of each array is in use. domain_setup_ctrlval() and reset_all_ctrls() both copy struct rdt_hw_resource's num_closid to a struct msr_param. Correct the value here. This is temporary as a subsequent patch will merge all three ctrl_val[] arrays such that when CDP is in use, the CODA/DATA layout in the array matches the hardware. reset_all_ctrls()'s loop over the whole of ctrl_val[] is not touched as this is harmless, and will be required as it is once the resources are merged. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-19-james.morse@arm.com	2021-08-11 17:58:33 +02:00
James Morse	fa8f711d2f	x86/resctrl: Pass configuration type to resctrl_arch_get_config() The ctrl_val[] array for a struct rdt_hw_resource only holds configurations of one type. The type is implicit. Once the CDP resources are merged, the ctrl_val[] array will hold all the configurations for the hardware resource. When a particular type of configuration is needed, it must be specified explicitly. Pass the expected type from the schema into resctrl_arch_get_config(). Nothing uses this yet, but once a single ctrl_val[] array is used for the three struct rdt_hw_resources that share hardware, the type will be used to return the correct configuration value from the shared array. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-18-james.morse@arm.com	2021-08-11 17:53:53 +02:00
James Morse	f07e9d0250	x86/resctrl: Add a helper to read a closid's configuration Functions like show_doms() reach into the architecture's private structure to retrieve the configuration from the struct rdt_hw_resource. The hardware configuration may look completely different to the values resctrl gets from user-space. The staged configuration and resctrl_arch_update_domains() allow the architecture to convert or translate these values. Resctrl shouldn't read or write the ctrl_val[] values directly. Add a helper to read the current configuration. This will allow another architecture to scale the bitmaps if necessary, and possibly use controls that don't take the user-space control format at all. Of the remaining functions that access ctrl_val[] directly, apply_config() is part of the architecture-specific code, and is called via resctrl_arch_update_domains(). reset_all_ctrls() will be an architecture specific helper. update_mba_bw() manipulates both ctrl_val[], mbps_val[] and the hardware. The mbps_val[] that matches the mba_sc state of the resource is changed, but the other is left unchanged. Abstracting this is the subject of later patches that affect set_mba_sc() too. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-17-james.morse@arm.com	2021-08-11 17:46:34 +02:00
James Morse	2e6678195d	x86/resctrl: Rename update_domains() to resctrl_arch_update_domains() update_domains() merges the staged configuration changes into the arch codes configuration array. Rename to make it clear it is part of the arch code interface to resctrl. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-16-james.morse@arm.com	2021-08-11 16:36:58 +02:00
James Morse	75408e4350	x86/resctrl: Allow different CODE/DATA configurations to be staged Before the CDP resources can be merged, struct rdt_domain will need an array of struct resctrl_staged_config, one per type of configuration. Use the type as an index to the array to ensure that a schema configuration string can't specify the same domain twice. This will allow two schemata to apply configuration changes to one resource. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-15-james.morse@arm.com	2021-08-11 16:33:42 +02:00
James Morse	e8f7282552	x86/resctrl: Group staged configuration into a separate struct When configuration changes are made, the new value is written to struct rdt_domain's new_ctrl field and the have_new_ctrl flag is set. Later new_ctrl is copied to hardware by a call to update_domains(). Once the CDP resources are merged, there will be one new_ctrl field in use by two struct resctrl_schema requiring a per-schema IPI to copy the value to hardware. Move new_ctrl and have_new_ctrl into a new struct resctrl_staged_config. Before the CDP resources can be merged, struct rdt_domain will need an array of these, one per type of configuration. Using the type as an index to the array will ensure that a schema configuration string can't specify the same domain twice. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-14-james.morse@arm.com	2021-08-11 16:32:32 +02:00
James Morse	e198fde3fe	x86/resctrl: Move the schemata names into struct resctrl_schema resctrl 'info' directories and schema parsing use the schema name. This lives in the struct rdt_resource, and is specified by the architecture code. Once the CDP resources are merged, there will only be one resource (and one name) in use by two schemata. To allow the CDP CODE/DATA property to be the type of configuration the schema uses, the name should also be per-schema. Add a name field to struct resctrl_schema, and use this wherever the schema name is exposed (or read from) user-space. Calculating max_name_width for padding the schemata file also moves as this is visible to user-space. As the names in struct rdt_resource already include the CDP information, schemata_list_create() copies them. schemata_list_create() includes the length of the CDP suffix when calculating max_name_width in preparation for CDP resources being merged. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-13-james.morse@arm.com	2021-08-11 16:21:35 +02:00
James Morse	c091e90721	x86/resctrl: Add a helper to read/set the CDP configuration Whether CDP is enabled for a hardware resource like the L3 cache can be found by inspecting the alloc_enabled flags of the L3CODE/L3DATA struct rdt_hw_resources, even if they aren't in use. Once these resources are merged, the flags can't be compared. Whether CDP is enabled needs tracking explicitly. If another architecture is emulating CDP the behaviour may not be per-resource. 'cdp_capable' needs to be visible to resctrl, even if its not in use, as this affects the padding of the schemata table visible to user-space. Add cdp_enabled to struct rdt_hw_resource and cdp_capable to struct rdt_resource. Add resctrl_arch_set_cdp_enabled() to let resctrl enable or disable CDP on a resource. resctrl_arch_get_cdp_enabled() lets it read the current state. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-12-james.morse@arm.com	2021-08-11 15:54:26 +02:00
James Morse	32150edd3f	x86/resctrl: Swizzle rdt_resource and resctrl_schema in pseudo_lock_region struct pseudo_lock_region points to the rdt_resource. Once the resources are merged, this won't be unique. The resource name is moving into the schema, so that the filesystem portions of resctrl can generate it. Swap pseudo_lock_region's rdt_resource pointer for a schema pointer. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-11-james.morse@arm.com	2021-08-11 15:51:45 +02:00
James Morse	1c290682c0	x86/resctrl: Pass the schema to resctrl filesystem functions Once the CDP resources are merged, there will be two struct resctrl_schema for one struct rdt_resource. CDP becomes a type of configuration that belongs to the schema. Helpers like rdtgroup_cbm_overlaps() need access to the schema to query the configuration (or configurations) based on schema properties. Change these functions to take a struct schema instead of the struct rdt_resource. All the modified functions are part of the filesystem code that will move to /fs/resctrl once it is possible to support a second architecture. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-10-james.morse@arm.com	2021-08-11 15:43:54 +02:00
James Morse	eb6f318769	x86/resctrl: Add resctrl_arch_get_num_closid() To initialise struct resctrl_schema's num_closid, schemata_list_create() reaches into the architectures private structure to retrieve num_closid from the struct rdt_hw_resource. The 'half the closids' behaviour should be part of the filesystem parts of resctrl that are the same on any architecture. struct resctrl_schema's num_closid should include any correction for CDP. Having two properties called num_closid is likely to be confusing when they have different values. Add a helper to read the resource's num_closid from the arch code. This should return the number of closid that the resource supports, regardless of whether CDP is in use. Once the CDP resources are merged, schemata_list_create() can apply the correction itself. Using a type with an obvious size for the arch helper means changing the type of num_closid to u32, which matches the type already used by struct rdtgroup. reset_all_ctrls() does not use resctrl_arch_get_num_closid(), even though it sets up a structure for modifying the hardware. This function will be part of the architecture code, the maximum closid should be the maximum value the hardware has, regardless of the way resctrl is using it. All the uses of num_closid in core.c are naturally part of the architecture specific code. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-9-james.morse@arm.com	2021-08-11 15:35:42 +02:00
James Morse	3183e87c1b	x86/resctrl: Store the effective num_closid in the schema Struct resctrl_schema holds properties that vary with the style of configuration that resctrl applies to a resource. There are already two values for the hardware's num_closid, depending on whether the architecture presents the L3 or L3CODE/L3DATA resources. As the way CDP changes the number of control groups that resctrl can create is part of the user-space interface, it should be managed by the filesystem parts of resctrl. This allows the architecture code to only describe the value the hardware supports. Add num_closid to resctrl_schema. This is the value seen by the filesystem, which may be different to the maximum value described by the arch code when CDP is enabled. These functions operate on the num_closid value that is exposed to user-space: * rdtgroup_parse_resource() * rdtgroup_schemata_show() * rdt_num_closids_show() * closid_init() Change them to use the schema value instead. schemata_list_create() sets this value, and reaches into the architecture-specific structure to get the value. This will eventually be replaced with a helper. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-8-james.morse@arm.com	2021-08-11 15:24:27 +02:00
James Morse	331ebe4c43	x86/resctrl: Walk the resctrl schema list instead of an arch list When parsing a schema configuration value from user-space, resctrl walks the architectures rdt_resources_all[] array to find a matching struct rdt_resource. Once the CDP resources are merged there will be one resource in use by two schemata. Anything walking rdt_resources_all[] on behalf of a user-space request should walk the list of struct resctrl_schema instead. Change the users of for_each_alloc_enabled_rdt_resource() to walk the schema instead. Schemata were only created for alloc_enabled resources so these two lists are currently equivalent. schemata_list_create() and rdt_kill_sb() are ignored. The first creates the schema list, and will eventually loop over the resource indexes using an arch helper to retrieve the resource. rdt_kill_sb() will eventually make use of an arch 'reset everything' helper. After the filesystem code is moved, rdtgroup_pseudo_locked_in_hierarchy() remains part of the x86 specific hooks to support pseudo lock. This code walks each domain, and still does this after the separate resources are merged. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-7-james.morse@arm.com	2021-08-11 13:20:43 +02:00
James Morse	208ab16847	x86/resctrl: Label the resources with their configuration type The names of resources are used for the schema name presented to user-space. The name used is rooted in a structure provided by the architecture code because the names are different when CDP is enabled. x86 implements this by swapping between two sets of resource structures based on their alloc_enabled flag. The type of configuration in-use is encoded in the name (and cbm_idx_offset). Once the CDP behaviour is moved into the parts of resctrl that will move to /fs/, there will be two struct resctrl_schema for one struct rdt_resource. The schema describes the type of configuration being applied to the resource. The name of the schema should be generated by resctrl, base on the type of configuration. To do this struct resctrl_schema needs to store the type of configuration in use for a schema. Create an enum resctrl_conf_type describing the options, and add it to struct resctrl_schema. The underlying resources are still separate, as cbm_idx_offset is still in use. Temporarily label all the entries in rdt_resources_all[] and copy that value to struct resctrl_schema. Copying the value ensures there is no mismatch while the filesystem parts of resctrl are modified to use the schema. Once the resources are merged, the filesystem code can assign this value based on the schema being created. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-6-james.morse@arm.com	2021-08-11 13:13:18 +02:00
James Morse	f259449230	x86/resctrl: Pass the schema in info dir's private pointer Many of resctrl's per-schema files return a value from struct rdt_resource, which they take as their 'priv' pointer. Moving properties that resctrl exposes to user-space into the core 'fs' code, (e.g. the name of the schema), means some of the functions that back the filesystem need the schema struct (to where the properties are moved), but currently take struct rdt_resource. For example, once the CDP resources are merged, struct rdt_resource no longer reflects all the properties of the schema. For the info dirs that represent a control, the information needed will be accessed via struct resctrl_schema, as this is how the resource is being used. For the monitors, its still struct rdt_resource as the monitors aren't described as schema. This difference means the type of the private pointers varies between control and monitor info dirs. Change the 'priv' pointer to point to struct resctrl_schema for the per-schema files that represent a control. The type can be determined from the fflags field. If the flags are RF_MON_INFO, its a struct rdt_resource. If the flags are RF_CTRL_INFO, its a struct resctrl_schema. No entry in res_common_files[] has both flags. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-5-james.morse@arm.com	2021-08-11 12:41:19 +02:00
James Morse	cdb9ebc917	x86/resctrl: Add a separate schema list for resctrl Resctrl exposes schemata to user-space, which allow the control values to be specified for a group of tasks. User-visible properties of the interface, (such as the schemata names and how the values are parsed) are rooted in a struct provided by the architecture code. (struct rdt_hw_resource). Once a second architecture uses resctrl, this would allow user-visible properties to diverge between architectures. These properties should come from the resctrl code that will be common to all architectures. Resctrl has no per-schema structure, only struct rdt_{hw_,}resource. Create a struct resctrl_schema to hold the rdt_resource. Before a second architecture can be supported, this structure will also need to hold the schema name visible to user-space and the type of configuration values for resctrl. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-4-james.morse@arm.com	2021-08-11 12:28:01 +02:00
James Morse	792e0f6f78	x86/resctrl: Split struct rdt_domain resctrl is the defacto Linux ABI for SoC resource partitioning features. To support it on another architecture, it needs to be abstracted from the features provided by Intel RDT and AMD PQoS, and moved to /fs/. struct rdt_domain contains a mix of architecture private details and properties of the filesystem interface user-space uses. Continue by splitting struct rdt_domain, into an architecture private 'hw' struct, which contains the common resctrl structure that would be used by any architecture. The hardware values in ctrl_val and mbps_val need to be accessed via helpers to allow another architecture to convert these into a different format if necessary. After this split, filesystem code paths touching a 'hw' struct indicates where an abstraction is needed. Splitting this structure only moves types around, and should not lead to any change in behaviour. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-3-james.morse@arm.com	2021-08-11 12:00:43 +02:00
James Morse	63c8b12319	x86/resctrl: Split struct rdt_resource resctrl is the defacto Linux ABI for SoC resource partitioning features. To support it on another architecture, it needs to be abstracted from the features provided by Intel RDT and AMD PQoS, and moved to /fs/. struct rdt_resource contains a mix of architecture private details and properties of the filesystem interface user-space uses. Start by splitting struct rdt_resource, into an architecture private 'hw' struct, which contains the common resctrl structure that would be used by any architecture. The foreach helpers are most commonly used by the filesystem code, and should return the common resctrl structure. for_each_rdt_resource() is changed to walk the common structure in its parent arch private structure. Move as much of the structure as possible into the common structure in the core code's header file. The x86 hardware accessors remain part of the architecture private code, as do num_closid, mon_scale and mbm_width. mon_scale and mbm_width are used to detect overflow of the hardware counters, and convert them from their native size to bytes. Any cross-architecture abstraction should be in terms of bytes, making these properties private. The hardware's num_closid is kept in the private structure to force the filesystem code to use a helper to access it. MPAM would return a single value for the system, regardless of the resource. Using the helper prevents this field from being confused with the version of num_closid that is being exposed to user-space (added in a later patch). After this split, filesystem code touching a 'hw' struct indicates where an abstraction is needed. Splitting this structure only moves types around, and should not lead to any change in behaviour. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lkml.kernel.org/r/20210728170637.25610-2-james.morse@arm.com	2021-08-11 11:51:34 +02:00
Maciej W. Rozycki	34739a2809	x86: Fix typo s/ECLR/ELCR/ for the PIC register The proper spelling for the acronym referring to the Edge/Level Control Register is ELCR rather than ECLR. Adjust references accordingly. No functional change. Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/alpine.DEB.2.21.2107200251080.9461@angie.orcam.me.uk	2021-08-10 23:31:44 +02:00
Maciej W. Rozycki	d253166168	x86: Avoid magic number with ELCR register accesses Define PIC_ELCR1 and PIC_ELCR2 macros for accesses to the ELCR registers implemented by many chipsets in their embedded 8259A PIC cores, avoiding magic numbers that are difficult to handle, and complementing the macros we already have for registers originally defined with discrete 8259A PIC implementations. No functional change. Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/alpine.DEB.2.21.2107200237300.9461@angie.orcam.me.uk	2021-08-10 23:31:43 +02:00
Maciej W. Rozycki	fb6a0408ea	x86: Add support for 0x22/0x23 port I/O configuration space Define macros and accessors for the configuration space addressed indirectly with an index register and a data register at the port I/O locations of 0x22 and 0x23 respectively. This space is defined by the Intel MultiProcessor Specification for the IMCR register used to switch between the PIC and the APIC mode[1], by Cyrix processors for their configuration[2][3], and also some chipsets. Given the lack of atomicity with the indirect addressing a spinlock is required to protect accesses, although for Cyrix processors it is enough if accesses are executed with interrupts locally disabled, because the registers are local to the accessing CPU, and IMCR is only ever poked at by the BSP and early enough for interrupts not to have been configured yet. Therefore existing code does not have to change or use the new spinlock and neither it does. Put the spinlock in a library file then, so that it does not get pulled unnecessarily for configurations that do not refer it. Convert Cyrix accessors to wrappers so as to retain the brevity and clarity of the `getCx86' and `setCx86' calls. References: [1] "MultiProcessor Specification", Version 1.4, Intel Corporation, Order Number: 242016-006, May 1997, Section 3.6.2.1 "PIC Mode", pp. 3-7, 3-8 [2] "5x86 Microprocessor", Cyrix Corporation, Order Number: 94192-00, July 1995, Section 2.3.2.4 "Configuration Registers", p. 2-23 [3] "6x86 Processor", Cyrix Corporation, Order Number: 94175-01, March 1996, Section 2.4.4 "6x86 Configuration Registers", p. 2-23 Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/alpine.DEB.2.21.2107182353140.9461@angie.orcam.me.uk	2021-08-10 23:31:43 +02:00
Sebastian Andrzej Siewior	8ae9e3f638	x86/mce/inject: Replace deprecated CPU-hotplug functions. The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210803141621.780504-10-bigeasy@linutronix.de	2021-08-10 14:46:27 +02:00
Sebastian Andrzej Siewior	2089f34f8c	x86/microcode: Replace deprecated CPU-hotplug functions. The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210803141621.780504-9-bigeasy@linutronix.de	2021-08-10 14:46:27 +02:00
Sebastian Andrzej Siewior	1a351eefd4	x86/mtrr: Replace deprecated CPU-hotplug functions. The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210803141621.780504-8-bigeasy@linutronix.de	2021-08-10 14:46:27 +02:00
Thomas Gleixner	ff363f480e	x86/msi: Force affinity setup before startup The X86 MSI mechanism cannot handle interrupt affinity changes safely after startup other than from an interrupt handler, unless interrupt remapping is enabled. The startup sequence in the generic interrupt code violates that assumption. Mark the irq chips with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that the default interrupt setting happens before the interrupt is started up for the first time. While the interrupt remapping MSI chip does not require this, there is no point in treating it differently as this might spare an interrupt to a CPU which is not in the default affinity mask. For the non-remapping case go to the direct write path when the interrupt is not yet started similar to the not yet activated case. Fixes: `1840475676` ("genirq: Expose default irq affinity mask (take 3)") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.886722080@linutronix.de	2021-08-10 10:59:21 +02:00
Thomas Gleixner	0c0e37dc11	x86/ioapic: Force affinity setup before startup The IO/APIC cannot handle interrupt affinity changes safely after startup other than from an interrupt handler. The startup sequence in the generic interrupt code violates that assumption. Mark the irq chip with the new IRQCHIP_AFFINITY_PRE_STARTUP flag so that the default interrupt setting happens before the interrupt is started up for the first time. Fixes: `1840475676` ("genirq: Expose default irq affinity mask (take 3)") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210729222542.832143400@linutronix.de	2021-08-10 10:59:21 +02:00
Logan Gunthorpe	183dc86335	x86/amd_gart: don't set failed sg dma_address to DMA_MAPPING_ERROR Setting the ->dma_address to DMA_MAPPING_ERROR is not part of the ->map_sg calling convention, so remove it. Link: https://lore.kernel.org/linux-mips/20210716063241.GC13345@lst.de/ Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Christoph Hellwig <hch@lst.de>	2021-08-09 17:13:06 +02:00
Martin Oliveira	fcacc8a614	x86/amd_gart: return error code from gart_map_sg() The .map_sg() op now expects an error code instead of zero on failure. So make __dma_map_cont() return a valid errno (which is then propagated to gart_map_sg() via dma_map_cont()) and return it in case of failure. Also, return -EINVAL in case of invalid nents. Signed-off-by: Martin Oliveira <martin.oliveira@eideticom.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Christoph Hellwig <hch@lst.de>	2021-08-09 17:13:06 +02:00
Maurizio Lombardi	342f43af70	iscsi_ibft: fix crash due to KASLR physical memory remapping Starting with commit `a799c2bd29` ("x86/setup: Consolidate early memory reservations") memory reservations have been moved earlier during the boot process, before the execution of the Kernel Address Space Layout Randomization code. setup_arch() calls the iscsi_ibft's find_ibft_region() function to find and reserve the memory dedicated to the iBFT and this function also saves a virtual pointer to the iBFT table for later use. The problem is that if KALSR is active, the physical memory gets remapped somewhere else in the virtual address space and the pointer is no longer valid, this will cause a kernel panic when the iscsi driver tries to dereference it. iBFT detected. BUG: unable to handle page fault for address: ffff888000099fd8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI ..snip.. Call Trace: ? ibft_create_kobject+0x1d2/0x1d2 [iscsi_ibft] do_one_initcall+0x44/0x1d0 ? kmem_cache_alloc_trace+0x119/0x220 do_init_module+0x5c/0x270 __do_sys_init_module+0x12e/0x1b0 do_syscall_64+0x40/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fix this bug by saving the address of the physical location of the ibft; later the driver will use isa_bus_to_virt() to get the correct virtual address. N.B. On each reboot KASLR randomizes the virtual addresses so assuming phys_to_virt before KASLR does its deed is incorrect. Simplify the code by renaming find_ibft_region() to reserve_ibft_region() and remove all the wrappers. Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>	2021-07-31 22:20:24 -04:00
Balbir Singh	e893bb1bb4	x86, prctl: Hook L1D flushing in via prctl Use the existing PR_GET/SET_SPECULATION_CTRL API to expose the L1D flush capability. For L1D flushing PR_SPEC_FORCE_DISABLE and PR_SPEC_DISABLE_NOEXEC are not supported. Enabling L1D flush does not check if the task is running on an SMT enabled core, rather a check is done at runtime (at the time of flush), if the task runs on a SMT sibling then the task is sent a SIGBUS which is executed before the task returns to user space or to a guest. This is better than the other alternatives of: a. Ensuring strict affinity of the task (hard to enforce without further changes in the scheduler) b. Silently skipping flush for tasks that move to SMT enabled cores. Hook up the core prctl and implement the x86 specific parts which in turn makes it functional. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Balbir Singh <sblbir@amazon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210108121056.21940-5-sblbir@amazon.com	2021-07-28 11:42:25 +02:00
Balbir Singh	b5f06f64e2	x86/mm: Prepare for opt-in based L1D flush in switch_mm() The goal of this is to allow tasks that want to protect sensitive information, against e.g. the recently found snoop assisted data sampling vulnerabilites, to flush their L1D on being switched out. This protects their data from being snooped or leaked via side channels after the task has context switched out. This could also be used to wipe L1D when an untrusted task is switched in, but that's not a really well defined scenario while the opt-in variant is clearly defined. The mechanism is default disabled and can be enabled on the kernel command line. Prepare for the actual prctl based opt-in: 1) Provide the necessary setup functionality similar to the other mitigations and enable the static branch when the command line option is set and the CPU provides support for hardware assisted L1D flushing. Software based L1D flush is not supported because it's CPU model specific and not really well defined. This does not come with a sysfs file like the other mitigations because it is not bound to any specific vulnerability. Support has to be queried via the prctl(2) interface. 2) Add TIF_SPEC_L1D_FLUSH next to L1D_SPEC_IB so the two bits can be mangled into the mm pointer in one go which allows to reuse the existing mechanism in switch_mm() for the conditional IBPB speculation barrier efficiently. 3) Add the L1D flush specific functionality which flushes L1D when the outgoing task opted in. Also check whether the incoming task has requested L1D flush and if so validate that it is not accidentaly running on an SMT sibling as this makes the whole excercise moot because SMT siblings share L1D which opens tons of other attack vectors. If that happens schedule task work which signals the incoming task on return to user/guest with SIGBUS as this is part of the paranoid L1D flush contract. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Balbir Singh <sblbir@amazon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210108121056.21940-1-sblbir@amazon.com	2021-07-28 11:42:24 +02:00
Balbir Singh	c52787b590	x86/smp: Add a per-cpu view of SMT state A new field smt_active in cpuinfo_x86 identifies if the current core/cpu is in SMT mode or not. This is helpful when the system has some of its cores with threads offlined and can be used for cases where action is taken based on the state of SMT. The upcoming support for paranoid L1D flush will make use of this information. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Balbir Singh <sblbir@amazon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210108121056.21940-2-sblbir@amazon.com	2021-07-28 11:42:23 +02:00
Dave Airlie	35482f9dc5	Linux 5.14-rc3 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmD95yIeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGqp0H/j/xHL20EHaUJOaV iJjnfGyjtnkLC5FCoV/q/v9sFuSW2p4W1nyF8/eIgVKObef94Mg4/xxaHQrWIM56 cbzK9aIcD9InAuImJ6lju4fqjNmFrt2x7mhfzjPKqmhfINfZ5CohpLFN5XdOwzYC l+ZgmUUl7GLDAND2M6rtkc7AOk4qTyAySDvvPFELE/uNgV4EKaENSIWofHhEzW5v Yk+4agawaFTfa6H9+uMVYZBOcEKwheQ0E2tcOJvHJT8Mwm8MFoC/B7fLY5zxIdN2 7A7r/7qbSQmSDSjOgwKS4ZOjom0xGSD+V+596SzET6jkbahR2HJ/mrFvmD7GNEoW OWJPjzI= =vzIM -----END PGP SIGNATURE----- Backmerge tag 'v5.14-rc3' into drm-next Linux 5.14-rc3 Daniel said we should pull the nouveau fix from fixes in here, probably a good plan. Signed-off-by: Dave Airlie <airlied@redhat.com>	2021-07-26 09:27:59 +10:00
Linus Torvalds	d1b178254c	A single fix for jump labels to prevent the compiler from agressive un-inlining which results in a section mismatch. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmD9LQsTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoUzRD/95KYmoyg2BUv1QIzPi1cHtBH6MANdE OAgWU82TMCa23okQL38BcNQdc/lE2mtJJT3F+V6NEl72U2kS+Ujf5x94T+ITPshM 5/Kv66SKH9xOmIlNto55PQC6glKf8Y0n2sOWV6JTAIQ/mbQkYe/fIioh/rblftG7 lqItTkCycz+soh3A/BH1kimJ3Mj4EOVybQ14UxNAH8FdhI+5gs6IHv5IqODWVLTs SB9cytUHUlnLHzVP/M0y1v8X+6hD9ajsz2boJPUESG9d/5KVU9yKC7brZrtCkNkH iIaa7e2SB80CS92gsgTqEMEN+aLDey4fQH4FnxP5sRBx5yuln5hGpsHxtTuPMkAf u9IMgoqedGDCEF1IZzLKjOP7UXovKyP2xfieISOt78cKwA4dChTaychAv4UeFlLo 1jnGHWTFy5UcIJfW86vBiF+6IS9bRFGw0t8KOOGl84ot3H8ZEzzn9Rmu/P1XR5ms 8mguz+4+bYR+dgKH8OVrAW104T1rlRYTqaLqfZIDDtmbjv9KLYUnw1NcL6ZmPrIZ 5nA4qU3e32pETIK6sJUo7OICVw3lFtokTu4P0Yts/oJZ7ggn7H+Gb33zg/yCjk9E GWRfF0uw6i+fHJeq0ZeNkSgiUzgMpUXjwLqPnM7XvRiwKDd7lZWoXPwZ5Axjd80h ZFBexWfxwZk4Gw== =ho3b -----END PGP SIGNATURE----- Merge tag 'locking-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 jump label fix from Thomas Gleixner: "A single fix for jump labels to prevent the compiler from agressive un-inlining which results in a section mismatch" * tag 'locking-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: jump_labels: Mark __jump_label_transform() as __always_inlined to work around aggressive compiler un-inlining	2021-07-25 10:21:19 -07:00
Eric W. Biederman	50ae81305c	signal: Verify the alignment and size of siginfo_t Update the static assertions about siginfo_t to also describe it's alignment and size. While investigating if it was possible to add a 64bit field into siginfo_t[1] it became apparent that the alignment of siginfo_t is as much a part of the ABI as the size of the structure. If the alignment changes siginfo_t when embedded in another structure can move to a different offset. Which is not acceptable from an ABI structure. So document that fact and add static assertions to notify developers if they change change the alignment by accident. [1] https://lkml.kernel.org/r/YJEZdhe6JGFNYlum@elver.google.com Acked-by: Marco Elver <elver@google.com> v1: https://lkml.kernel.org/r/20210505141101.11519-4-ebiederm@xmission.co Link: https://lkml.kernel.org/r/875yxaxmyl.fsf_-_@disp2133 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2021-07-23 13:15:31 -05:00
Dave Airlie	8da49a33dd	drm-misc-next for v5.15-rc1: UAPI Changes: - Remove sysfs stats for dma-buf attachments, as it causes a performance regression. Previous merge is not in a rc kernel yet, so no userspace regression possible. Cross-subsystem Changes: - Sanitize user input in kyro's viewport ioctl. - Use refcount_t in fb_info->count - Assorted fixes to dma-buf. - Extend x86 efifb handling to all archs. - Fix neofb divide by 0. - Document corpro,gm7123 bridge dt bindings. Core Changes: - Slightly rework drm master handling. - Cleanup vgaarb handling. - Assorted fixes. Driver Changes: - Add support for ws2401 panel. - Assorted fixes to stm, ast, bochs. - Demidlayer ingenic irq. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAmD5TGAACgkQ/lWMcqZw E8PNgxAApjTYQSfjIBbOZnNraxW6w7/bPea35E9A47EdBQsNGnYftNsFjbrn/mCJ D+0eRLjCMlg4FF1SHdh9cPJ35py+ygbDeupogboLITfU99eGBth3fM2Xdg9LPcBh dbni/JLG9R7gIvSlqdJuweN21trfVrV/9FQEilG5DvQcl27Wx5g8VMRZke1EqGKX 7Id09Uq50ky18vhDjQRCveYhRqJAxV+XozBatzHyxpDVzjLQvRhlAAYdvrSMHZ5R jreGzOfR8awc6Om+w7wx3Jn1oEGmXVZB/VqxEqGtMOr3lpARPucxrqfHsqpam3rv yIoEKPrkG+k6fsU7Tbg59jNqe/PbCUW3AlpyuBxf55EbnVGgjLDbq4sRRMkehPfA fhC31ujOXQQnAgaxyeQAaAJFKNFJzA8Cq5ZPfG+zztzuomHCiUVQBRowP65hJMzR +ZlEDnhUD3STLz39zuO1reZR1ZoPIvKbsokHAA+ZrIwUd6U3D3ia8V51pq+lL5aS TGDkyMN9jyZ+SO8Z7+2FnJAv9FAOPU/WCLU/fWW46jAvuezwMIwVcjfSqDU2XbZD e7KgHpHhx3BGxI8TThHKlY7mf6IL2Bm7X1Cv1pdZs/eEn3Udh2ax942uTQZu/YOO 0AT1XchpvYCBNRw05bVI3OlJ+w3I8uV+h+11jHOKeY6cbwdHeKE= =BUya -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2021-07-22' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v5.15-rc1: UAPI Changes: - Remove sysfs stats for dma-buf attachments, as it causes a performance regression. Previous merge is not in a rc kernel yet, so no userspace regression possible. Cross-subsystem Changes: - Sanitize user input in kyro's viewport ioctl. - Use refcount_t in fb_info->count - Assorted fixes to dma-buf. - Extend x86 efifb handling to all archs. - Fix neofb divide by 0. - Document corpro,gm7123 bridge dt bindings. Core Changes: - Slightly rework drm master handling. - Cleanup vgaarb handling. - Assorted fixes. Driver Changes: - Add support for ws2401 panel. - Assorted fixes to stm, ast, bochs. - Demidlayer ingenic irq. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/2d0d2fe8-01fc-e216-c3fd-38db9e69944e@linux.intel.com	2021-07-23 11:32:43 +10:00
Wei Liu	f5a11c69b6	Revert "x86/hyperv: fix logical processor creation" This reverts commit `450605c28d`. Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-07-21 15:55:43 +00:00
Javier Martinez Canillas	d391c58271	drivers/firmware: move x86 Generic System Framebuffers support The x86 architecture has generic support to register a system framebuffer platform device. It either registers a "simple-framebuffer" if the config option CONFIG_X86_SYSFB is enabled, or a legacy VGA/VBE/EFI FB device. But the code is generic enough to be reused by other architectures and can be moved out of the arch/x86 directory. This will allow to also support the simple{fb,drm} drivers on non-x86 EFI platforms, such as aarch64 where these drivers are only supported with DT. Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20210625130947.1803678-2-javierm@redhat.com	2021-07-21 12:04:56 +02:00
Chris Down	3370155737	printk: Userspace format indexing support We have a number of systems industry-wide that have a subset of their functionality that works as follows: 1. Receive a message from local kmsg, serial console, or netconsole; 2. Apply a set of rules to classify the message; 3. Do something based on this classification (like scheduling a remediation for the machine), rinse, and repeat. As a couple of examples of places we have this implemented just inside Facebook, although this isn't a Facebook-specific problem, we have this inside our netconsole processing (for alarm classification), and as part of our machine health checking. We use these messages to determine fairly important metrics around production health, and it's important that we get them right. While for some kinds of issues we have counters, tracepoints, or metrics with a stable interface which can reliably indicate the issue, in order to react to production issues quickly we need to work with the interface which most kernel developers naturally use when developing: printk. Most production issues come from unexpected phenomena, and as such usually the code in question doesn't have easily usable tracepoints or other counters available for the specific problem being mitigated. We have a number of lines of monitoring defence against problems in production (host metrics, process metrics, service metrics, etc), and where it's not feasible to reliably monitor at another level, this kind of pragmatic netconsole monitoring is essential. As one would expect, monitoring using printk is rather brittle for a number of reasons -- most notably that the message might disappear entirely in a new version of the kernel, or that the message may change in some way that the regex or other classification methods start to silently fail. One factor that makes this even harder is that, under normal operation, many of these messages are never expected to be hit. For example, there may be a rare hardware bug which one wants to detect if it was to ever happen again, but its recurrence is not likely or anticipated. This precludes using something like checking whether the printk in question was printed somewhere fleetwide recently to determine whether the message in question is still present or not, since we don't anticipate that it should be printed anywhere, but still need to monitor for its future presence in the long-term. This class of issue has happened on a number of occasions, causing unhealthy machines with hardware issues to remain in production for longer than ideal. As a recent example, some monitoring around blk_update_request fell out of date and caused semi-broken machines to remain in production for longer than would be desirable. Searching through the codebase to find the message is also extremely fragile, because many of the messages are further constructed beyond their callsite (eg. btrfs_printk and other module-specific wrappers, each with their own functionality). Even if they aren't, guessing the format and formulation of the underlying message based on the aesthetics of the message emitted is not a recipe for success at scale, and our previous issues with fleetwide machine health checking demonstrate as much. This provides a solution to the issue of silently changed or deleted printks: we record pointers to all printk format strings known at compile time into a new .printk_index section, both in vmlinux and modules. At runtime, this can then be iterated by looking at <debugfs>/printk/index/<module>, which emits the following format, both readable by humans and able to be parsed by machines: $ head -1 vmlinux; shuf -n 5 vmlinux # <level[,flags]> filename:line function "format" <5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n" <4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n" <6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n" <6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n" <6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n" This mitigates the majority of cases where we have a highly-specific printk which we want to match on, as we can now enumerate and check whether the format changed or the printk callsite disappeared entirely in userspace. This allows us to catch changes to printks we monitor earlier and decide what to do about it before it becomes problematic. There is no additional runtime cost for printk callers or printk itself, and the assembly generated is exactly the same. Signed-off-by: Chris Down <chris@chrisdown.name> Cc: Petr Mladek <pmladek@suse.com> Cc: Jessica Yu <jeyu@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Jessica Yu <jeyu@kernel.org> # for module.{c,h} Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/e42070983637ac5e384f17fbdbe86d19c7b212a5.1623775748.git.chris@chrisdown.name	2021-07-19 11:57:48 +02:00
Ani Sinha	5f92b45c3b	x86/hyperv: add comment describing TSC_INVARIANT_CONTROL MSR setting bit 0 Commit `dce7cd6275` ("x86/hyperv: Allow guests to enable InvariantTSC") added the support for HV_X64_MSR_TSC_INVARIANT_CONTROL. Setting bit 0 of this synthetic MSR will allow hyper-v guests to report invariant TSC CPU feature through CPUID. This comment adds this explanation to the code and mentions where the Intel's generic platform init code reads this feature bit from CPUID. The comment will help developers understand how the two parts of the initialization (hyperV specific and non-hyperV specific generic hw init) are related. Signed-off-by: Ani Sinha <ani@anisinha.ca> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210716133245.3272672-1-ani@anisinha.ca Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-07-16 14:51:54 +00:00
Michael Kelley	6dc77fa5ac	Drivers: hv: Move Hyper-V misc functionality to arch-neutral code The check for whether hibernation is possible, and the enabling of Hyper-V panic notification during kexec, are both architecture neutral. Move the code from under arch/x86 and into drivers/hv/hv_common.c where it can also be used for ARM64. No functional change. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1626287687-2045-4-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-07-15 12:59:45 +00:00
Michael Kelley	9d7cf2c967	Drivers: hv: Add arch independent default functions for some Hyper-V handlers Architecture independent Hyper-V code calls various arch-specific handlers when needed. To aid in supporting multiple architectures, provide weak defaults that can be overridden by arch-specific implementations where appropriate. But when arch-specific overrides aren't needed or haven't been implemented yet for a particular architecture, these stubs reduce the amount of clutter under arch/. No functional change. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1626287687-2045-3-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-07-15 12:59:45 +00:00
Michael Kelley	afca4d95dd	Drivers: hv: Make portions of Hyper-V init code be arch neutral The code to allocate and initialize the hv_vp_index array is architecture neutral. Similarly, the code to allocate and populate the hypercall input and output arg pages is architecture neutral. Move both sets of code out from arch/x86 and into utility functions in drivers/hv/hv_common.c that can be shared by Hyper-V initialization on ARM64. No functional changes. However, the allocation of the hypercall input and output arg pages is done differently so that the size is always the Hyper-V page size, even if not the same as the guest page size (such as with ARM64's 64K page size). Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1626287687-2045-2-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-07-15 12:59:45 +00:00
Ani Sinha	c445535c3e	x86/hyperv: fix for unwanted manipulation of sched_clock when TSC marked unstable Marking TSC as unstable has a side effect of marking sched_clock as unstable when TSC is still being used as the sched_clock. This is not desirable. Hyper-V ultimately uses a paravirtualized clock source that provides a stable scheduler clock even on systems without TscInvariant CPU capability. Hence, mark_tsc_unstable() call should be called _after_ scheduler clock has been changed to the paravirtualized clocksource. This will prevent any unwanted manipulation of the sched_clock. Only TSC will be correctly marked as unstable. Signed-off-by: Ani Sinha <ani@anisinha.ca> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210713030522.1714803-1-ani@anisinha.ca Signed-off-by: Wei Liu <wei.liu@kernel.org>	2021-07-13 17:40:23 +00:00
Ingo Molnar	e48a12e546	jump_labels: Mark __jump_label_transform() as __always_inlined to work around aggressive compiler un-inlining In randconfig testing, certain UBSAN and CC Kconfig combinations with GCC 10.3.0: CONFIG_X86_32=y CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_UBSAN=y # CONFIG_UBSAN_TRAP is not set # CONFIG_UBSAN_BOUNDS is not set CONFIG_UBSAN_SHIFT=y # CONFIG_UBSAN_DIV_ZERO is not set CONFIG_UBSAN_UNREACHABLE=y CONFIG_UBSAN_BOOL=y # CONFIG_UBSAN_ENUM is not set # CONFIG_UBSAN_ALIGNMENT is not set # CONFIG_UBSAN_SANITIZE_ALL is not set ... produce this build warning (and build error if CONFIG_SECTION_MISMATCH_WARN_ONLY=y is set): WARNING: modpost: vmlinux.o(.text+0x4c1cc): Section mismatch in reference from the function __jump_label_transform() to the function .init.text:text_poke_early() The function __jump_label_transform() references the function __init text_poke_early(). This is often because __jump_label_transform lacks a __init annotation or the annotation of text_poke_early is wrong. ERROR: modpost: Section mismatches detected. The problem is that __jump_label_transform() gets uninlined by GCC, despite there being only a single local scope user of the 'static inline' function. Mark the function __always_inline instead, to work around this compiler bug/artifact. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2021-07-13 06:32:05 +02:00
Stephen Boyd	9ef8af2a8f	x86/dumpstack: use %pSb/%pBb for backtrace printing Let's use the new printk formats to print the stacktrace entries when printing a backtrace to the kernel logs. This will include any module's build ID[1] in it so that offline/crash debugging can easily locate the debuginfo for a module via something like debuginfod[2]. Link: https://lkml.kernel.org/r/20210511003845.2429846-8-swboyd@chromium.org Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1] Link: https://sourceware.org/elfutils/Debuginfod.html [2] Signed-off-by: Stephen Boyd <swboyd@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Evan Green <evgreen@chromium.org> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Baoquan He <bhe@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sasha Levin <sashal@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-07-08 11:48:22 -07:00

... 7 8 9 10 11 ...

17622 commits