1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

8955 commits

Author SHA1 Message Date
Dave Ertman
4d50fcdc24 ice: alter feature support check for SRIOV and LAG
Previously, the ice driver had support for using a handler for bonding
netdev events to ensure that conflicting features were not allowed to be
activated at the same time.  While this was still in place, additional
support was added to specifically support SRIOV and LAG together.  These
both utilized the netdev event handler, but the SRIOV and LAG feature was
behind a capabilities feature check to make sure the current NVM has
support.

The exclusion part of the event handler should be removed since there are
users who have custom made solutions that depend on the non-exclusion of
features.

Wrap the creation/registration and cleanup of the event handler and
associated structs in the probe flow with a feature check so that the
only systems that support the full implementation of LAG features will
initialize support.  This will leave other systems unhindered with
functionality as it existed before any LAG code was added.

Fixes: bb52f42ace ("ice: Add driver support for firmware changes for LAG")
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-18 09:39:31 -08:00
Jacob Keller
7d88134612 ice: stop trashing VF VSI aggregator node ID information
When creating new VSIs, they are assigned into an aggregator node in the
scheduler tree. Information about which aggregator node a VSI is assigned
into is maintained by the vsi->agg_node structure. In ice_vsi_decfg(), this
information is being destroyed, by overwriting the valid flag and the
agg_id field to zero.

For VF VSIs, this breaks the aggregator node configuration replay, which
depends on this information. This results in VFs being inserted into the
default aggregator node. The resulting configuration will have unexpected
Tx bandwidth sharing behavior.

This was broken by commit 6624e780a5 ("ice: split ice_vsi_setup into
smaller functions"), which added the block to reset the agg_node data.

The vsi->agg_node structure is not managed by the scheduler code, but is
instead a wrapper around an aggregator node ID that is tracked at the VSI
layer. Its been around for a long time, and its primary purpose was for
handling VFs. The SR-IOV VF reset flow does not make use of the standard VSI
rebuild/replay logic, and uses vsi->agg_node as part of its handling to
rebuild the aggregator node configuration.

The logic for aggregator nodes stretches  back to early ice driver code from
commit b126bd6bcd ("ice: create scheduler aggregator node config and move
VSIs")

The logic in ice_vsi_decfg() which trashes the ice_agg_node data is clearly
wrong. It destroys information that is necessary for handling VF reset,. It
is also not the correct way to actually remove a VSI from an aggregator
node. For that, we need to implement logic in the scheduler code. Further,
non-VF VSIs properly replay their aggregator configuration using existing
scheduler replay logic.

To fix the VF replay logic, remove this broken aggregator node cleanup
logic. This is the simplest way to immediately fix this.

This ensures that VFs will have proper aggregate configuration after a
reset. This is especially important since VFs often perform resets as part
of their reconfiguration flows. Without fixing this, VFs will be placed in
the default aggregator node and Tx bandwidth will not be shared in the
expected and configured manner.

Fixes: 6624e780a5 ("ice: split ice_vsi_setup into smaller functions")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-18 09:01:51 -08:00
David S. Miller
283f105bc8 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
add v2 FW logging for ice driver

Paul Stillwell says:

Firmware (FW) log support was added to the ice driver, but that version is
no longer supported. There is a newer version of FW logging (v2) that
adds more control knobs to get the exact data out of the FW
for debugging.

The interface for FW logging is debugfs. This was chosen based on
discussions here:
https://lore.kernel.org/netdev/20230214180712.53fc8ba2@kernel.org/ and
https://lore.kernel.org/netdev/20231012164033.1069fb4b@kernel.org/

We talked about using devlink in a variety of ways, but none of those
options made any sense for the way the FW reports data. We briefly talked
about using ethtool, but that seemed to go by the wayside. Ultimately it
seems like using debugfs is the way to go so re-implement the code to use
that.

FW logging is across all the PFs on the device so restrict the commands to
only PF0.

If the device supports FW logging then a directory named 'fwlog' will be
created under '/sys/kernel/debug/ice/<pci_dev>'. A variety of files will be
created to manage the behavior of logging. The following files will be
created:
- modules/<module>
- nr_messages
- enable
- log_size
- data

where
modules/<module> is used to read/write the log level for a specific module

nr_messages is used to determine how many events should be in each message
sent to the driver

enable is used to start/stop FW logging. This is a boolean value so only 1
or 0 are permissible values

log_size is used to configure the amount of memory the driver uses for log
data

data is used to read/clear the log data

Generally there is a lot of data and dumping that data to syslog will
result in a loss of data. This causes problems when decoding the data and
the user doesn't know that data is missing until later. Instead of dumping
the FW log output to syslog use debugfs. This ensures that all the data the
driver has gets retrieved correctly.

The FW log data is binary data that the FW team decodes to determine what
happened in firmware. The binary blob is sent to Intel for decoding.
---
v6:
- use seq_printf() for outputting module info when reading from 'module' file
- replace code that created argc and argv for handling command line input
- removed checks in all the _read() and _write() functions to see if FW logging
  is supported because the files will not exist if it is not supported
- removed warnings on allocation failures on debugfs file creation failures
- removed a newline between memory allocation and checking if the memory was
  allocated
- fixed cases where we could just return the value from a function call
  instead of saving the value in a variable
- moved the check for PFO in ice_fwlog_init() to an earlier patch
- reworked all of argument scanning in the _write() functions in ice_debugfs.c
  to remove adding characters past the end of the buffer

v5: https://lore.kernel.org/netdev/20231205211251.2122874-1-anthony.l.nguyen@intel.com/
- changed the log level configuration from a single file for all modules to a
  file per module.
- changed 'nr_buffs' to 'log_size' because users understand memory sizes
  better than a number of buffers
- changed 'resolution' to 'nr_messages' to better reflect what it represents
- updated documentation to reflect these changes
- updated documentation to indicate that FW logging must be disabled to
  clear the data. also clarified that any value written to the 'data' file will
  clear the data

v4: https://lore.kernel.org/netdev/20231005170110.3221306-1-anthony.l.nguyen@intel.com/
- removed CONFIG_DEBUG_FS wrapper around code because the debugfs calls handle
  this case already
- moved ice_debugfs_exit() call to remove unreachable code issue
- minor changes to documentation based on feedback

v3: https://lore.kernel.org/netdev/20230815165750.2789609-1-anthony.l.nguyen@intel.com/
- Adjust error path cleanup in ice_module_init() for unreachable code.

v2: https://lore.kernel.org/netdev/20230810170109.1963832-1-anthony.l.nguyen@intel.com/
- Rewrote code to use debugfs instead of devlink

v1: https://lore.kernel.org/netdev/20230209190702.3638688-1-anthony.l.nguyen@intel.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-15 12:03:20 +00:00
Jakub Kicinski
2c1a4185a1 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-12-13 (ice, i40e)

This series contains updates to ice and i40e drivers.

Michal Schmidt prevents possible out-of-bounds access for ice.

Ivan Vecera corrects value for MDIO clause 45 on i40e.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  i40e: Fix ST code value for Clause 45
  ice: fix theoretical out-of-bounds access in ethtool link modes
====================

Link: https://lore.kernel.org/r/20231213220827.1311772-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-14 18:57:40 -08:00
Jason Xing
8d182d5869 i40e: remove fake support of rx-frames-irq
Since we never support this feature for I40E driver, we don't have to
display the value when using 'ethtool -c eth0'.

Before this patch applied, the rx-frames-irq is 256 which is consistent
with tx-frames-irq. Apparently it could mislead users.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20231213184406.1306602-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-14 18:56:46 -08:00
Jakub Kicinski
8f674972d6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/intel/iavf/iavf_ethtool.c
  3a0b5a2929 ("iavf: Introduce new state machines for flow director")
  95260816b4 ("iavf: use iavf_schedule_aq_request() helper")
https://lore.kernel.org/all/84e12519-04dc-bd80-bc34-8cf50d7898ce@intel.com/

drivers/net/ethernet/broadcom/bnxt/bnxt.c
  c13e268c07 ("bnxt_en: Fix HWTSTAMP_FILTER_ALL packet timestamp logic")
  c2f8063309 ("bnxt_en: Refactor RX VLAN acceleration logic.")
  a7445d6980 ("bnxt_en: Add support for new RX and TPA_START completion types for P7")
  1c7fd6ee2f ("bnxt_en: Rename some macros for the P5 chips")
https://lore.kernel.org/all/20231211110022.27926ad9@canb.auug.org.au/

drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c
  bd6781c18c ("bnxt_en: Fix wrong return value check in bnxt_close_nic()")
  84793a4995 ("bnxt_en: Skip nic close/open when configuring tstamp filters")
https://lore.kernel.org/all/20231214113041.3a0c003c@canb.auug.org.au/

drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
  3d7a3f2612 ("net/mlx5: Nack sync reset request when HotPlug is enabled")
  cecf44ea1a ("net/mlx5: Allow sync reset flow when BF MGT interface device is present")
https://lore.kernel.org/all/20231211110328.76c925af@canb.auug.org.au/

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-14 17:14:41 -08:00
Paul M Stillwell Jr
9d3535e719 ice: add ability to read and configure FW log data
Once logging is enabled the user should read the data from the 'data'
file. The data is in the form of a binary blob that can be sent to Intel
for decoding. To read the data use a command like:

  # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/data > log_data.bin

If the user wants to clear the FW log data that has been stored in the
driver then they can write any value to the 'data' file and that will clear
the data. An example is:

  # echo 34 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/data

In addition to being able to read the data the user can configure how
much memory is used to store FW log data. This allows the user to
increase/decrease the amount of memory based on the users situation.
The data is stored such that if the memory fills up then the oldest data
will get overwritten in a circular manner. To change the amount of
memory the user can write to the 'log_size' file like this:

  # echo <value> > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/log_size

Where <value> is one of 128K, 256K, 512K, 1M, and 2M. The default value
is 1M.

The user can see the current value of 'log_size' by reading the file:

  # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/log_size

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-14 09:50:58 -08:00
Paul M Stillwell Jr
73671c3162 ice: enable FW logging
Once users have configured the FW logging then allow them to enable it
by writing to the 'fwlog/enable' file. The file accepts a boolean value
(0 or 1) where 1 means enable FW logging and 0 means disable FW logging.

  # echo <value> > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/enable

Where <value> is 0 or 1.

The user can read the 'fwlog/enable' file to see whether logging is
enabled or not. Reading the actual data is a separate patch. To see the
current value then:

  # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/enable

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-14 09:50:53 -08:00
Paul M Stillwell Jr
96a9a9341c ice: configure FW logging
Users want the ability to debug FW issues by retrieving the
FW logs from the E8xx devices. Use debugfs to allow the user to
configure the log level and number of messages for FW logging.

If FW logging is supported on the E8xx then the file 'fwlog' will be
created under the PCI device ID for the ice driver. If the file does not
exist then either the E8xx doesn't support FW logging or debugfs is not
enabled on the system.

One thing users want to do is control which events are reported. The
user can read and write the 'fwlog/modules/<module name>' to get/set
the log levels. Each module in the FW that supports logging ht as a file
under 'fwlog/modules' that supports reading (to see what the current log
level is) and writing (to change the log level).

The format to set the log levels for a module are:

  # echo <log level> > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/<module>

The supported log levels are:

      *	none
      *	error
      *	warning
      *	normal
      *	verbose

Each level includes the messages from the previous/lower level

The modules that are supported are:

      *	general
      *	ctrl
      *	link
      *	link_topo
      *	dnl
      *	i2c
      *	sdp
      *	mdio
      *	adminq
      *	hdma
      *	lldp
      *	dcbx
      *	dcb
      *	xlr
      *	nvm
      *	auth
      *	vpd
      *	iosf
      *	parser
      *	sw
      *	scheduler
      *	txq
      *	rsvd
      *	post
      *	watchdog
      *	task_dispatch
      *	mng
      *	synce
      *	health
      *	tsdrv
      *	pfreg
      *	mdlver
      *	all

The module 'all' is a special module which allows the user to read or
write to all of the modules.

The following example command would set the DCB module to the 'normal'
log level:

  # echo normal > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/dcb

If the user wants to set the DCB, Link, and the AdminQ modules to
'verbose' then the commands are:

  # echo verbose > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/dcb
  # echo verbose > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/link
  # echo verbose > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/adminq

If the user wants to set all modules to the 'warning' level then the
command is:

  # echo warning > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/all

If the user wants to disable logging for a module then they can set the
level to 'none'. An example setting the 'watchdog' module is:

  # echo none > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/watchdog

If the user wants to see what the log level is for a specific module
then the command is:

  # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/dcb

This will return the log level for the DCB module. If the user wants to
see the log level for all the modules then the command is:

  # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/all

Writing to the module file will update the configuration, but NOT enable the
configuration (that is a separate command).

In addition to configuring the modules, the user can also configure the
number of log messages (nr_messages) to include in a single Admin Receive
Queue (ARQ) event.The range is 1-128 (1 means push every log message, 128
means push only when the max AQ command buffer is full). The suggested
value is 10.

To see/change the resolution the user can read/write the
'fwlog/nr_messages' file. An example changing the value to 50 is

  # echo 50 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/nr_messages

To see the current value of 'nr_messages' then the command is:

  # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/nr_messages

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-14 09:50:48 -08:00
Paul M Stillwell Jr
1953fc720e ice: remove FW logging code
The FW logging code doesn't work because there is no way to set
cq_ena or uart_ena so remove the code. This code is the original
(v1) way of FW logging so it should be replaced with the v2 way.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-14 09:43:04 -08:00
Jakub Kicinski
00b1b2296b Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-12-12 (igb, e1000e)

This series contains updates to igb and e1000e drivers.

Ilpo Järvinen does some cleanups to both drivers: utilizing FIELD_GET()
helpers and using standard kernel defines over driver created ones.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  e1000e: Use pcie_capability_read_word() for reading LNKSTA
  e1000e: Use PCI_EXP_LNKSTA_NLW & FIELD_GET() instead of custom defines/code
  igb: Use FIELD_GET() to extract Link Width
====================

Link: https://lore.kernel.org/r/20231212204947.513563-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13 22:08:58 -08:00
Ahmed Zaki
4a3de3fb0e iavf: enable symmetric-xor RSS for Toeplitz hash function
Allow the user to set the symmetric Toeplitz hash function via:

    # ethtool -X eth0 hfunc toeplitz symmetric-xor

The driver will reject any new RSS configuration if a field other than
(IP src/dst and L4 src/dst ports) is requested for hashing.

The symmetric RSS will not be supported on PFs not advertising the ADV RSS
Offload flag (ADV_RSS_SUPPORT()), for example the E700 series (i40e).

Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://lore.kernel.org/r/20231213003321.605376-9-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13 22:07:17 -08:00
Jeff Guo
352e9bf238 ice: enable symmetric-xor RSS for Toeplitz hash function
Allow the user to set the symmetric Toeplitz hash function via:

    # ethtool -X eth0 hfunc toeplitz symmetric-xor

All existing RSS configurations will be converted to symmetric unless they
have a non-symmetric field (other than IP src/dst and L4 src/dst ports)
used for hashing. The driver will reject a new RSS configuration if such
a field is requested.

The hash function in the E800 NICs is set per-VSI and a specific AQ
command is needed to modify the hash function. Use the AQ command to
enable setting the symmetric Toeplitz RSS hash function for any VSI
in the new ice_set_rss_hfunc().

When the Symmetric Toeplitz hash function is used, the hardware sets the
input set of the RSS (Toeplitz) algorithm to be the XOR of the fields
index by HSYMM and the fields index by the INSET registers. We use this
to create a symmetric hash by setting the HSYMM registers to point to
their counterparts in the INSET registers:

 HSYMM [src_fv] = dst_fv;
 HSYMM [dst_fv] = src_fv;

where src_fv and dst_fv are the indexes of the protocol's src and dst
fields.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Jeff Guo <jia.guo@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Co-developed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://lore.kernel.org/r/20231213003321.605376-8-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13 22:07:17 -08:00
Ahmed Zaki
b1f5921a99 ice: refactor the FD and RSS flow ID generation
The flow director and RSS blocks use separate methods to generate a
unique 64 bit ID for the flow. This is not extendable, especially for
the RSS that already uses all 64 bit space.

Refactor the flow generation API so that the ID is generated within
ice_flow_add_prof(). The FD and RSS blocks caches the generated ID for
later use.

Suggested-by: Dan Nowlin <dan.nowlin@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://lore.kernel.org/r/20231213003321.605376-7-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13 22:07:17 -08:00
Qi Zhang
dc6e44c9d6 ice: refactor RSS configuration
Refactor the driver to use a communication data structure for RSS
config. To do so we introduce the new ice_rss_hash_cfg struct, and then
pass it as an argument to several functions.

Also introduce enum ice_rss_cfg_hdr_type to specify a more granular and
flexible RSS configuration:

ICE_RSS_OUTER_HEADERS - take outer layer as RSS input set
ICE_RSS_INNER_HEADERS - take inner layer as RSS input set
ICE_RSS_INNER_HEADERS_W_OUTER_IPV4 - take inner layer as RSS input set for
                                     packet with outer IPV4
ICE_RSS_INNER_HEADERS_W_OUTER_IPV6 - take inner layer as RSS input set for
                                     packet with outer IPV6
ICE_RSS_ANY_HEADERS - try with outer first then inner (same as the
		      behaviour without this change)

Finally, move the virtchnl_rss_algorithm enum to be with the other RSS
related structures in the virtchnl.h file.

There should be no functional change due to this patch.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Co-developed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://lore.kernel.org/r/20231213003321.605376-6-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13 22:07:17 -08:00
Ahmed Zaki
20f73b60bb ice: fix ICE_AQ_VSI_Q_OPT_RSS_* register values
Fix the values of the ICE_AQ_VSI_Q_OPT_RSS_* registers. Shifting is
already done when the values are used, no need to double shift. Bug was
not discovered earlier since only ICE_AQ_VSI_Q_OPT_RSS_TPLZ (Zero) is
currently used.

Also, rename ICE_AQ_VSI_Q_OPT_RSS_XXX to ICE_AQ_VSI_Q_OPT_RSS_HASH_XXX
for consistency.

Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://lore.kernel.org/r/20231213003321.605376-5-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13 22:07:16 -08:00
Ahmed Zaki
dcd8dbf9e7 net: ethtool: get rid of get/set_rxfh_context functions
Add the RSS context parameters to struct ethtool_rxfh_param and use the
get/set_rxfh to handle the RSS contexts as well.

This is part 2/2 of the fix suggested in [1]:

 - Add a rss_context member to the argument struct and a capability
   like cap_link_lanes_supported to indicate whether driver supports
   rss contexts, then you can remove *et_rxfh_context functions,
   and instead call *et_rxfh() with a non-zero rss_context.

Link: https://lore.kernel.org/netdev/20231121152906.2dd5f487@kernel.org/ [1]
CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Tony Nguyen <anthony.l.nguyen@intel.com>
CC: Marcin Wojtas <mw@semihalf.com>
CC: Russell King <linux@armlinux.org.uk>
CC: Sunil Goutham <sgoutham@marvell.com>
CC: Geetha sowjanya <gakula@marvell.com>
CC: Subbaraya Sundeep <sbhatta@marvell.com>
CC: hariprasad <hkelam@marvell.com>
CC: Saeed Mahameed <saeedm@nvidia.com>
CC: Leon Romanovsky <leon@kernel.org>
CC: Edward Cree <ecree.xilinx@gmail.com>
CC: Martin Habets <habetsm.xilinx@gmail.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://lore.kernel.org/r/20231213003321.605376-3-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13 22:07:16 -08:00
Ahmed Zaki
fb6e30a725 net: ethtool: pass a pointer to parameters to get/set_rxfh ethtool ops
The get/set_rxfh ethtool ops currently takes the rxfh (RSS) parameters
as direct function arguments. This will force us to change the API (and
all drivers' functions) every time some new parameters are added.

This is part 1/2 of the fix, as suggested in [1]:

- First simplify the code by always providing a pointer to all params
   (indir, key and func); the fact that some of them may be NULL seems
   like a weird historic thing or a premature optimization.
   It will simplify the drivers if all pointers are always present.

 - Then make the functions take a dev pointer, and a pointer to a
   single struct wrapping all arguments. The set_* should also take
   an extack.

Link: https://lore.kernel.org/netdev/20231121152906.2dd5f487@kernel.org/ [1]
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Suggested-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://lore.kernel.org/r/20231213003321.605376-2-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13 22:07:16 -08:00
Michal Kubiak
9b1aa3ef23 idpf: add get/set for Ethtool's header split ringparam
idpf supports the header split feature and that feature is always
enabled by default.
However, for flexibility reasons and to simplify some scenarios, it
would be useful to have the support for switching the header split
off (and on) from the userspace.

Address that need by adding the user config parameter, the functions
for disabling (or enabling) the header split feature, and calls to
them from the Ethtool ringparam callbacks.
It still is enabled by default if supported by the hardware.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
Co-developed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20231212142752.935000-3-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13 18:22:06 -08:00
Larysa Zaremba
b591137c4e ice: use VLAN proto from ring packet context in skb path
VLAN proto, used in ice XDP hints implementation is stored in ring packet
context. Utilize this value in skb VLAN processing too instead of checking
netdev features.

At the same time, use vlan_tci instead of vlan_tag in touched code,
because VLAN tag often refers to VLAN proto and VLAN TCI combined,
while in the code we clearly store only VLAN TCI.

Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20231205210847.28460-12-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13 16:16:41 -08:00
Larysa Zaremba
714ed949c6 ice: Implement VLAN tag hint
Implement .xmo_rx_vlan_tag callback to allow XDP code to read
packet's VLAN tag.

At the same time, use vlan_tci instead of vlan_tag in touched code,
because VLAN tag often refers to VLAN proto and VLAN TCI combined,
while in the code we clearly store only VLAN TCI.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20231205210847.28460-11-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13 16:16:41 -08:00
Larysa Zaremba
d68d707dcb ice: Support XDP hints in AF_XDP ZC mode
In AF_XDP ZC, xdp_buff is not stored on ring,
instead it is provided by xsk_buff_pool.
Space for metadata sources right after such buffers was already reserved
in commit 94ecc5ca4d ("xsk: Add cb area to struct xdp_buff_xsk").

Some things (such as pointer to packet context) do not change on a
per-packet basis, so they can be set at the same time as RX queue info.
On the other hand, RX descriptor is unique for each packet, but is already
known when setting DMA addresses. This minimizes performance impact of
hints on regular packet processing.

Update AF_XDP ZC packet processing to support XDP hints.

Co-developed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20231205210847.28460-9-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13 16:16:40 -08:00
Larysa Zaremba
0e6a7b0959 ice: Support RX hash XDP hint
RX hash XDP hint requests both hash value and type.
Type is XDP-specific, so we need a separate way to map
these values to the hardware ptypes, so create a lookup table.

Instead of creating a new long list, reuse contents
of ice_decode_rx_desc_ptype[] through preprocessor.

Current hash type enum does not contain ICMP packet type,
but ice devices support it, so also add a new type into core code.

Then use previously refactored code and create a function
that allows XDP code to read RX hash.

Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20231205210847.28460-7-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13 16:16:40 -08:00
Larysa Zaremba
9031d5f491 ice: Support HW timestamp hint
Use previously refactored code and create a function
that allows XDP code to read HW timestamp.

Also, introduce packet context, where hints-related data will be stored.
ice_xdp_buff contains only a pointer to this structure, to avoid copying it
in ZC mode later in the series.

HW timestamp is the first supported hint in the driver,
so also add xdp_metadata_ops.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20231205210847.28460-6-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13 16:16:40 -08:00
Larysa Zaremba
d951c14ad2 ice: Introduce ice_xdp_buff
In order to use XDP hints via kfuncs we need to put
RX descriptor and miscellaneous data next to xdp_buff.
Same as in hints implementations in other drivers, we achieve
this through putting xdp_buff into a child structure.

Currently, xdp_buff is stored in the ring structure,
so replace it with union that includes child structure.
This way enough memory is available while existing XDP code
remains isolated from hints.

Minimum size of the new child structure (ice_xdp_buff) is exactly
64 bytes (single cache line). To place it at the start of a cache line,
move 'next' field from CL1 to CL4, as it isn't used often. This still
leaves 192 bits available in CL3 for packet context extensions.

Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20231205210847.28460-5-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13 16:16:40 -08:00
Larysa Zaremba
6b62a42149 ice: Make ptype internal to descriptor info processing
Currently, rx_ptype variable is used only as an argument
to ice_process_skb_fields() and is computed
just before the function call.

Therefore, there is no reason to pass this value as an argument.
Instead, remove this argument and compute the value directly inside
ice_process_skb_fields() function.

Also, separate its calculation into a short function, so the code
can later be reused in .xmo_() callbacks.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20231205210847.28460-4-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13 16:16:40 -08:00
Larysa Zaremba
3310aad20d ice: make RX HW timestamp reading code more reusable
Previously, we only needed RX HW timestamp in skb path,
hence all related code was written with skb in mind.
But with the addition of XDP hints via kfuncs to the ice driver,
the same logic will be needed in .xmo_() callbacks.

Put generic process of reading RX HW timestamp from a descriptor
into a separate function.
Move skb-related code into another source file.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20231205210847.28460-3-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13 16:16:40 -08:00
Larysa Zaremba
9244384e81 ice: make RX hash reading code more reusable
Previously, we only needed RX hash in skb path,
hence all related code was written with skb in mind.
But with the addition of XDP hints via kfuncs to the ice driver,
the same logic will be needed in .xmo_() callbacks.

Separate generic process of reading RX hash from a descriptor
into a separate function.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20231205210847.28460-2-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-13 16:16:40 -08:00
Ivan Vecera
9b3daf2b04 i40e: Fix ST code value for Clause 45
ST code value for clause 45 that has been changed by
commit 8196b5fd6c ("i40e: Refactor I40E_MDIO_CLAUSE* macros")
is currently wrong.

The mentioned commit refactored ..MDIO_CLAUSE??_STCODE_MASK so
their value is the same for both clauses. The value is correct
for clause 22 but not for clause 45.

Fix the issue by adding a parameter to I40E_GLGEN_MSCA_STCODE_MASK
macro that specifies required value.

Fixes: 8196b5fd6c ("i40e: Refactor I40E_MDIO_CLAUSE* macros")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-13 10:44:31 -08:00
Michal Schmidt
91f9181c73 ice: fix theoretical out-of-bounds access in ethtool link modes
To map phy types reported by the hardware to ethtool link mode bits,
ice uses two lookup tables (phy_type_low_lkup, phy_type_high_lkup).
The "low" table has 64 elements to cover every possible bit the hardware
may report, but the "high" table has only 13. If the hardware reports a
higher bit in phy_types_high, the driver would access memory beyond the
lookup table's end.

Instead of iterating through all 64 bits of phy_types_{low,high}, use
the sizes of the respective lookup tables.

Fixes: 9136e1f1e5 ("ice: refactor PHY type to ethtool link mode")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-13 10:44:31 -08:00
Jiri Pirko
4f7aa122bc dpll: remove leftover mode_supported() op and use mode_get() instead
Mode supported is currently reported to the user exactly the same, as
the current mode. That's because mode changing is not implemented.
Remove the leftover mode_supported() op and use mode_get() to fill up
the supported mode exposed to user.

One, if even, mode changing is going to be introduced, this could be
very easily taken back. In the meantime, prevent drivers form
implementing this in wrong way (as for example recent netdevsim
implementation attempt intended to do).

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-13 10:31:19 +00:00
Ilpo Järvinen
bf88f7d920 e1000e: Use pcie_capability_read_word() for reading LNKSTA
Use pcie_capability_read_word() for reading LNKSTA and remove the
custom define that matches to PCI_EXP_LNKSTA.

As only single user for cap_offset remains, replace it with a call to
pci_pcie_cap(). Instead of e1000_adapter, make local variable out of
pci_dev because both users are interested in it.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-12 12:21:29 -08:00
Slawomir Laba
7ae42ef308 iavf: Fix iavf_shutdown to call iavf_remove instead iavf_close
Make the flow for pci shutdown be the same to the pci remove.

iavf_shutdown was implementing an incomplete version
of iavf_remove. It misses several calls to the kernel like
iavf_free_misc_irq, iavf_reset_interrupt_capability, iounmap
that might break the system on reboot or hibernation.

Implement the call of iavf_remove directly in iavf_shutdown to
close this gap.

Fixes below error messages (dmesg) during shutdown stress tests -
[685814.900917] ice 0000:88:00.0: MAC 02:d0:5f:82:43:5d does not exist for
 VF 0
[685814.900928] ice 0000:88:00.0: MAC 33:33:00:00:00:01 does not exist for
VF 0

Reproduction:

1. Create one VF interface:
echo 1 > /sys/class/net/<interface_name>/device/sriov_numvfs

2. Run live dmesg on the host:
dmesg -wH

3. On SUT, script below steps into vf_namespace_assignment.sh

<#!/bin/sh> // Remove <>. Git removes # line
if=<VF name> (edit this per VF name)
loop=0

while true; do

echo test round $loop
let loop++

ip netns add ns$loop
ip link set dev $if up
ip link set dev $if netns ns$loop
ip netns exec ns$loop ip link set dev $if up
ip netns exec ns$loop ip link set dev $if netns 1
ip netns delete ns$loop

done

4. Run the script for at least 1000 iterations on SUT:
./vf_namespace_assignment.sh

Expected result:
No errors in dmesg.

Fixes: 129cf89e58 ("iavf: rename functions and structs to new name")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Co-developed-by: Ranganatha Rao <ranganatha.rao@intel.com>
Signed-off-by: Ranganatha Rao <ranganatha.rao@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-12 11:21:30 -08:00
Piotr Gardocki
09d23b8918 iavf: Handle ntuple on/off based on new state machines for flow director
ntuple-filter feature on/off:
Default is on. If turned off, the filters will be removed from both
PF and iavf list. The removal is irrespective of current filter state.

Steps to reproduce:
-------------------

1. Ensure ntuple is on.

ethtool -K enp8s0 ntuple-filters on

2. Create a filter to receive the traffic into non-default rx-queue like 15
and ensure traffic is flowing into queue into 15.
Now, turn off ntuple. Traffic should not flow to configured queue 15.
It should flow to default RX queue.

Fixes: 0dbfbabb84 ("iavf: Add framework to enable ethtool ntuple filters")
Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Ranganatha Rao <ranganatha.rao@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-12 11:21:30 -08:00
Piotr Gardocki
3a0b5a2929 iavf: Introduce new state machines for flow director
New states introduced:

 IAVF_FDIR_FLTR_DIS_REQUEST
 IAVF_FDIR_FLTR_DIS_PENDING
 IAVF_FDIR_FLTR_INACTIVE

Current FDIR state machines (SM) are not adequate to handle a few
scenarios in the link DOWN/UP event, reset event and ntuple-feature.

For example, when VF link goes DOWN and comes back UP administratively,
the expectation is that previously installed filters should also be
restored. But with current SM, filters are not restored.
So with new SM, during link DOWN filters are marked as INACTIVE in
the iavf list but removed from PF. After link UP, SM will transition
from INACTIVE to ADD_REQUEST to restore the filter.

Similarly, with VF reset, filters will be removed from the PF, but
marked as INACTIVE in the iavf list. Filters will be restored after
reset completion.

Steps to reproduce:
-------------------

1. Create a VF. Here VF is enp8s0.

2. Assign IP addresses to VF and link partner and ping continuously
from remote. Here remote IP is 1.1.1.1.

3. Check default RX Queue of traffic.

ethtool -S enp8s0 | grep -E "rx-[[:digit:]]+\.packets"

4. Add filter - change default RX Queue (to 15 here)

ethtool -U ens8s0 flow-type ip4 src-ip 1.1.1.1 action 15 loc 5

5. Ensure filter gets added and traffic is received on RX queue 15 now.

Link event testing:
-------------------
6. Bring VF link down and up. If traffic flows to configured queue 15,
test is success, otherwise it is a failure.

Reset event testing:
--------------------
7. Reset the VF. If traffic flows to configured queue 15, test is success,
otherwise it is a failure.

Fixes: 0dbfbabb84 ("iavf: Add framework to enable ethtool ntuple filters")
Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Ranganatha Rao <ranganatha.rao@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-12 11:20:40 -08:00
Ilpo Järvinen
4c39e76846 e1000e: Use PCI_EXP_LNKSTA_NLW & FIELD_GET() instead of custom defines/code
e1000e has own copy of PCI Negotiated Link Width field defines. Use the
ones from include/uapi/linux/pci_regs.h instead of the custom ones and
remove the custom ones and convert to FIELD_GET().

Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-12 10:55:22 -08:00
Ilpo Järvinen
4f6011678d igb: Use FIELD_GET() to extract Link Width
Use FIELD_GET() to extract PCIe Negotiated Link Width field instead of
custom masking and shifting.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-12 10:55:22 -08:00
justinstitt@google.com
e403cffff1 net: Convert some ethtool_sprintf() to ethtool_puts()
This patch converts some basic cases of ethtool_sprintf() to
ethtool_puts().

The conversions are used in cases where ethtool_sprintf() was being used
with just two arguments:
|       ethtool_sprintf(&data, buffer[i].name);
or when it's used with format string: "%s"
|       ethtool_sprintf(&data, "%s", buffer[i].name);
which both now become:
|       ethtool_puts(&data, buffer[i].name);

Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-08 10:56:25 +00:00
Jakub Kicinski
2483e7f04c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/stmicro/stmmac/dwmac5.c
drivers/net/ethernet/stmicro/stmmac/dwmac5.h
drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
drivers/net/ethernet/stmicro/stmmac/hwif.h
  37e4b8df27 ("net: stmmac: fix FPE events losing")
  c3f3b97238 ("net: stmmac: Refactor EST implementation")
https://lore.kernel.org/all/20231206110306.01e91114@canb.auug.org.au/

Adjacent changes:

net/ipv4/tcp_ao.c
  9396c4ee93 ("net/tcp: Don't store TCP-AO maclen on reqsk")
  7b0f570f87 ("tcp: Move TCP-AO bits from cookie_v[46]_check() to tcp_ao_syncookie().")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07 17:53:17 -08:00
Jacob Keller
a206d9959f iavf: validate tx_coalesce_usecs even if rx_coalesce_usecs is zero
In __iavf_set_coalesce, the driver checks both ec->rx_coalesce_usecs and
ec->tx_coalesce_usecs for validity. It does this via a chain if if/else-if
blocks. If every single branch of the series of if statements exited, this
would be fine. However, the rx_coalesce_usecs is checked against zero to
print an informative message if use_adaptive_rx_coalesce is enabled. If
this check is true, it short circuits the entire chain of statements,
preventing validation of the tx_coalesce_usecs field.

Indeed, since commit e792779e6b ("iavf: Prevent changing static ITR
values if adaptive moderation is on") the iavf driver actually rejects any
change to the tx_coalesce_usecs or rx_coalesce_usecs when
use_adaptive_tx_coalesce or use_adaptive_rx_coalesce is enabled, making
this checking a bit redundant.

Fix this error by removing the unnecessary and redundant checks for
use_adaptive_rx_coalesce and use_adaptive_tx_coalesce. Since zero is a
valid value, and since the tx_coalesce_usecs and rx_coalesce_usecs fields
are already unsigned, remove the minimum value check. This allows assigning
an ITR value ranging from 0-8160 as described by the printed message.

Fixes: 65e87c0398 ("i40evf: support queue-specific settings for interrupt moderation")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-05 10:51:22 -08:00
Ivan Vecera
7d9f22b3d3 i40e: Fix unexpected MFS warning message
Commit 3a2c6ced90 ("i40e: Add a check to see if MFS is set") added
a warning message that reports unexpected size of port's MFS (max
frame size) value. This message use for the port number local
variable 'i' that is wrong.
In i40e_probe() this 'i' variable is used only to iterate VSIs
to find FDIR VSI:

<code>
...
/* if FDIR VSI was set up, start it now */
        for (i = 0; i < pf->num_alloc_vsi; i++) {
                if (pf->vsi[i] && pf->vsi[i]->type == I40E_VSI_FDIR) {
                        i40e_vsi_open(pf->vsi[i]);
                        break;
                }
        }
...
</code>

So the warning message use for the port number index of FDIR VSI
if this exists or pf->num_alloc_vsi if not.

Fix the message by using 'pf->hw.port' for the port number.

Fixes: 3a2c6ced90 ("i40e: Add a check to see if MFS is set")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-05 10:51:22 -08:00
Marcin Szycik
4e7f0087b0 ice: Restore fix disabling RX VLAN filtering
Fix setting dis_rx_filtering depending on whether port vlan is being
turned on or off. This was originally fixed in commit c793f8ea15 ("ice:
Fix disabling Rx VLAN filtering with port VLAN enabled"), but while
refactoring ice_vf_vsi_init_vlan_ops(), the fix has been lost. Restore the
fix along with the original comment from that change.

Also delete duplicate lines in ice_port_vlan_on().

Fixes: 2946204b3f ("ice: implement bridge port vlan")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-05 10:51:22 -08:00
Michal Swiatkowski
f8e9889f54 ice: change vfs.num_msix_per to vf->num_msix
vfs::num_msix_per should be only used as default value for
vf->num_msix. For other use cases vf->num_msix should be used, as VF can
have different MSI-X amount values.

Fix incorrect register index calculation. vfs::num_msix_per and
pf->sriov_base_vector shouldn't be used after implementation of changing
MSI-X amount on VFs. Instead vf->first_vector_idx should be used, as it
is storing value for first irq index.

Fixes: fe1c5ca2fe ("ice: implement num_msix field per VF")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-05 10:51:22 -08:00
Karol Kolacinski
a39dd252d5 ice: Rename E822 to E82X
When code is applicable for both E822 and E823 devices, rename it from
E822 to E82X.
ICE_PHY_PER_NAC_E822 was unused, so just remove it.

Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-05 11:40:12 +01:00
Jacob Keller
712e876371 ice: periodically kick Tx timestamp interrupt
The E822 hardware for Tx timestamping keeps track of how many
outstanding timestamps are still in the PHY memory block. It will not
generate a new interrupt to the MAC until all of the timestamps in the
region have been read.

If somehow all the available data is not read, but the driver has exited
its interrupt routine already, the PHY will not generate a new interrupt
even if new timestamp data is captured. Because no interrupt is
generated, the driver never processes the timestamp data. This state
results in a permanent failure for all future Tx timestamps.

It is not clear how the driver and hardware could enter this state.
However, if it does, there is currently no recovery mechanism.

Add a recovery mechanism via the periodic PTP work thread which invokes
ice_ptp_periodic_work(). Introduce a new check,
ice_ptp_maybe_trigger_tx_interrupt() which checks the PHY timestamp
ready bitmask. If any bits are set, trigger a software interrupt by
writing to PFINT_OICR.

Once triggered, the main timestamp processing thread will read through
the PHY data and clear the outstanding timestamp data. Once cleared, new
data should trigger interrupts as expected.

This should allow recovery from such a state rather than leaving the
device in a state where we cannot process Tx timestamps.

It is possible that this function checks for timestamp data
simultaneously with the interrupt, and it might trigger additional
unnecessary interrupts. This will cause a small amount of additional
processing.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Andrii Staikov <andrii.staikov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-05 11:40:12 +01:00
Karol Kolacinski
1cc5b6eaad ice: Re-enable timestamping correctly after reset
During reset, TX_TSYN interrupt should be processed as it may process
timestamps in brief moments before and after reset.
Timestamping should be enabled on VSIs at the end of reset procedure.
On ice_get_phy_tx_tstamp_ready error, interrupt should not be rearmed
because error only happens on resets.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-05 11:40:12 +01:00
Pawel Kaminski
e9fd08a9a7 ice: Improve logs for max ntuple errors
Supported number of ntuple filters affect also maximum location value that
can be provided to ethtool command. Update error message to provide info
about max supported value.

Fix double spaces in the error messages.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Pawel Kaminski <pawel.kaminski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-05 11:40:12 +01:00
Arkadiusz Kubalewski
b86455a1cb ice: add CGU info to devlink info callback
If Clock Generation Unit is present on NIC board user shall know its
details.
Provide the devlink info callback with a new:
- fixed type object (cgu.id) indicating hardware variant of onboard CGU,
- running type object (fw.cgu) consisting of CGU id, config and firmware
versions.
These information shall be known for debugging purposes.

Test (on NIC board with CGU)
$ devlink dev info <bus_name>/<dev_name> | grep cgu
        cgu.id 36
        fw.cgu 8032.16973825.6021

Test (on NIC board without CGU)
$ devlink dev info <bus_name>/<dev_name> | grep cgu -c
0

Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-05 11:40:12 +01:00
Konrad Knitter
4da71a77fc ice: read internal temperature sensor
Since 4.30 firmware exposes internal thermal sensor reading via admin
queue commands. Expose those readouts via hwmon API when supported.

Datasheet:

Get Sensor Reading Command (Opcode: 0x0632)

+--------------------+--------+--------------------+-------------------------+
| Name               | Bytes  | Value              |          Remarks        |
+--------------------+--------+--------------------+-------------------------+
| Flags              | 1-0    |                    |                         |
| Opcode             | 2-3    | 0x0632             | Command opcode          |
| Datalen            | 4-5    | 0                  | No external buffer.     |
| Return value       | 6-7    |                    | Return value.           |
| Cookie High        | 8-11   | Cookie             |                         |
| Cookie Low         | 12-15  | Cookie             |                         |
| Sensor             | 16     |                    | 0x00: Internal temp     |
|                    |        |                    | 0x01-0xFF: Reserved.    |
| Format             | 17     | Requested response | Only 0x00 is supported. |
|                    |        | format             | 0x01-0xFF: Reserved.    |
| Reserved           | 18-23  |                    |                         |
| Data Address high  | 24-27  | Response buffer    |                         |
|                    |        | address            |                         |
| Data Address low   | 28-31  | Response buffer    |                         |
|                    |        | address            |                         |
+--------------------+--------+--------------------+-------------------------+

Get Sensor Reading Response (Opcode: 0x0632)

+--------------------+--------+--------------------+-------------------------+
| Name               | Bytes  | Value              |          Remarks        |
+--------------------+--------+--------------------+-------------------------+
| Flags              | 1-0    |                    |                         |
| Opcode             | 2-3    | 0x0632             | Command opcode          |
| Datalen            | 4-5    | 0                  | No external buffer      |
| Return value       | 6-7    |                    | Return value.           |
|                    |        |                    | EINVAL: Invalid         |
|                    |        |                    | parameters              |
|                    |        |                    | ENOENT: Unsupported     |
|                    |        |                    | sensor                  |
|                    |        |                    | EIO: Sensor access      |
|                    |        |                    | error                   |
| Cookie High        | 8-11   | Cookie             |                         |
| Cookie Low         | 12-15  | Cookie             |                         |
| Sensor Reading     | 16-23  |                    | Format of the reading   |
|                    |        |                    | is dependent on request |
| Data Address high  | 24-27  | Response buffer    |                         |
|                    |        | address            |                         |
| Data Address low   | 28-31  | Response buffer    |                         |
|                    |        | address            |                         |
+--------------------+--------+--------------------+-------------------------+

Sensor Reading for Sensor 0x00 (Internal Chip Temperature):

+--------------------+--------+--------------------+-------------------------+
| Name               | Bytes  | Value              |          Remarks        |
+--------------------+--------+--------------------+-------------------------+
| Thermal Sensor     | 0      |                    | Reading in degrees      |
| reading            |        |                    | Celsius. Signed int8    |
| Warning High       | 1      |                    | Warning High threshold  |
| threshold          |        |                    | in degrees Celsius.     |
|                    |        |                    | Unsigned int8.          |
|                    |        |                    | 0xFF when unsupported   |
| Critical High      | 2      |                    | Critical High threshold |
| threshold          |        |                    | in degrees Celsius.     |
|                    |        |                    | Unsigned int8.          |
|                    |        |                    | 0xFF when unsupported   |
| Fatal High         | 3      |                    | Fatal High threshold    |
| threshold          |        |                    | in degrees Celsius.     |
|                    |        |                    | Unsigned int8.          |
|                    |        |                    | 0xFF when unsupported   |
| Reserved           | 4-7    |                    |                         |
+--------------------+--------+--------------------+-------------------------+

Driver provides current reading from HW as well as device specific
thresholds for thermal alarm (Warning, Critical, Fatal) events.

$ sensors

Output
=========================================================
ice-pci-b100
Adapter: PCI adapter
temp1:        +62.0°C  (high = +95.0°C, crit = +105.0°C)
                       (emerg = +115.0°C)

Tested on Intel Corporation Ethernet Controller E810-C for SFP

Co-developed-by: Marcin Domagala <marcinx.domagala@intel.com>
Signed-off-by: Marcin Domagala <marcinx.domagala@intel.com>
Co-developed-by: Eric Joyner <eric.joyner@intel.com>
Signed-off-by: Eric Joyner <eric.joyner@intel.com>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Konrad Knitter <konrad.knitter@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-05 11:40:12 +01:00
Amritha Nambiar
26793bfb5d net: Add NAPI IRQ support
Add support to associate the interrupt vector number for a
NAPI instance.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Link: https://lore.kernel.org/r/170147334728.5260.13221803396905901904.stgit@anambiarhost.jf.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-04 18:04:06 -08:00