1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

46357 commits

Author SHA1 Message Date
Linus Torvalds
40f71e7cd3 Including fixes from wireless, and netfilter.
Selftests excluded - we have 58 patches and diff of +442/-199,
 which isn't really small but perhaps with the exception of
 the WiFi locking change it's old(ish) bugs.
 
 We have no known problems with v6.4.
 
 The selftest changes are rather large as MPTCP folks try to apply
 Greg's guidance that selftest from torvalds/linux should be able
 to run against stable kernels.
 
 Last thing I should call out is the DCCP/UDP-lite deprecation notices,
 we are fairly sure those are dead, but if we're wrong reverting them
 back in won't be fun.
 
 Current release - regressions:
 
  - wifi:
   - cfg80211: fix double lock bug in reg_wdev_chan_valid()
   - iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
 
 Current release - new code bugs:
 
  - handshake: remove fput() that causes use-after-free
 
 Previous releases - regressions:
 
  - sched: cls_u32: fix reference counter leak leading to overflow
 
  - sched: cls_api: fix lockup on flushing explicitly created chain
 
 Previous releases - always broken:
 
  - nf_tables: integrate pipapo into commit protocol
 
  - nf_tables: incorrect error path handling with NFT_MSG_NEWRULE,
    fix dangling pointer on failure
 
  - ping6: fix send to link-local addresses with VRF
 
  - sched: act_pedit: parse L3 header for L4 offset, the skb may
    not have the offset saved
 
  - sched: act_ct: fix promotion of offloaded unreplied tuple
 
  - sched: refuse to destroy an ingress and clsact Qdiscs if there
    are lockless change operations in flight
 
  - wifi: mac80211: fix handful of bugs in multi-link operation
 
  - ipvlan: fix bound dev checking for IPv6 l3s mode
 
  - eth: enetc: correct the indexes of highest and 2nd highest TCs
 
  - eth: ice: fix XDP memory leak when NIC is brought up and down
 
 Misc:
 
  - add deprecation notices for UDP-lite and DCCP
 
  - selftests: mptcp: skip tests not supported by old kernels
 
  - sctp: handle invalid error codes without calling BUG()
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmSLllkACgkQMUZtbf5S
 IrubBg/+OeLG7Q3h80t8q1UV2uRXXp3zYcV1Hm2DEtP96RuBYXR4q06/n9Pqbt8P
 gkPWS8dHgt+hCKgsPP2XcWOKxWXK4knTDcV58MVVo4DiVfjCNa6KKb6Glo+G/fvY
 8RlLpQAaTLWBqm8BSQMLL5paWTe9q9LK0w1g280fwVnbPchtqM594zmpP2dm6z3o
 sSFMtYHN62h0isLnrlo1cnY/Qq6H/OWMZDdcJpMoRXIF0JHKMfbangotX/MjgCGj
 4EYrIwQj8+Ctyg+QgmgK5Pr53i2as/ErfrXQKfvjq/4FyLECPUd+KXu6uJW8TpIi
 2/wzO9ssx0iArAn5V+OPqAalbWpJoQ4ba1Ztdd2GKSaOtR8zNYL0QepYK3s+n3YT
 88ZJC0rDOKq9E3MdMuBVgV83NFtwkDe4JdKJwYW2F8+UsDs0jxXjcCEuH719GKSz
 Ag5RK7MQGm3N1Uom9RDGlMin+cvTjWH/owN39ibvJ5G90JTUpGU7IyVHi0Z8X1DG
 lb0C/fc/QF9xl0S7B+LgyRh53lBY0L+zLO8JYK51n+VzU1L9ur5sylqoS3P2XtwB
 4gHX1E+OAX1j4X/lvwF6nclISQs9nF9G41EYfnh38+YtcAKd70+Yo0/cnY5HUCvr
 KKELhdXfqx/Dx18aq8o9IhRuECM81Q7dHHoe6PhHxZaJFgn0nSE=
 =oNA0
 -----END PGP SIGNATURE-----

Merge tag 'net-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireless, and netfilter.

  Selftests excluded - we have 58 patches and diff of +442/-199, which
  isn't really small but perhaps with the exception of the WiFi locking
  change it's old(ish) bugs.

  We have no known problems with v6.4.

  The selftest changes are rather large as MPTCP folks try to apply
  Greg's guidance that selftest from torvalds/linux should be able to
  run against stable kernels.

  Last thing I should call out is the DCCP/UDP-lite deprecation notices.
  We are fairly sure those are dead, but if we're wrong reverting them
  back in won't be fun.

  Current release - regressions:

   - wifi:
      - cfg80211: fix double lock bug in reg_wdev_chan_valid()
      - iwlwifi: mvm: spin_lock_bh() to fix lockdep regression

  Current release - new code bugs:

   - handshake: remove fput() that causes use-after-free

  Previous releases - regressions:

   - sched: cls_u32: fix reference counter leak leading to overflow

   - sched: cls_api: fix lockup on flushing explicitly created chain

  Previous releases - always broken:

   - nf_tables: integrate pipapo into commit protocol

   - nf_tables: incorrect error path handling with NFT_MSG_NEWRULE, fix
     dangling pointer on failure

   - ping6: fix send to link-local addresses with VRF

   - sched: act_pedit: parse L3 header for L4 offset, the skb may not
     have the offset saved

   - sched: act_ct: fix promotion of offloaded unreplied tuple

   - sched: refuse to destroy an ingress and clsact Qdiscs if there are
     lockless change operations in flight

   - wifi: mac80211: fix handful of bugs in multi-link operation

   - ipvlan: fix bound dev checking for IPv6 l3s mode

   - eth: enetc: correct the indexes of highest and 2nd highest TCs

   - eth: ice: fix XDP memory leak when NIC is brought up and down

  Misc:

   - add deprecation notices for UDP-lite and DCCP

   - selftests: mptcp: skip tests not supported by old kernels

   - sctp: handle invalid error codes without calling BUG()"

* tag 'net-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
  dccp: Print deprecation notice.
  udplite: Print deprecation notice.
  octeon_ep: Add missing check for ioremap
  selftests/ptp: Fix timestamp printf format for PTP_SYS_OFFSET
  net: ethernet: stmicro: stmmac: fix possible memory leak in __stmmac_open
  net: tipc: resize nlattr array to correct size
  sfc: fix XDP queues mode with legacy IRQ
  net: macsec: fix double free of percpu stats
  net: lapbether: only support ethernet devices
  MAINTAINERS: add reviewers for SMC Sockets
  s390/ism: Fix trying to free already-freed IRQ by repeated ism_dev_exit()
  net: dsa: felix: fix taprio guard band overflow at 10Mbps with jumbo frames
  net/sched: cls_api: Fix lockup on flushing explicitly created chain
  ice: Fix ice module unload
  net/handshake: remove fput() that causes use-after-free
  selftests: forwarding: hw_stats_l3: Set addrgenmode in a separate step
  net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting
  net/sched: Refactor qdisc_graft() for ingress and clsact Qdiscs
  net/sched: act_ct: Fix promotion of offloaded unreplied tuple
  wifi: iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
  ...
2023-06-15 21:11:17 -07:00
Linus Torvalds
93fd8eb053 v6.4 second rc RDMA pull request
Many rc bug fixes:
 
 - Two rtrs bug fixes for error unwind bugs
 
 - Several rxe bug fixes:
    * Incorrect Rx packet validation
    * Using memory without a refcount
    * Syzkaller found use before initialization
    * Regression fix for missing locking with the tasklet conversion from
      this merge window
 
 - Have bnxt report the correct link properties to userspace, this was
   a regression in v6.3
 
 - Several mlx5 bug fixes:
    * Kernel crash triggerable by userspace for the RAW ethernet profile
    * Defend against steering refcounting issues created by userspace
    * Incorrect change of QP port affinity parameters in some LAG configurations
 
 - Fix mlx5 Q counters:
    * Do not over allocate Q counters to allow userspace to use the full
      port capacity
    * Kernel crash triggered by eswitch due to mis-use of Q counters
    * Incorrect mlx5_device for Q counters in some LAG configurations
 
 - Properly implement the IBA spec restricting privileged qkeys to root
 
 - Always an error when reading from a disassociated device's event queue
 
 - isert bug fixes:
    * Avoid a deadlock with the CM handler and CM ID destruction
    * Correct list corruption due to incorrect locking
    * Fix a use after free around connection tear down
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZIoZ7gAKCRCFwuHvBreF
 YUSyAPsH+VdqzYd1z/bk0zMds9JR2oqMOR8l02e4gq8K9hjTJAD/ePEuT5PTpFZu
 FrT4SDhOfA30XNz+RofRNisJC92OOAo=
 =KyF7
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "This is an unusually large bunch of bug fixes for the later rc cycle,
  rxe and mlx5 both dumped a lot of things at once. rxe continues to fix
  itself, and mlx5 is fixing a bunch of "queue counters" related bugs.

  There is one highly notable bug fix regarding the qkey. This small
  security check was missed in the original 2005 implementation and it
  allows some significant issues.

  Summary:

   - Two rtrs bug fixes for error unwind bugs

   - Several rxe bug fixes:
      * Incorrect Rx packet validation
      * Using memory without a refcount
      * Syzkaller found use before initialization
      * Regression fix for missing locking with the tasklet conversion
        from this merge window

   - Have bnxt report the correct link properties to userspace, this was
     a regression in v6.3

   - Several mlx5 bug fixes:
      * Kernel crash triggerable by userspace for the RAW ethernet
        profile
      * Defend against steering refcounting issues created by userspace
      * Incorrect change of QP port affinity parameters in some LAG
        configurations

   - Fix mlx5 Q counters:
      * Do not over allocate Q counters to allow userspace to use the
        full port capacity
      * Kernel crash triggered by eswitch due to mis-use of Q counters
      * Incorrect mlx5_device for Q counters in some LAG configurations

   - Properly implement the IBA spec restricting privileged qkeys to
     root

   - Always an error when reading from a disassociated device's event
     queue

   - isert bug fixes:
      * Avoid a deadlock with the CM handler and CM ID destruction
      * Correct list corruption due to incorrect locking
      * Fix a use after free around connection tear down"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/rxe: Fix rxe_cq_post
  IB/isert: Fix incorrect release of isert connection
  IB/isert: Fix possible list corruption in CMA handler
  IB/isert: Fix dead lock in ib_isert
  RDMA/mlx5: Fix affinity assignment
  IB/uverbs: Fix to consider event queue closing also upon non-blocking mode
  RDMA/uverbs: Restrict usage of privileged QKEYs
  RDMA/cma: Always set static rate to 0 for RoCE
  RDMA/mlx5: Fix Q-counters query in LAG mode
  RDMA/mlx5: Remove vport Q-counters dependency on normal Q-counters
  RDMA/mlx5: Fix Q-counters per vport allocation
  RDMA/mlx5: Create an indirect flow table for steering anchor
  RDMA/mlx5: Initiate dropless RQ for RAW Ethernet functions
  RDMA/rxe: Fix the use-before-initialization error of resp_pkts
  RDMA/bnxt_re: Fix reporting active_{speed,width} attributes
  RDMA/rxe: Fix ref count error in check_rkey()
  RDMA/rxe: Fix packet length checks
  RDMA/rtrs: Fix rxe_dealloc_pd warning
  RDMA/rtrs: Fix the last iu->buf leak in err path
2023-06-15 20:13:56 -07:00
Jiasheng Jiang
9a36e2d44d octeon_ep: Add missing check for ioremap
Add check for ioremap() and return the error if it fails in order to
guarantee the success of ioremap().

Fixes: 862cd659a6 ("octeon_ep: Add driver framework and device initialization")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://lore.kernel.org/r/20230615033400.2971-1-jiasheng@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-15 15:07:28 -07:00
Christian Marangi
30134b7c47 net: ethernet: stmicro: stmmac: fix possible memory leak in __stmmac_open
Fix a possible memory leak in __stmmac_open when stmmac_init_phy fails.
It's also needed to free everything allocated by stmmac_setup_dma_desc
and not just the dma_conf struct.

Drop free_dma_desc_resources from __stmmac_open and correctly call
free_dma_desc_resources on each user of __stmmac_open on error.

Reported-by: Jose Abreu <Jose.Abreu@synopsys.com>
Fixes: ba39b344e9 ("net: ethernet: stmicro: stmmac: generate stmmac dma conf before open")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Jose Abreu <Jose.Abreu@synopsys.com>
Link: https://lore.kernel.org/r/20230614091714.15912-1-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-15 15:02:04 -07:00
Íñigo Huguet
e84a1e1e68 sfc: fix XDP queues mode with legacy IRQ
In systems without MSI-X capabilities, xdp_txq_queues_mode is calculated
in efx_allocate_msix_channels, but when enabling MSI-X fails, it was not
changed to a proper default value. This was leading to the driver
thinking that it has dedicated XDP queues, when it didn't.

Fix it by setting xdp_txq_queues_mode to the correct value if the driver
fallbacks to MSI or legacy IRQ mode. The correct value is
EFX_XDP_TX_QUEUES_BORROWED because there are no XDP dedicated queues.

The issue can be easily visible if the kernel is started with pci=nomsi,
then a call trace is shown. It is not shown only with sfc's modparam
interrupt_mode=2. Call trace example:
 WARNING: CPU: 2 PID: 663 at drivers/net/ethernet/sfc/efx_channels.c:828 efx_set_xdp_channels+0x124/0x260 [sfc]
 [...skip...]
 Call Trace:
  <TASK>
  efx_set_channels+0x5c/0xc0 [sfc]
  efx_probe_nic+0x9b/0x15a [sfc]
  efx_probe_all+0x10/0x1a2 [sfc]
  efx_pci_probe_main+0x12/0x156 [sfc]
  efx_pci_probe_post_io+0x18/0x103 [sfc]
  efx_pci_probe.cold+0x154/0x257 [sfc]
  local_pci_probe+0x42/0x80

Fixes: 6215b608a8 ("sfc: last resort fallback for lack of xdp tx queues")
Reported-by: Yanghang Liu <yanghliu@redhat.com>
Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-15 11:43:31 +01:00
Jakub Buchocki
24b454bc35 ice: Fix ice module unload
Clearing the interrupt scheme before PFR reset,
during the removal routine, could cause the hardware
errors and possibly lead to system reboot, as the PF
reset can cause the interrupt to be generated.

Place the call for PFR reset inside ice_deinit_dev(),
wait until reset and all pending transactions are done,
then call ice_clear_interrupt_scheme().

This introduces a PFR reset to multiple error paths.

Additionally, remove the call for the reset from
ice_load() - it will be a part of ice_unload() now.

Error example:
[   75.229328] ice 0000:ca:00.1: Failed to read Tx Scheduler Tree - User Selection data from flash
[   77.571315] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[   77.571418] {1}[Hardware Error]: event severity: recoverable
[   77.571459] {1}[Hardware Error]:  Error 0, type: recoverable
[   77.571500] {1}[Hardware Error]:   section_type: PCIe error
[   77.571540] {1}[Hardware Error]:   port_type: 4, root port
[   77.571580] {1}[Hardware Error]:   version: 3.0
[   77.571615] {1}[Hardware Error]:   command: 0x0547, status: 0x4010
[   77.571661] {1}[Hardware Error]:   device_id: 0000:c9:02.0
[   77.571703] {1}[Hardware Error]:   slot: 25
[   77.571736] {1}[Hardware Error]:   secondary_bus: 0xca
[   77.571773] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x347a
[   77.571821] {1}[Hardware Error]:   class_code: 060400
[   77.571858] {1}[Hardware Error]:   bridge: secondary_status: 0x2800, control: 0x0013
[   77.572490] pcieport 0000:c9:02.0: AER: aer_status: 0x00200000, aer_mask: 0x00100020
[   77.572870] pcieport 0000:c9:02.0:    [21] ACSViol                (First)
[   77.573222] pcieport 0000:c9:02.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
[   77.573554] pcieport 0000:c9:02.0: AER: aer_uncor_severity: 0x00463010
[   77.691273] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[   77.691738] {2}[Hardware Error]: event severity: recoverable
[   77.691971] {2}[Hardware Error]:  Error 0, type: recoverable
[   77.692192] {2}[Hardware Error]:   section_type: PCIe error
[   77.692403] {2}[Hardware Error]:   port_type: 4, root port
[   77.692616] {2}[Hardware Error]:   version: 3.0
[   77.692825] {2}[Hardware Error]:   command: 0x0547, status: 0x4010
[   77.693032] {2}[Hardware Error]:   device_id: 0000:c9:02.0
[   77.693238] {2}[Hardware Error]:   slot: 25
[   77.693440] {2}[Hardware Error]:   secondary_bus: 0xca
[   77.693641] {2}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x347a
[   77.693853] {2}[Hardware Error]:   class_code: 060400
[   77.694054] {2}[Hardware Error]:   bridge: secondary_status: 0x0800, control: 0x0013
[   77.719115] pci 0000:ca:00.1: AER: can't recover (no error_detected callback)
[   77.719140] pcieport 0000:c9:02.0: AER: device recovery failed
[   77.719216] pcieport 0000:c9:02.0: AER: aer_status: 0x00200000, aer_mask: 0x00100020
[   77.719390] pcieport 0000:c9:02.0:    [21] ACSViol                (First)
[   77.719557] pcieport 0000:c9:02.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
[   77.719723] pcieport 0000:c9:02.0: AER: aer_uncor_severity: 0x00463010

Fixes: 5b246e533d ("ice: split probe into smaller functions")
Signed-off-by: Jakub Buchocki <jakubx.buchocki@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230612171421.21570-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-14 22:43:36 -07:00
Jakub Kicinski
d6858e1904 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-06-12 (igc, igb)

This series contains updates to igc and igb drivers.

Husaini clears Tx rings when interface is brought down for igc.

Vinicius disables PTM and PCI busmaster when removing igc driver.

Alex adds error check and path for NVM read error on igb.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  igb: fix nvm.ops.read() error handling
  igc: Fix possible system crash when loading module
  igc: Clean the TX buffer and TX descriptor ring
====================

Link: https://lore.kernel.org/r/20230612205208.115292-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-14 22:36:54 -07:00
Dan Carpenter
374283a100 net: ethernet: ti: am65-cpsw: Call of_node_put() on error path
This code returns directly but it should instead call of_node_put()
to drop some reference counts.

Fixes: dab2b265dd ("net: ethernet: ti: am65-cpsw: Add support for SERDES configuration")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Link: https://lore.kernel.org/r/e3012f0c-1621-40e6-bf7d-03c276f6e07f@kili.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-13 20:47:03 -07:00
Aleksandr Loktionov
48a821fd58 igb: fix nvm.ops.read() error handling
Add error handling into igb_set_eeprom() function, in case
nvm.ops.read() fails just quit with error code asap.

Fixes: 9d5c824399 ("igb: PCI-Express 82575 Gigabit Ethernet driver")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-12 13:33:22 -07:00
Vinicius Costa Gomes
c080fe262f igc: Fix possible system crash when loading module
Guarantee that when probe() is run again, PTM and PCI busmaster will be
in the same state as it was if the driver was never loaded.

Avoid an i225/i226 hardware issue that PTM requests can be made even
though PCI bus mastering is not enabled. These unexpected PTM requests
can crash some systems.

So, "force" disable PTM and busmastering before removing the driver,
so they can be re-enabled in the right order during probe(). This is
more like a workaround and should be applicable for i225 and i226, in
any platform.

Fixes: 1b5d73fb86 ("igc: Enable PCIe PTM")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-12 13:18:30 -07:00
Muhammad Husaini Zulkifli
e43516f597 igc: Clean the TX buffer and TX descriptor ring
There could be a race condition during link down where interrupt
being generated and igc_clean_tx_irq() been called to perform the
TX completion. Properly clear the TX buffer/descriptor ring and
disable the TX Queue ring in igc_free_tx_resources() to avoid that.

Kernel trace:
[  108.237177] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLIFUI1.R00.4204.A00.2105270302 05/27/2021
[  108.237178] RIP: 0010:refcount_warn_saturate+0x55/0x110
[  108.242143] RSP: 0018:ffff9e7980003db0 EFLAGS: 00010286
[  108.245555] Code: 84 bc 00 00 00 c3 cc cc cc cc 85 f6 74 46 80 3d 20 8c 4d 01 00 75 ee 48 c7 c7 88 f4 03 ab c6 05 10 8c 4d 01 01 e8 0b 10 96 ff <0f> 0b c3 cc cc cc cc 80 3d fc 8b 4d 01 00 75 cb 48 c7 c7 b0 f4 03
[  108.250434]
[  108.250434] RSP: 0018:ffff9e798125f910 EFLAGS: 00010286
[  108.254358] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  108.259325]
[  108.259325] RAX: 0000000000000000 RBX: ffff8ddb935b8000 RCX: 0000000000000027
[  108.261868] RDX: ffff8de250a28800 RSI: ffff8de250a1c580 RDI: ffff8de250a1c580
[  108.265538] RDX: 0000000000000027 RSI: 0000000000000002 RDI: ffff8de250a9c588
[  108.265539] RBP: ffff8ddb935b8000 R08: ffffffffab2655a0 R09: ffff9e798125f898
[  108.267914] RBP: ffff8ddb8a5b8d80 R08: 0000005648eba354 R09: 0000000000000000
[  108.270196] R10: 0000000000000001 R11: 000000002d2d2d2d R12: ffff9e798125f948
[  108.270197] R13: ffff9e798125fa1c R14: ffff8ddb8a5b8d80 R15: 7fffffffffffffff
[  108.273001] R10: 000000002d2d2d2d R11: 000000002d2d2d2d R12: ffff8ddb8a5b8ed4
[  108.276410] FS:  00007f605851b740(0000) GS:ffff8de250a80000(0000) knlGS:0000000000000000
[  108.280597] R13: 00000000000002ac R14: 00000000ffffff99 R15: ffff8ddb92561b80
[  108.282966] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  108.282967] CR2: 00007f053c039248 CR3: 0000000185850003 CR4: 0000000000f70ee0
[  108.286206] FS:  0000000000000000(0000) GS:ffff8de250a00000(0000) knlGS:0000000000000000
[  108.289701] PKRU: 55555554
[  108.289702] Call Trace:
[  108.289704]  <TASK>
[  108.293977] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  108.297562]  sock_alloc_send_pskb+0x20c/0x240
[  108.301494] CR2: 00007f053c03a168 CR3: 0000000184394002 CR4: 0000000000f70ef0
[  108.301495] PKRU: 55555554
[  108.306464]  __ip_append_data.isra.0+0x96f/0x1040
[  108.309441] Call Trace:
[  108.309443]  ? __pfx_ip_generic_getfrag+0x10/0x10
[  108.314927]  <IRQ>
[  108.314928]  sock_wfree+0x1c7/0x1d0
[  108.318078]  ? __pfx_ip_generic_getfrag+0x10/0x10
[  108.320276]  skb_release_head_state+0x32/0x90
[  108.324812]  ip_make_skb+0xf6/0x130
[  108.327188]  skb_release_all+0x16/0x40
[  108.330775]  ? udp_sendmsg+0x9f3/0xcb0
[  108.332626]  napi_consume_skb+0x48/0xf0
[  108.334134]  ? xfrm_lookup_route+0x23/0xb0
[  108.344285]  igc_poll+0x787/0x1620 [igc]
[  108.346659]  udp_sendmsg+0x9f3/0xcb0
[  108.360010]  ? ttwu_do_activate+0x40/0x220
[  108.365237]  ? __pfx_ip_generic_getfrag+0x10/0x10
[  108.366744]  ? try_to_wake_up+0x289/0x5e0
[  108.376987]  ? sock_sendmsg+0x81/0x90
[  108.395698]  ? __pfx_process_timeout+0x10/0x10
[  108.395701]  sock_sendmsg+0x81/0x90
[  108.409052]  __napi_poll+0x29/0x1c0
[  108.414279]  ____sys_sendmsg+0x284/0x310
[  108.419507]  net_rx_action+0x257/0x2d0
[  108.438216]  ___sys_sendmsg+0x7c/0xc0
[  108.439723]  __do_softirq+0xc1/0x2a8
[  108.444950]  ? finish_task_switch+0xb4/0x2f0
[  108.452077]  irq_exit_rcu+0xa9/0xd0
[  108.453584]  ? __schedule+0x372/0xd00
[  108.460713]  common_interrupt+0x84/0xa0
[  108.467840]  ? clockevents_program_event+0x95/0x100
[  108.474968]  </IRQ>
[  108.482096]  ? do_nanosleep+0x88/0x130
[  108.489224]  <TASK>
[  108.489225]  asm_common_interrupt+0x26/0x40
[  108.496353]  ? __rseq_handle_notify_resume+0xa9/0x4f0
[  108.503478] RIP: 0010:cpu_idle_poll+0x2c/0x100
[  108.510607]  __sys_sendmsg+0x5d/0xb0
[  108.518687] Code: 05 e1 d9 c8 00 65 8b 15 de 64 85 55 85 c0 7f 57 e8 b9 ef ff ff fb 65 48 8b 1c 25 00 cc 02 00 48 8b 03 a8 08 74 0b eb 1c f3 90 <48> 8b 03 a8 08 75 13 8b 05 77 63 cd 00 85 c0 75 ed e8 ce ec ff ff
[  108.525817]  do_syscall_64+0x44/0xa0
[  108.531563] RSP: 0018:ffffffffab203e70 EFLAGS: 00000202
[  108.538693]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  108.546775]
[  108.546777] RIP: 0033:0x7f605862b7f7
[  108.549495] RAX: 0000000000000001 RBX: ffffffffab20c940 RCX: 000000000000003b
[  108.551955] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[  108.554068] RDX: 4000000000000000 RSI: 000000002da97f6a RDI: 00000000002b8ff4
[  108.559816] RSP: 002b:00007ffc99264058 EFLAGS: 00000246
[  108.564178] RBP: 0000000000000000 R08: 00000000002b8ff4 R09: ffff8ddb01554c80
[  108.571302]  ORIG_RAX: 000000000000002e
[  108.571303] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f605862b7f7
[  108.574023] R10: 000000000000015b R11: 000000000000000f R12: ffffffffab20c940
[  108.574024] R13: 0000000000000000 R14: ffff8de26fbeef40 R15: ffffffffab20c940
[  108.578727] RDX: 0000000000000000 RSI: 00007ffc992640a0 RDI: 0000000000000003
[  108.578728] RBP: 00007ffc99264110 R08: 0000000000000000 R09: 175f48ad1c3a9c00
[  108.581187]  do_idle+0x62/0x230
[  108.585890] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc992642d8
[  108.585891] R13: 00005577814ab2ba R14: 00005577814addf0 R15: 00007f605876d000
[  108.587920]  cpu_startup_entry+0x1d/0x20
[  108.591422]  </TASK>
[  108.596127]  rest_init+0xc5/0xd0
[  108.600490] ---[ end trace 0000000000000000 ]---

Test Setup:

DUT:
- Change mac address on DUT Side. Ensure NIC not having same MAC Address
- Running udp_tai on DUT side. Let udp_tai running throughout the test

Example:
./udp_tai -i enp170s0 -P 100000 -p 90 -c 1 -t 0 -u 30004

Host:
- Perform link up/down every 5 second.

Result:
Kernel panic will happen on DUT Side.

Fixes: 13b5b7fd6a ("igc: Add support for Tx/Rx rings")
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-12 13:18:30 -07:00
Mark Bloch
617f5db1a6 RDMA/mlx5: Fix affinity assignment
The cited commit aimed to ensure that Virtual Functions (VFs) assign a
queue affinity to a Queue Pair (QP) to distribute traffic when
the LAG master creates a hardware LAG. If the affinity was set while
the hardware was not in LAG, the firmware would ignore the affinity value.

However, this commit unintentionally assigned an affinity to QPs on the LAG
master's VPORT even if the RDMA device was not marked as LAG-enabled.
In most cases, this was not an issue because when the hardware entered
hardware LAG configuration, the RDMA device of the LAG master would be
destroyed and a new one would be created, marked as LAG-enabled.

The problem arises when a user configures Equal-Cost Multipath (ECMP).
In ECMP mode, traffic can be directed to different physical ports based on
the queue affinity, which is intended for use by VPORTS other than the
E-Switch manager. ECMP mode is supported only if both E-Switch managers are
in switchdev mode and the appropriate route is configured via IP. In this
configuration, the RDMA device is not destroyed, and we retain the RDMA
device that is not marked as LAG-enabled.

To ensure correct behavior, Send Queues (SQs) opened by the E-Switch
manager through verbs should be assigned strict affinity. This means they
will only be able to communicate through the native physical port
associated with the E-Switch manager. This will prevent the firmware from
assigning affinity and will not allow the SQs to be remapped in case of
failover.

Fixes: 802dcc7fc5 ("RDMA/mlx5: Support TX port affinity for VF drivers in LAG mode")
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://lore.kernel.org/r/425b05f4da840bc684b0f7e8ebf61aeb5cef09b0.1685960567.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-06-11 11:27:17 +03:00
David Christensen
7ebe4eda42 bnx2x: fix page fault following EEH recovery
In the last step of the EEH recovery process, the EEH driver calls into
bnx2x_io_resume() to re-initialize the NIC hardware via the function
bnx2x_nic_load().  If an error occurs during bnx2x_nic_load(), OS and
hardware resources are released and an error code is returned to the
caller.  When called from bnx2x_io_resume(), the return code is ignored
and the network interface is brought up unconditionally.  Later attempts
to send a packet via this interface result in a page fault due to a null
pointer reference.

This patch checks the return code of bnx2x_nic_load(), prints an error
message if necessary, and does not enable the interface.

Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-10 20:07:10 +01:00
Nithin Dabilpuram
87e12a17ee octeontx2-af: fix lbk link credits on cn10k
Fix LBK link credits on CN10K to be same as CN9K i.e
16 * MAX_LBK_DATA_RATE instead of current scheme of
calculation based on LBK buf length / FIFO size.

Fixes: 6e54e1c539 ("octeontx2-af: cn10K: Add MTU configuration")
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-10 19:53:02 +01:00
Satha Rao
4e635f9d86 octeontx2-af: fixed resource availability check
txschq_alloc response have two different arrays to store continuous
and non-continuous schedulers of each level. Requested count should
be checked for each array separately.

Fixes: 5d9b976d44 ("octeontx2-af: Support fixed transmit scheduler topology")
Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-10 19:53:02 +01:00
Jakub Kicinski
1b8975f30a Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-06-08 (ice)

This series contains updates to ice driver only.

Simon Horman stops null pointer dereference for GNSS error path.

Kamil fixes memory leak when downing interface when XDP is enabled.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  ice: Fix XDP memory leak when NIC is brought up and down
  ice: Don't dereference NULL in ice_gnss_read error path
====================

Link: https://lore.kernel.org/r/20230608200051.451752-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-10 00:11:18 -07:00
Ahmed Zaki
c37cf54c12 iavf: remove mask from iavf_irq_enable_queues()
Enable more than 32 IRQs by removing the u32 bit mask in
iavf_irq_enable_queues(). There is no need for the mask as there are no
callers that select individual IRQs through the bitmask. Also, if the PF
allocates more than 32 IRQs, this mask will prevent us from using all of
them.

Modify the comment in iavf_register.h to show that the maximum number
allowed for the IRQ index is 63 as per the iAVF standard 1.0 [1].

link: [1] https://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/ethernet-adaptive-virtual-function-hardware-spec.pdf
Fixes: 5eae00c57f ("i40evf: main driver core")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230608200226.451861-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-10 00:09:54 -07:00
Ratheesh Kannoth
c0e489372a octeontx2-af: Fix promiscuous mode
CN10KB silicon introduced a new exact match feature,
which is used for DMAC filtering. The state of installed
DMAC filters in this exact match table is getting corrupted
when promiscuous mode is toggled. Fix this by not touching
Exact match related config when promiscuous mode is toggled.

Fixes: 2dba9459d2 ("octeontx2-af: Wrapper functions for MAC addr add/del/update/reset")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-09 10:47:59 +01:00
Yoshihiro Shimoda
0ad4982c52 net: renesas: rswitch: Fix timestamp feature after all descriptors are used
The timestamp descriptors were intended to act cyclically. Descriptors
from index 0 through gq->ring_size - 1 contain actual information, and
the last index (gq->ring_size) should have LINKFIX to indicate
the first index 0 descriptor. However, the LINKFIX value is missing,
causing the timestamp feature to stop after all descriptors are used.
To resolve this issue, set the LINKFIX to the timestamp descritors.

Reported-by: Phong Hoang <phong.hoang.wz@renesas.com>
Fixes: 33f5d733b5 ("net: renesas: rswitch: Improve TX timestamp accuracy")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-09 10:41:09 +01:00
Yuezhen Luan
6292d7436c igb: Fix extts capture value format for 82580/i354/i350
82580/i354/i350 features circle-counter-like timestamp registers
that are different with newer i210. The EXTTS capture value in
AUXTSMPx should be converted from raw circle counter value to
timestamp value in resolution of 1 nanosec by the driver.

This issue can be reproduced on i350 nics, connecting an 1PPS
signal to a SDP pin, and run 'ts2phc' command to read external
1PPS timestamp value. On i210 this works fine, but on i350 the
extts is not correctly converted.

The i350/i354/82580's SYSTIM and other timestamp registers are
40bit counters, presenting time range of 2^40 ns, that means these
registers overflows every about 1099s. This causes all these regs
can't be used directly in contrast to the newer i210/i211s.

The igb driver needs to convert these raw register values to
valid time stamp format by using kernel timecounter apis for i350s
families. Here the igb_extts() just forgot to do the convert.

Fixes: 38970eac41 ("igb: support EXTTS on 82580/i354/i350")
Signed-off-by: Yuezhen Luan <eggcar.luan@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230607164116.3768175-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-08 19:11:12 -07:00
Wei Fang
21225873be net: enetc: correct the indexes of highest and 2nd highest TCs
For ENETC hardware, the TCs are numbered from 0 to N-1, where N
is the number of TCs. Numerically higher TC has higher priority.
It's obvious that the highest priority TC index should be N-1 and
the 2nd highest priority TC index should be N-2.

However, the previous logic uses netdev_get_prio_tc_map() to get
the indexes of highest priority and 2nd highest priority TCs, it
does not make sense and is incorrect to give a "tc" argument to
netdev_get_prio_tc_map(). So the driver may get the wrong indexes
of the two highest priotiry TCs which would lead to failed to set
the CBS for the two highest priotiry TCs.

e.g.
$ tc qdisc add dev eno0 parent root handle 100: mqprio num_tc 6 \
	map 0 0 1 1 2 3 4 5 queues 1@0 1@1 1@2 1@3 2@4 2@6 hw 1
$ tc qdisc replace dev eno0 parent 100:6 cbs idleslope 100000 \
	sendslope -900000 hicredit 12 locredit -113 offload 1
$ Error: Specified device failed to setup cbs hardware offload.
  ^^^^^

In this example, the previous logic deems the indexes of the two
highest priotiry TCs should be 3 and 2. Actually, the indexes are
5 and 4, because the number of TCs is 6. So it would be failed to
configure the CBS for the two highest priority TCs.

Fixes: c431047c4e ("enetc: add support Credit Based Shaper(CBS) for hardware offload")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-08 20:57:54 +01:00
Kamil Maziarz
78c50d6961 ice: Fix XDP memory leak when NIC is brought up and down
Fix the buffer leak that occurs while switching
the port up and down with traffic and XDP by
checking for an active XDP program and freeing all empty TX buffers.

Fixes: efc2214b60 ("ice: Add support for XDP")
Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-08 11:41:40 -07:00
Simon Horman
05a1308a2e ice: Don't dereference NULL in ice_gnss_read error path
If pf is NULL in ice_gnss_read() then it will be dereferenced
in the error path by a call to dev_dbg(ice_pf_to_dev(pf), ...).

Avoid this by simply returning in this case.
If logging is desired an alternate approach might be to
use pr_err() before returning.

Flagged by Smatch as:

  .../ice_gnss.c:196 ice_gnss_read() error: we previously assumed 'pf' could be null (see line 131)

Fixes: 43113ff734 ("ice: add TTY for GNSS module for E810T device")
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-06-08 08:38:56 -07:00
Somnath Kotur
1eb4ef1259 bnxt_en: Implement .set_port / .unset_port UDP tunnel callbacks
As per the new udp tunnel framework, drivers which need to know the
details of a port entry (i.e. port type) when it gets deleted should
use the .set_port / .unset_port callbacks.

Implementing the current .udp_tunnel_sync callback would mean that the
deleted tunnel port entry would be all zeros.  This used to work on
older firmware because it would not check the input when deleting a
tunnel port.  With newer firmware, the delete will now fail and
subsequent tunnel port allocation will fail as a result.

Fixes: 442a35a5a7 ("bnxt: convert to new udp_tunnel_nic infra")
Reviewed-by: Kalesh Anakkur Purayil <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-08 10:52:45 +02:00
Pavan Chebbi
319a7827df bnxt_en: Prevent kernel panic when receiving unexpected PHC_UPDATE event
The firmware can send PHC_RTC_UPDATE async event on a PF that may not
have PTP registered. In such a case, there will be a null pointer
deference for bp->ptp_cfg when we try to handle the event.

Fix it by not registering for this event with the firmware if !bp->ptp_cfg.
Also, check that bp->ptp_cfg is valid before proceeding when we receive
the event.

Fixes: 8bcf6f04d4 ("bnxt_en: Handle async event when the PHC is updated in RTC mode")
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-08 10:52:45 +02:00
Vikas Gupta
83474a9b25 bnxt_en: Skip firmware fatal error recovery if chip is not accessible
Driver starts firmware fatal error recovery by detecting
heartbeat failure or fw reset count register changing.  But
these checks are not reliable if the device is not accessible.
This can happen while DPC (Downstream Port containment) is in
progress.  Skip firmware fatal recovery if pci_device_is_present()
returns false.

Fixes: acfb50e4e7 ("bnxt_en: Add FW fatal devlink_health_reporter.")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-08 10:52:45 +02:00
Somnath Kotur
1a9e4f501b bnxt_en: Query default VLAN before VNIC setup on a VF
We need to call bnxt_hwrm_func_qcfg() on a VF to query the default
VLAN that may be setup by the PF.  If a default VLAN is enabled,
the VF cannot support VLAN acceleration on the receive side and
the VNIC must be setup to strip out the default VLAN tag.  If a
default VLAN is not enabled, the VF can support VLAN acceleration
on the receive side.  The VNIC should be set up to strip or not
strip the VLAN based on the RX VLAN acceleration setting.

Without this call to determine the default VLAN before calling
bnxt_setup_vnic(), the VNIC may not be set up correctly.  For
example, bnxt_setup_vnic() may set up to strip the VLAN tag based
on stale default VLAN information.  If RX VLAN acceleration is
not enabled, the VLAN tag will be incorrectly stripped and the
RX data path will not work correctly.

Fixes: cf6645f8eb ("bnxt_en: Add function for VF driver to query default VLAN.")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-08 10:52:45 +02:00
Sreekanth Reddy
1d997801c7 bnxt_en: Don't issue AP reset during ethtool's reset operation
Only older NIC controller's firmware uses the PROC AP reset type.
Firmware on 5731X/5741X and newer chips does not support this reset
type.  When bnxt_reset() issues a series of resets, this PROC AP
reset may actually fail on these newer chips because the firmware
is not ready to accept this unsupported command yet.  Avoid this
unnecessary error by skipping this reset type on chips that don't
support it.

Fixes: 7a13240e37 ("bnxt_en: fix ethtool_reset_flags ABI violations")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-08 10:52:45 +02:00
Pavan Chebbi
095d5dc0c1 bnxt_en: Fix bnxt_hwrm_update_rss_hash_cfg()
We must specify the vnic id of the vnic in the input structure of this
firmware message.  Otherwise we will get an error from the firmware.

Fixes: 98a4322b70 ("bnxt_en: update RSS config using difference algorithm")
Reviewed-by: Kalesh Anakkur Purayil <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-08 10:52:45 +02:00
Florian Fainelli
a9f31047ba net: bcmgenet: Fix EEE implementation
We had a number of short comings:

- EEE must be re-evaluated whenever the state machine detects a link
  change as wight be switching from a link partner with EEE
  enabled/disabled

- tx_lpi_enabled controls whether EEE should be enabled/disabled for the
  transmit path, which applies to the TBUF block

- We do not need to forcibly enable EEE upon system resume, as the PHY
  state machine will trigger a link event that will do that, too

Fixes: 6ef398ea60 ("net: bcmgenet: add EEE support")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20230606214348.2408018-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07 21:53:14 -07:00
Jakub Kicinski
f0d751973f eth: ixgbe: fix the wake condition
Flip the netif_carrier_ok() condition in queue wake logic.
When I moved it to inside __netif_txq_completed_wake()
I missed negating it.

This made the condition ineffective and could probably
lead to crashes.

Fixes: 301f227fc8 ("net: piggy back on the memory barrier in bql when waking queues")
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230607010826.960226-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07 21:53:11 -07:00
Jakub Kicinski
649c3fed36 eth: bnxt: fix the wake condition
The down condition should be the negation of the wake condition,
IOW when I moved it from:

 if (cond && wake())
to
 if (__netif_txq_completed_wake(cond))

Cond should have been negated. Flip it now.

This bug leads to occasional crashes with netconsole.
It may also lead to queue never waking up in case BQL is not enabled.

Reported-by: David Wei <davidhwei@meta.com>
Fixes: 08a096780d ("bnxt: use new queue try_stop/try_wake macros")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230607010826.960226-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07 21:53:11 -07:00
Michal Schmidt
bf15bb38ec ice: make writes to /dev/gnssX synchronous
The current ice driver's GNSS write implementation buffers writes and
works through them asynchronously in a kthread. That's bad because:
 - The GNSS write_raw operation is supposed to be synchronous[1][2].
 - There is no upper bound on the number of pending writes.
   Userspace can submit writes much faster than the driver can process,
   consuming unlimited amounts of kernel memory.

A patch that's currently on review[3] ("[v3,net] ice: Write all GNSS
buffers instead of first one") would add one more problem:
 - The possibility of waiting for a very long time to flush the write
   work when doing rmmod, softlockups.

To fix these issues, simplify the implementation: Drop the buffering,
the write_work, and make the writes synchronous.

I tested this with gpsd and ubxtool.

[1] https://events19.linuxfoundation.org/wp-content/uploads/2017/12/The-GNSS-Subsystem-Johan-Hovold-Hovold-Consulting-AB.pdf
    "User interface" slide.
[2] A comment in drivers/gnss/core.c:gnss_write():
        /* Ignoring O_NONBLOCK, write_raw() is synchronous. */
[3] https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20230217120541.16745-1-karol.kolacinski@intel.com/

Fixes: d6b98c8d24 ("ice: add write functionality for GNSS TTY")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07 10:51:58 +01:00
Brett Creeley
4f48c30312 pds_core: Fix FW recovery detection
Commit 523847df1b ("pds_core: add devcmd device interfaces") included
initial support for FW recovery detection. Unfortunately, the ordering
in pdsc_is_fw_good() was incorrect, which was causing FW recovery to be
undetected by the driver. Fix this by making sure to update the cached
fw_status by calling pdsc_is_fw_running() before setting the local FW
gen.

Fixes: 523847df1b ("pds_core: add devcmd device interfaces")
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230605195116.49653-1-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06 21:10:01 -07:00
Manish Chopra
42510dffd0 qed/qede: Fix scheduling while atomic
Statistics read through bond interface via sysfs causes
below bug and traces as it triggers the bonding module to
collect the slave device statistics while holding the spinlock,
beneath that qede->qed driver statistics flow gets scheduled out
due to usleep_range() used in PTT acquire logic

[ 3673.988874] Hardware name: HPE ProLiant DL365 Gen10 Plus/ProLiant DL365 Gen10 Plus, BIOS A42 10/29/2021
[ 3673.988878] Call Trace:
[ 3673.988891]  dump_stack_lvl+0x34/0x44
[ 3673.988908]  __schedule_bug.cold+0x47/0x53
[ 3673.988918]  __schedule+0x3fb/0x560
[ 3673.988929]  schedule+0x43/0xb0
[ 3673.988932]  schedule_hrtimeout_range_clock+0xbf/0x1b0
[ 3673.988937]  ? __hrtimer_init+0xc0/0xc0
[ 3673.988950]  usleep_range+0x5e/0x80
[ 3673.988955]  qed_ptt_acquire+0x2b/0xd0 [qed]
[ 3673.988981]  _qed_get_vport_stats+0x141/0x240 [qed]
[ 3673.989001]  qed_get_vport_stats+0x18/0x80 [qed]
[ 3673.989016]  qede_fill_by_demand_stats+0x37/0x400 [qede]
[ 3673.989028]  qede_get_stats64+0x19/0xe0 [qede]
[ 3673.989034]  dev_get_stats+0x5c/0xc0
[ 3673.989045]  netstat_show.constprop.0+0x52/0xb0
[ 3673.989055]  dev_attr_show+0x19/0x40
[ 3673.989065]  sysfs_kf_seq_show+0x9b/0xf0
[ 3673.989076]  seq_read_iter+0x120/0x4b0
[ 3673.989087]  new_sync_read+0x118/0x1a0
[ 3673.989095]  vfs_read+0xf3/0x180
[ 3673.989099]  ksys_read+0x5f/0xe0
[ 3673.989102]  do_syscall_64+0x3b/0x90
[ 3673.989109]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3673.989115] RIP: 0033:0x7f8467d0b082
[ 3673.989119] Code: c0 e9 b2 fe ff ff 50 48 8d 3d ca 05 08 00 e8 35 e7 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[ 3673.989121] RSP: 002b:00007ffffb21fd08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 3673.989127] RAX: ffffffffffffffda RBX: 000000000100eca0 RCX: 00007f8467d0b082
[ 3673.989128] RDX: 00000000000003ff RSI: 00007ffffb21fdc0 RDI: 0000000000000003
[ 3673.989130] RBP: 00007f8467b96028 R08: 0000000000000010 R09: 00007ffffb21ec00
[ 3673.989132] R10: 00007ffffb27b170 R11: 0000000000000246 R12: 00000000000000f0
[ 3673.989134] R13: 0000000000000003 R14: 00007f8467b92000 R15: 0000000000045a05
[ 3673.989139] CPU: 30 PID: 285188 Comm: read_all Kdump: loaded Tainted: G        W  OE

Fix this by collecting the statistics asynchronously from a periodic
delayed work scheduled at default stats coalescing interval and return
the recent copy of statisitcs from .ndo_get_stats64(), also add ability
to configure/retrieve stats coalescing interval using below commands -

ethtool -C ethx stats-block-usecs <val>
ethtool -c ethx

Fixes: 133fac0eed ("qede: Add basic ethtool support")
Cc: Sudarsana Kalluru <skalluru@marvell.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Manish Chopra <manishc@marvell.com>
Link: https://lore.kernel.org/r/20230605112600.48238-1-manishc@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-06 13:56:18 +02:00
Bartosz Golaszewski
9bc0097347 net: stmmac: dwmac-qcom-ethqos: fix a regression on EMAC < 3
We must not assign plat_dat->dwmac4_addrs unconditionally as for
structures which don't set them, this will result in the core driver
using zeroes everywhere and breaking the driver for older HW. On EMAC < 2
the address should remain NULL.

Fixes: b68376191c ("net: stmmac: dwmac-qcom-ethqos: Add EMAC3 support")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 11:38:49 +01:00
Wei Fang
fdebd850cc net: enetc: correct rx_bytes statistics of XDP
The rx_bytes statistics of XDP are always zero, because rx_byte_cnt
is not updated after it is initialized to 0. So fix it.

Fixes: d1b15102dd ("net: enetc: add support for XDP_DROP and XDP_PASS")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-04 15:43:45 +01:00
Wei Fang
7190d0ff0e net: enetc: correct the statistics of rx bytes
The rx_bytes of struct net_device_stats should count the length of
ethernet frames excluding the FCS. However, there are two problems
with the rx_bytes statistics of the current enetc driver. one is
that the length of VLAN header is not counted if the VLAN extraction
feature is enabled. The other is that the length of L2 header is not
counted, because eth_type_trans() is invoked before updating rx_bytes
which will subtract the length of L2 header from skb->len.
BTW, the rx_bytes statistics of XDP path also have similar problem,
I will fix it in another patch.

Fixes: a800abd3ec ("net: enetc: move skb creation into enetc_build_skb")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-04 15:43:45 +01:00
Jiasheng Jiang
f93b30e50a net: systemport: Replace platform_get_irq with platform_get_irq_optional
Replace platform_get_irq with platform_get_irq_optional because wol_irq
is optional.

Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-02 09:56:23 +01:00
Jakub Kicinski
a451b8eb96 mlx5-fixes-2023-05-31
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmR4C7UACgkQSD+KveBX
 +j43QQgAm2KnMLUmtSHBFgyFEh7udVTK2XKH8RFwcQ9rhGnpAaDMhZY40oNLuNZv
 i+yMl1dfopNr9SxfQQdyPAbS138rfCIxZi7F26PBAaaPJ6WS1Ds5qIpyS8Dt8FFW
 w2+AS5RXDCZwoa0OYZYqXYnsWtRCjAO7KEChbif26ruuN9alu3ll+b074xup8a8q
 +j3Guk5zFHsf9Qz+RZIXrUj//qoWpsmyKdcVlr/vsWra9Y31tpymtEp7mplUrZmu
 xAmYoRssFiYYGxl5oVIpTzyzGBLxbDrpQiBHZH+XjUmGVYeEUFsVMgCcvLs5VjeF
 wvcEFMTPcuAdkkXzNa2txEznYf53Bg==
 =vNQo
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2023-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2023-05-31

This series provides bug fixes to mlx5 driver.

* tag 'mlx5-fixes-2023-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Read embedded cpu after init bit cleared
  net/mlx5e: Fix error handling in mlx5e_refresh_tirs
  net/mlx5: Ensure af_desc.mask is properly initialized
  net/mlx5: Fix setting of irq->map.index for static IRQ case
  net/mlx5: Remove rmap also in case dynamic MSIX not supported
====================

Link: https://lore.kernel.org/r/20230601031051.131529-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-01 10:15:43 -07:00
Maciej Fijalkowski
abaf8d51b0 ice: recycle/free all of the fragments from multi-buffer frame
The ice driver caches next_to_clean value at the beginning of
ice_clean_rx_irq() in order to remember the first buffer that has to be
freed/recycled after main Rx processing loop. The end boundary is
indicated by first descriptor of frame that Rx processing loop has ended
its duties. Note that if mentioned loop ended in the middle of gathering
multi-buffer frame, next_to_clean would be pointing to the descriptor in
the middle of the frame BUT freeing/recycling stage will stop at the
first descriptor. This means that next iteration of ice_clean_rx_irq()
will miss the (first_desc, next_to_clean - 1) entries.

 When running various 9K MTU workloads, such splats were observed:

[  540.780716] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  540.787787] #PF: supervisor read access in kernel mode
[  540.793002] #PF: error_code(0x0000) - not-present page
[  540.798218] PGD 0 P4D 0
[  540.800801] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  540.805231] CPU: 18 PID: 3984 Comm: xskxceiver Tainted: G        W          6.3.0-rc7+ #96
[  540.813619] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[  540.824209] RIP: 0010:ice_clean_rx_irq+0x2b6/0xf00 [ice]
[  540.829678] Code: 74 24 10 e9 aa 00 00 00 8b 55 78 41 31 57 10 41 09 c4 4d 85 ff 0f 84 83 00 00 00 49 8b 57 08 41 8b 4f 1c 65 8b 35 1a fa 4b 3f <48> 8b 02 48 c1 e8 3a 39 c6 0f 85 a2 00 00 00 f6 42 08 02 0f 85 98
[  540.848717] RSP: 0018:ffffc9000f42fc50 EFLAGS: 00010282
[  540.854029] RAX: 0000000000000004 RBX: 0000000000000002 RCX: 000000000000fffe
[  540.861272] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00000000ffffffff
[  540.868519] RBP: ffff88984a05ac00 R08: 0000000000000000 R09: dead000000000100
[  540.875760] R10: ffff88983fffcd00 R11: 000000000010f2b8 R12: 0000000000000004
[  540.883008] R13: 0000000000000003 R14: 0000000000000800 R15: ffff889847a10040
[  540.890253] FS:  00007f6ddf7fe640(0000) GS:ffff88afdf800000(0000) knlGS:0000000000000000
[  540.898465] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  540.904299] CR2: 0000000000000000 CR3: 000000010d3da001 CR4: 00000000007706e0
[  540.911542] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  540.918789] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  540.926032] PKRU: 55555554
[  540.928790] Call Trace:
[  540.931276]  <TASK>
[  540.933418]  ice_napi_poll+0x4ca/0x6d0 [ice]
[  540.937804]  ? __pfx_ice_napi_poll+0x10/0x10 [ice]
[  540.942716]  napi_busy_loop+0xd7/0x320
[  540.946537]  xsk_recvmsg+0x143/0x170
[  540.950178]  sock_recvmsg+0x99/0xa0
[  540.953729]  __sys_recvfrom+0xa8/0x120
[  540.957543]  ? do_futex+0xbd/0x1d0
[  540.961008]  ? __x64_sys_futex+0x73/0x1d0
[  540.965083]  __x64_sys_recvfrom+0x20/0x30
[  540.969155]  do_syscall_64+0x38/0x90
[  540.972796]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  540.977934] RIP: 0033:0x7f6de5f27934

To fix this, set cached_ntc to first_desc so that at the end, when
freeing/recycling buffers, descriptors from first to ntc are not missed.

Fixes: 2fba7dc515 ("ice: Add support for XDP multi-buffer on Rx side")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230531154457.3216621-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-01 09:59:39 -07:00
Yoshihiro Shimoda
a60caf039e net: renesas: rswitch: Fix return value in error path of xmit
Fix return value in the error path of rswitch_start_xmit(). If TX
queues are full, this function should return NETDEV_TX_BUSY.

Fixes: 3590918b5d ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20230529073817.1145208-1-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-01 09:57:08 -07:00
Edward Cree
622ab65634 sfc: fix error unwinds in TC offload
Failure ladders weren't exactly unwinding what the function had done up
 to that point; most seriously, when we encountered an already offloaded
 rule, the failure path tried to remove the new rule from the hashtable,
 which would in fact remove the already-present 'old' rule (since it has
 the same key) from the table, and leak its resources.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Closes: https://lore.kernel.org/r/202305200745.xmIlkqjH-lkp@intel.com/
Fixes: d902e1a737 ("sfc: bare bones TC offload on EF100")
Fixes: 17654d84b4 ("sfc: add offloading of 'foreign' TC (decap) rules")
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230530202527.53115-1-edward.cree@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-31 22:30:27 -07:00
Moshe Shemesh
bbfa4b5899 net/mlx5: Read embedded cpu after init bit cleared
During driver load it reads embedded_cpu bit from initialization
segment, but the initialization segment is readable only after
initialization bit is cleared.

Move the call to mlx5_read_embedded_cpu() right after initialization bit
cleared.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Fixes: 591905ba96 ("net/mlx5: Introduce Mellanox SmartNIC and modify page management logic")
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-31 20:08:37 -07:00
Saeed Mahameed
b6193d7030 net/mlx5e: Fix error handling in mlx5e_refresh_tirs
Allocation failure is outside the critical lock section and should
return immediately rather than jumping to the unlock section.

Also unlock as soon as required and remove the now redundant jump label.

Fixes: 80a2a9026b ("net/mlx5e: Add a lock on tir list")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-31 20:08:37 -07:00
Chuck Lever
368591995d net/mlx5: Ensure af_desc.mask is properly initialized
[    9.837087] mlx5_core 0000:02:00.0: firmware version: 16.35.2000
[    9.843126] mlx5_core 0000:02:00.0: 126.016 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x16 link)
[   10.311515] mlx5_core 0000:02:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[   10.321948] mlx5_core 0000:02:00.0: E-Switch: Total vports 2, per vport: max uc(128) max mc(2048)
[   10.344324] mlx5_core 0000:02:00.0: mlx5_pcie_event:301:(pid 88): PCIe slot advertised sufficient power (27W).
[   10.354339] BUG: unable to handle page fault for address: ffffffff8ff0ade0
[   10.361206] #PF: supervisor read access in kernel mode
[   10.366335] #PF: error_code(0x0000) - not-present page
[   10.371467] PGD 81ec39067 P4D 81ec39067 PUD 81ec3a063 PMD 114b07063 PTE 800ffff7e10f5062
[   10.379544] Oops: 0000 [#1] PREEMPT SMP PTI
[   10.383721] CPU: 0 PID: 117 Comm: kworker/0:6 Not tainted 6.3.0-13028-g7222f123c983 #1
[   10.391625] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0b 06/12/2017
[   10.398750] Workqueue: events work_for_cpu_fn
[   10.403108] RIP: 0010:__bitmap_or+0x10/0x26
[   10.407286] Code: 85 c0 0f 95 c0 c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 89 c9 31 c0 48 83 c1 3f 48 c1 e9 06 39 c>
[   10.426024] RSP: 0000:ffffb45a0078f7b0 EFLAGS: 00010097
[   10.431240] RAX: 0000000000000000 RBX: ffffffff8ff0adc0 RCX: 0000000000000004
[   10.438365] RDX: ffff9156801967d0 RSI: ffffffff8ff0ade0 RDI: ffff9156801967b0
[   10.445489] RBP: ffffb45a0078f7e8 R08: 0000000000000030 R09: 0000000000000000
[   10.452613] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000000ec
[   10.459737] R13: ffffffff8ff0ade0 R14: 0000000000000001 R15: 0000000000000020
[   10.466862] FS:  0000000000000000(0000) GS:ffff9165bfc00000(0000) knlGS:0000000000000000
[   10.474936] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.480674] CR2: ffffffff8ff0ade0 CR3: 00000001011ae003 CR4: 00000000003706f0
[   10.487800] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   10.494922] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   10.502046] Call Trace:
[   10.504493]  <TASK>
[   10.506589]  ? matrix_alloc_area.constprop.0+0x43/0x9a
[   10.511729]  ? prepare_namespace+0x84/0x174
[   10.515914]  irq_matrix_reserve_managed+0x56/0x10c
[   10.520699]  x86_vector_alloc_irqs+0x1d2/0x31e
[   10.525146]  irq_domain_alloc_irqs_hierarchy+0x39/0x3f
[   10.530284]  irq_domain_alloc_irqs_parent+0x1a/0x2a
[   10.535155]  intel_irq_remapping_alloc+0x59/0x5e9
[   10.539859]  ? kmem_cache_debug_flags+0x11/0x26
[   10.544383]  ? __radix_tree_lookup+0x39/0xb9
[   10.548649]  irq_domain_alloc_irqs_hierarchy+0x39/0x3f
[   10.553779]  irq_domain_alloc_irqs_parent+0x1a/0x2a
[   10.558650]  msi_domain_alloc+0x8c/0x120
[   10.567697]  irq_domain_alloc_irqs_locked+0x11d/0x286
[   10.572741]  __irq_domain_alloc_irqs+0x72/0x93
[   10.577179]  __msi_domain_alloc_irqs+0x193/0x3f1
[   10.581789]  ? __xa_alloc+0xcf/0xe2
[   10.585273]  msi_domain_alloc_irq_at+0xa8/0xfe
[   10.589711]  pci_msix_alloc_irq_at+0x47/0x5c

The crash is due to matrix_alloc_area() attempting to access per-CPU
memory for CPUs that are not present on the system. The CPU mask
passed into reserve_managed_vector() via it's @irqd parameter is
corrupted because it contains uninitialized stack data.

Fixes: bbac70c741 ("net/mlx5: Use newer affinity descriptor")
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-31 20:08:36 -07:00
Niklas Schnelle
8764bd0fa5 net/mlx5: Fix setting of irq->map.index for static IRQ case
When dynamic IRQ allocation is not supported all IRQs are allocated up
front in mlx5_irq_table_create() instead of dynamically as part of
mlx5_irq_alloc(). In the latter dynamic case irq->map.index is set
via the mapping returned by pci_msix_alloc_irq_at(). In the static case
and prior to commit 1da438c0ae ("net/mlx5: Fix indexing of mlx5_irq")
irq->map.index was set in mlx5_irq_alloc() twice once initially to 0 and
then to the requested index before storing in the xarray. After this
commit it is only set to 0 which breaks all other IRQ mappings.

Fix this by setting irq->map.index to the requested index together with
irq->map.virq and improve the related comment to make it clearer which
cases it deals with.

Cc: Chuck Lever III <chuck.lever@oracle.com>
Tested-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Fixes: 1da438c0ae ("net/mlx5: Fix indexing of mlx5_irq")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Tested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-31 20:08:36 -07:00
Shay Drory
1c4c769cdf net/mlx5: Remove rmap also in case dynamic MSIX not supported
mlx5 add IRQs to rmap upon MSIX request, and mlx5 remove rmap from
MSIX only if msi_map.index is populated. However, msi_map.index is
populated only when dynamic MSIX is supported. This results in freeing
IRQs without removing them from rmap, which triggers the bellow
WARN_ON[1].

rmap is a feature which have no relation to dynamic MSIX.
Hence, remove the check of msi_map.index when removing IRQ from rmap.

[1]
[  200.307160 ] WARNING: CPU: 20 PID: 1702 at kernel/irq/manage.c:2034 free_irq+0x2ac/0x358
[  200.316990 ] CPU: 20 PID: 1702 Comm: modprobe Not tainted 6.4.0-rc3_for_upstream_min_debug_2023_05_24_14_02 #1
[  200.318939 ] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[  200.321659 ] pc : free_irq+0x2ac/0x358
[  200.322400 ] lr : free_irq+0x20/0x358
[  200.337865 ] Call trace:
[  200.338360 ]  free_irq+0x2ac/0x358
[  200.339029 ]  irq_release+0x58/0xd0 [mlx5_core]
[  200.340093 ]  mlx5_irqs_release_vectors+0x80/0xb0 [mlx5_core]
[  200.341344 ]  destroy_comp_eqs+0x120/0x170 [mlx5_core]
[  200.342469 ]  mlx5_eq_table_destroy+0x1c/0x38 [mlx5_core]
[  200.343645 ]  mlx5_unload+0x8c/0xc8 [mlx5_core]
[  200.344652 ]  mlx5_uninit_one+0x78/0x118 [mlx5_core]
[  200.345745 ]  remove_one+0x80/0x108 [mlx5_core]
[  200.346752 ]  pci_device_remove+0x40/0xd8
[  200.347554 ]  device_remove+0x50/0x88
[  200.348272 ]  device_release_driver_internal+0x1c4/0x228
[  200.349312 ]  driver_detach+0x54/0xa0
[  200.350030 ]  bus_remove_driver+0x74/0x100
[  200.350833 ]  driver_unregister+0x34/0x68
[  200.351619 ]  pci_unregister_driver+0x28/0xa0
[  200.352476 ]  mlx5_cleanup+0x14/0x2210 [mlx5_core]
[  200.353536 ]  __arm64_sys_delete_module+0x190/0x2e8
[  200.354495 ]  el0_svc_common.constprop.0+0x6c/0x1d0
[  200.355455 ]  do_el0_svc+0x38/0x98
[  200.356122 ]  el0_svc+0x1c/0x80
[  200.356739 ]  el0t_64_sync_handler+0xb4/0x130
[  200.357604 ]  el0t_64_sync+0x174/0x178
[  200.358345 ] ---[ end trace 0000000000000000  ]---

Fixes: 3354822cde ("net/mlx5: Use dynamic msix vectors allocation")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-31 20:08:36 -07:00
Haiyang Zhang
1919b39fc6 net: mana: Fix perf regression: remove rx_cqes, tx_cqes counters
The apc->eth_stats.rx_cqes is one per NIC (vport), and it's on the
frequent and parallel code path of all queues. So, r/w into this
single shared variable by many threads on different CPUs creates a
lot caching and memory overhead, hence perf regression. And, it's
not accurate due to the high volume concurrent r/w.

For example, a workload is iperf with 128 threads, and with RPS
enabled. We saw perf regression of 25% with the previous patch
adding the counters. And this patch eliminates the regression.

Since the error path of mana_poll_rx_cq() already has warnings, so
keeping the counter and convert it to a per-queue variable is not
necessary. So, just remove this counter from this high frequency
code path.

Also, remove the tx_cqes counter for the same reason. We have
warnings & other counters for errors on that path, and don't need
to count every normal cqe processing.

Cc: stable@vger.kernel.org
Fixes: bd7fc6e195 ("net: mana: Add new MANA VF performance counters for easier troubleshooting")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/1685115537-31675-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-30 12:05:22 +02:00
Raju Rangoju
dc362e20cd amd-xgbe: fix the false linkup in xgbe_phy_status
In the event of a change in XGBE mode, the current auto-negotiation
needs to be reset and the AN cycle needs to be re-triggerred. However,
the current code ignores the return value of xgbe_set_mode(), leading to
false information as the link is declared without checking the status
register.

Fix this by propagating the mode switch status information to
xgbe_phy_status().

Fixes: e57f7a3fea ("amd-xgbe: Prepare for working with more than one type of phy")
Co-developed-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
Signed-off-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-26 10:42:06 +01:00