1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

896 commits

Author SHA1 Message Date
Sebastian Andrzej Siewior
7f04bd109d net: Tree wide: Replace xdp_do_flush_map() with xdp_do_flush().
xdp_do_flush_map() is deprecated and new code should use xdp_do_flush()
instead.

Replace xdp_do_flush_map() with xdp_do_flush().

Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Clark Wang <xiaoning.wang@nxp.com>
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Cc: David Arinzon <darinzon@amazon.com>
Cc: Edward Cree <ecree.xilinx@gmail.com>
Cc: Felix Fietkau <nbd@nbd.name>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Jassi Brar <jaswinder.singh@linaro.org>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: John Crispin <john@phrozen.org>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Lorenzo Bianconi <lorenzo@kernel.org>
Cc: Louis Peens <louis.peens@corigine.com>
Cc: Marcin Wojtas <mw@semihalf.com>
Cc: Mark Lee <Mark-MC.Lee@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: NXP Linux Team <linux-imx@nxp.com>
Cc: Noam Dagan <ndagan@amazon.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Saeed Bishara <saeedb@amazon.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: Shay Agroskin <shayagr@amazon.com>
Cc: Shenwei Wang <shenwei.wang@nxp.com>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Cc: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Arthur Kiyanovski <akiyano@amazon.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230908143215.869913-2-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-03 07:34:51 -07:00
Lin Ma
f1d152eb66 rtnetlink: remove redundant checks for nlattr IFLA_BRIDGE_MODE
The commit d73ef2d69c ("rtnetlink: let rtnl_bridge_setlink checks
IFLA_BRIDGE_MODE length") added the nla_len check in rtnl_bridge_setlink,
which is the only caller for ndo_bridge_setlink handlers defined in
low-level driver codes. Hence, this patch cleanups the redundant checks in
each ndo_bridge_setlink handler function.

Suggested-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Acked-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230807091347.3804523-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-08 15:10:37 -07:00
Jiri Benc
94d166c531 vxlan: calculate correct header length for GPE
VXLAN-GPE does not add an extra inner Ethernet header. Take that into
account when calculating header length.

This causes problems in skb_tunnel_check_pmtu, where incorrect PMTU is
cached.

In the collect_md mode (which is the only mode that VXLAN-GPE
supports), there's no magic auto-setting of the tunnel interface MTU.
It can't be, since the destination and thus the underlying interface
may be different for each packet.

So, the administrator is responsible for setting the correct tunnel
interface MTU. Apparently, the administrators are capable enough to
calculate that the maximum MTU for VXLAN-GPE is (their_lower_MTU - 36).
They set the tunnel interface MTU to 1464. If you run a TCP stream over
such interface, it's then segmented according to the MTU 1464, i.e.
producing 1514 bytes frames. Which is okay, this still fits the lower
MTU.

However, skb_tunnel_check_pmtu (called from vxlan_xmit_one) uses 50 as
the header size and thus incorrectly calculates the frame size to be
1528. This leads to ICMP too big message being generated (locally),
PMTU of 1450 to be cached and the TCP stream to be resegmented.

The fix is to use the correct actual header size, especially for
skb_tunnel_check_pmtu calculation.

Fixes: e1e5314de0 ("vxlan: implement GPE")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-24 09:37:32 +01:00
Jakub Kicinski
f0d751973f eth: ixgbe: fix the wake condition
Flip the netif_carrier_ok() condition in queue wake logic.
When I moved it to inside __netif_txq_completed_wake()
I missed negating it.

This made the condition ineffective and could probably
lead to crashes.

Fixes: 301f227fc8 ("net: piggy back on the memory barrier in bql when waking queues")
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230607010826.960226-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07 21:53:11 -07:00
John Hickey
c23ae5091a ixgbe: Fix panic during XDP_TX with > 64 CPUs
Commit 4fe815850b ("ixgbe: let the xdpdrv work with more than 64 cpus")
adds support to allow XDP programs to run on systems with more than
64 CPUs by locking the XDP TX rings and indexing them using cpu % 64
(IXGBE_MAX_XDP_QS).

Upon trying this out patch on a system with more than 64 cores,
the kernel paniced with an array-index-out-of-bounds at the return in
ixgbe_determine_xdp_ring in ixgbe.h, which means ixgbe_determine_xdp_q_idx
was just returning the cpu instead of cpu % IXGBE_MAX_XDP_QS.  An example
splat:

 ==========================================================================
 UBSAN: array-index-out-of-bounds in
 /var/lib/dkms/ixgbe/5.18.6+focal-1/build/src/ixgbe.h:1147:26
 index 65 is out of range for type 'ixgbe_ring *[64]'
 ==========================================================================
 BUG: kernel NULL pointer dereference, address: 0000000000000058
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP NOPTI
 CPU: 65 PID: 408 Comm: ksoftirqd/65
 Tainted: G          IOE     5.15.0-48-generic #54~20.04.1-Ubuntu
 Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 2.5.4 01/13/2020
 RIP: 0010:ixgbe_xmit_xdp_ring+0x1b/0x1c0 [ixgbe]
 Code: 3b 52 d4 cf e9 42 f2 ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00 55 b9
 00 00 00 00 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 08 <44> 0f b7
 47 58 0f b7 47 5a 0f b7 57 54 44 0f b7 76 08 66 41 39 c0
 RSP: 0018:ffffbc3fcd88fcb0 EFLAGS: 00010282
 RAX: ffff92a253260980 RBX: ffffbc3fe68b00a0 RCX: 0000000000000000
 RDX: ffff928b5f659000 RSI: ffff928b5f659000 RDI: 0000000000000000
 RBP: ffffbc3fcd88fce0 R08: ffff92b9dfc20580 R09: 0000000000000001
 R10: 3d3d3d3d3d3d3d3d R11: 3d3d3d3d3d3d3d3d R12: 0000000000000000
 R13: ffff928b2f0fa8c0 R14: ffff928b9be20050 R15: 000000000000003c
 FS:  0000000000000000(0000) GS:ffff92b9dfc00000(0000)
 knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000058 CR3: 000000011dd6a002 CR4: 00000000007706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  <TASK>
  ixgbe_poll+0x103e/0x1280 [ixgbe]
  ? sched_clock_cpu+0x12/0xe0
  __napi_poll+0x30/0x160
  net_rx_action+0x11c/0x270
  __do_softirq+0xda/0x2ee
  run_ksoftirqd+0x2f/0x50
  smpboot_thread_fn+0xb7/0x150
  ? sort_range+0x30/0x30
  kthread+0x127/0x150
  ? set_kthread_struct+0x50/0x50
  ret_from_fork+0x1f/0x30
  </TASK>

I think this is how it happens:

Upon loading the first XDP program on a system with more than 64 CPUs,
ixgbe_xdp_locking_key is incremented in ixgbe_xdp_setup.  However,
immediately after this, the rings are reconfigured by ixgbe_setup_tc.
ixgbe_setup_tc calls ixgbe_clear_interrupt_scheme which calls
ixgbe_free_q_vectors which calls ixgbe_free_q_vector in a loop.
ixgbe_free_q_vector decrements ixgbe_xdp_locking_key once per call if
it is non-zero.  Commenting out the decrement in ixgbe_free_q_vector
stopped my system from panicing.

I suspect to make the original patch work, I would need to load an XDP
program and then replace it in order to get ixgbe_xdp_locking_key back
above 0 since ixgbe_setup_tc is only called when transitioning between
XDP and non-XDP ring configurations, while ixgbe_xdp_locking_key is
incremented every time ixgbe_xdp_setup is called.

Also, ixgbe_setup_tc can be called via ethtool --set-channels, so this
becomes another path to decrement ixgbe_xdp_locking_key to 0 on systems
with more than 64 CPUs.

Since ixgbe_xdp_locking_key only protects the XDP_TX path and is tied
to the number of CPUs present, there is no reason to disable it upon
unloading an XDP program.  To avoid confusion, I have moved enabling
ixgbe_xdp_locking_key into ixgbe_sw_init, which is part of the probe path.

Fixes: 4fe815850b ("ixgbe: let the xdpdrv work with more than 64 cpus")
Signed-off-by: John Hickey <jjh@daedalian.us>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230425170308.2522429-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-27 11:54:33 +02:00
Vladimir Oltean
1f5020acb3 net: vlan: introduce skb_vlan_eth_hdr()
Similar to skb_eth_hdr() introduced in commit 96cc4b6958 ("macvlan: do
not assume mac_header is set in macvlan_broadcast()"), let's introduce a
skb_vlan_eth_hdr() helper which can be used in TX-only code paths to get
to the VLAN header based on skb->data rather than based on the
skb_mac_header(skb).

We also consolidate the drivers that dereference skb->data to go through
this helper.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23 14:16:44 +01:00
Jakub Kicinski
301f227fc8 net: piggy back on the memory barrier in bql when waking queues
Drivers call netdev_tx_completed_queue() right before
netif_txq_maybe_wake(). If BQL is enabled netdev_tx_completed_queue()
should issue a memory barrier, so we can depend on that separating
the stop check from the consumer index update, instead of adding
another barrier in netif_txq_maybe_wake().

This matters more than the barriers on the xmit path, because
the wake condition is almost always true. So we issue the
consumer side barrier often.

Wrap netdev_tx_completed_queue() in a local helper to issue
the barrier even if BQL is disabled. Keep the same semantics
as netdev_tx_completed_queue() (barrier only if bytes != 0)
to make it clear that the barrier is conditional.

Plus since macro gets pkt/byte counts as arguments now -
we can skip waking if there were no packets completed.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-10 17:56:18 -07:00
Jakub Kicinski
9ded5bc77f ixgbe: use new queue try_stop/try_wake macros
Convert ixgbe to use the new macros, I think a lot of people
copy the ixgbe code. The only functional change is that the
unlikely() in ixgbe_clean_tx_irq() turns into a likely()
inside the new macro and no longer includes

  total_packets && netif_carrier_ok(tx_ring->netdev)

which is probably for the best, anyway.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-10 17:56:18 -07:00
David S. Miller
675f176b4d Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net
Some of the devlink bits were tricky, but I think I got it right.

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-17 11:06:39 +00:00
Jason Xing
0967bf8377 ixgbe: add double of VLAN header when computing the max MTU
Include the second VLAN HLEN into account when computing the maximum
MTU size as other drivers do.

Fixes: fabf1bce10 ("ixgbe: Prevent unsupported configurations with XDP")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-02-14 10:07:27 -08:00
Jason Xing
f9cd6a4418 ixgbe: allow to increase MTU to 3K with XDP enabled
Recently I encountered one case where I cannot increase the MTU size
directly from 1500 to a much bigger value with XDP enabled if the
server is equipped with IXGBE card, which happened on thousands of
servers in production environment. After applying the current patch,
we can set the maximum MTU size to 3K.

This patch follows the behavior of changing MTU as i40e/ice does.

References:
[1] commit 23b44513c3 ("ice: allow 3k MTU for XDP")
[2] commit 0c8493d90b ("i40e: add XDP support for pass and drop actions")

Fixes: fabf1bce10 ("ixgbe: Prevent unsupported configurations with XDP")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-02-14 10:07:13 -08:00
Jakub Kicinski
de42873367 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY+bZrwAKCRDbK58LschI
 gzi4AP4+TYo0jnSwwkrOoN9l4f5VO9X8osmj3CXfHBv7BGWVxAD/WnvA3TDZyaUd
 agIZTkRs6BHF9He8oROypARZxTeMLwM=
 =nO1C
 -----END PGP SIGNATURE-----

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-02-11

We've added 96 non-merge commits during the last 14 day(s) which contain
a total of 152 files changed, 4884 insertions(+), 962 deletions(-).

There is a minor conflict in drivers/net/ethernet/intel/ice/ice_main.c
between commit 5b246e533d ("ice: split probe into smaller functions")
from the net-next tree and commit 66c0e13ad2 ("drivers: net: turn on
XDP features") from the bpf-next tree. Remove the hunk given ice_cfg_netdev()
is otherwise there a 2nd time, and add XDP features to the existing
ice_cfg_netdev() one:

        [...]
        ice_set_netdev_features(netdev);
        netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
                               NETDEV_XDP_ACT_XSK_ZEROCOPY;
        ice_set_ops(netdev);
        [...]

Stephen's merge conflict mail:
https://lore.kernel.org/bpf/20230207101951.21a114fa@canb.auug.org.au/

The main changes are:

1) Add support for BPF trampoline on s390x which finally allows to remove many
   test cases from the BPF CI's DENYLIST.s390x, from Ilya Leoshkevich.

2) Add multi-buffer XDP support to ice driver, from Maciej Fijalkowski.

3) Add capability to export the XDP features supported by the NIC.
   Along with that, add a XDP compliance test tool,
   from Lorenzo Bianconi & Marek Majtyka.

4) Add __bpf_kfunc tag for marking kernel functions as kfuncs,
   from David Vernet.

5) Add a deep dive documentation about the verifier's register
   liveness tracking algorithm, from Eduard Zingerman.

6) Fix and follow-up cleanups for resolve_btfids to be compiled
   as a host program to avoid cross compile issues,
   from Jiri Olsa & Ian Rogers.

7) Batch of fixes to the BPF selftest for xdp_hw_metadata which resulted
   when testing on different NICs, from Jesper Dangaard Brouer.

8) Fix libbpf to better detect kernel version code on Debian, from Hao Xiang.

9) Extend libbpf to add an option for when the perf buffer should
   wake up, from Jon Doron.

10) Follow-up fix on xdp_metadata selftest to just consume on TX
    completion, from Stanislav Fomichev.

11) Extend the kfuncs.rst document with description on kfunc
    lifecycle & stability expectations, from David Vernet.

12) Fix bpftool prog profile to skip attaching to offline CPUs,
    from Tonghao Zhang.

====================

Link: https://lore.kernel.org/r/20230211002037.8489-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10 17:51:27 -08:00
Marek Majtyka
66c0e13ad2 drivers: net: turn on XDP features
A summary of the flags being set for various drivers is given below.
Note that XDP_F_REDIRECT_TARGET and XDP_F_FRAG_TARGET are features
that can be turned off and on at runtime. This means that these flags
may be set and unset under RTNL lock protection by the driver. Hence,
READ_ONCE must be used by code loading the flag value.

Also, these flags are not used for synchronization against the availability
of XDP resources on a device. It is merely a hint, and hence the read
may race with the actual teardown of XDP resources on the device. This
may change in the future, e.g. operations taking a reference on the XDP
resources of the driver, and in turn inhibiting turning off this flag.
However, for now, it can only be used as a hint to check whether device
supports becoming a redirection target.

Turn 'hw-offload' feature flag on for:
 - netronome (nfp)
 - netdevsim.

Turn 'native' and 'zerocopy' features flags on for:
 - intel (i40e, ice, ixgbe, igc)
 - mellanox (mlx5).
 - stmmac
 - netronome (nfp)

Turn 'native' features flags on for:
 - amazon (ena)
 - broadcom (bnxt)
 - freescale (dpaa, dpaa2, enetc)
 - funeth
 - intel (igb)
 - marvell (mvneta, mvpp2, octeontx2)
 - mellanox (mlx4)
 - mtk_eth_soc
 - qlogic (qede)
 - sfc
 - socionext (netsec)
 - ti (cpsw)
 - tap
 - tsnep
 - veth
 - xen
 - virtio_net.

Turn 'basic' (tx, pass, aborted and drop) features flags on for:
 - netronome (nfp)
 - cavium (thunder)
 - hyperv.

Turn 'redirect_target' feature flag on for:
 - amanzon (ena)
 - broadcom (bnxt)
 - freescale (dpaa, dpaa2)
 - intel (i40e, ice, igb, ixgbe)
 - ti (cpsw)
 - marvell (mvneta, mvpp2)
 - sfc
 - socionext (netsec)
 - qlogic (qede)
 - mellanox (mlx5)
 - tap
 - veth
 - virtio_net
 - xen

Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Marek Majtyka <alardam@gmail.com>
Link: https://lore.kernel.org/r/3eca9fafb308462f7edb1f58e451d59209aa07eb.1675245258.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-02 20:48:23 -08:00
Bjorn Helgaas
dec6b80164 ixgbe: Remove redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages.  Since f26e58bf6f ("PCI/AER: Enable error reporting when AER is
native"), the PCI core does this for all devices during enumeration.

Remove the redundant pci_enable_pcie_error_reporting() call from the
driver.  Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.

Note that this doesn't control interrupt generation by the Root Port; that
is controlled by the AER Root Error Command register, which is managed by
the AER service driver.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-30 09:35:05 -08:00
Andrew Lunn
ab2960f0fd ixgbe: Use C45 mdiobus accessors
When performing a C45 bus transaction, make use of the c45 variants of
the bus read/write helpers. The ability to pass a special register
value is being removed to clean up the mdio bus driver API.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-17 19:34:08 -08:00
Jesse Brandeburg
3fe1d0a48d ixgbe: XDP: fix checker warning from rcu pointer
The ixgbe driver uses an older style failure mode when initializing the
XDP program and the queues. It causes some warnings when running C=2
checking builds (and it's the last one in the ethernet/intel tree).

$ make W=1 C=2 M=`pwd`/drivers/net/ethernet/intel modules
.../ixgbe_main.c:10301:25: error: incompatible types in comparison expression (different address spaces):
.../ixgbe_main.c:10301:25:    struct bpf_prog [noderef] __rcu *
.../ixgbe_main.c:10301:25:    struct bpf_prog *

Fix the problem by removing the line that tried to re-xchg "the old_prog
pointer" if there was an error, to make this driver act like the other
drivers which return the error code without "pointer restoration."

Also, update the "copy the pointer" logic to use WRITE_ONCE as many/all
the other drivers do, which required making a change in two separate
functions that write the xdp_prog variable in the ring.

The code here was modeled after the code in i40e/i40e_xdp_setup().

NOTE: Compile-tested only.

CC: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
CC: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-13 09:42:15 -08:00
Thomas Gleixner
068c38ad88 net: Remove the obsolte u64_stats_fetch_*_irq() users (drivers).
Now that the 32bit UP oddity is gone and 32bit uses always a sequence
count, there is no need for the fetch_irq() variants anymore.

Convert to the regular interface.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28 20:13:54 -07:00
Wolfram Sang
f029c781dd net: ethernet: move from strlcpy with unused retval to strscpy
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Petr Machata <petrm@nvidia.com> # For drivers/net/ethernet/mellanox/mlxsw
Acked-by: Geoff Levand <geoff@infradead.org> # For ps3_gelic_net and spider_net_ethtool
Acked-by: Tom Lendacky <thomas.lendacky@amd.com> # For drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
Acked-by: Marcin Wojtas <mw@semihalf.com> # For drivers/net/ethernet/marvell/mvpp2
Reviewed-by: Leon Romanovsky <leonro@nvidia.com> # For drivers/net/ethernet/mellanox/mlx{4|5}
Reviewed-by: Shay Agroskin <shayagr@amazon.com> # For drivers/net/ethernet/amazon/ena
Acked-by: Krzysztof Hałasa <khalasa@piap.pl> # For IXP4xx Ethernet
Link: https://lore.kernel.org/r/20220830201457.7984-3-wsa+renesas@sang-engineering.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-31 14:11:26 -07:00
Jakub Kicinski
6e0e846ee2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-21 13:03:39 -07:00
Piotr Skajewski
1e53834ce5 ixgbe: Add locking to prevent panic when setting sriov_numvfs to zero
It is possible to disable VFs while the PF driver is processing requests
from the VF driver.  This can result in a panic.

BUG: unable to handle kernel paging request at 000000000000106c
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 8 PID: 0 Comm: swapper/8 Kdump: loaded Tainted: G I      --------- -
Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS 2.8.2 08/27/2020
RIP: 0010:ixgbe_msg_task+0x4c8/0x1690 [ixgbe]
Code: 00 00 48 8d 04 40 48 c1 e0 05 89 7c 24 24 89 fd 48 89 44 24 10 83 ff
01 0f 84 b8 04 00 00 4c 8b 64 24 10 4d 03 a5 48 22 00 00 <41> 80 7c 24 4c
00 0f 84 8a 03 00 00 0f b7 c7 83 f8 08 0f 84 8f 0a
RSP: 0018:ffffb337869f8df8 EFLAGS: 00010002
RAX: 0000000000001020 RBX: 0000000000000000 RCX: 000000000000002b
RDX: 0000000000000002 RSI: 0000000000000008 RDI: 0000000000000006
RBP: 0000000000000006 R08: 0000000000000002 R09: 0000000000029780
R10: 00006957d8f42832 R11: 0000000000000000 R12: 0000000000001020
R13: ffff8a00e8978ac0 R14: 000000000000002b R15: ffff8a00e8979c80
FS:  0000000000000000(0000) GS:ffff8a07dfd00000(0000) knlGS:00000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000106c CR3: 0000000063e10004 CR4: 00000000007726e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <IRQ>
 ? ttwu_do_wakeup+0x19/0x140
 ? try_to_wake_up+0x1cd/0x550
 ? ixgbevf_update_xcast_mode+0x71/0xc0 [ixgbevf]
 ixgbe_msix_other+0x17e/0x310 [ixgbe]
 __handle_irq_event_percpu+0x40/0x180
 handle_irq_event_percpu+0x30/0x80
 handle_irq_event+0x36/0x53
 handle_edge_irq+0x82/0x190
 handle_irq+0x1c/0x30
 do_IRQ+0x49/0xd0
 common_interrupt+0xf/0xf

This can be eventually be reproduced with the following script:

while :
do
    echo 63 > /sys/class/net/<devname>/device/sriov_numvfs
    sleep 1
    echo 0 > /sys/class/net/<devname>/device/sriov_numvfs
    sleep 1
done

Add lock when disabling SR-IOV to prevent process VF mailbox communication.

Fixes: d773d13106 ("ixgbe: Fix memory leak when SR-IOV VFs are direct assigned")
Signed-off-by: Piotr Skajewski <piotrx.skajewski@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220715214456.2968711-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 20:13:40 -07:00
Jiang Jian
eb6683b622 ixgbe: drop unexpected word 'for' in comments
there is an unexpected word 'for' in the comments that need to be dropped

file - drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
line - 5164

 * ixgbe_lpbthresh - calculate low water mark for for flow control

changed to:

 * ixgbe_lpbthresh - calculate low water mark for flow control

Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-30 10:46:42 -07:00
Maximilian Heyne
37530030c0 drivers, ixgbe: export vf statistics
This change retrieves network metrics for virtual functions from the
device and exports them via the iproute2 interface.

The code for retrieving the statistics from the device is taken from the
out-of-tree driver.  The feature was introduced with version 2.0.75.7,
so the diff between this version and the previous version 2.0.72.4 was
used to identify required changes. The export via ethtool is omitted in
favor of using the standard ndo_get_vf_stats interface.

Per-VF statistics can now be printed, for instance, via

  ip --statistics link show dev eth0

Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-09 09:49:45 -07:00
Lorenzo Bianconi
470bcfd603 ixgbe: add xdp frags support to ndo_xdp_xmit
Add the capability to map non-linear xdp frames in XDP_TX and ndo_xdp_xmit
callback.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220512212621.3746140-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-13 17:05:41 -07:00
Jakub Kicinski
53e2cb3b2a Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
10GbE Intel Wired LAN Driver Updates 2022-05-05

This series contains updates to ixgbe and igb drivers.

Jeff Daly adjusts type for 'allow_unsupported_sfp' to match the
associated struct value for ixgbe.

Alaa Mohamed converts, deprecated, kmap() call to kmap_local_page() for
igb.

* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  igb: Convert kmap() to kmap_local_page()
  ixgbe: Fix module_param allow_unsupported_sfp type
====================

Link: https://lore.kernel.org/r/20220505155651.2606195-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-06 15:39:28 -07:00
Jakub Kicinski
ee8b7a1156 net: make drivers set the TSO limit not the GSO limit
Drivers should call the TSO setting helper, GSO is controllable
by user space.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06 12:07:56 +01:00
Jeff Daly
833fbbbbfc ixgbe: Fix module_param allow_unsupported_sfp type
The module_param allow_unsupported_sfp should be a boolean to match the
type in the ixgbe_hw struct.

Signed-off-by: Jeff Daly <jeffd@silicom-usa.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-05 08:37:14 -07:00
Slawomir Mrozowicz
008ca35f6e ixgbe: add improvement for MDD response functionality
The 82599 PF driver disable VF driver after a special MDD event occurs.
Adds the option for administrators to control whether VFs are
automatically disabled after several MDD events.
The automatically disabling is now the default mode for 82599 PF driver,
as it is more reliable.

This addresses CVE-2021-33061.

Signed-off-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-08 07:41:18 -08:00
Slawomir Mrozowicz
366fd10009 ixgbe: add the ability for the PF to disable VF link state
Add support for ndo_set_vf_link_state the Network Device Option that
allows the PF driver to control the virtual link state of the VF devices.
Without this change a VF cannot be disabled/enabled by the administrator.
In the implementation the auto state takes over PF link state to
VF link setting, the enable state is not supported, the disable state
shut off the VF link regardless of the PF setting.

Signed-off-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-08 07:41:18 -08:00
Piotr Skajewski
93b067f154 ixgbe: Remove non-inclusive language
Remove non-inclusive language from the driver.

Additionally correct the duplication "from from"
reported by checkpatch after the changes above.

Signed-off-by: Piotr Skajewski <piotrx.skajewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-23 09:55:38 -08:00
Christophe JAILLET
4d361c6fd8 ixgbe: Remove useless DMA-32 fallback configuration
As stated in [1], dma_set_mask() with a 64-bit mask never fails if
dev->dma_mask is non-NULL.
So, if it fails, the 32 bits case will also fail for the same reason.

So, if dma_set_mask_and_coherent() succeeds, 'pci_using_dac' is known to be
1.

Simplify code and remove some dead code accordingly.

[1]: https://lkml.org/lkml/2021/6/7/398

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-27 08:58:23 -08:00
Linus Torvalds
147cc5838c Updates for the interrupt subsystem:
Core:
 
   - Provide a new interface for affinity hints to provide a separation
     between hint and actual affinity change which has become a hidden
     property of the current interface
 
   - Fix up the in tree usage of the affinity hint interfaces
 
  Drivers:
 
   - No new irqchip drivers!
 
   - Fix GICv3 redistributor table reservation with RT across kexec
 
   - Fix GICv4.1 redistributor view of the VPE table across kexec
 
   - Add support for extra interrupts on spear-shirq
 
   - Make obtaining some interrupts optional for the Renesas drivers
 
   - Various cleanups and bug fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmHf9v0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoRK6D/9bQmyITmJ4KLn0HZ1DsvkuR/GB7I8v
 yTF99FxIi/F0jlJ7+87Hdm68cfYPMahpiHqSlsf/QE2kkuWYDJmMaPUao14XMdG6
 jxrJ1OZtZXeDXyAWkB/gjmiuqyW/e/Myndg0UNUrJ66GqKfxfxtz1/4GfLjgDpIu
 TfZQdojvo6T7NTVnU8aAkgKUhM2jL/HxPiR3VUJ+VneSfwKLHzr3+lTY9zkSvJ8s
 ATqqGn6+GugJmDWaCI13IJcmBhPU/Gvs+Eqnwz7Xez/6wJftYvJh7vGec3ixS9pw
 skjPDnwuHcPl+h0mYMv7ySN7WuqTr0iqYIepdvLUfq6D1WjnHvF5XNcV4W7EzPJN
 B/pBosJ97ZAiHgrWsb35/S3bJ0mnB3Ib4WOOIcnRM36JUdNZrnKJntCsyrrmUsYA
 s6J1og9Ut7it+F9OFvsuZ2pUv25U8BlzhgfJen8Z0fzV1/2f5LQN0gQGVxqVpwkg
 3Cmd5Rmy5h2vlcKKHklLxIP24+UMIb2WyhsDiZ/qYH3zSFFnQPUJ6fvmZIxN/fPx
 exU5O8kgsXSwauXWHJJBb+qhKNcUNvUwKGHNMAvM9mh1xytU6ZowjTqqOlCfBWlg
 dRXT2xI0ex7liXek6yXa4lN1tabIdnvmYTmueUoFiOCqbUPBO8LTutjdehsUMa4d
 xV0a8WEzuk9Q/A==
 =myJA
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "Updates for the interrupt subsystem:

  Core:

   - Provide a new interface for affinity hints to provide a separation
     between hint and actual affinity change which has become a hidden
     property of the current interface

   - Fix up the in tree usage of the affinity hint interfaces

  Drivers:

   - No new irqchip drivers!

   - Fix GICv3 redistributor table reservation with RT across kexec

   - Fix GICv4.1 redistributor view of the VPE table across kexec

   - Add support for extra interrupts on spear-shirq

   - Make obtaining some interrupts optional for the Renesas drivers

   - Various cleanups and bug fixes"

* tag 'irq-core-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  irqchip/renesas-intc-irqpin: Use platform_get_irq_optional() to get the interrupt
  irqchip/renesas-irqc: Use platform_get_irq_optional() to get the interrupt
  irqchip/gic-v4: Disable redistributors' view of the VPE table at boot time
  irqchip/ingenic-tcu: Use correctly sized arguments for bit field
  irqchip/gic-v2m: Add const to of_device_id
  irqchip/imx-gpcv2: Mark imx_gpcv2_instance with __ro_after_init
  irqchip/spear-shirq: Add support for IRQ 0..6
  irqchip/gic-v3-its: Limit memreserve cpuhp state lifetime
  irqchip/gic-v3-its: Postpone LPI pending table freeing and memreserve
  irqchip/gic-v3-its: Give the percpu rdist struct its own flags field
  net/mlx4: Use irq_update_affinity_hint()
  net/mlx5: Use irq_set_affinity_and_hint()
  hinic: Use irq_set_affinity_and_hint()
  scsi: lpfc: Use irq_set_affinity()
  mailbox: Use irq_update_affinity_hint()
  ixgbe: Use irq_update_affinity_hint()
  be2net: Use irq_update_affinity_hint()
  enic: Use irq_update_affinity_hint()
  RDMA/irdma: Use irq_update_affinity_hint()
  scsi: mpt3sas: Use irq_set_affinity_and_hint()
  ...
2022-01-13 08:53:45 -08:00
Ingo Molnar
0422fe2666 Merge branch 'linus' into irq/core, to fix conflict
Conflicts:
	drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-01-08 10:53:57 +01:00
David S. Miller
e63a023489 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-12-30

The following pull-request contains BPF updates for your *net-next* tree.

We've added 72 non-merge commits during the last 20 day(s) which contain
a total of 223 files changed, 3510 insertions(+), 1591 deletions(-).

The main changes are:

1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii.

2) Beautify and de-verbose verifier logs, from Christy.

3) Composable verifier types, from Hao.

4) bpf_strncmp helper, from Hou.

5) bpf.h header dependency cleanup, from Jakub.

6) get_func_[arg|ret|arg_cnt] helpers, from Jiri.

7) Sleepable local storage, from KP.

8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-31 14:35:40 +00:00
Alexander Lobakin
a39363367a ixgbe: switch to napi_build_skb()
napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order
to save some cycles on freeing/allocating skbuff_heads on every
new Rx or completed Tx.
ixgbe driver runs Tx completion polling cycle right before the Rx
one and uses napi_consume_skb() to feed the cache with skbuff_heads
of completed entries, so it's never empty and always warm at that
moment. Switch to the napi_build_skb() to relax mm pressure on
heavy Rx.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-28 09:42:33 -08:00
Robert Schlabbach
271225fd57 ixgbe: Document how to enable NBASE-T support
Commit a296d665ea ("ixgbe: Add ethtool support to enable 2.5 and 5.0
Gbps support") introduced suppression of the advertisement of NBASE-T
speeds by default, according to Todd Fujinaka to accommodate customers
with network switches which could not cope with advertised NBASE-T
speeds, as posted in the E1000-devel mailing list:

https://sourceforge.net/p/e1000/mailman/message/37106269/

However, the suppression was not documented at all, nor was how to
enable NBASE-T support.

Properly document the NBASE-T suppression and how to enable NBASE-T
support.

Fixes: a296d665ea ("ixgbe: Add ethtool support to enable 2.5 and 5.0 Gbps support")
Reported-by: Robert Schlabbach <robert_s@gmx.net>
Signed-off-by: Robert Schlabbach <robert_s@gmx.net>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15 11:09:29 -08:00
Paolo Abeni
c8064e5b4a bpf: Let bpf_warn_invalid_xdp_action() report more info
In non trivial scenarios, the action id alone is not sufficient to
identify the program causing the warning. Before the previous patch,
the generated stack-trace pointed out at least the involved device
driver.

Let's additionally include the program name and id, and the relevant
device name.

If the user needs additional infos, he can fetch them via a kernel
probe, leveraging the arguments added here.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/ddb96bb975cbfddb1546cf5da60e77d5100b533c.1638189075.git.pabeni@redhat.com
2021-12-13 22:28:27 +01:00
Nitesh Narayan Lal
cc493264c0 ixgbe: Use irq_update_affinity_hint()
The driver uses irq_set_affinity_hint() to update the affinity_hint mask
that is consumed by the userspace to distribute the interrupts. However,
under the hood irq_set_affinity_hint() also applies the provided cpumask
(if not NULL) as the affinity for the given interrupt which is an
undocumented side effect.

To remove this side effect irq_set_affinity_hint() has been marked
as deprecated and new interfaces have been introduced. Hence, replace the
irq_set_affinity_hint() with the new interface irq_update_affinity_hint()
that only updates the affinity_hint pointer.

Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Link: https://lore.kernel.org/r/20210903152430.244937-10-nitesh@redhat.com
2021-12-10 20:47:39 +01:00
Jakub Kicinski
a05e4c0af4 ethernet: use eth_hw_addr_set() for dev->addr_len cases
Convert all Ethernet drivers from memcpy(... dev->addr_len)
to eth_hw_addr_set():

  @@
  expression dev, np;
  @@
  - memcpy(dev->dev_addr, np, dev->addr_len)
  + eth_hw_addr_set(dev, np)

In theory addr_len may not be ETH_ALEN, but we don't expect
non-Ethernet devices to live under this directory, and only
the following cases of setting addr_len exist:
 - cxgb4 for mgmt device,
and the drivers which set it to ETH_ALEN: s2io, mlx4, vxge.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05 13:16:48 +01:00
Jakub Kicinski
dd9a887b35 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/phy/bcm7xxx.c
  d88fd1b546 ("net: phy: bcm7xxx: Fixed indirect MMD operations")
  f68d08c437 ("net: phy: bcm7xxx: Add EPHY entry for 72165")

net/sched/sch_api.c
  b193e15ac6 ("net: prevent user from passing illegal stab size")
  69508d4333 ("net_sched: Use struct_size() and flex_array_size() helpers")

Both cases trivial - adjacent code additions.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-30 14:49:21 -07:00
Jason Xing
4fe815850b ixgbe: let the xdpdrv work with more than 64 cpus
Originally, ixgbe driver doesn't allow the mounting of xdpdrv if the
server is equipped with more than 64 cpus online. So it turns out that
the loading of xdpdrv causes the "NOMEM" failure.

Actually, we can adjust the algorithm and then make it work through
mapping the current cpu to some xdp ring with the protect of @tx_lock.

Here are some numbers before/after applying this patch with xdp-example
loaded on the eth0X:

As client (tx path):
                     Before    After
TCP_STREAM send-64   734.14    714.20
TCP_STREAM send-128  1401.91   1395.05
TCP_STREAM send-512  5311.67   5292.84
TCP_STREAM send-1k   9277.40   9356.22 (not stable)
TCP_RR     send-1    22559.75  21844.22
TCP_RR     send-128  23169.54  22725.13
TCP_RR     send-512  21670.91  21412.56

As server (rx path):
                     Before    After
TCP_STREAM send-64   1416.49   1383.12
TCP_STREAM send-128  3141.49   3055.50
TCP_STREAM send-512  9488.73   9487.44
TCP_STREAM send-1k   9491.17   9356.22 (not stable)
TCP_RR     send-1    23617.74  23601.60
...

Notice: the TCP_RR mode is unstable as the official document explains.

I tested many times with different parameters combined through netperf.
Though the result is not that accurate, I cannot see much influence on
this patch. The static key is places on the hot path, but it actually
shouldn't cause a huge regression theoretically.

Co-developed-by: Shujin Li <lishujin@kuaishou.com>
Signed-off-by: Shujin Li <lishujin@kuaishou.com>
Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-30 13:38:08 +01:00
Feng Zhou
513e605d7a ixgbe: Fix NULL pointer dereference in ixgbe_xdp_setup
The ixgbe driver currently generates a NULL pointer dereference with
some machine (online cpus < 63). This is due to the fact that the
maximum value of num_xdp_queues is nr_cpu_ids. Code is in
"ixgbe_set_rss_queues"".

Here's how the problem repeats itself:
Some machine (online cpus < 63), And user set num_queues to 63 through
ethtool. Code is in the "ixgbe_set_channels",
	adapter->ring_feature[RING_F_FDIR].limit = count;

It becomes 63.

When user use xdp, "ixgbe_set_rss_queues" will set queues num.
	adapter->num_rx_queues = rss_i;
	adapter->num_tx_queues = rss_i;
	adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);

And rss_i's value is from
	f = &adapter->ring_feature[RING_F_FDIR];
	rss_i = f->indices = f->limit;

So "num_rx_queues" > "num_xdp_queues", when run to "ixgbe_xdp_setup",
	for (i = 0; i < adapter->num_rx_queues; i++)
		if (adapter->xdp_ring[i]->xsk_umem)

It leads to panic.

Call trace:
[exception RIP: ixgbe_xdp+368]
RIP: ffffffffc02a76a0  RSP: ffff9fe16202f8d0  RFLAGS: 00010297
RAX: 0000000000000000  RBX: 0000000000000020  RCX: 0000000000000000
RDX: 0000000000000000  RSI: 000000000000001c  RDI: ffffffffa94ead90
RBP: ffff92f8f24c0c18   R8: 0000000000000000   R9: 0000000000000000
R10: ffff9fe16202f830  R11: 0000000000000000  R12: ffff92f8f24c0000
R13: ffff9fe16202fc01  R14: 000000000000000a  R15: ffffffffc02a7530
ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 7 [ffff9fe16202f8f0] dev_xdp_install at ffffffffa89fbbcc
 8 [ffff9fe16202f920] dev_change_xdp_fd at ffffffffa8a08808
 9 [ffff9fe16202f960] do_setlink at ffffffffa8a20235
10 [ffff9fe16202fa88] rtnl_setlink at ffffffffa8a20384
11 [ffff9fe16202fc78] rtnetlink_rcv_msg at ffffffffa8a1a8dd
12 [ffff9fe16202fcf0] netlink_rcv_skb at ffffffffa8a717eb
13 [ffff9fe16202fd40] netlink_unicast at ffffffffa8a70f88
14 [ffff9fe16202fd80] netlink_sendmsg at ffffffffa8a71319
15 [ffff9fe16202fdf0] sock_sendmsg at ffffffffa89df290
16 [ffff9fe16202fe08] __sys_sendto at ffffffffa89e19c8
17 [ffff9fe16202ff30] __x64_sys_sendto at ffffffffa89e1a64
18 [ffff9fe16202ff38] do_syscall_64 at ffffffffa84042b9
19 [ffff9fe16202ff50] entry_SYSCALL_64_after_hwframe at ffffffffa8c0008c

So I fix ixgbe_max_channels so that it will not allow a setting of queues
to be higher than the num_online_cpus(). And when run to ixgbe_xdp_setup,
take the smaller value of num_rx_queues and num_xdp_queues.

Fixes: 4a9b32f30f ("ixgbe: fix potential RX buffer starvation for AF_XDP")
Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-29 10:51:51 +01:00
Arnd Bergmann
a76053707d dev_ioctl: split out ndo_eth_ioctl
Most users of ndo_do_ioctl are ethernet drivers that implement
the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware
timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP.

Separate these from the few drivers that use ndo_do_ioctl to
implement SIOCBOND, SIOCBR and SIOCWANDEV commands.

This is a purely cosmetic change intended to help readers find
their way through the implementation.

Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Vladimir Oltean <olteanv@gmail.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27 20:11:45 +01:00
Markus Boehme
09cfae9f13 ixgbe: Fix packet corruption due to missing DMA sync
When receiving a packet with multiple fragments, hardware may still
touch the first fragment until the entire packet has been received. The
driver therefore keeps the first fragment mapped for DMA until end of
packet has been asserted, and delays its dma_sync call until then.

The driver tries to fit multiple receive buffers on one page. When using
3K receive buffers (e.g. using Jumbo frames and legacy-rx is turned
off/build_skb is being used) on an architecture with 4K pages, the
driver allocates an order 1 compound page and uses one page per receive
buffer. To determine the correct offset for a delayed DMA sync of the
first fragment of a multi-fragment packet, the driver then cannot just
use PAGE_MASK on the DMA address but has to construct a mask based on
the actual size of the backing page.

Using PAGE_MASK in the 3K RX buffer/4K page architecture configuration
will always sync the first page of a compound page. With the SWIOTLB
enabled this can lead to corrupted packets (zeroed out first fragment,
re-used garbage from another packet) and various consequences, such as
slow/stalling data transfers and connection resets. For example, testing
on a link with MTU exceeding 3058 bytes on a host with SWIOTLB enabled
(e.g. "iommu=soft swiotlb=262144,force") TCP transfers quickly fizzle
out without this patch.

Cc: stable@vger.kernel.org
Fixes: 0c5661ecc5 ("ixgbe: fix crash in build_skb Rx code path")
Signed-off-by: Markus Boehme <markubo@amazon.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20 16:58:41 -07:00
Christophe JAILLET
dd2aefcd5e ixgbe: Fix an error handling path in 'ixgbe_probe()'
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it
must be undone by a corresponding 'pci_disable_pcie_error_reporting()'
call, as already done in the remove function.

Fixes: 6fabd715e6 ("ixgbe: Implement PCIe AER support")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-07-01 10:53:22 -07:00
Toke Høiland-Jørgensen
49589b23d5 intel: Remove rcu_read_lock() around XDP program invocation
The Intel drivers all have rcu_read_lock()/rcu_read_unlock() pairs around
XDP program invocations. However, the actual lifetime of the objects
referred by the XDP program invocation is longer, all the way through to
the call to xdp_do_flush(), making the scope of the rcu_read_lock() too
small. This turns out to be harmless because it all happens in a single
NAPI poll cycle (and thus under local_bh_disable()), but it makes the
rcu_read_lock() misleading.

Rather than extend the scope of the rcu_read_lock(), just get rid of it
entirely. With the addition of RCU annotations to the XDP_REDIRECT map
types that take bh execution into account, lockdep even understands this to
be safe, so there's really no reason to keep it around.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> # i40e
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Link: https://lore.kernel.org/bpf/20210624160609.292325-12-toke@redhat.com
2021-06-24 19:44:34 +02:00
Magnus Karlsson
8281356b1c ixgbe: add correct exception tracing for XDP
Add missing exception tracing to XDP when a number of different
errors can occur. The support was only partial. Several errors
where not logged which would confuse the user quite a lot not
knowing where and why the packets disappeared.

Fixes: 33fdc82f08 ("ixgbe: add support for XDP_TX action")
Fixes: d0bcacd0a1 ("ixgbe: add AF_XDP zero-copy Rx support")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-03 08:38:37 -07:00
Jakub Kicinski
8203c7ce4e Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
 - keep the ZC code, drop the code related to reinit
net/bridge/netfilter/ebtables.c
 - fix build after move to net_generic

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-17 11:08:07 -07:00
Yongxin Liu
debb9df311 ixgbe: fix unbalanced device enable/disable in suspend/resume
pci_disable_device() called in __ixgbe_shutdown() decreases
dev->enable_cnt by 1. pci_enable_device_mem() which increases
dev->enable_cnt by 1, was removed from ixgbe_resume() in commit
6f82b25587 ("ixgbe: use generic power management"). This caused
unbalanced increase/decrease. So add pci_enable_device_mem() back.

Fix the following call trace.

  ixgbe 0000:17:00.1: disabling already-disabled device
  Call Trace:
   __ixgbe_shutdown+0x10a/0x1e0 [ixgbe]
   ixgbe_suspend+0x32/0x70 [ixgbe]
   pci_pm_suspend+0x87/0x160
   ? pci_pm_freeze+0xd0/0xd0
   dpm_run_callback+0x42/0x170
   __device_suspend+0x114/0x460
   async_suspend+0x1f/0xa0
   async_run_entry_fn+0x3c/0xf0
   process_one_work+0x1dd/0x410
   worker_thread+0x34/0x3f0
   ? cancel_delayed_work+0x90/0x90
   kthread+0x14c/0x170
   ? kthread_park+0x90/0x90
   ret_from_fork+0x1f/0x30

Fixes: 6f82b25587 ("ixgbe: use generic power management")
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-13 11:49:11 -07:00
Alexander Duyck
31166efb1c ixgbe: Fix NULL pointer dereference in ethtool loopback test
The ixgbe driver currently generates a NULL pointer dereference when
performing the ethtool loopback test. This is due to the fact that there
isn't a q_vector associated with the test ring when it is setup as
interrupts are not normally added to the test rings.

To address this I have added code that will check for a q_vector before
returning a napi_id value. If a q_vector is not present it will return a
value of 0.

Fixes: b02e5a0ebb ("xsk: Propagate napi_id to XDP socket Rx path")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-13 11:03:36 -07:00
David S. Miller
241949e488 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-03-24

The following pull-request contains BPF updates for your *net-next* tree.

We've added 37 non-merge commits during the last 15 day(s) which contain
a total of 65 files changed, 3200 insertions(+), 738 deletions(-).

The main changes are:

1) Static linking of multiple BPF ELF files, from Andrii.

2) Move drop error path to devmap for XDP_REDIRECT, from Lorenzo.

3) Spelling fixes from various folks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 16:30:46 -07:00