1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

479 commits

Author SHA1 Message Date
Paolo Abeni
9f8f1933dc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/freescale/fec.h
  7d650df99d ("net: fec: add pm_qos support on imx6q platform")
  40c79ce13b ("net: fec: add stop mode support for imx8 platform")

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-08 18:38:30 +02:00
Tony Nguyen
ce46261311 ice: Allow operation with reduced device MSI-X
The driver currently takes an all or nothing approach for device MSI-X
vectors. Meaning if it does not get its full allocation, it will fail and
not load. There is no reason it can't work with a reduced number of MSI-X
vectors. Take a similar approach as commit 741106f7bd ("ice: Improve
MSI-X fallback logic") and, instead, adjust the MSI-X request to make use
of what is available.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Petr Oros <poros@redhat.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
2022-09-06 13:55:22 -07:00
Michal Swiatkowski
59ac325557 ice: use bitmap_free instead of devm_kfree
pf->avail_txqs was allocated using bitmap_zalloc, bitmap_free should be
used to free this memory.

Fixes: 78b5713ac1 ("ice: Alloc queue management bitmaps and arrays dynamically")
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-02 08:25:15 -07:00
Przemyslaw Patynowski
7e753eb675 ice: Fix DMA mappings leak
Fix leak, when user changes ring parameters.
During reallocation of RX buffers, new DMA mappings are created for
those buffers. New buffers with different RX ring count should
substitute older ones, but those buffers were freed in ice_vsi_cfg_rxq
and reallocated again with ice_alloc_rx_buf. kfree on rx_buf caused
leak of already mapped DMA.
Reallocate ZC with xdp_buf struct, when BPF program loads. Reallocate
back to rx_buf, when BPF program unloads.
If BPF program is loaded/unloaded and XSK pools are created, reallocate
RX queues accordingly in XDP_SETUP_XSK_POOL handler.

Steps for reproduction:
while :
do
	for ((i=0; i<=8160; i=i+32))
	do
		ethtool -G enp130s0f0 rx $i tx $i
		sleep 0.5
		ethtool -g enp130s0f0
	done
done

Fixes: 617f3e1b58 ("ice: xsk: allocate separate memory for XDP SW ring")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Chandan <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-09-02 08:25:15 -07:00
Jakub Kicinski
880b0dd94f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
  21234e3a84 ("net/mlx5e: Fix use after free in mlx5e_fs_init()")
  c7eafc5ed0 ("net/mlx5e: Convert ethtool_steering member of flow_steering struct to pointer")
https://lore.kernel.org/all/20220825104410.67d4709c@canb.auug.org.au/
https://lore.kernel.org/all/20220823055533.334471-1-saeed@kernel.org/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-25 16:07:42 -07:00
Jakub Kicinski
0134fe8512 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-08-18 (ice)

This series contains updates to ice driver only.

Jesse and Anatolii add support for controlling FCS/CRC stripping via
ethtool.

Anirudh allows for 100M speeds on devices which support it.

Sylwester removes ucast_shared field and the associated dead code related
to it.

Mikael removes non-inclusive language from the driver.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: remove non-inclusive language
  ice: Remove ucast_shared
  ice: Allow 100M speeds for some devices
  ice: Implement FCS/CRC and VLAN stripping co-existence policy
  ice: Implement control of FCS/CRC stripping
====================

Link: https://lore.kernel.org/r/20220818155207.996297-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-22 20:10:50 -07:00
Maciej Fijalkowski
9ead7e74bf ice: xsk: use Rx ring's XDP ring when picking NAPI context
Ice driver allocates per cpu XDP queues so that redirect path can safely
use smp_processor_id() as an index to the array. At the same time
though, XDP rings are used to pick NAPI context to call napi_schedule()
or set NAPIF_STATE_MISSED. When user reduces queue count, say to 8, and
num_possible_cpus() of underlying platform is 44, then this means queue
vectors with correlated NAPI contexts will carry several XDP queues.

This in turn can result in a broken behavior where NAPI context of
interest will never be scheduled and AF_XDP socket will not process any
traffic.

To fix this, let us change the way how XDP rings are assigned to Rx
rings and use this information later on when setting
ice_tx_ring::xsk_pool pointer. For each Rx ring, grab the associated
queue vector and walk through Tx ring's linked list. Once we stumble
upon XDP ring in it, assign this ring to ice_rx_ring::xdp_ring.

Previous [0] approach of fixing this issue was for txonly scenario
because of the described grouping of XDP rings across queue vectors. So,
relying on Rx ring meant that NAPI context could be scheduled with a
queue vector without XDP ring with associated XSK pool.

[0]: https://lore.kernel.org/netdev/20220707161128.54215-1-maciej.fijalkowski@intel.com/

Fixes: 2d4238f556 ("ice: Add support for AF_XDP")
Fixes: 22bf877e52 ("ice: introduce XDP_TX fallback path")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-22 08:47:26 -07:00
Sylwester Dziedziuch
e1e9db57c0 ice: Remove ucast_shared
Remove ucast_shared as it was always true. Remove the code depending on
ucast_shared from ice_add_mac and ice_remove_mac.
Remove ice_find_ucast_rule_entry function as it was only
used when ucast_shared was set to false.

Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-18 08:23:44 -07:00
Anatolii Gerasymenko
affa1029d6 ice: Implement FCS/CRC and VLAN stripping co-existence policy
Make sure that only the valid combinations of FCS/CRC stripping and
VLAN stripping offloads are allowed.

You cannot have FCS/CRC stripping disabled while VLAN stripping is
enabled - this breaks the correctness of the FCS/CRC.

If administrator tries to enable VLAN stripping when FCS/CRC stripping is
disabled, the request should be rejected.

If administrator tries to disable FCS/CRC stripping when VLAN stripping
is enabled, the request should be rejected if VLANs are configured. If
there is no VLAN configured, then both FCS/CRC and VLAN stripping should
be disabled.

Testing Hints:
The default settings after driver load are:
- VLAN C-Tag offloads are enabled
- VLAN S-Tag offloads are disabled
- FCS/CRC stripping is enabled

Restore the default settings before each test with the command:
ethtool -K eth0 rx-fcs off rxvlan on txvlan on rx-vlan-stag-hw-parse off
tx-vlan-stag-hw-insert off

Test 1:
Disable FCS/CRC and VLAN stripping:
ethtool -K eth0 rx-fcs on rxvlan off
Try to enable VLAN stripping:
ethtool -K eth0 rxvlan on

Expected: VLAN stripping request is rejected

Test 2:
Try to disable FCS/CRC stripping:
ethtool -K eth0 rx-fcs on

Expected: VLAN stripping is also disabled, as there are no VLAN
configured

Test 3:
Add a VLAN:
ip link add link eth0 eth0.42 type vlan id 42
ip link set eth0 up
Try to disable FCS/CRC stripping:
ethtool -K eth0 rx-fcs on

Expected: FCS/CRC stripping request is rejected

Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-18 08:08:23 -07:00
Jesse Brandeburg
dddd406d9d ice: Implement control of FCS/CRC stripping
The driver can allow the user to configure whether the CRC aka the FCS
(Frame Check Sequence) is DMA'd to the host as part of the receive
buffer.  The driver usually wants this feature disabled so that the
hardware checks the FCS and strips it in order to save PCI bandwidth.

Control the reception of FCS to the host using the command:
ethtool -K eth0 rx-fcs <on|off>

The default shown in ethtool -k eth0 | grep fcs; should be "off", as the
hardware will drop any frame with a bad checksum, and DMA of the
checksum is useless overhead especially for small packets.

Testing Hints:
test the FCS/CRC arrives with received packets using
tcpdump -nnpi eth0 -xxxx
and it should show crc data as the last 4 bytes of the packet. Can also
use wireshark to turn on CRC checking and check the data is correct.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Co-developed-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com>
Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com>
Co-developed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-18 08:08:23 -07:00
Grzegorz Siwik
abddafd458 ice: Fix clearing of promisc mode with bridge over bond
When at least two interfaces are bonded and a bridge is enabled on the
bond, an error can occur when the bridge is removed and re-added. The
reason for the error is because promiscuous mode was not fully cleared from
the VLAN VSI in the hardware. With this change, promiscuous mode is
properly removed when the bridge disconnects from bonding.

[ 1033.676359] bond1: link status definitely down for interface enp95s0f0, disabling it
[ 1033.676366] bond1: making interface enp175s0f0 the new active one
[ 1033.676369] device enp95s0f0 left promiscuous mode
[ 1033.676522] device enp175s0f0 entered promiscuous mode
[ 1033.676901] ice 0000:af:00.0 enp175s0f0: Error setting Multicast promiscuous mode on VSI 6
[ 1041.795662] ice 0000:af:00.0 enp175s0f0: Error setting Multicast promiscuous mode on VSI 6
[ 1041.944826] bond1: link status definitely down for interface enp175s0f0, disabling it
[ 1041.944874] device enp175s0f0 left promiscuous mode
[ 1041.944918] bond1: now running without any active interface!

Fixes: c31af68a1b ("ice: Add outer_vlan_ops and VSI specific VLAN ops implementations")
Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com>
Link: https://lore.kernel.org/all/CAK8fFZ7m-KR57M_rYX6xZN39K89O=LGooYkKsu6HKt0Bs+x6xQ@mail.gmail.com/
Tested-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Tested-by: Igor Raits <igor@gooddata.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-17 09:30:32 -07:00
Paolo Abeni
7c6327c77d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

net/ax25/af_ax25.c
  d7c4c9e075 ("ax25: fix incorrect dev_tracker usage")
  d62607c3fe ("net: rename reference+tracking helpers")

drivers/net/netdevsim/fib.c
  180a6a3ee6 ("netdevsim: fib: Fix reference count leak on route deletion failure")
  012ec02ae4 ("netdevsim: convert driver to use unlocked devlink API during init/fini")

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-03 09:04:55 +02:00
Jian Shen
7dc839fe47 net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr()
vsi->current_netdev_flags is used store the current net device
flags, not the active netdevice features. So it should use
vsi->netdev->featurs, rather than vsi->current_netdev_flags
to check NETIF_F_HW_VLAN_CTAG_FILTER.

Fixes: 1babaf77f4 ("ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-01 12:09:28 -07:00
Jakub Kicinski
84a8d931ab Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2022-07-28

This series contains updates to ice driver only.

Michal allows for VF true promiscuous mode to be set for multiple VFs
and adds clearing of promiscuous filters when VF trust is removed.

Maciej refactors ice_set_features() to track/check changed features
instead of constantly checking against netdev features and adds support for
NETIF_F_LOOPBACK.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: allow toggling loopback mode via ndo_set_features callback
  ice: compress branches in ice_set_features()
  ice: Fix promiscuous mode not turning off
  ice: Introduce enabling promiscuous mode on multiple VF's
====================

Link: https://lore.kernel.org/r/20220728195538.3391360-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-29 21:26:10 -07:00
Jakub Kicinski
272ac32f56 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 18:21:16 -07:00
Maciej Fijalkowski
44ece4e1a3 ice: allow toggling loopback mode via ndo_set_features callback
Add support for NETIF_F_LOOPBACK. This feature can be set via:
$ ethtool -K eth0 loopback <on|off>

Feature can be useful for local data path tests.

Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-28 11:44:40 -07:00
Maciej Fijalkowski
c67672fa26 ice: compress branches in ice_set_features()
Instead of rather verbose comparison of current netdev->features bits vs
the incoming ones from user, let us compress them by a helper features
set that will be the result of netdev->features XOR features. This way,
current, extensive branches:

	if (features & NETIF_F_BIT && !(netdev->features & NETIF_F_BIT))
		set_feature(true);
	else if (!(features & NETIF_F_BIT) && netdev->features & NETIF_F_BIT)
		set_feature(false);

can become:

	netdev_features_t changed = netdev->features ^ features;

	if (changed & NETIF_F_BIT)
		set_feature(!!(features & NETIF_F_BIT));

This is nothing new as currently several other drivers use this
approach, which I find much more convenient.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-28 11:44:40 -07:00
Michal Wilczynski
d7393425e7 ice: Introduce enabling promiscuous mode on multiple VF's
In current implementation default VSI switch filter is only able to
forward traffic to a single VSI. This limits promiscuous mode with
private flag 'vf-true-promisc-support' to a single VF. Enabling it on
the second VF won't work. Also allmulticast support doesn't seem to be
properly implemented when vf-true-promisc-support is true.

Use standard ice_add_rule_internal() function that already implements
forwarding to multiple VSI's instead of constructing AQ call manually.

Add switch filter for allmulticast mode when vf-true-promisc-support is
enabled. The same filter is added regardless of the flag - it doesn't
matter for this case.

Remove unnecessary fields in switch structure. From now on book keeping
will be done by ice_add_rule_internal().

Refactor unnecessarily passed function arguments.

To test:
1) Create 2 VM's, and two VF's. Attach VF's to VM's.
2) Enable promiscuous mode on both of them and check if
   traffic is seen on both of them.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-28 11:44:22 -07:00
Maciej Fijalkowski
cc019545a2 ice: do not setup vlan for loopback VSI
Currently loopback test is failiing due to the error returned from
ice_vsi_vlan_setup(). Skip calling it when preparing loopback VSI.

Fixes: 0e674aeb0b ("ice: Add handler for ethtool selftest")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26 13:15:36 -07:00
Anirudh Venkataramanan
5c8e3c7ff3 ice: Fix VSIs unable to share unicast MAC
The driver currently does not allow two VSIs in the same PF domain
to have the same unicast MAC address. This is incorrect in the sense
that a policy decision is being made in the driver when it must be
left to the user. This approach was causing issues when rebooting
the system with VFs spawned not being able to change their MAC addresses.
Such errors were present in dmesg:

[ 7921.068237] ice 0000:b6:00.2 ens2f2: Unicast MAC 6a:0d:e4:70:ca:d1 already
exists on this PF. Preventing setting VF 7 unicast MAC address to 6a:0d:e4:70:ca:d1

Fix that by removing this restriction. Doing this also allows
us to remove some additional code that's checking if a unicast MAC
filter already exists.

Fixes: 47ebc7b024 ("ice: Check if unicast MAC exists before setting VF MAC")
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26 13:15:36 -07:00
Zhuo Chen
ca415ea1f0 ice: Remove pci_aer_clear_nonfatal_status() call
After commit 62b36c3ea6 ("PCI/AER: Remove
pci_cleanup_aer_uncorrect_error_status() calls"), calls to
pci_cleanup_aer_uncorrect_error_status() have already been removed. But in
commit 5995b6d0c6 ("ice: Implement pci_error_handler ops")
pci_cleanup_aer_uncorrect_error_status  was used again, so remove it in
this patch.

Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Sen Wang <wangsen.harry@bytedance.com>
Cc: Wenliang Wang <wangwenliang.1995@bytedance.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-15 10:41:39 -07:00
Paul M Stillwell Jr
f52d166819 ice: handle E822 generic device ID in PLDM header
The driver currently presumes that the record data in the PLDM header
of the firmware image will match the device ID of the running device.
This is true for E810 devices. It appears that for E822 devices that
this is not guaranteed to be true.

Fix this by adding a check for the generic E822 device.

Fixes: d69ea414c9 ("ice: implement device flash update via devlink")
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-12 09:25:34 -07:00
Roman Storozhenko
9542ef4fba ice: Sync VLAN filtering features for DVM
VLAN filtering features, that is C-Tag and S-Tag, in DVM mode must be
both enabled or disabled.
In case of turning off/on only one of the features, another feature must
be turned off/on automatically with issuing an appropriate message to
the kernel log.

Fixes: 1babaf77f4 ("ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev")
Signed-off-by: Roman Storozhenko <roman.storozhenko@intel.com>
Co-developed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-14 09:38:57 -07:00
Jakub Kicinski
d7e6f58360 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/mellanox/mlx5/core/main.c
  b33886971d ("net/mlx5: Initialize flow steering during driver probe")
  40379a0084 ("net/mlx5_fpga: Drop INNOVA TLS support")
  f2b41b32cd ("net/mlx5: Remove ipsec_ops function table")
https://lore.kernel.org/all/20220519040345.6yrjromcdistu7vh@sx1/
  16d42d3133 ("net/mlx5: Drain fw_reset when removing device")
  8324a02c34 ("net/mlx5: Add exit route when waiting for FW")
https://lore.kernel.org/all/20220519114119.060ce014@canb.auug.org.au/

tools/testing/selftests/net/mptcp/mptcp_join.sh
  e274f71540 ("selftests: mptcp: add subflow limits test-cases")
  b6e074e171 ("selftests: mptcp: add infinite map testcase")
  5ac1d2d634 ("selftests: mptcp: Add tests for userspace PM type")
https://lore.kernel.org/all/20220516111918.366d747f@canb.auug.org.au/

net/mptcp/options.c
  ba2c89e0ea ("mptcp: fix checksum byte order")
  1e39e5a32a ("mptcp: infinite mapping sending")
  ea66758c17 ("tcp: allow MPTCP to update the announced window")
https://lore.kernel.org/all/20220519115146.751c3a37@canb.auug.org.au/

net/mptcp/pm.c
  95d6865178 ("mptcp: fix subflow accounting on close")
  4d25247d3a ("mptcp: bypass in-kernel PM restrictions for non-kernel PMs")
https://lore.kernel.org/all/20220516111435.72f35dca@canb.auug.org.au/

net/mptcp/subflow.c
  ae66fb2ba6 ("mptcp: Do TCP fallback on early DSS checksum failure")
  0348c690ed ("mptcp: add the fallback check")
  f8d4bcacff ("mptcp: infinite mapping receiving")
https://lore.kernel.org/all/20220519115837.380bb8d4@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19 11:23:59 -07:00
Paul Greenwalt
31b6298fd8 ice: fix possible under reporting of ethtool Tx and Rx statistics
The hardware statistics counters are not cleared during resets so the
drivers first access is to initialize the baseline and then subsequent
reads are for reporting the counters. The statistics counters are read
during the watchdog subtask when the interface is up. If the baseline
is not initialized before the interface is up, then there can be a brief
window in which some traffic can be transmitted/received before the
initial baseline reading takes place.

Directly initialize ethtool statistics in driver open so the baseline will
be initialized when the interface is up, and any dropped packets
incremented before the interface is up won't be reported.

Fixes: 28dc1b86f8 ("ice: ignore dropped packets during init")
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-17 10:37:09 -07:00
Jakub Kicinski
9b19e57a3c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Build issue in drivers/net/ethernet/sfc/ptp.c
  54fccfdd7c ("sfc: efx_default_channel_type APIs can be static")
  49e6123c65 ("net: sfc: fix memory leak due to ptp channel")
https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12 16:15:30 -07:00
Alaa Mohamed
ca4567f1e6 rtnetlink: add extack support in fdb del handlers
Add extack support to .ndo_fdb_del in netdevice.h and
all related methods.

Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-09 11:58:20 +01:00
Ivan Vecera
486b9eee57 ice: Fix race during aux device (un)plugging
Function ice_plug_aux_dev() assigns pf->adev field too early prior
aux device initialization and on other side ice_unplug_aux_dev()
starts aux device deinit and at the end assigns NULL to pf->adev.
This is wrong because pf->adev should always be non-NULL only when
aux device is fully initialized and ready. This wrong order causes
a crash when ice_send_event_to_aux() call occurs because that function
depends on non-NULL value of pf->adev and does not assume that
aux device is half-initialized or half-destroyed.
After order correction the race window is tiny but it is still there,
as Leon mentioned and manipulation with pf->adev needs to be protected
by mutex.

Fix (un-)plugging functions so pf->adev field is set after aux device
init and prior aux device destroy and protect pf->adev assignment by
new mutex. This mutex is also held during ice_send_event_to_aux()
call to ensure that aux device is valid during that call.
Note that device lock used ice_send_event_to_aux() needs to be kept
to avoid race with aux drv unload.

Reproducer:
cycle=1
while :;do
        echo "#### Cycle: $cycle"

        ip link set ens7f0 mtu 9000
        ip link add bond0 type bond mode 1 miimon 100
        ip link set bond0 up
        ifenslave bond0 ens7f0
        ip link set bond0 mtu 9000
        ethtool -L ens7f0 combined 1
        ip link del bond0
        ip link set ens7f0 mtu 1500
        sleep 1

        let cycle++
done

In short when the device is added/removed to/from bond the aux device
is unplugged/plugged. When MTU of the device is changed an event is
sent to aux device asynchronously. This can race with (un)plugging
operation and because pf->adev is set too early (plug) or too late
(unplug) the function ice_send_event_to_aux() can touch uninitialized
or destroyed fields. In the case of crash below pf->adev->dev.mutex.

Crash:
[   53.372066] bond0: (slave ens7f0): making interface the new active one
[   53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an u
p link
[   53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
[   53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an up
 link
[   54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed inval
idating tc mappings. Priority traffic classification disabled!
[   54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed inval
idating tc mappings. Priority traffic classification disabled!
[   54.248204] bond0: (slave ens7f0): Releasing backup interface
[   54.253955] bond0: (slave ens7f1): making interface the new active one
[   54.274875] bond0: (slave ens7f1): Releasing backup interface
[   54.289153] bond0 (unregistering): Released all slaves
[   55.383179] MII link monitoring set to 100 ms
[   55.398696] bond0: (slave ens7f0): making interface the new active one
[   55.405241] BUG: kernel NULL pointer dereference, address: 0000000000000080
[   55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an u
p link
[   55.412198] #PF: supervisor write access in kernel mode
[   55.412200] #PF: error_code(0x0002) - not-present page
[   55.412201] PGD 25d2ad067 P4D 0
[   55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI
[   55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted: G S
           5.17.0-13579-g57f2d6540f03 #1
[   55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an up
 link
[   55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.4.4 10/07/
2021
[   55.430226] Workqueue: ice ice_service_task [ice]
[   55.468169] RIP: 0010:mutex_unlock+0x10/0x20
[   55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 <f0> 48 0f b1 17 75 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48
[   55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246
[   55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX: 0000000000000001
[   55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI: 0000000000000080
[   55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09: 0000000000000041
[   55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12: ff1a79d1c7e48bc0
[   55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15: 0000000000000000
[   55.532076] FS:  0000000000000000(0000) GS:ff1a79d0ffc00000(0000) knlGS:0000000000000000
[   55.540163] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4: 0000000000771ef0
[   55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   55.567305] PKRU: 55555554
[   55.570018] Call Trace:
[   55.572474]  <TASK>
[   55.574579]  ice_service_task+0xaab/0xef0 [ice]
[   55.579130]  process_one_work+0x1c5/0x390
[   55.583141]  ? process_one_work+0x390/0x390
[   55.587326]  worker_thread+0x30/0x360
[   55.590994]  ? process_one_work+0x390/0x390
[   55.595180]  kthread+0xe6/0x110
[   55.598325]  ? kthread_complete_and_exit+0x20/0x20
[   55.603116]  ret_from_fork+0x1f/0x30
[   55.606698]  </TASK>

Fixes: f9f5301e7e ("ice: Register auxiliary device to provide RDMA")
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-06 10:09:24 -07:00
Michal Swiatkowski
4b889474ad ice: get switch id on switchdev devices
Switch id should be the same for each netdevice on a driver.
The id must be unique between devices on the same system, but
does not need to be unique between devices on different systems.

The switch id is used to locate ports on a switch and to know if
aggregated ports belong to the same switch.

To meet this requirements, use pci_get_dsn as switch id value, as
this is unique value for each devices on the same system.

Implementing switch id is needed by automatic tools for kubernetes.

Set switch id by setting devlink port attribiutes and calling
devlink_port_attrs_set while creating pf (for uplink) and vf
(for representator) devlink port.

To get switch id (in switchdev mode):
cat /sys/class/net/$PF0/phys_switch_id

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-05 11:28:35 -07:00
Jakub Kicinski
0e55546b18 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/linux/netdevice.h
net/core/dev.c
  6510ea973d ("net: Use this_cpu_inc() to increment net->core_stats")
  794c24e992 ("net-core: rx_otherhost_dropped to core_stats")
https://lore.kernel.org/all/20220428111903.5f4304e0@canb.auug.org.au/

drivers/net/wan/cosa.c
  d48fea8401 ("net: cosa: fix error check return value of register_chrdev()")
  89fbca3307 ("net: wan: remove support for COSA and SRP synchronous serial boards")
https://lore.kernel.org/all/20220428112130.1f689e5e@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-28 13:02:01 -07:00
Petr Oros
b537752e6c ice: wait 5 s for EMP reset after firmware flash
We need to wait 5 s for EMP reset after firmware flash. Code was extracted
from OOT driver (ice v1.8.3 downloaded from sourceforge). Without this
wait, fw_activate let card in inconsistent state and recoverable only
by second flash/activate. Flash was tested on these fw's:
From -> To
 3.00 -> 3.10/3.20
 3.10 -> 3.00/3.20
 3.20 -> 3.00/3.10

Reproducer:
[root@host ~]# devlink dev flash pci/0000:ca:00.0 file E810_XXVDA4_FH_O_SEC_FW_1p6p1p9_NVM_3p10_PLDMoMCTP_0.11_8000AD7B.bin
Preparing to flash
[fw.mgmt] Erasing
[fw.mgmt] Erasing done
[fw.mgmt] Flashing 100%
[fw.mgmt] Flashing done 100%
[fw.undi] Erasing
[fw.undi] Erasing done
[fw.undi] Flashing 100%
[fw.undi] Flashing done 100%
[fw.netlist] Erasing
[fw.netlist] Erasing done
[fw.netlist] Flashing 100%
[fw.netlist] Flashing done 100%
Activate new firmware by devlink reload
[root@host ~]# devlink dev reload pci/0000:ca:00.0 action fw_activate
reload_actions_performed:
    fw_activate
[root@host ~]# ip link show ens7f0
71: ens7f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether b4:96:91:dc:72:e0 brd ff:ff:ff:ff:ff:ff
    altname enp202s0f0

dmesg after flash:
[   55.120788] ice: Copyright (c) 2018, Intel Corporation.
[   55.274734] ice 0000:ca:00.0: Get PHY capabilities failed status = -5, continuing anyway
[   55.569797] ice 0000:ca:00.0: The DDP package was successfully loaded: ICE OS Default Package version 1.3.28.0
[   55.603629] ice 0000:ca:00.0: Get PHY capability failed.
[   55.608951] ice 0000:ca:00.0: ice_init_nvm_phy_type failed: -5
[   55.647348] ice 0000:ca:00.0: PTP init successful
[   55.675536] ice 0000:ca:00.0: DCB is enabled in the hardware, max number of TCs supported on this port are 8
[   55.685365] ice 0000:ca:00.0: FW LLDP is disabled, DCBx/LLDP in SW mode.
[   55.692179] ice 0000:ca:00.0: Commit DCB Configuration to the hardware
[   55.701382] ice 0000:ca:00.0: 126.024 Gb/s available PCIe bandwidth, limited by 16.0 GT/s PCIe x8 link at 0000:c9:02.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
Reboot doesn’t help, only second flash/activate with OOT or patched
driver put card back in consistent state.

After patch:
[root@host ~]# devlink dev flash pci/0000:ca:00.0 file E810_XXVDA4_FH_O_SEC_FW_1p6p1p9_NVM_3p10_PLDMoMCTP_0.11_8000AD7B.bin
Preparing to flash
[fw.mgmt] Erasing
[fw.mgmt] Erasing done
[fw.mgmt] Flashing 100%
[fw.mgmt] Flashing done 100%
[fw.undi] Erasing
[fw.undi] Erasing done
[fw.undi] Flashing 100%
[fw.undi] Flashing done 100%
[fw.netlist] Erasing
[fw.netlist] Erasing done
[fw.netlist] Flashing 100%
[fw.netlist] Flashing done 100%
Activate new firmware by devlink reload
[root@host ~]# devlink dev reload pci/0000:ca:00.0 action fw_activate
reload_actions_performed:
    fw_activate
[root@host ~]# ip link show ens7f0
19: ens7f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether b4:96:91:dc:72:e0 brd ff:ff:ff:ff:ff:ff
    altname enp202s0f0

Fixes: 399e27dbbd ("ice: support immediate firmware activation via devlink reload")
Signed-off-by: Petr Oros <poros@redhat.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-26 09:26:48 -07:00
Paolo Abeni
edf45f007a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-04-15 09:26:00 +02:00
Joe Damato
69e66c04c6 ice: Add mpls+tso support
Attempt to add mpls+tso support.

I don't have ice hardware available to test myself, but I just implemented
this feature in i40e and thought it might be useful to implement for ice
while this is fresh in my brain.

Hoping some one at intel will be able to test this on my behalf.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-12 11:05:09 -07:00
Alexander Lobakin
d7442f512b ice: arfs: fix use-after-free when freeing @rx_cpu_rmap
The CI testing bots triggered the following splat:

[  718.203054] BUG: KASAN: use-after-free in free_irq_cpu_rmap+0x53/0x80
[  718.206349] Read of size 4 at addr ffff8881bd127e00 by task sh/20834
[  718.212852] CPU: 28 PID: 20834 Comm: sh Kdump: loaded Tainted: G S      W IOE     5.17.0-rc8_nextqueue-devqueue-02643-g23f3121aca93 #1
[  718.219695] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[  718.223418] Call Trace:
[  718.227139]
[  718.230783]  dump_stack_lvl+0x33/0x42
[  718.234431]  print_address_description.constprop.9+0x21/0x170
[  718.238177]  ? free_irq_cpu_rmap+0x53/0x80
[  718.241885]  ? free_irq_cpu_rmap+0x53/0x80
[  718.245539]  kasan_report.cold.18+0x7f/0x11b
[  718.249197]  ? free_irq_cpu_rmap+0x53/0x80
[  718.252852]  free_irq_cpu_rmap+0x53/0x80
[  718.256471]  ice_free_cpu_rx_rmap.part.11+0x37/0x50 [ice]
[  718.260174]  ice_remove_arfs+0x5f/0x70 [ice]
[  718.263810]  ice_rebuild_arfs+0x3b/0x70 [ice]
[  718.267419]  ice_rebuild+0x39c/0xb60 [ice]
[  718.270974]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[  718.274472]  ? ice_init_phy_user_cfg+0x360/0x360 [ice]
[  718.278033]  ? delay_tsc+0x4a/0xb0
[  718.281513]  ? preempt_count_sub+0x14/0xc0
[  718.284984]  ? delay_tsc+0x8f/0xb0
[  718.288463]  ice_do_reset+0x92/0xf0 [ice]
[  718.292014]  ice_pci_err_resume+0x91/0xf0 [ice]
[  718.295561]  pci_reset_function+0x53/0x80
<...>
[  718.393035] Allocated by task 690:
[  718.433497] Freed by task 20834:
[  718.495688] Last potentially related work creation:
[  718.568966] The buggy address belongs to the object at ffff8881bd127e00
                which belongs to the cache kmalloc-96 of size 96
[  718.574085] The buggy address is located 0 bytes inside of
                96-byte region [ffff8881bd127e00, ffff8881bd127e60)
[  718.579265] The buggy address belongs to the page:
[  718.598905] Memory state around the buggy address:
[  718.601809]  ffff8881bd127d00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  718.604796]  ffff8881bd127d80: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
[  718.607794] >ffff8881bd127e00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  718.610811]                    ^
[  718.613819]  ffff8881bd127e80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
[  718.617107]  ffff8881bd127f00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc

This is due to that free_irq_cpu_rmap() is always being called
*after* (devm_)free_irq() and thus it tries to work with IRQ descs
already freed. For example, on device reset the driver frees the
rmap right before allocating a new one (the splat above).
Make rmap creation and freeing function symmetrical with
{request,free}_irq() calls i.e. do that on ifup/ifdown instead
of device probe/remove/resume. These operations can be performed
independently from the actual device aRFS configuration.
Also, make sure ice_vsi_free_irq() clears IRQ affinity notifiers
only when aRFS is disabled -- otherwise, CPU rmap sets and clears
its own and they must not be touched manually.

Fixes: 28bf26724f ("ice: Implement aRFS")
Co-developed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Tested-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-08 09:08:36 -07:00
Maciej Fijalkowski
e19778e6c9 ice: clear cmd_type_offset_bsz for TX rings
Currently when XDP rings are created, each descriptor gets its DD bit
set, which turns out to be the wrong approach as it can lead to a
situation where more descriptors get cleaned than it was supposed to,
e.g. when AF_XDP busy poll is run with a large batch size. In this
situation, the driver would request for more buffers than it is able to
handle.

Fix this by not setting the DD bits in ice_xdp_alloc_setup_rings(). They
should be initialized to zero instead.

Fixes: 9610bd988d ("ice: optimize XDP_TX workloads")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Shwetha Nagaraju <shwetha.nagaraju@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-05 09:09:06 -07:00
Maciej Fijalkowski
f9124c68f0 ice: synchronize_rcu() when terminating rings
Unfortunately, the ice driver doesn't respect the RCU critical section that
XSK wakeup is surrounded with. To fix this, add synchronize_rcu() calls to
paths that destroy resources that might be in use.

This was addressed in other AF_XDP ZC enabled drivers, for reference see
for example commit b3873a5be7 ("net/i40e: Fix concurrency issues
between config flow and XSK")

Fixes: efc2214b60 ("ice: Add support for XDP")
Fixes: 2d4238f556 ("ice: Add support for AF_XDP")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Shwetha Nagaraju <shwetha.nagaraju@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-05 09:07:08 -07:00
Ivan Vecera
1273f89578 ice: Fix broken IFF_ALLMULTI handling
Handling of all-multicast flag and associated multicast promiscuous
mode is broken in ice driver. When an user switches allmulticast
flag on or off the driver checks whether any VLANs are configured
over the interface (except default VLAN 0).

If any extra VLANs are registered it enables multicast promiscuous
mode for all these VLANs (including default VLAN 0) using
ICE_SW_LKUP_PROMISC_VLAN look-up type. In this situation all
multicast packets tagged with known VLAN ID or untagged are received
and multicast packets tagged with unknown VLAN ID ignored.

If no extra VLANs are registered (so only VLAN 0 exists) it enables
multicast promiscuous mode for VLAN 0 and uses ICE_SW_LKUP_PROMISC
look-up type. In this situation any multicast packets including
tagged ones are received.

The driver handles IFF_ALLMULTI in ice_vsi_sync_fltr() this way:

ice_vsi_sync_fltr() {
  ...
  if (changed_flags & IFF_ALLMULTI) {
    if (netdev->flags & IFF_ALLMULTI) {
      if (vsi->num_vlans > 1)
        ice_set_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS);
      else
        ice_set_promisc(..., ICE_MCAST_PROMISC_BITS);
    } else {
      if (vsi->num_vlans > 1)
        ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS);
      else
        ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS);
    }
  }
  ...
}

The code above depends on value vsi->num_vlan that specifies number
of VLANs configured over the interface (including VLAN 0) and
this is problem because that value is modified in NDO callbacks
ice_vlan_rx_add_vid() and ice_vlan_rx_kill_vid().

Scenario 1:
1. ip link set ens7f0 allmulticast on
2. ip link add vlan10 link ens7f0 type vlan id 10
3. ip link set ens7f0 allmulticast off
4. ip link set ens7f0 allmulticast on

[1] In this scenario IFF_ALLMULTI is enabled and the driver calls
    ice_set_promisc(..., ICE_MCAST_PROMISC_BITS) that installs
    multicast promisc rule with non-VLAN look-up type.
[2] Then VLAN with ID 10 is added and vsi->num_vlan incremented to 2
[3] Command switches IFF_ALLMULTI off and the driver calls
    ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS) but this
    call is effectively NOP because it looks for multicast promisc
    rules for VLAN 0 and VLAN 10 with VLAN look-up type but no such
    rules exist. So the all-multicast remains enabled silently
    in hardware.
[4] Command tries to switch IFF_ALLMULTI on and the driver calls
    ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS) but this call
    fails (-EEXIST) because non-VLAN multicast promisc rule already
    exists.

Scenario 2:
1. ip link add vlan10 link ens7f0 type vlan id 10
2. ip link set ens7f0 allmulticast on
3. ip link add vlan20 link ens7f0 type vlan id 20
4. ip link del vlan10 ; ip link del vlan20
5. ip link set ens7f0 allmulticast off

[1] VLAN with ID 10 is added and vsi->num_vlan==2
[2] Command switches IFF_ALLMULTI on and driver installs multicast
    promisc rules with VLAN look-up type for VLAN 0 and 10
[3] VLAN with ID 20 is added and vsi->num_vlan==3 but no multicast
    promisc rules is added for this new VLAN so the interface does
    not receive MC packets from VLAN 20
[4] Both VLANs are removed but multicast rule for VLAN 10 remains
    installed so interface receives multicast packets from VLAN 10
[5] Command switches IFF_ALLMULTI off and because vsi->num_vlan is 1
    the driver tries to remove multicast promisc rule for VLAN 0
    with non-VLAN look-up that does not exist.
    All-multicast looks disabled from user point of view but it
    is partially enabled in HW (interface receives all multicast
    packets either untagged or tagged with VLAN ID 10)

To resolve these issues the patch introduces these changes:
1. Adds handling for IFF_ALLMULTI to ice_vlan_rx_add_vid() and
   ice_vlan_rx_kill_vid() callbacks. So when VLAN is added/removed
   and IFF_ALLMULTI is enabled an appropriate multicast promisc
   rule for that VLAN ID is added/removed.
2. In ice_vlan_rx_add_vid() when first VLAN besides VLAN 0 is added
   so (vsi->num_vlan == 2) and IFF_ALLMULTI is enabled then look-up
   type for existing multicast promisc rule for VLAN 0 is updated
   to ICE_MCAST_VLAN_PROMISC_BITS.
3. In ice_vlan_rx_kill_vid() when last VLAN besides VLAN 0 is removed
   so (vsi->num_vlan == 1) and IFF_ALLMULTI is enabled then look-up
   type for existing multicast promisc rule for VLAN 0 is updated
   to ICE_MCAST_PROMISC_BITS.
4. Both ice_vlan_rx_{add,kill}_vid() have to run under ICE_CFG_BUSY
   bit protection to avoid races with ice_vsi_sync_fltr() that runs
   in ice_service_task() context.
5. Bit ICE_VSI_VLAN_FLTR_CHANGED is use-less and can be removed.
6. Error messages added to ice_fltr_*_vsi_promisc() helper functions
   to avoid them in their callers
7. Small improvements to increase readability

Fixes: 5eda8afd6b ("ice: Add support for PF/VF promiscuous mode")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-01 12:01:38 +01:00
Ivan Vecera
2c0069f3f9 ice: Fix MAC address setting
Commit 2ccc1c1ccc ("ice: Remove excess error variables") merged
the usage of 'status' and 'err' variables into single one in
function ice_set_mac_address(). Unfortunately this causes
a regression when call of ice_fltr_add_mac() returns -EEXIST because
this return value does not indicate an error in this case but
value of 'err' remains to be -EEXIST till the end of the function
and is returned to caller.

Prior mentioned commit this does not happen because return value of
ice_fltr_add_mac() was stored to 'status' variable first and
if it was -EEXIST then 'err' remains to be zero.

Fix the problem by reset 'err' to zero when ice_fltr_add_mac()
returns -EEXIST.

Fixes: 2ccc1c1ccc ("ice: Remove excess error variables")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-01 12:01:38 +01:00
Jakub Kicinski
89695196f0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge in overtime fixes, no conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-23 10:53:49 -07:00
Alexander Lobakin
32d53c0aa3 ice: fix 'scheduling while atomic' on aux critical err interrupt
There's a kernel BUG splat on processing aux critical error
interrupts in ice_misc_intr():

[ 2100.917085] BUG: scheduling while atomic: swapper/15/0/0x00010000
...
[ 2101.060770] Call Trace:
[ 2101.063229]  <IRQ>
[ 2101.065252]  dump_stack+0x41/0x60
[ 2101.068587]  __schedule_bug.cold.100+0x4c/0x58
[ 2101.073060]  __schedule+0x6a4/0x830
[ 2101.076570]  schedule+0x35/0xa0
[ 2101.079727]  schedule_preempt_disabled+0xa/0x10
[ 2101.084284]  __mutex_lock.isra.7+0x310/0x420
[ 2101.088580]  ? ice_misc_intr+0x201/0x2e0 [ice]
[ 2101.093078]  ice_send_event_to_aux+0x25/0x70 [ice]
[ 2101.097921]  ice_misc_intr+0x220/0x2e0 [ice]
[ 2101.102232]  __handle_irq_event_percpu+0x40/0x180
[ 2101.106965]  handle_irq_event_percpu+0x30/0x80
[ 2101.111434]  handle_irq_event+0x36/0x53
[ 2101.115292]  handle_edge_irq+0x82/0x190
[ 2101.119148]  handle_irq+0x1c/0x30
[ 2101.122480]  do_IRQ+0x49/0xd0
[ 2101.125465]  common_interrupt+0xf/0xf
[ 2101.129146]  </IRQ>
...

As Andrew correctly mentioned previously[0], the following call
ladder happens:

ice_misc_intr() <- hardirq
  ice_send_event_to_aux()
    device_lock()
      mutex_lock()
        might_sleep()
          might_resched() <- oops

Add a new PF state bit which indicates that an aux critical error
occurred and serve it in ice_service_task() in process context.
The new ice_pf::oicr_err_reg is read-write in both hardirq and
process contexts, but only 3 bits of non-critical data probably
aren't worth explicit synchronizing (and they're even in the same
byte [31:24]).

[0] https://lore.kernel.org/all/YeSRUVmrdmlUXHDn@lunn.ch

Fixes: 348048e724 ("ice: Implement iidc operations")
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Tested-by: Michal Kubiak <michal.kubiak@intel.com>
Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-23 10:40:40 -07:00
Jakub Kicinski
e243f39685 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17 13:56:58 -07:00
Sudheer Mogilappagari
1b4ae7d925 ice: destroy flow director filter mutex after releasing VSIs
Currently fdir_fltr_lock is accessed in ice_vsi_release_all() function
after it is destroyed. Instead destroy mutex after ice_vsi_release_all.

Fixes: 40319796b7 ("ice: Add flow director support for channel mode")
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15 13:36:13 -07:00
Maciej Fijalkowski
f153546913 ice: fix NULL pointer dereference in ice_update_vsi_tx_ring_stats()
It is possible to do NULL pointer dereference in routine that updates
Tx ring stats. Currently only stats and bytes are updated when ring
pointer is valid, but later on ring is accessed to propagate gathered Tx
stats onto VSI stats.

Change the existing logic to move to next ring when ring is NULL.

Fixes: e72bba2135 ("ice: split ice_ring onto Tx/Rx separate structs")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15 13:36:13 -07:00
Jacob Keller
f5f085c01d ice: introduce ICE_VF_RESET_LOCK flag
The ice_reset_vf function performs actions which must be taken only
while holding the VF configuration lock. Some flows already acquired the
lock, while other flows must acquire it just for the reset function. Add
the ICE_VF_RESET_LOCK flag to the function so that it can handle taking
and releasing the lock instead at the appropriate scope.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15 13:23:02 -07:00
Jacob Keller
7eb517e434 ice: convert ice_reset_vf to take flags
The ice_reset_vf function takes a boolean parameter which indicates
whether or not the reset is due to a VFLR event.

This is somewhat confusing to read because readers must interpret what
"true" and "false" mean when seeing a line of code like
"ice_reset_vf(vf, false)".

We will want to add another toggle to the ice_reset_vf in a following
change. To avoid proliferating many arguments, convert this function to
take flags instead. ICE_VF_RESET_VFLR will indicate if this is a VFLR
reset. A value of 0 indicates no flags.

One could argue that "ice_reset_vf(vf, 0)" is no more readable than
"ice_reset_vf(vf, false)".. However, this type of flags interface is
somewhat common and using 0 to mean "no flags" makes sense in this
context. We could bother to add a define for "ICE_VF_RESET_PLAIN" or
something similar, but this can be confusing since its not an actual bit
flag.

This paves the way to add another flag to the function in a following
change.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15 13:22:54 -07:00
Jacob Keller
dac5728875 ice: drop is_vflr parameter from ice_reset_all_vfs
The ice_reset_all_vfs function takes a parameter to handle whether its
operating after a VFLR event or not. This is not necessary as every
caller always passes true. Simplify the interface by removing the
parameter.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-15 13:22:39 -07:00
Jacob Keller
dc36796ead ice: rename ICE_MAX_VF_COUNT to avoid confusion
The ICE_MAX_VF_COUNT field is defined in ice_sriov.h. This count is true
for SR-IOV but will not be true for all VF implementations, such as when
the ice driver supports Scalable IOV.

Rename this definition to clearly indicate ICE_MAX_SRIOV_VFS.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-14 17:22:59 -07:00
Jacob Keller
649c87c6ff ice: remove circular header dependencies on ice.h
Several headers in the ice driver include ice.h even though they are
themselves included by that header. The most notable of these is
ice_common.h, but several other headers also do this.

Such a recursive inclusion is problematic as it forces headers to be
included in a strict order, otherwise compilation errors can result. The
circular inclusions do not trigger an endless loop due to standard
header inclusion guards, however other errors can occur.

For example, ice_flow.h defines ice_rss_hash_cfg, which is used by
ice_sriov.h as part of the definition of ice_vf_hash_ip_ctx.

ice_flow.h includes ice_acl.h, which includes ice_common.h, and which
finally includes ice.h. Since ice.h itself includes ice_sriov.h, this
creates a circular dependency.

The definition in ice_sriov.h requires things from ice_flow.h, but
ice_flow.h itself will lead to trying to load ice_sriov.h as part of its
process for expanding ice.h. The current code avoids this issue by
having an implicit dependency without the include of ice_flow.h.

If we were to fix that so that ice_sriov.h explicitly depends on
ice_flow.h the following pattern would occur:

  ice_flow.h -> ice_acl.h -> ice_common.h -> ice.h -> ice_sriov.h

At this point, during the expansion of, the header guard for ice_flow.h
is already set, so when ice_sriov.h attempts to load the ice_flow.h
header it is skipped. Then, we go on to begin including the rest of
ice_sriov.h, including structure definitions which depend on
ice_rss_hash_cfg. This produces a compiler warning because
ice_rss_hash_cfg hasn't yet been included. Remember, we're just at the
start of ice_flow.h!

If the order of headers is incorrect (ice_flow.h is not implicitly
loaded first in all files which include ice_sriov.h) then we get the
same failure.

Removing this recursive inclusion requires fixing a few cases where some
headers depended on the header inclusions from ice.h. In addition, a few
other changes are also required.

Most notably, ice_hw_to_dev is implemented as a macro in ice_osdep.h,
which is the likely reason that ice_common.h includes ice.h at all. This
macro implementation requires the full definition of ice_pf in order to
properly compile.

Fix this by moving it to a function declared in ice_main.c, so that we
do not require all files to depend on the layout of the ice_pf
structure.

Note that this change only fixes circular dependencies, but it does not
fully resolve all implicit dependencies where one header may depend on
the inclusion of another. I tried to fix as many of the implicit
dependencies as I noticed, but fixing them all requires a somewhat
tedious analysis of each header and attempting to compile it separately.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-14 17:22:58 -07:00
Jakub Kicinski
4c7d2e1795 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2022-03-09

This series contains updates to ice driver only.

Martyna implements switchdev filtering on inner EtherType field for
tunnels.

Marcin adds reporting of slowpath statistics for port representors.

Jonathan Toppins changes a non-fatal link error message from warning to
debug.

Maciej removes unnecessary checks in ice_clean_tx_irq().

Amritha adds support for ADQ to match outer destination MAC for tunnels.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: Add support for outer dest MAC for ADQ tunnels
  ice: avoid XDP checks in ice_clean_tx_irq()
  ice: change "can't set link" message to dbg level
  ice: Add slow path offload stats on port representor in switchdev
  ice: Add support for inner etype in switchdev
====================

Link: https://lore.kernel.org/r/20220309190315.1380414-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10 20:20:13 -08:00
Jakub Kicinski
1e8a3f0d2a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/dsa/dsa2.c
  commit afb3cc1a39 ("net: dsa: unlock the rtnl_mutex when dsa_master_setup() fails")
  commit e83d565378 ("net: dsa: replay master state events in dsa_tree_{setup,teardown}_master")
https://lore.kernel.org/all/20220307101436.7ae87da0@canb.auug.org.au/

drivers/net/ethernet/intel/ice/ice.h
  commit 97b0129146 ("ice: Fix error with handling of bonding MTU")
  commit 43113ff734 ("ice: add TTY for GNSS module for E810T device")
https://lore.kernel.org/all/20220310112843.3233bcf1@canb.auug.org.au/

drivers/staging/gdm724x/gdm_lte.c
  commit fc7f750dc9 ("staging: gdm724x: fix use after free in gdm_lte_rx()")
  commit 4bcc4249b4 ("staging: Use netif_rx().")
https://lore.kernel.org/all/20220308111043.1018a59d@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10 17:16:56 -08:00