The i40e device has a different clock rate depending on the current link
speed. This requires using a different increment rate for the PTP clock
registers. For slower link speeds, the base increment value is larger.
Directly multiplying the larger increment value by the parts per billion
adjustment might overflow.
To avoid this, the i40e implementation defaults to using the lower
increment value and then multiplying the adjustment afterwards. This causes
a loss of precision for lower link speeds.
We can fix this by using mul_u64_u64_div_u64 instead of performing the
multiplications using standard C operations. On X86, this will use special
instructions that perform the multiplication and division with 128bit
intermediate values. For other architectures, the fallback implementation
will limit the loss of precision for large values. Small adjustments don't
overflow anyways and won't lose precision at all.
This allows first multiplying the base increment value and then performing
the adjustment calculation, since we no longer fear overflowing. It also
makes it easier to convert to the even more precise .adjfine implementation
in a following change.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix the inability to bring an interface up on a setup with
only MSI interrupts enabled (no MSI-X).
Solution is to add a default number of QPs = 1. This is enough,
since without MSI-X support driver enables only a basic feature set.
Fixes: bc6d33c8d9 ("i40e: Fix the number of queues available to be mapped for use")
Signed-off-by: Dawid Lukwinski <dawid.lukwinski@intel.com>
Signed-off-by: Michal Maloszewski <michal.maloszewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220722175401.112572-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Refactor bitwise checks for whether TC MQPRIO is enabled
into one single method for improved readability.
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix an issue when driver incorrectly detects state
of recovery process and erroneously reinitializes interrupts,
which results in a kernel error and call trace message.
The issue was caused by a combination of two factors:
1. Assuming the EMP reset issued after completing
firmware recovery means the whole recovery process is complete.
2. Erroneous reinitialization of interrupt vector after detecting
the above mentioned EMP reset.
Fixes (1) by changing how recovery state change is detected
and (2) by adjusting the conditional expression to ensure using proper
interrupt reinitialization method, depending on the situation.
Fixes: 4ff0ee1af0 ("i40e: Introduce recovery mode support")
Signed-off-by: Dawid Lukwinski <dawid.lukwinski@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220715214542.2968762-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Clear VF MAC from parent PF and remove VF filter from VSI when both
conditions are true:
-VIRTCHNL_VF_OFFLOAD_USO is not used
-VM MAC was not set from PF level
It affects older version of IAVF and it allow them to change MAC
Address on VM, newer IAVF won't change their behaviour.
Previously it wasn't possible to change VF's MAC Address on VM
because there is flag on IAVF driver that won't allow to
change MAC Address if this address is given from PF driver.
Fixes: 155f0ac2c9 ("iavf: allow permanent MAC address to change")
Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Dropped packets caused by too large frames were not included in
dropped RX packets statistics.
Issue was caused by not reading the GL_RXERR1 register. That register
stores count of packet which was have been dropped due to too large
size.
Fix it by reading GL_RXERR1 register for each interface.
Repro steps:
Send a packet larger than the set MTU to SUT
Observe rx statists: ethtool -S <interface> | grep rx | grep -v ": 0"
Fixes: 41a9e55c89 ("i40e: add missing VSI statistics")
Signed-off-by: Lukasz Cieplicki <lukaszx.cieplicki@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
As found by the compile option -Wunused-macros, remove these macros
that are never used by the code.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Similar to how it's done in the ice driver since 'eb087cd82864 ("ice:
propagate xdp_ring onto rx_ring")', read the XDP program once per NAPI
instead of once per descriptor cleaned. I measured an improvement in
throughput of 2% for the AF_XDP xdpsock l2fwd benchmark for zero copy mode
and 1% for copy mode.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Link: https://lore.kernel.org/r/20220623100852.7867-1-ciara.loftus@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
dev_kfree_skb check if the input parameter NULL and do the right
thing, there is no need to check again.
This change is to cleanup the code a bit.
Signed-off-by: Bernard Zhao <zhaojunkui2008@126.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Calling synchronize_irq() right before free_irq() is quite useless. On one
hand the IRQ can easily fire again before free_irq() is entered, on the
other hand free_irq() itself calls synchronize_irq() internally (in a race
condition free way), before any state associated with the IRQ is freed.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add ability to change speed through ethtool -s <interface> speed <speed in Mb>
Driver advertises all link modes that support requested speed.
Autoneg must be set off e.g.:
ethtool -s <interface> autoneg off speed <speed in Mb>.
Add helper function that translate speed in Mb to
enum i40e_aq_link_speed and compare it to supported speeds from
given ethtool_link_ksettings. Add in i40e_set_link_ksettings hold
for requested speed and set copy_ks.base.speed to safe_ks.base.speed
to be sure that user is not changing unsupported setting.
In i40e_speed_to_link_speed compare requested speed with supported
speeds. Set speed to requested speed.
Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add the capability to map non-linear xdp frames in XDP_TX and ndo_xdp_xmit
callback.
Tested-by: Sarkar Tirthendu <tirthendu.sarkar@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After PF reset and ethtool -t there was call trace in dmesg
sometimes leading to panic. When there was some time, around 5
seconds, between reset and test there were no errors.
Problem was that pf reset calls i40e_vsi_close in prep_for_reset
and ethtool -t calls i40e_vsi_close in diag_test. If there was not
enough time between those commands the second i40e_vsi_close starts
before previous i40e_vsi_close was done which leads to crash.
Add check to diag_test if pf is in reset and don't start offline
tests if it is true.
Add netif_info("testing failed") into unhappy path of i40e_diag_test()
Fixes: e17bc411ae ("i40e: Disable offline diagnostics if VFs are enabled")
Fixes: 510efb2682 ("i40e: Fix ethtool offline diagnostic with netqueues")
Signed-off-by: Michal Jaron <michalx.jaron@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
If ADQ is enabled for a VF, then actual number of queue pair
is a number of currently available traffic classes for this VF.
Without this change the configuration of the Rx/Tx queues
fails with error.
Fixes: d29e0d233e ("i40e: missing input validation on VF message handling by the PF")
Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Procedure of configure tc flower filters erroneously allows to create
filters on TC0 where unfiltered packets are also directed by default.
Issue was caused by insufficient checks of hw_tc parameter specifying
the hardware traffic class to pass matching packets to.
Fix checking hw_tc parameter which blocks creation of filters on TC0.
Fixes: 2f4b411a3d ("i40e: Enable cloud filters via tc-flower")
Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
VFs by default are able to see all tagged traffic regardless of trust
and VLAN filters configured.
Add new private flag vf-vlan-pruning that allows changing of default
VF behavior for tagged traffic. When the flag is turned on
untrusted VF will only be able to receive untagged traffic
or traffic with VLAN tags it has created interfaces for
The flag is off by default and can only be changed if
there are no VFs spawned on the PF. This flag will only be effective
when no PVID is set on VF and VF is not trusted.
Add new function that computes the correct VLAN ID for VF VLAN filters
based on trust, PVID, vf-vlan-prune-disable flag and current VLAN ID.
Testing Hints:
Test 1: vf-vlan-pruning == off
==============================
1. Set the private flag
> ethtool --set-priv-flag eth0 vf-vlan-pruning off (default setting)
2. Use scapy to send any VLAN tagged traffic and make sure the VF
receives all VLAN tagged traffic that matches its destination MAC
filters (unicast, multicast, and broadcast).
Test 2: vf-vlan-pruning == on
==============================
1. Set the private flag
> ethtool --set-priv-flag eth0 vf-vlan-pruning on
2. Use scapy to send any VLAN tagged traffic and make sure the VF does
not receive any VLAN tagged traffic that matches its destination MAC
filters (unicast, multicast, and broadcast).
3. Add a VLAN filter on the VF netdev
> ip link add link eth0v0 name vlan10 type vlan id 10
4. Bring the VLAN netdev up
> ip link set vlan10 up
4. Use scapy to send traffic with VLAN 10, VLAN 11 (anything not VLAN
10), and untagged traffic. Make sure the VF only receives VLAN 10
and untagged traffic when the link partner is sending.
Test 3: vf-vlan-pruning == off && VF is in a port VLAN
==============================
1. Set the private flag
> ethtool --set-priv-flag eth0 vf-vlan-pruning off (default setting)
2. Create a VF
> echo 1 > sriov_numvfs
3. Put the VF in a port VLAN
> ip link set eth0 vf 0 vlan 10
4. Use scapy to send traffic with VLAN 10 and VLAN 11 (anything not VLAN
10) and make sure the VF only receives untagged traffic when the link
partner is sending VLAN 10 tagged traffic as the VLAN tag is expected
to be stripped by HW for port VLANs and not visible to the VF.
Test 4: Change vf-vlan-pruning while VFs are created
==============================
echo 0 > sriov_numvfs
ethtool --set-priv-flag eth0 vf-vlan-pruning off
echo 1 > sriov_numvfs
ethtool --set-priv-flag eth0 vf-vlan-pruning on (expect failure)
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The bug is here:
ret = i40e_add_macvlan_filter(hw, ch->seid, vdev->dev_addr, &aq_err);
The list iterator 'ch' will point to a bogus position containing
HEAD if the list is empty or no element is found. This case must
be checked before any use of the iterator, otherwise it will
lead to a invalid memory access.
To fix this bug, use a new variable 'iter' as the list iterator,
while use the origin variable 'ch' as a dedicated pointer to
point to the found element.
Cc: stable@vger.kernel.org
Fixes: 1d8d80b4e4 ("i40e: Add macvlan support on i40e")
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220510204846.2166999-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-04-27
We've added 85 non-merge commits during the last 18 day(s) which contain
a total of 163 files changed, 4499 insertions(+), 1521 deletions(-).
The main changes are:
1) Teach libbpf to enhance BPF verifier log with human-readable and relevant
information about failed CO-RE relocations, from Andrii Nakryiko.
2) Add typed pointer support in BPF maps and enable it for unreferenced pointers
(via probe read) and referenced ones that can be passed to in-kernel helpers,
from Kumar Kartikeya Dwivedi.
3) Improve xsk to break NAPI loop when rx queue gets full to allow for forward
progress to consume descriptors, from Maciej Fijalkowski & Björn Töpel.
4) Fix a small RCU read-side race in BPF_PROG_RUN routines which dereferenced
the effective prog array before the rcu_read_lock, from Stanislav Fomichev.
5) Implement BPF atomic operations for RV64 JIT, and add libbpf parsing logic
for USDT arguments under riscv{32,64}, from Pu Lehui.
6) Implement libbpf parsing of USDT arguments under aarch64, from Alan Maguire.
7) Enable bpftool build for musl and remove nftw with FTW_ACTIONRETVAL usage
so it can be shipped under Alpine which is musl-based, from Dominique Martinet.
8) Clean up {sk,task,inode} local storage trace RCU handling as they do not
need to use call_rcu_tasks_trace() barrier, from KP Singh.
9) Improve libbpf API documentation and fix error return handling of various
API functions, from Grant Seltzer.
10) Enlarge offset check for bpf_skb_{load,store}_bytes() helpers given data
length of frags + frag_list may surpass old offset limit, from Liu Jian.
11) Various improvements to prog_tests in area of logging, test execution
and by-name subtest selection, from Mykola Lysenko.
12) Simplify map_btf_id generation for all map types by moving this process
to build time with help of resolve_btfids infra, from Menglong Dong.
13) Fix a libbpf bug in probing when falling back to legacy bpf_probe_read*()
helpers; the probing caused always to use old helpers, from Runqing Yang.
14) Add support for ARCompact and ARCv2 platforms for libbpf's PT_REGS
tracing macros, from Vladimir Isaev.
15) Cleanup BPF selftests to remove old & unneeded rlimit code given kernel
switched to memcg-based memory accouting a while ago, from Yafang Shao.
16) Refactor of BPF sysctl handlers to move them to BPF core, from Yan Zhu.
17) Fix BPF selftests in two occasions to work around regressions caused by latest
LLVM to unblock CI until their fixes are worked out, from Yonghong Song.
18) Misc cleanups all over the place, from various others.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits)
selftests/bpf: Add libbpf's log fixup logic selftests
libbpf: Fix up verifier log for unguarded failed CO-RE relos
libbpf: Simplify bpf_core_parse_spec() signature
libbpf: Refactor CO-RE relo human description formatting routine
libbpf: Record subprog-resolved CO-RE relocations unconditionally
selftests/bpf: Add CO-RE relos and SEC("?...") to linked_funcs selftests
libbpf: Avoid joining .BTF.ext data with BPF programs by section name
libbpf: Fix logic for finding matching program for CO-RE relocation
libbpf: Drop unhelpful "program too large" guess
libbpf: Fix anonymous type check in CO-RE logic
bpf: Compute map_btf_id during build time
selftests/bpf: Add test for strict BTF type check
selftests/bpf: Add verifier tests for kptr
selftests/bpf: Add C tests for kptr
libbpf: Add kptr type tag macros to bpf_helpers.h
bpf: Make BTF type match stricter for release arguments
bpf: Teach verifier about kptr_get kfunc helpers
bpf: Wire up freeing of referenced kptr
bpf: Populate pairs of btf_id and destructor kfunc in btf
bpf: Adapt copy_map_value for multiple offset case
...
====================
Link: https://lore.kernel.org/r/20220427224758.20976-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Intel drivers translate actions returned from XDP programs to their own
return codes that have the following mapping:
XDP_REDIRECT -> I40E_XDP_{REDIR,CONSUMED}
XDP_TX -> I40E_XDP_{TX,CONSUMED}
XDP_DROP -> I40E_XDP_CONSUMED
XDP_ABORTED -> I40E_XDP_CONSUMED
XDP_PASS -> I40E_XDP_PASS
Commit b8aef650e5 ("i40e, xsk: Terminate Rx side of NAPI when XSK Rx
queue gets full") introduced new translation
XDP_REDIRECT -> I40E_XDP_EXIT
which is set when XSK RQ gets full and to indicate that driver should
stop further Rx processing. This happens for unsuccessful
xdp_do_redirect() so it is valuable to call trace_xdp_exception() for
this case. In order to avoid I40E_XDP_EXIT -> IXGBE_XDP_CONSUMED
overwrite, XDP_DROP case was moved above which in turn made the
'fallthrough' that is in XDP_ABORTED useless as it became the last label
in the switch statement.
Simply drop this leftover.
Fixes: b8aef650e5 ("i40e, xsk: Terminate Rx side of NAPI when XSK Rx queue gets full")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220421132126.471515-3-maciej.fijalkowski@intel.com
Currently, when debugging AF_XDP workloads, one can correlate the -ENXIO
return code as the case that XSK is not in the bound state. Returning
same code from ndo_xsk_wakeup can be misleading and simply makes it
harder to follow what is going on.
Change ENXIOs in i40e's ndo_xsk_wakeup() implementation to EINVALs, so
that when probing it is clear that something is wrong on the driver
side, not the xsk_{recv,send}msg.
There is a -ENETDOWN that can happen from both kernel/driver sides
though, but I don't have a correct replacement for this on one of the
sides, so let's keep it that way.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220413153015.453864-10-maciej.fijalkowski@intel.com
When XSK pool uses need_wakeup feature, correlate -ENOBUFS that was
returned from xdp_do_redirect() with a XSK Rx queue being full. In such
case, terminate the Rx processing that is being done on the current HW
Rx ring and let the user space consume descriptors from XSK Rx queue so
that there is room that driver can use later on.
Introduce new internal return code I40E_XDP_EXIT that will indicate case
described above.
Note that it does not affect Tx processing that is bound to the same
NAPI context, nor the other Rx rings.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220413153015.453864-7-maciej.fijalkowski@intel.com
Add support for Ethernet Connection X722 for 10GbE SFP+ cards.
Make possible for the driver to bind to the card.
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add vsi.tx_restart to the i40e driver ethtool statistics
Signed-off-by: Nabil S. Alramli <dev@nalramli.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Track TX queue stop events and export the new stat with ethtool.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The calculation of the checksum can fail.
So move converting the checksum to little endian
to inside the return status check.
Signed-off-by: Tom Rix <trix@redhat.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The i40e_vc_send_msg_to_vf_ex (and its wrapper i40e_vc_send_msg_to_vf)
function has logic to detect "failure" responses sent to the VF. If a VF
is sent more than I40E_DEFAULT_NUM_INVALID_MSGS_ALLOWED, then the VF is
marked as disabled. In either case, a dev_info message is printed
stating that a VF opcode failed.
This logic originates from the early implementation of VF support in
commit 5c3c48ac6b ("i40e: implement virtual device interface").
That commit did not go far enough. The "logic" for this behavior seems
to be that error responses somehow indicate a malicious VF. This is not
really true. The PF might be sending an error for any number of reasons
such as lacking resources, an unsupported operation, etc. This does not
indicate a malicious VF. We already have a separate robust malicious VF
detection which relies on hardware logic to detect and prevent a variety
of behaviors.
There is no justification for this behavior in the original
implementation. In fact, a later commit 18b7af57d9 ("i40e: Lower some
message levels") reduced the opcode failure message from a dev_err to a
dev_info. In addition, recent commit 01cbf50877 ("i40e: Fix to not
show opcode msg on unsuccessful VF MAC change") changed the logic to
allow quieting it for expected failures.
That commit prevented this logic from kicking in for specific
circumstances. This change did not go far enough. The behavior is not
documented nor is it part of any requirement for our products. Other
operating systems such as the FreeBSD implementation of our driver do
not include this logic.
It is clear this check does not make sense, and causes problems which
led to ugly workarounds.
Fix this by just removing the entire logic and the need for the
i40e_vc_send_msg_to_vf_ex function.
Fixes: 01cbf50877 ("i40e: Fix to not show opcode msg on unsuccessful VF MAC change")
Fixes: 5c3c48ac6b ("i40e: implement virtual device interface")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Revert of a patch that instead of fixing a AQ error when trying
to reset BW limit introduced several regressions related to
creation and managing TC. Currently there are errors when creating
a TC on both PF and VF.
Error log:
[17428.783095] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14
[17428.783107] i40e 0000:3b:00.1: Failed configuring TC map 0 for VSI 391
[17428.783254] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14
[17428.783259] i40e 0000:3b:00.1: Unable to configure TC map 0 for VSI 391
This reverts commit 3d2504663c.
Fixes: 3d2504663c (i40e: Fix reset bw limit when DCB enabled with 1 TC)
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220223175347.1690692-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The 'if (ntu == rx_ring->count)' block in i40e_alloc_rx_buffers_zc()
was previously residing in the loop, but after introducing the
batched interface it is used only to wrap-around the NTU descriptor,
thus no more need to assign 'xdp'.
'cleaned_count' in i40e_clean_rx_irq_zc() was previously being
incremented in the loop, but after commit f12738b6ec
("i40e: remove unnecessary cleaned_count updates") it gets
assigned only once after it, so the initialization can be dropped.
Fixes: 6aab0bb0c5 ("i40e: Use the xsk batched rx allocation interface")
Fixes: f12738b6ec ("i40e: remove unnecessary cleaned_count updates")
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-02-09
We've added 126 non-merge commits during the last 16 day(s) which contain
a total of 201 files changed, 4049 insertions(+), 2215 deletions(-).
The main changes are:
1) Add custom BPF allocator for JITs that pack multiple programs into a huge
page to reduce iTLB pressure, from Song Liu.
2) Add __user tagging support in vmlinux BTF and utilize it from BPF
verifier when generating loads, from Yonghong Song.
3) Add per-socket fast path check guarding from cgroup/BPF overhead when
used by only some sockets, from Pavel Begunkov.
4) Continued libbpf deprecation work of APIs/features and removal of their
usage from samples, selftests, libbpf & bpftool, from Andrii Nakryiko
and various others.
5) Improve BPF instruction set documentation by adding byte swap
instructions and cleaning up load/store section, from Christoph Hellwig.
6) Switch BPF preload infra to light skeleton and remove libbpf dependency
from it, from Alexei Starovoitov.
7) Fix architecture-agnostic macros in libbpf for accessing syscall
arguments from BPF progs for non-x86 architectures,
from Ilya Leoshkevich.
8) Rework port members in struct bpf_sk_lookup and struct bpf_sock to be
of 16-bit field with anonymous zero padding, from Jakub Sitnicki.
9) Add new bpf_copy_from_user_task() helper to read memory from a different
task than current. Add ability to create sleepable BPF iterator progs,
from Kenny Yu.
10) Implement XSK batching for ice's zero-copy driver used by AF_XDP and
utilize TX batching API from XSK buffer pool, from Maciej Fijalkowski.
11) Generate temporary netns names for BPF selftests to avoid naming
collisions, from Hangbin Liu.
12) Implement bpf_core_types_are_compat() with limited recursion for
in-kernel usage, from Matteo Croce.
13) Simplify pahole version detection and finally enable CONFIG_DEBUG_INFO_DWARF5
to be selected with CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor.
14) Misc minor fixes to libbpf and selftests from various folks.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (126 commits)
selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup
bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide
libbpf: Fix compilation warning due to mismatched printf format
selftests/bpf: Test BPF_KPROBE_SYSCALL macro
libbpf: Add BPF_KPROBE_SYSCALL macro
libbpf: Fix accessing the first syscall argument on s390
libbpf: Fix accessing the first syscall argument on arm64
libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL
selftests/bpf: Skip test_bpf_syscall_macro's syscall_arg1 on arm64 and s390
libbpf: Fix accessing syscall arguments on riscv
libbpf: Fix riscv register names
libbpf: Fix accessing syscall arguments on powerpc
selftests/bpf: Use PT_REGS_SYSCALL_REGS in bpf_syscall_macro
libbpf: Add PT_REGS_SYSCALL_REGS macro
selftests/bpf: Fix an endianness issue in bpf_syscall_macro test
bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE
bpf: Fix leftover header->pages in sparc and powerpc code.
libbpf: Fix signedness bug in btf_dump_array_data()
selftests/bpf: Do not export subtest as standalone test
bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures
...
====================
Link: https://lore.kernel.org/r/20220209210050.8425-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In some cases, pages cannot be reused by i40e because the page is busy. Add
a counter for this event.
Busy page count is accessible via ethtool.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In some cases, pages can not be reused because they are not associated with
the correct NUMA zone. Knowing how often pages are waived helps users to
understand the interaction between the driver's memory usage and their
system.
Pass rx_stats through to i40e_can_reuse_rx_page to allow tracking when
pages are waived.
The page waive count is accessible via ethtool.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add a counter for new page allocations in the i40e RX path. This stat is
accessible with ethtool.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
rx page reuse was already being tracked by the i40e driver per RX ring.
Aggregate the counts and make them accessible via ethtool.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Page reuse was being tracked from two locations:
- i40e_reuse_rx_page (via 40e_clean_rx_irq), and
- i40e_alloc_mapped_page
Remove the double count and only count reuse from i40e_alloc_mapped_page
when the page is about to be reused.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tony Nguyen says:
====================
40GbE Intel Wired LAN Driver Updates 2022-02-03
This series contains updates to the i40e client header file and driver.
Mateusz disables HW TC offload by default.
Joe Damato removes a no longer used statistic.
Jakub Kicinski removes an unused enum from the client header file.
Jedrzej changes some admin queue commands to occur under atomic context
and adds new functions for admin queue MAC VLAN filters to avoid a
potential race that could occur due storing results in a structure that
could be overwritten by the next admin queue call.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There was a race condition in access to hw->aq.asq_last_status
while adding and deleting MAC/VLAN filters causing
incorrect error status to be printed as ERROR OK instead of
the correct error.
Change calls to i40e_aq_add_macvlan in i40e_aqc_add_filters
and i40e_aq_remove_macvlan in i40e_aqc_del_filters
to _v2 versions that return Admin Queue status on the stack
to avoid race conditions in access to hw->aq.asq_last_status.
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
ASQ send command functions are returning only i40e status codes
yet some calling functions also need Admin Queue status
that is stored in hw->aq.asq_last_status. Since hw object
is stored on a heap it introduces a possibility for
a race condition in access to hw if calling function is not
fast enough to read hw->aq.asq_last_status before next
send ASQ command is executed.
Add new _v2 version of i40e_aq_add_macvlan that is using
new _v2 versions of ASQ send command functions and returns
the Admin Queue status on the stack.
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
ASQ send command functions are returning only i40e status codes
yet some calling functions also need Admin Queue status
that is stored in hw->aq.asq_last_status. Since hw object
is stored on a heap it introduces a possibility for
a race condition in access to hw if calling function is not
fast enough to read hw->aq.asq_last_status before next
send ASQ command is executed.
Add new versions of send ASQ command functions that return
Admin Queue status on the stack to avoid race conditions
in access to hw->aq.asq_last_status.
Add new _v2 version of i40e_aq_remove_macvlan that is using
new _v2 versions of ASQ send command functions and returns
the Admin Queue status on the stack.
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Change functions:
- i40e_aq_add_macvlan
- i40e_aq_remove_macvlan
- i40e_aq_delete_element
- i40e_aq_add_vsi
- i40e_aq_update_vsi_params
to explicitly use i40e_asq_send_command_atomic(..., true)
instead of i40e_asq_send_command, as they use mutexes and do some
work in an atomic context.
Without this change setting vlan via netdev will fail with
call trace cased by bug "BUG: scheduling while atomic".
Signed-off-by: Witold Fijalkowski <witoldx.fijalkowski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
After commit 1a557afc4d ("i40e: Refactor receive routine"),
rx_stats.realloc_count is no longer being incremented, so remove it.
The debugfs string was left, but hardcoded to 0. This is intended to
prevent breaking any existing code / scripts that are parsing debugfs
for i40e.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
After loading driver hw-tc-offload is enabled by default.
Change the behaviour of driver to disable hw-tc-offload by default as
this is the expected state. Additionally since this impacts ntuple
feature state change the way of checking NETIF_F_HW_TC flag.
Signed-off-by: Norbert Zulinski <norbertx.zulinski@intel.com>
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>