1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

376 commits

Author SHA1 Message Date
David Arinzon
1cc0a47daa net: ena: Change initial rx_usec interval
For the purpose of obtaining better CPU utilization,
minimum rx moderation interval is set to 20 usec.

Signed-off-by: Osama Abboud <osamaabb@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240512134637.25299-6-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 14:42:04 -07:00
David Arinzon
97776caf6c net: ena: Changes around strscpy calls
strscpy copies as much of the string as possible,
meaning that the destination string will be truncated
in case of no space. As this is a non-critical error in
our case, adding a debug level print for indication.

This patch also removes a -1 which was added to ensure
enough space for NUL, but strscpy destination string is
guaranteed to be NUL-terminted, therefore, the -1 is
not needed.

Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240512134637.25299-5-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 14:42:04 -07:00
David Arinzon
b37b98a3a0 net: ena: Add validation for completion descriptors consistency
Validate that `first` flag is set only for the first
descriptor in multi-buffer packets.
In case of an invalid descriptor, a reset will occur.
A new reset reason for RX data corruption has been added.

Signed-off-by: Shahar Itzko <itzko@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240512134637.25299-4-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 14:42:04 -07:00
David Arinzon
48673ef444 net: ena: Reduce holes in ena_com structures
This patch makes two changes in order to fill holes and
reduce ther overall size of the structures ena_com_dev
and ena_com_rx_ctx.

Signed-off-by: Shahar Itzko <itzko@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240512134637.25299-3-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 14:42:04 -07:00
David Arinzon
62a261f6c1 net: ena: Add a counter for driver's reset failures
This patch adds a counter to the ena_adapter struct in
order to keep track of reset failures.
The counter is incremented every time either ena_restore_device()
or ena_destroy_device() fail.

Signed-off-by: Osama Abboud <osamaabb@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240512134637.25299-2-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13 14:42:04 -07:00
Eric Dumazet
1eb2cded45 net: annotate writes on dev->mtu from ndo_change_mtu()
Simon reported that ndo_change_mtu() methods were never
updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted
in commit 501a90c945 ("inet: protect against too small
mtu values.")

We read dev->mtu without holding RTNL in many places,
with READ_ONCE() annotations.

It is time to take care of ndo_change_mtu() methods
to use corresponding WRITE_ONCE()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Simon Horman <horms@kernel.org>
Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07 16:19:14 -07:00
David Arinzon
36a1ca01f0 net: ena: Set tx_info->xdpf value to NULL
The patch mentioned in the `Fixes` tag removed the explicit assignment
of tx_info->xdpf to NULL with the justification that there's no need
to set tx_info->xdpf to NULL and tx_info->num_of_bufs to 0 in case
of a mapping error. Both values won't be used once the mapping function
returns an error, and their values would be overridden by the next
transmitted packet.

While both values do indeed get overridden in the next transmission
call, the value of tx_info->xdpf is also used to check whether a TX
descriptor's transmission has been completed (i.e. a completion for it
was polled).

An example scenario:
1. Mapping failed, tx_info->xdpf wasn't set to NULL
2. A VF reset occurred leading to IO resource destruction and
   a call to ena_free_tx_bufs() function
3. Although the descriptor whose mapping failed was freed by the
   transmission function, it still passes the check
     if (!tx_info->skb)

   (skb and xdp_frame are in a union)
4. The xdp_frame associated with the descriptor is freed twice

This patch returns the assignment of NULL to tx_info->xdpf to make the
cleaning function knows that the descriptor is already freed.

Fixes: 504fd6a539 ("net: ena: fix DMA mapping function issues in XDP")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-11 11:21:02 +02:00
David Arinzon
bf02d9fe00 net: ena: Fix incorrect descriptor free behavior
ENA has two types of TX queues:
- queues which only process TX packets arriving from the network stack
- queues which only process TX packets forwarded to it by XDP_REDIRECT
  or XDP_TX instructions

The ena_free_tx_bufs() cycles through all descriptors in a TX queue
and unmaps + frees every descriptor that hasn't been acknowledged yet
by the device (uncompleted TX transactions).
The function assumes that the processed TX queue is necessarily from
the first category listed above and ends up using napi_consume_skb()
for descriptors belonging to an XDP specific queue.

This patch solves a bug in which, in case of a VF reset, the
descriptors aren't freed correctly, leading to crashes.

Fixes: 548c4940b9 ("net: ena: Implement XDP_TX action")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-11 11:21:02 +02:00
David Arinzon
f7e4171806 net: ena: Wrong missing IO completions check order
Missing IO completions check is called every second (HZ jiffies).
This commit fixes several issues with this check:

1. Duplicate queues check:
   Max of 4 queues are scanned on each check due to monitor budget.
   Once reaching the budget, this check exits under the assumption that
   the next check will continue to scan the remainder of the queues,
   but in practice, next check will first scan the last already scanned
   queue which is not necessary and may cause the full queue scan to
   last a couple of seconds longer.
   The fix is to start every check with the next queue to scan.
   For example, on 8 IO queues:
   Bug: [0,1,2,3], [3,4,5,6], [6,7]
   Fix: [0,1,2,3], [4,5,6,7]

2. Unbalanced queues check:
   In case the number of active IO queues is not a multiple of budget,
   there will be checks which don't utilize the full budget
   because the full scan exits when reaching the last queue id.
   The fix is to run every TX completion check with exact queue budget
   regardless of the queue id.
   For example, on 7 IO queues:
   Bug: [0,1,2,3], [4,5,6], [0,1,2,3]
   Fix: [0,1,2,3], [4,5,6,0], [1,2,3,4]
   The budget may be lowered in case the number of IO queues is less
   than the budget (4) to make sure there are no duplicate queues on
   the same check.
   For example, on 3 IO queues:
   Bug: [0,1,2,0], [1,2,0,1]
   Fix: [0,1,2], [0,1,2]

Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Amit Bernstein <amitbern@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-11 11:21:01 +02:00
David Arinzon
713a85195a net: ena: Fix potential sign extension issue
Small unsigned types are promoted to larger signed types in
the case of multiplication, the result of which may overflow.
In case the result of such a multiplication has its MSB
turned on, it will be sign extended with '1's.
This changes the multiplication result.

Code example of the phenomenon:
-------------------------------
u16 x, y;
size_t z1, z2;

x = y = 0xffff;
printk("x=%x y=%x\n",x,y);

z1 = x*y;
z2 = (size_t)x*y;

printk("z1=%lx z2=%lx\n", z1, z2);

Output:
-------
x=ffff y=ffff
z1=fffffffffffe0001 z2=fffe0001

The expected result of ffff*ffff is fffe0001, and without the
explicit casting to avoid the unwanted sign extension we got
fffffffffffe0001.

This commit adds an explicit casting to avoid the sign extension
issue.

Fixes: 689b2bdaaa ("net: ena: add functions for handling Low Latency Queues in ena_com")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-11 11:21:01 +02:00
Kamal Heib
78e886ba2b net: ena: Remove ena_select_queue
Avoid the following warnings by removing the ena_select_queue() function
and rely on the net core to do the queue selection, The issue happen
when an skb received from an interface with more queues than ena is
forwarded to the ena interface.

[ 1176.159959] eth0 selects TX queue 11, but real number of TX queues is 8
[ 1176.863976] eth0 selects TX queue 14, but real number of TX queues is 8
[ 1180.767877] eth0 selects TX queue 14, but real number of TX queues is 8
[ 1188.703742] eth0 selects TX queue 14, but real number of TX queues is 8

Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Kamal Heib <kheib@redhat.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-18 19:47:45 +00:00
Kamal Heib
e8d8acad5a net: ena: Remove unlikely() from IS_ERR() condition
IS_ERR() is already using unlikely internally.

Signed-off-by: Kamal Heib <kheib@redhat.com>
Acked-by: Arthur Kiyanovski <akiyano@amazon.com>
Link: https://lore.kernel.org/r/20240213161502.2297048-1-kheib@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-15 13:24:21 +01:00
Kamal Heib
723615a14b net: ena: Remove redundant assignment
There is no point in initializing an ndo to NULL, therefore the
assignment is redundant and can be removed.

Signed-off-by: Kamal Heib <kheib@redhat.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Acked-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14 10:57:25 +00:00
David Arinzon
50613650c3 net: ena: Reduce lines with longer column width boundary
This patch reduces some of the lines by removing newlines
where more variables or print strings can be pushed back
to the previous line while still adhering to the styling
guidelines.

Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01 13:22:12 +01:00
David Arinzon
4b4012da28 net: ena: handle ena_calc_io_queue_size() possible errors
Fail queue size calculation when the device returns maximum
TX/RX queue sizes that are smaller than the allowed minimum.

Signed-off-by: Osama Abboud <osamaabb@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01 13:22:12 +01:00
David Arinzon
716bdaecea net: ena: Change default print level for netif_ prints
The netif_* functions are used by the driver to log events into the
kernel ring (dmesg) similar to the netdev_* ones. Unlike the latter,
the netif_* function family allow the user to choose what events get
logged using ethtool:
	sudo ethtool -s [interface] msglvl [msg_type] on

By default the events which get logged are slow-path related and aren't
printed often (e.g. interface up related prints). This patch removes the
NETIF_MSG_TX_DONE type (called every TX completion polling) from the
defaults and adds NETIF_MSG_IFDOWN instead as it makes more sensible
defaults.

This patch also transforms ena_down() print from netif_info into
netif_dbg (same as the analogue print in ena_up()) as it suits it
better.

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01 13:22:12 +01:00
David Arinzon
70c9360390 net: ena: Relocate skb_tx_timestamp() to improve time stamping accuracy
Move skb_tx_timestamp() closer to the actual time the driver sends the
packets to the device.

Signed-off-by: Osama Abboud <osamaabb@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01 13:22:12 +01:00
David Arinzon
071271f39c net: ena: Add more information on TX timeouts
The function responsible for polling TX completions might not receive
the CPU resources it needs due to higher priority tasks running on the
requested core.

The driver might not be able to recognize such cases, but it can use its
state to suspect that they happened. If both conditions are met:

- napi hasn't been executed more than the TX completion timeout value
- napi is scheduled (meaning that we've received an interrupt)

Then it's more likely that the napi handler isn't scheduled because of
an overloaded CPU.
It was decided that for this case, the driver would wait twice as long
as the regular timeout before scheduling a reset.
The driver uses ENA_REGS_RESET_SUSPECTED_POLL_STARVATION reset reason to
indicate this case to the device.

This patch also adds more information to the ena_tx_timeout() callback.
This function is called by the kernel when it detects that a specific TX
queue has been closed for too long.

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01 13:22:12 +01:00
David Arinzon
ae82209293 net: ena: Change error print during ena_device_init()
The print was re-worded to a more informative one.

Signed-off-by: Shahar Itzko <itzko@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01 13:22:12 +01:00
David Arinzon
06a96fe6f9 net: ena: Remove CQ tail pointer update
The functionality was added to allow the drivers to create an
SQ and CQ of different sizes.
When the RX/TX SQ and CQ have the same size, such update isn't
necessary as the device can safely assume it doesn't override
unprocessed completions. However, if the SQ is larger than the CQ,
the device might "have" more completions it wants to update about
than there's room in the CQ.

There's no support for different SQ and CQ sizes, therefore,
removing the API and its usage.

'____cacheline_aligned' compiler attribute was added to
'struct ena_com_io_cq' to ensure that the removal of the
'cq_head_db_reg' field doesn't change the cache-line layout
of this struct.

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01 13:22:12 +01:00
David Arinzon
50d7a26605 net: ena: Enable DIM by default
Dynamic Interrupt Moderation (DIM) is a technique
designed to balance the need for timely data processing
with the desire to minimize CPU overhead.
Instead of generating an interrupt for every received
packet, the system can dynamically adjust the rate at
which interrupts are generated based on the incoming
traffic patterns.

Enabling DIM by default to improve the user experience.

DIM can be turned on/off through ethtool:
`ethtool -C <interface> adaptive-rx <on/off>`

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Osama Abboud <osamaabb@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01 13:22:12 +01:00
David Arinzon
243f36eef5 net: ena: Minor cosmetic changes
A few changes for better readability and style

1. Adding / Removing newlines
2. Removing an unnecessary and confusing comment
3. Using an existing variable rather than re-checking a field

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01 13:22:12 +01:00
David Arinzon
0def8a15da net: ena: Remove an unused field
Remove io_sq->header_addr field because it is no longer
in use.
LLQ was updated to support a bounce buffer so there is
no need in saving the header address of the sq.

Signed-off-by: Nati Koler <nkoler@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-01 13:22:11 +01:00
David Arinzon
782345d248 net: ena: Take xdp packets stats into account in ena_get_stats64()
Queue stats using ifconfig and ip are retrieved
via ena_get_stats64(). This function currently does not take
the xdp sent or dropped packets stats into account.

This commit adds the following xdp stats to ena_get_stats64():
tx bytes sent
tx packets sent
rx dropped packets

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20240101190855.18739-12-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 18:00:58 -08:00
David Arinzon
4f28e789be net: ena: Make queue stats code cleaner by removing the if block
Also shorten comment related to it.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20240101190855.18739-11-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 18:00:58 -08:00
David Arinzon
ea5c460023 net: ena: Always register RX queue info
The RX queue info contains information about the RX queue which might
be relevant to the kernel.

To avoid configuring this queue for different scenarios, this patch
moves the RX queue configuration to ena_up()/ena_down() function and
makes it configured every interface state toggle.

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20240101190855.18739-10-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 18:00:58 -08:00
David Arinzon
2b02e332c1 net: ena: Add more debug prints to XDP related function
Used for better readability and debugging of XDP
flow.

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20240101190855.18739-9-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 18:00:57 -08:00
David Arinzon
b626fd9627 net: ena: Refactor napi functions
This patch focuses on changes to the XDP part of the napi
polling routine.

1. Update the `napi_comp` stat only when napi is actually
   complete.
2. Simplify the code by using a function pointer to the right
   napi routine (XDP vs non-XDP path)
3. Remove unnecessary local variables.
4. Adjust a debug print to show the processed XDP frame index
   rather than the pointer.

Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20240101190855.18739-8-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 18:00:57 -08:00
David Arinzon
436c793585 net: ena: Don't check if XDP program is loaded in ena_xdp_execute()
This check is already done in ena_clean_rx_irq() which indirectly
calls it.
This function is called in napi context and the driver doesn't
allow to change the XDP program without performing destruction and
reinitialization of napi context (part of ena_down/ena_up sequence).

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20240101190855.18739-7-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 18:00:57 -08:00
David Arinzon
911a8c9601 net: ena: Use tx_ring instead of xdp_ring for XDP channel TX
When an XDP program is loaded the existing channels in the driver split
into two halves:
- The first half of the channels contain RX and TX rings, these queues
  are used for receiving traffic and sending packets originating from
  kernel.
- The second half of the channels contain only a TX ring. These queues
  are used for sending packets that were redirected using XDP_TX
  or XDP_REDIRECT.

Referring to the queues in the second half of the channels as "xdp_ring"
can be confusing and may give the impression that ENA has the capability
to generate an additional special queue.

This patch ensures that the xdp_ring field is exclusively used to
describe the XDP TX queue that a specific RX queue needs to utilize when
forwarding packets with XDP TX and XDP REDIRECT, preserving the
integrity of the xdp_ring field in ena_ring.

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20240101190855.18739-6-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 18:00:57 -08:00
David Arinzon
23ec974980 net: ena: Introduce total_tx_size field in ena_tx_buffer struct
To avoid de-referencing skb or xdp_frame when we poll for TX completion
(where they might not be in the cache), save the total TX packet size in
the ena_tx_buffer object representing the packet.

Also the 'print_once' field's type was changed from u32 to u8 to allow
adding the 'total_tx_size' without changing the total size of the
struct.

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20240101190855.18739-5-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 18:00:57 -08:00
David Arinzon
009b387659 net: ena: Put orthogonal fields in ena_tx_buffer in a union
The skb and xdpf pointers cannot be set together in the driver
(each TX descriptor can send either an SKB or an XDP frame), and so it
makes more sense to put them both in a union.

This decreases the overall size of the ena_tx_buffer struct which
improves cache locality.

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20240101190855.18739-4-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 18:00:57 -08:00
David Arinzon
39a044f4dc net: ena: Pass ena_adapter instead of net_device to ena_xmit_common()
This change will enable the ability to use ena_xmit_common()
in functions that don't have a net_device pointer.
While it can be retrieved by dereferencing
ena_adapter (adapter->netdev), there's no reason to do it in
fast path code where this pointer is only needed for
debug prints.

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20240101190855.18739-3-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 18:00:57 -08:00
David Arinzon
d000574d02 net: ena: Move XDP code to its new files
XDP system has a very large footprint in the driver's overall code.
makes the whole driver's code much harder to read.

Moving XDP code to dedicated files.

This patch doesn't make any changes to the code itself and only
cut-pastes the code into ena_xdp.c and ena_xdp.h files so the change
is purely cosmetic.

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20240101190855.18739-2-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 18:00:56 -08:00
Jakub Kicinski
8f674972d6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/intel/iavf/iavf_ethtool.c
  3a0b5a2929 ("iavf: Introduce new state machines for flow director")
  95260816b4 ("iavf: use iavf_schedule_aq_request() helper")
https://lore.kernel.org/all/84e12519-04dc-bd80-bc34-8cf50d7898ce@intel.com/

drivers/net/ethernet/broadcom/bnxt/bnxt.c
  c13e268c07 ("bnxt_en: Fix HWTSTAMP_FILTER_ALL packet timestamp logic")
  c2f8063309 ("bnxt_en: Refactor RX VLAN acceleration logic.")
  a7445d6980 ("bnxt_en: Add support for new RX and TPA_START completion types for P7")
  1c7fd6ee2f ("bnxt_en: Rename some macros for the P5 chips")
https://lore.kernel.org/all/20231211110022.27926ad9@canb.auug.org.au/

drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c
  bd6781c18c ("bnxt_en: Fix wrong return value check in bnxt_close_nic()")
  84793a4995 ("bnxt_en: Skip nic close/open when configuring tstamp filters")
https://lore.kernel.org/all/20231214113041.3a0c003c@canb.auug.org.au/

drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
  3d7a3f2612 ("net/mlx5: Nack sync reset request when HotPlug is enabled")
  cecf44ea1a ("net/mlx5: Allow sync reset flow when BF MGT interface device is present")
https://lore.kernel.org/all/20231211110328.76c925af@canb.auug.org.au/

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-14 17:14:41 -08:00
Ahmed Zaki
fb6e30a725 net: ethtool: pass a pointer to parameters to get/set_rxfh ethtool ops
The get/set_rxfh ethtool ops currently takes the rxfh (RSS) parameters
as direct function arguments. This will force us to change the API (and
all drivers' functions) every time some new parameters are added.

This is part 1/2 of the fix, as suggested in [1]:

- First simplify the code by always providing a pointer to all params
   (indir, key and func); the fact that some of them may be NULL seems
   like a weird historic thing or a premature optimization.
   It will simplify the drivers if all pointers are always present.

 - Then make the functions take a dev pointer, and a pointer to a
   single struct wrapping all arguments. The set_* should also take
   an extack.

Link: https://lore.kernel.org/netdev/20231121152906.2dd5f487@kernel.org/ [1]
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Suggested-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://lore.kernel.org/r/20231213003321.605376-2-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13 22:07:16 -08:00
David Arinzon
4ab138ca0a net: ena: Fix XDP redirection error
When sending TX packets, the meta descriptor can be all zeroes
as no meta information is required (as in XDP).

This patch removes the validity check, as when
`disable_meta_caching` is enabled, such TX packets will be
dropped otherwise.

Fixes: 0e3a3f6dac ("net: ena: support new LLQ acceleration mode")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20231211062801.27891-5-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12 16:07:31 -08:00
David Arinzon
d760117060 net: ena: Fix DMA syncing in XDP path when SWIOTLB is on
This patch fixes two issues:

Issue 1
-------
Description
```````````
Current code does not call dma_sync_single_for_cpu() to sync data from
the device side memory to the CPU side memory before the XDP code path
uses the CPU side data.
This causes the XDP code path to read the unset garbage data in the CPU
side memory, resulting in incorrect handling of the packet by XDP.

Solution
````````
1. Add a call to dma_sync_single_for_cpu() before the XDP code starts to
   use the data in the CPU side memory.
2. The XDP code verdict can be XDP_PASS, in which case there is a
   fallback to the non-XDP code, which also calls
   dma_sync_single_for_cpu().
   To avoid calling dma_sync_single_for_cpu() twice:
2.1. Put the dma_sync_single_for_cpu() in the code in such a place where
     it happens before XDP and non-XDP code.
2.2. Remove the calls to dma_sync_single_for_cpu() in the non-XDP code
     for the first buffer only (rx_copybreak and non-rx_copybreak
     cases), since the new call that was added covers these cases.
     The call to dma_sync_single_for_cpu() for the second buffer and on
     stays because only the first buffer is handled by the newly added
     dma_sync_single_for_cpu(). And there is no need for special
     handling of the second buffer and on for the XDP path since
     currently the driver supports only single buffer packets.

Issue 2
-------
Description
```````````
In case the XDP code forwarded the packet (ENA_XDP_FORWARDED),
ena_unmap_rx_buff_attrs() is called with attrs set to 0.
This means that before unmapping the buffer, the internal function
dma_unmap_page_attrs() will also call dma_sync_single_for_cpu() on
the whole buffer (not only on the data part of it).
This sync is both wasteful (since a sync was already explicitly
called before) and also causes a bug, which will be explained
using the below diagram.

The following diagram shows the flow of events causing the bug.
The order of events is (1)-(4) as shown in the diagram.

CPU side memory area

     (3)convert_to_xdp_frame() initializes the
        headroom with xdpf metadata
                      ||
                      \/
          ___________________________________
         |                                   |
 0       |                                   V                       4K
 ---------------------------------------------------------------------
 | xdpf->data      | other xdpf       |   < data >   | tailroom ||...|
 |                 | fields           |              | GARBAGE  ||   |
 ---------------------------------------------------------------------

                   /\                        /\
                   ||                        ||
   (4)ena_unmap_rx_buff_attrs() calls     (2)dma_sync_single_for_cpu()
      dma_sync_single_for_cpu() on the       copies data from device
      whole buffer page, overwriting         side to CPU side memory
      the xdpf->data with GARBAGE.           ||
 0                                                                   4K
 ---------------------------------------------------------------------
 | headroom                           |   < data >   | tailroom ||...|
 | GARBAGE                            |              | GARBAGE  ||   |
 ---------------------------------------------------------------------

Device side memory area                      /\
                                             ||
                               (1) device writes RX packet data

After the call to ena_unmap_rx_buff_attrs() in (4), the xdpf->data
becomes corrupted, and so when it is later accessed in
ena_clean_xdp_irq()->xdp_return_frame(), it causes a page fault,
crashing the kernel.

Solution
````````
Explicitly tell ena_unmap_rx_buff_attrs() not to call
dma_sync_single_for_cpu() by passing it the ENA_DMA_ATTR_SKIP_CPU_SYNC
flag.

Fixes: f7d625adeb ("net: ena: Add dynamic recycling mechanism for rx buffers")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20231211062801.27891-4-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12 16:07:31 -08:00
David Arinzon
505b1a88d3 net: ena: Fix xdp drops handling due to multibuf packets
Current xdp code drops packets larger than ENA_XDP_MAX_MTU.
This is an incorrect condition since the problem is not the
size of the packet, rather the number of buffers it contains.

This commit:

1. Identifies and drops XDP multi-buffer packets at the
   beginning of the function.
2. Increases the xdp drop statistic when this drop occurs.
3. Adds a one-time print that such drops are happening to
   give better indication to the user.

Fixes: 838c93dc54 ("net: ena: implement XDP drop support")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20231211062801.27891-3-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12 16:07:31 -08:00
David Arinzon
41db6f99b5 net: ena: Destroy correct number of xdp queues upon failure
The ena_setup_and_create_all_xdp_queues() function freed all the
resources upon failure, after creating only xdp_num_queues queues,
instead of freeing just the created ones.

In this patch, the only resources that are freed, are the ones
allocated right before the failure occurs.

Fixes: 548c4940b9 ("net: ena: Implement XDP_TX action")
Signed-off-by: Shahar Itzko <itzko@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20231211062801.27891-2-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12 16:07:31 -08:00
justinstitt@google.com
378bc9a40e net: ena: replace deprecated strncpy with strscpy
`strncpy` is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

A suitable replacement is `strscpy` [2] due to the fact that it
guarantees NUL-termination on the destination buffer without
unnecessarily NUL-padding.

host_info allocation is done in ena_com_allocate_host_info() via
dma_alloc_coherent() and is not zero initialized by alloc_etherdev_mq().

However zero initialization of the destination doesn't matter in this case,
because strscpy() guarantees a NULL termination.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2]
Link: https://github.com/KSPP/linux/issues/90
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt <justinstitt@google.com>
Acked-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-10 19:08:58 +00:00
justinstitt@google.com
e403cffff1 net: Convert some ethtool_sprintf() to ethtool_puts()
This patch converts some basic cases of ethtool_sprintf() to
ethtool_puts().

The conversions are used in cases where ethtool_sprintf() was being used
with just two arguments:
|       ethtool_sprintf(&data, buffer[i].name);
or when it's used with format string: "%s"
|       ethtool_sprintf(&data, "%s", buffer[i].name);
which both now become:
|       ethtool_puts(&data, buffer[i].name);

Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-08 10:56:25 +00:00
Sebastian Andrzej Siewior
7f04bd109d net: Tree wide: Replace xdp_do_flush_map() with xdp_do_flush().
xdp_do_flush_map() is deprecated and new code should use xdp_do_flush()
instead.

Replace xdp_do_flush_map() with xdp_do_flush().

Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Clark Wang <xiaoning.wang@nxp.com>
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Cc: David Arinzon <darinzon@amazon.com>
Cc: Edward Cree <ecree.xilinx@gmail.com>
Cc: Felix Fietkau <nbd@nbd.name>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Jassi Brar <jaswinder.singh@linaro.org>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: John Crispin <john@phrozen.org>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Lorenzo Bianconi <lorenzo@kernel.org>
Cc: Louis Peens <louis.peens@corigine.com>
Cc: Marcin Wojtas <mw@semihalf.com>
Cc: Mark Lee <Mark-MC.Lee@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: NXP Linux Team <linux-imx@nxp.com>
Cc: Noam Dagan <ndagan@amazon.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Saeed Bishara <saeedb@amazon.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: Shay Agroskin <shayagr@amazon.com>
Cc: Shenwei Wang <shenwei.wang@nxp.com>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Cc: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Arthur Kiyanovski <akiyano@amazon.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230908143215.869913-2-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-03 07:34:51 -07:00
Sebastian Andrzej Siewior
6f411fb5ca net: ena: Flush XDP packets on error.
xdp_do_flush() should be invoked before leaving the NAPI poll function
after a XDP-redirect. This is not the case if the driver leaves via
the error path (after having a redirect in one of its previous
iterations).

Invoke xdp_do_flush() also in the error path.

Cc: Arthur Kiyanovski <akiyano@amazon.com>
Cc: David Arinzon <darinzon@amazon.com>
Cc: Noam Dagan <ndagan@amazon.com>
Cc: Saeed Bishara <saeedb@amazon.com>
Cc: Shay Agroskin <shayagr@amazon.com>
Fixes: a318c70ad1 ("net: ena: introduce XDP redirect implementation")
Acked-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21 09:01:41 +02:00
Jialin Zhang
a5e5b2cd47 net: ena: Use pci_dev_id() to simplify the code
PCI core API pci_dev_id() can be used to get the BDF number for a pci
device. We don't need to compose it manually. Use pci_dev_id() to
simplify the code a little bit.

Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shay Agroskin <shayagr@amazon.com>
Link: https://lore.kernel.org/r/20230815024248.3519068-1-zhangjialin11@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-17 19:13:09 -07:00
Jakub Kicinski
92272ec410 eth: add missing xdp.h includes in drivers
Handful of drivers currently expect to get xdp.h by virtue
of including netdevice.h. This will soon no longer be the case
so add explicit includes.

Reviewed-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230803010230.1755386-2-kuba@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-03 08:38:07 -07:00
Krister Johansen
1e9cb763e9 net: ena: fix shift-out-of-bounds in exponential backoff
The ENA adapters on our instances occasionally reset.  Once recently
logged a UBSAN failure to console in the process:

  UBSAN: shift-out-of-bounds in build/linux/drivers/net/ethernet/amazon/ena/ena_com.c:540:13
  shift exponent 32 is too large for 32-bit type 'unsigned int'
  CPU: 28 PID: 70012 Comm: kworker/u72:2 Kdump: loaded not tainted 5.15.117
  Hardware name: Amazon EC2 c5d.9xlarge/, BIOS 1.0 10/16/2017
  Workqueue: ena ena_fw_reset_device [ena]
  Call Trace:
  <TASK>
  dump_stack_lvl+0x4a/0x63
  dump_stack+0x10/0x16
  ubsan_epilogue+0x9/0x36
  __ubsan_handle_shift_out_of_bounds.cold+0x61/0x10e
  ? __const_udelay+0x43/0x50
  ena_delay_exponential_backoff_us.cold+0x16/0x1e [ena]
  wait_for_reset_state+0x54/0xa0 [ena]
  ena_com_dev_reset+0xc8/0x110 [ena]
  ena_down+0x3fe/0x480 [ena]
  ena_destroy_device+0xeb/0xf0 [ena]
  ena_fw_reset_device+0x30/0x50 [ena]
  process_one_work+0x22b/0x3d0
  worker_thread+0x4d/0x3f0
  ? process_one_work+0x3d0/0x3d0
  kthread+0x12a/0x150
  ? set_kthread_struct+0x50/0x50
  ret_from_fork+0x22/0x30
  </TASK>

Apparently, the reset delays are getting so large they can trigger a
UBSAN panic.

Looking at the code, the current timeout is capped at 5000us.  Using a
base value of 100us, the current code will overflow after (1<<29).  Even
at values before 32, this function wraps around, perhaps
unintentionally.

Cap the value of the exponent used for this backoff at (1<<16) which is
larger than currently necessary, but large enough to support bigger
values in the future.

Cc: stable@vger.kernel.org
Fixes: 4bb7f4cf60 ("net: ena: reduce driver load time")
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shay Agroskin <shayagr@amazon.com>
Link: https://lore.kernel.org/r/20230711013621.GE1926@templeofstupid.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-12 15:57:57 -07:00
David Arinzon
f7d625adeb net: ena: Add dynamic recycling mechanism for rx buffers
The current implementation allocates page-sized rx buffers.
As traffic may consist of different types and sizes of packets,
in various cases, buffers are not fully used.

This change (Dynamic RX Buffers - DRB) uses part of the allocated rx
page needed for the incoming packet, and returns the rest of the
unused page to be used again as an rx buffer for future packets.
A threshold of 2K for unused space has been set in order to declare
whether the remainder of the page can be reused again as an rx buffer.

As a page may be reused, dma_sync_single_for_cpu() is added in order
to sync the memory to the CPU side after it was owned by the HW.
In addition, when the rx page can no longer be reused, it is being
unmapped using dma_page_unmap(), which implicitly syncs and then
unmaps the entire page. In case the kernel still handles the skbs
pointing to the previous buffers from that rx page, it may access
garbage pointers, caused by the implicit sync overwriting them.
The implicit dma sync is removed by replacing dma_page_unmap() with
dma_unmap_page_attrs() with DMA_ATTR_SKIP_CPU_SYNC flag.

The functionality is disabled for XDP traffic to avoid handling
several descriptors per packet.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20230612121448.28829-1-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-15 22:45:47 -07:00
Simon Horman
c5370374bb net: ena: removed unused tx_bytes variable
clang 16.0.0 with W=1 reports:

drivers/net/ethernet/amazon/ena/ena_netdev.c:1901:6: error: variable 'tx_bytes' set but not used [-Werror,-Wunused-but-set-variable]
        u32 tx_bytes = 0;

The variable is not used so remove it.

Signed-off-by: Simon Horman <horms@kernel.org>
Acked-by: Shay Agroskin <shayagr@amazon.com>
Link: https://lore.kernel.org/r/20230328151958.410687-1-horms@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-29 21:39:35 -07:00
Shay Agroskin
060cdac218 net: ena: Advertise TX push support
LLQ is auto enabled by the device and disabling it isn't supported on
new ENA generations while on old ones causes sub-optimal performance.

This patch adds advertisement of push-mode when LLQ mode is used, but
rejects an attempt to modify it.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-27 19:49:59 -07:00