Several new features here:
- virtio-net is finally supported in vduse.
- Virtio (balloon and mem) interaction with suspend is improved
- vhost-scsi now handles signals better/faster.
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmZN570PHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRp2JUH/1K3fZOHymop6Y5Z3USFS7YdlF+dniedY/vg
TKyWERkXOlxq1d9DVxC0mN7tk72DweuWI0YJjLXofrEW1VuW29ecSbyFXxpeWJls
b7ErffxDAFRas5jkMCngD8TuFnbEegU0mGP5kbiHpEndBydQ2hH99Gg0x7swW+cE
xsvU5zonCCLwLGIP2DrVrn9qGOHtV6o8eZfVKDVXfvicn3lFBkUSxlwEYsO9RMup
aKxV4FT2Pb1yBicwBK4TH1oeEXqEGy1YLEn+kAHRbgoC/5L0/LaiqrkzwzwwOIPj
uPGkacf8CIbX0qZo5EzD8kvfcYL1xhU3eT9WBmpp2ZwD+4bINd4=
=nax1
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"Several new features here:
- virtio-net is finally supported in vduse
- virtio (balloon and mem) interaction with suspend is improved
- vhost-scsi now handles signals better/faster
And fixes, cleanups all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits)
virtio-pci: Check if is_avq is NULL
virtio: delete vq in vp_find_vqs_msix() when request_irq() fails
MAINTAINERS: add Eugenio Pérez as reviewer
vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API
vp_vdpa: don't allocate unused msix vectors
sound: virtio: drop owner assignment
fuse: virtio: drop owner assignment
scsi: virtio: drop owner assignment
rpmsg: virtio: drop owner assignment
nvdimm: virtio_pmem: drop owner assignment
wifi: mac80211_hwsim: drop owner assignment
vsock/virtio: drop owner assignment
net: 9p: virtio: drop owner assignment
net: virtio: drop owner assignment
net: caif: virtio: drop owner assignment
misc: nsm: drop owner assignment
iommu: virtio: drop owner assignment
drm/virtio: drop owner assignment
gpio: virtio: drop owner assignment
firmware: arm_scmi: virtio: drop owner assignment
...
virtio core already sets the .owner, so driver does not need to.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Message-Id: <20240331-module-owner-virtio-v2-17-98f04bfaf46a@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We call the build_skb() actually without copying data.
The comment is misleading. So remove it.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20240511031404.30903-5-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now, the premapped mode can be enabled unconditionally.
So we can remove the failover code for merge and small mode.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20240511031404.30903-4-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The virtio-net big mode did not enable premapped mode,
so we did not need to check the unmap. And the subsequent
commit will remove the failover code for failing enable
premapped for merge and small mode. So we need to remove
the checking do_dma code in the big mode path.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20240511031404.30903-3-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The rtnl lock is no longer needed to protect the control buffer and
command VQ.
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Heng Qi <hengqi@linux.alibaba.com>
Tested-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Once the RTNL locking around the control buffer is removed there can be
contention on the per queue RX interrupt coalescing data. Use a mutex
per queue. A mutex is required because virtnet_send_command can sleep.
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Heng Qi <hengqi@linux.alibaba.com>
Tested-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Since we no longer have to hold the RTNL lock here just do updates for
the specified queue.
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Heng Qi <hengqi@linux.alibaba.com>
Tested-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The command VQ will no longer be protected by the RTNL lock. Use a
mutex to protect the control buffer header and the VQ.
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Heng Qi <hengqi@linux.alibaba.com>
Tested-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Allocate memory for the data when it's used. Ideally the struct could
be on the stack, but we can't DMA stack memory. With this change only
the header and status memory are shared between commands, which will
allow using a tighter lock than RTNL.
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Heng Qi <hengqi@linux.alibaba.com>
Tested-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stop storing RSS setting in the control buffer. This is prep work for
removing RTNL lock protection of the control buffer.
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Heng Qi <hengqi@linux.alibaba.com>
Tested-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
To enhance functionality, we now support reporting statistics through
the netdev-generic netlink (netdev-genl) queue stats interface. However,
this does not extend to all statistics, so a new field, qstat_offset,
has been introduced. This field determines which statistics should be
reported via netdev-genl queue stats.
Given that queue stats are retrieved individually per queue, it's
necessary for the virtnet_get_hw_stats() function to be capable of
fetching statistics for a specific queue.
As the document https://docs.kernel.org/next/networking/statistics.html#notes-for-driver-authors
We should not duplicate the stats which get reported via the netlink API in
ethtool. If the stats are for queue stat, that will not be reported by
ethtool -S.
python3 ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml
--dump qstats-get --json '{"scope": "queue"}'
[{'ifindex': 2,
'queue-id': 0,
'queue-type': 'rx',
'rx-bytes': 157844011,
'rx-csum-bad': 0,
'rx-csum-none': 0,
'rx-csum-unnecessary': 2195386,
'rx-hw-drop-overruns': 0,
'rx-hw-drop-ratelimits': 0,
'rx-hw-drops': 12964,
'rx-packets': 598929},
{'ifindex': 2,
'queue-id': 0,
'queue-type': 'tx',
'tx-bytes': 1938511,
'tx-csum-none': 0,
'tx-hw-drop-errors': 0,
'tx-hw-drop-ratelimits': 0,
'tx-hw-drops': 0,
'tx-needs-csum': 61263,
'tx-packets': 15515}]
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In the last commit, we introduced some helpers for device stats.
And the drivers stats are realized by the open code.
This commit make the helpers to support driver stats.
Then we can have the unify helper for device and driver stats.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The key size of ethtool -S is controlled by this macro.
ETH_GSTRING_LEN 32
That includes the \0 at the end. So the max length of the key name must
is 31. But the length of the prefix "rx_queue_0_" is 11. If the queue
num is larger than 10, the length of the prefix is 12. So the
key name max is 19. That is too short. We will introduce some keys
such as "gso_packets_coalesced". So we should change the prefix
to "rx0_".
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
As the spec 42f3899898
Based on the description provided in the above specification, we have
enabled the virtio-net driver to support acquiring some response
information from the device via the CVQ (Control Virtqueue).
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The RSS hash report is a feature that's part of the virtio specification.
Currently, virtio backends like qemu, vdpa (mlx5), and potentially vhost
(still a work in progress as per [1]) support this feature. While the
capability to obtain the RSS hash has been enabled in the normal path,
it's currently missing in the XDP path. Therefore, we are introducing
XDP hints through kfuncs to allow XDP programs to access the RSS hash.
1.
https://lore.kernel.org/all/20231015141644.260646-1-akihiko.odaki@daynix.com/#r
Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20240417071822.27831-1-liangchen.linux@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
There is a bug when setting the RSS options in virtio_net that can break
the whole machine, getting the kernel into an infinite loop.
Running the following command in any QEMU virtual machine with virtionet
will reproduce this problem:
# ethtool -X eth0 hfunc toeplitz
This is how the problem happens:
1) ethtool_set_rxfh() calls virtnet_set_rxfh()
2) virtnet_set_rxfh() calls virtnet_commit_rss_command()
3) virtnet_commit_rss_command() populates 4 entries for the rss
scatter-gather
4) Since the command above does not have a key, then the last
scatter-gatter entry will be zeroed, since rss_key_size == 0.
sg_buf_size = vi->rss_key_size;
5) This buffer is passed to qemu, but qemu is not happy with a buffer
with zero length, and do the following in virtqueue_map_desc() (QEMU
function):
if (!sz) {
virtio_error(vdev, "virtio: zero sized buffers are not allowed");
6) virtio_error() (also QEMU function) set the device as broken
vdev->broken = true;
7) Qemu bails out, and do not repond this crazy kernel.
8) The kernel is waiting for the response to come back (function
virtnet_send_command())
9) The kernel is waiting doing the following :
while (!virtqueue_get_buf(vi->cvq, &tmp) &&
!virtqueue_is_broken(vi->cvq))
cpu_relax();
10) None of the following functions above is true, thus, the kernel
loops here forever. Keeping in mind that virtqueue_is_broken() does
not look at the qemu `vdev->broken`, so, it never realizes that the
vitio is broken at QEMU side.
Fix it by not sending RSS commands if the feature is not available in
the device.
Fixes: c7114b1249 ("drivers/net/virtio_net: Added basic RSS support.")
Cc: stable@vger.kernel.org
Cc: qemu-devel@nongnu.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since free_old_xmit_skbs not only deals with skb, but also xdp frame and
subsequent added xsk, so change the name of this function to
free_old_xmit.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20240229072044.77388-19-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
There are two completely similar and independent implementations. This
is inconvenient for the subsequent addition of new types. So extract a
function from this piece of code and call this function uniformly to
recover old xmit ptr.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20240229072044.77388-18-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Adding cond_resched() to the command waiting loop for a better
co-operation with the scheduler. This allows to give CPU a breath to
run other task(workqueue) instead of busy looping when preemption is
not allowed on a device whose CVQ might be slow.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230720083839.481487-3-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
This patch convert rx mode setting to be done in a workqueue, this is
a must for allow to sleep when waiting for the cvq command to
response since current code is executed under addr spin lock.
Note that we need to disable and flush the workqueue during freeze,
this means the rx mode setting is lost after resuming. This is not the
bug of this patch as we never try to restore rx mode setting during
resume.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230720083839.481487-2-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Fix the warnings when building virtio_net driver.
"
drivers/net/virtio_net.c: In function ‘init_vqs’:
drivers/net/virtio_net.c:4551:48: warning: ‘%d’ directive writing between 1 and 11 bytes into a region of size 10 [-Wformat-overflow=]
4551 | sprintf(vi->rq[i].name, "input.%d", i);
| ^~
In function ‘virtnet_find_vqs’,
inlined from ‘init_vqs’ at drivers/net/virtio_net.c:4645:8:
drivers/net/virtio_net.c:4551:41: note: directive argument in the range [-2147483643, 65534]
4551 | sprintf(vi->rq[i].name, "input.%d", i);
| ^~~~~~~~~~
drivers/net/virtio_net.c:4551:17: note: ‘sprintf’ output between 8 and 18 bytes into a destination of size 16
4551 | sprintf(vi->rq[i].name, "input.%d", i);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/virtio_net.c: In function ‘init_vqs’:
drivers/net/virtio_net.c:4552:49: warning: ‘%d’ directive writing between 1 and 11 bytes into a region of size 9 [-Wformat-overflow=]
4552 | sprintf(vi->sq[i].name, "output.%d", i);
| ^~
In function ‘virtnet_find_vqs’,
inlined from ‘init_vqs’ at drivers/net/virtio_net.c:4645:8:
drivers/net/virtio_net.c:4552:41: note: directive argument in the range [-2147483643, 65534]
4552 | sprintf(vi->sq[i].name, "output.%d", i);
| ^~~~~~~~~~~
drivers/net/virtio_net.c:4552:17: note: ‘sprintf’ output between 9 and 19 bytes into a destination of size 16
4552 | sprintf(vi->sq[i].name, "output.%d", i);
"
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://lore.kernel.org/r/20240104020902.2753599-1-yanjun.zhu@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For rq, we have three cases getting buffers from virtio core:
1. virtqueue_get_buf{,_ctx}
2. virtqueue_detach_unused_buf
3. callback for virtqueue_resize
But in commit 295525e29a5b("virtio_net: merge dma operations when
filling mergeable buffers"), I missed the dma unmap for the #3 case.
That will leak some memory, because I did not release the pages referred
by the unused buffers.
If we do such script, we will make the system OOM.
while true
do
ethtool -G ens4 rx 128
ethtool -G ens4 rx 256
free -m
done
Fixes: 295525e29a ("virtio_net: merge dma operations when filling mergeable buffers")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20231226094333.47740-1-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The get/set_rxfh ethtool ops currently takes the rxfh (RSS) parameters
as direct function arguments. This will force us to change the API (and
all drivers' functions) every time some new parameters are added.
This is part 1/2 of the fix, as suggested in [1]:
- First simplify the code by always providing a pointer to all params
(indir, key and func); the fact that some of them may be NULL seems
like a weird historic thing or a premature optimization.
It will simplify the drivers if all pointers are always present.
- Then make the functions take a dev pointer, and a pointer to a
single struct wrapping all arguments. The set_* should also take
an extack.
Link: https://lore.kernel.org/netdev/20231121152906.2dd5f487@kernel.org/ [1]
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Suggested-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://lore.kernel.org/r/20231213003321.605376-2-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
By comparing the traffic information in the complete napi processes,
let the virtio-net driver automatically adjust the coalescing
moderation parameters of each receive queue.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extract commands to set virtqueue coalescing parameters for reuse
by ethtool -Q, vq resize and netdim.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch separates the rx and tx global coalescing moderation
commands to support netdim switches in subsequent patches.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rx netdim needs to count the traffic during a complete napi process,
and start updating and comparing samples to make decisions after
the napi ends. Let virtqueue_napi_complete() return true if napi is done,
otherwise vice versa.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 295525e29a ("virtio_net: merge dma operations when filling
mergeable buffers") unmaps the buffer with DMA_ATTR_SKIP_CPU_SYNC when
the dma->ref is zero. We do that with DMA_ATTR_SKIP_CPU_SYNC, because we
do not want to do the sync for the entire page_frag. But that misses the
sync for the current area.
This patch does cpu sync regardless of whether the ref is zero or not.
Fixes: 295525e29a ("virtio_net: merge dma operations when filling mergeable buffers")
Reported-by: Michael Roth <michael.roth@amd.com>
Closes: http://lore.kernel.org/all/20230926130451.axgodaa6tvwqs3ut@amd.com
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update a comment because virtio-net now supports both
VIRTIO_NET_F_NOTF_COAL and VIRTIO_NET_F_VQ_NOTF_COAL.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to the definition of virtqueue coalescing spec[1]:
Upon disabling and re-enabling a transmit virtqueue, the device MUST set
the coalescing parameters of the virtqueue to those configured through the
VIRTIO_NET_CTRL_NOTF_COAL_TX_SET command, or, if the driver did not set
any TX coalescing parameters, to 0.
Upon disabling and re-enabling a receive virtqueue, the device MUST set
the coalescing parameters of the virtqueue to those configured through the
VIRTIO_NET_CTRL_NOTF_COAL_RX_SET command, or, if the driver did not set
any RX coalescing parameters, to 0.
We need to add this setting for vq resize (ethtool -G) where vq_reset happens.
[1] https://lists.oasis-open.org/archives/virtio-dev/202303/msg00415.html
Fixes: 394bd87764 ("virtio_net: support per queue interrupt coalesce command")
Cc: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the user sets a non-zero coalescing parameter to 0 for a specific
virtqueue, it does not work as expected, so let's fix this.
Fixes: 394bd87764 ("virtio_net: support per queue interrupt coalesce command")
Reported-by: Xiaoming Zhao <zxm377917@alibaba-inc.com>
Cc: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using .set_coalesce interface to set all queue coalescing
parameters, we need to update both per-queue and global save values.
Fixes: 394bd87764 ("virtio_net: support per queue interrupt coalesce command")
Cc: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since virtio-net allows switching napi_tx for per txq, we have to
get the specific txq's result now.
Fixes: 394bd87764 ("virtio_net: support per queue interrupt coalesce command")
Cc: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Background:
1. Commit 0c465be183 ("virtio_net: ethtool tx napi configuration") uses
tx-frames to toggle napi_tx (0 off and 1 on) if notification coalescing
is not supported.
2. Commit 31c03aef9b ("virtio_net: enable napi_tx by default") enables
napi_tx for all txqs by default.
Status:
When virtio-net supports notification coalescing, after initialization,
tx-frames is 0 and napi_tx is true.
Problem:
When the user only wants to set rx coalescing params using
ethtool -C eth0 rx-usecs 10, or
ethtool -Q eth0 queue_mask 0x1 -C rx-usecs 10,
these cmds will carry tx-frames as 0, causing the napi_tx switching condition
is satisfied. Then the user gets:
netlink error: Device or resource busy.
The same happens when trying to set rx-frames, adaptive_rx, adaptive_tx...
How to fix:
When notification coalescing feature is negotiated, initially make the
value of tx-frames to be consistent with napi_tx.
For compatibility with the past, it is still supported to use tx-frames
to toggle napi_tx.
Reported-by: Xiaoming Zhao <zxm377917@alibaba-inc.com>
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use DEV_STATS_INC() and DEV_STATS_READ() which provide
atomicity on paths that can be used concurrently.
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a small pull request this time around, mostly because the
vduse network got postponed to next relase so we can be sure
we got the security store right.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmT1BMAPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpYJUH+QHNhfn0JC/yE1IySwDwpmdgr73aaGik1LgV
ObHi48ucRMtxB+QpXLjPWAlQhVVzZv1wBK+Up9QxW8e9USJrSeI/MWfoHtXOFnGe
1JdmNr+XQM/uDngZ+mjI4ZUwRkA61iOcTR7gEDdfBUOr+Yl6R7Na/+kKtTDiDMfy
O8bOCLYVyJNiny2eSMmXH0mb4oPplkne4PzW4i/+ssKNoHlBmUIcx0jqj/qUVpSR
ozr0SpyhlXKSEQGAtNxwR4PONeMDOOdkRBhxHW5N5QgnP9P7HQ57Ar39Vz7+Kc0i
6vO2g1gpYV1naQr9BCg8hIF9r68rjgi4IOSghmfpWWUL0yNURtU=
=z/Df
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"A small pull request this time around, mostly because the vduse
network got postponed to next relase so we can be sure we got the
security store right"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_ring: fix avail_wrap_counter in virtqueue_add_packed
virtio_vdpa: build affinity masks conditionally
virtio_net: merge dma operations when filling mergeable buffers
virtio_ring: introduce dma sync api for virtqueue
virtio_ring: introduce dma map api for virtqueue
virtio_ring: introduce virtqueue_reset()
virtio_ring: separate the logic of reset/enable from virtqueue_resize
virtio_ring: correct the expression of the description of virtqueue_resize()
virtio_ring: skip unmap for premapped
virtio_ring: introduce virtqueue_dma_dev()
virtio_ring: support add premapped buf
virtio_ring: introduce virtqueue_set_dma_premapped()
virtio_ring: put mapping error check in vring_map_one_sg
virtio_ring: check use_dma_api before unmap desc for indirect
vdpa_sim: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK
vdpa: add get_backend_features vdpa operation
vdpa: accept VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature
vdpa: add VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK flag
vdpa/mlx5: Remove unused function declarations
Currently, the virtio core will perform a dma operation for each
buffer. Although, the same page may be operated multiple times.
This patch, the driver does the dma operation and manages the dma
address based the feature premapped of virtio core.
This way, we can perform only one dma operation for the pages of the
alloc frag. This is beneficial for the iommu device.
kernel command line: intel_iommu=on iommu.passthrough=0
| strict=0 | strict=1
Before | 775496pps | 428614pps
After | 1109316pps | 742853pps
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20230810123057.43407-13-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The virtio_net driver currently deals with different versions and types
of virtio net headers, such as virtio_net_hdr_mrg_rxbuf,
virtio_net_hdr_v1_hash, etc. Due to these variations, the code relies
on multiple type casts to convert memory between different structures,
potentially leading to bugs when there are changes in these structures.
Introduces the "struct skb_vnet_common_hdr" as a unifying header
structure using a union. With this approach, various virtio net header
structures can be converted by accessing different members of this
structure, thus eliminating the need for type casting and reducing the
risk of potential bugs.
For example following code:
static struct sk_buff *page_to_skb(struct virtnet_info *vi,
struct receive_queue *rq,
struct page *page, unsigned int offset,
unsigned int len, unsigned int truesize,
unsigned int headroom)
{
[...]
struct virtio_net_hdr_mrg_rxbuf *hdr;
[...]
hdr_len = vi->hdr_len;
[...]
ok:
hdr = skb_vnet_hdr(skb);
memcpy(hdr, hdr_p, hdr_len);
[...]
}
When VIRTIO_NET_F_HASH_REPORT feature is enabled, hdr_len = 20
But the sizeof(*hdr) is 12,
memcpy(hdr, hdr_p, hdr_len); will copy 20 bytes to the hdr,
which make a potential risk of bug. And this risk can be avoided by
introducing struct skb_vnet_common_hdr.
Change log
v1->v2
feedback from Willem de Bruijn <willemdebruijn.kernel@gmail.com>
feedback from Simon Horman <horms@kernel.org>
1. change to use net-next tree.
2. move skb_vnet_common_hdr inside kernel file instead of the UAPI header.
v2->v3
feedback from Willem de Bruijn <willemdebruijn.kernel@gmail.com>
1. fix typo in commit message.
2. add original struct virtio_net_hdr into union
3. remove virtio_net_hdr_mrg_rxbuf variable in receive_buf;
Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/sfc/tc.c
fa165e1949 ("sfc: don't unregister flow_indr if it was never registered")
3bf969e88a ("sfc: add MAE table machinery for conntrack table")
https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
No known outstanding regressions.
Fixes to fixes:
- virtio-net: set queues after driver_ok, avoid a potential race
added by recent fix
- Revert "vlan: Fix VLAN 0 memory leak", it may lead to a warning
when VLAN 0 is registered explicitly
- nf_tables:
- fix false-positive lockdep splat in recent fixes
- don't fail inserts if duplicate has expired (fix test failures)
- fix races between garbage collection and netns dismantle
Current release - new code bugs:
- mlx5: Fix mlx5_cmd_update_root_ft() error flow
Previous releases - regressions:
- phy: fix IRQ-based wake-on-lan over hibernate / power off
Previous releases - always broken:
- sock: fix misuse of sk_under_memory_pressure() preventing system
from exiting global TCP memory pressure if a single cgroup is under
pressure
- fix the RTO timer retransmitting skb every 1ms if linear option
is enabled
- af_key: fix sadb_x_filter validation, amment netlink policy
- ipsec: fix slab-use-after-free in decode_session6()
- macb: in ZynqMP resume always configure PS GTR for non-wakeup source
Misc:
- netfilter: set default timeout to 3 secs for sctp shutdown send and
recv state (from 300ms), align with protocol timers
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmTemA4ACgkQMUZtbf5S
IrtCThAAj+t35QM5BgGZowmrx9U4yF+kacDkdPztxlT8a/b+famrTtnZJ8USW+PF
VCk3Eu8JXheuyAOMArHyM84/crS6wim6mzGcXaucusA3981PFzoqdgCLLf9emAJ2
j9vzKrnHBtdd5fj8Exwq70KN4CzXyrzRgqwr2EXBK9lH59HjX0+J7o+trbDxNmFK
RZJE2oDCqf939iRGG3PhJryKYBmrQaMtdonNpSU5PiiRT0HnVYcEtdWcOXK7d53D
onpoaPdawcsqsns5c5Qj01E1OdyM8X54BEGkl/S4FmSw5jF9Bp6btmTcxYYtdb7E
M3CeYROZ0Kt8KcKKje/o1AzdGqWq8Hnxfwy+2WulZAHMucshg0JPm6Ev74WRondw
NGYriKJSdORSO8idK9K/i7pnjZXYr9gU50lpPUFU+QzSdd+zv+U11arjAodwI9Wi
pW+dFi3UR7J01LidaxclvHmWnZ7d5sSzE2khpqb0xd0+PagRGesl8qnKyoDJNS1P
IHsOrRh9aXLzEZjud/rVG+sUobQvc1oiHW+hvbJ04GLKoli9U5poGT2fcaa4O67M
T7JcN5oGDF+PIHJKgTEN7pfX2epY33gmofKUhbt/OPOqnvZOVbTu7/ojjuJZ8Lc5
SF8AvTe+lECcX8Htjq30PoVfai+FT6AhnZzK0H9K4HMfUB9O32Q=
=Ze13
-----END PGP SIGNATURE-----
Merge tag 'net-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from ipsec and netfilter.
No known outstanding regressions.
Fixes to fixes:
- virtio-net: set queues after driver_ok, avoid a potential race
added by recent fix
- Revert "vlan: Fix VLAN 0 memory leak", it may lead to a warning
when VLAN 0 is registered explicitly
- nf_tables:
- fix false-positive lockdep splat in recent fixes
- don't fail inserts if duplicate has expired (fix test failures)
- fix races between garbage collection and netns dismantle
Current release - new code bugs:
- mlx5: Fix mlx5_cmd_update_root_ft() error flow
Previous releases - regressions:
- phy: fix IRQ-based wake-on-lan over hibernate / power off
Previous releases - always broken:
- sock: fix misuse of sk_under_memory_pressure() preventing system
from exiting global TCP memory pressure if a single cgroup is under
pressure
- fix the RTO timer retransmitting skb every 1ms if linear option is
enabled
- af_key: fix sadb_x_filter validation, amment netlink policy
- ipsec: fix slab-use-after-free in decode_session6()
- macb: in ZynqMP resume always configure PS GTR for non-wakeup
source
Misc:
- netfilter: set default timeout to 3 secs for sctp shutdown send and
recv state (from 300ms), align with protocol timers"
* tag 'net-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits)
ice: Block switchdev mode when ADQ is active and vice versa
qede: fix firmware halt over suspend and resume
net: do not allow gso_size to be set to GSO_BY_FRAGS
sock: Fix misuse of sk_under_memory_pressure()
sfc: don't fail probe if MAE/TC setup fails
sfc: don't unregister flow_indr if it was never registered
net: dsa: mv88e6xxx: Wait for EEPROM done before HW reset
net/mlx5: Fix mlx5_cmd_update_root_ft() error flow
net/mlx5e: XDP, Fix fifo overrun on XDP_REDIRECT
i40e: fix misleading debug logs
iavf: fix FDIR rule fields masks validation
ipv6: fix indentation of a config attribute
mailmap: add entries for Simon Horman
broadcom: b44: Use b44_writephy() return value
net: openvswitch: reject negative ifindex
team: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves
net: phy: broadcom: stub c45 read/write for 54810
netfilter: nft_dynset: disallow object maps
netfilter: nf_tables: GC transaction race with netns dismantle
netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path
...
Commit 25266128fe ("virtio-net: fix race between set queues and
probe") tries to fix the race between set queues and probe by calling
_virtnet_set_queues() before DRIVER_OK is set. This violates virtio
spec. Fixing this by setting queues after virtio_device_ready().
Note that rtnl needs to be held for userspace requests to change the
number of queues. So we are serialized in this way.
Fixes: 25266128fe ("virtio-net: fix race between set queues and probe")
Reported-by: Dragos Tatulea <dtatulea@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>