linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
Jakub Kicinski	02839cc8d7	rtnl: move rtnl_newlink_create() Pure code move. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-05-02 15:14:20 +02:00
Jakub Kicinski	63105e8398	rtnl: split __rtnl_newlink() into two functions __rtnl_newlink() is 250LoC, but has a few clear sections. Move the part which creates a new netdev to a separate function. For ease of review code will be moved in the next change. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-05-02 15:14:20 +02:00
Jakub Kicinski	c92bf26cce	rtnl: allocate more attr tables on the heap Commit `a293974590` ("rtnetlink: avoid frame size warning in rtnl_newlink()") moved to allocating the largest attribute array of rtnl_newlink() on the heap. Kalle reports the stack has grown above 1k again: net/core/rtnetlink.c:3557:1: error: the frame size of 1104 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] Move more attrs to the heap, wrap them in a struct. Don't bother with linkinfo, it's referenced a lot and we take its size so it's awkward to move, plus it's small (6 elements). Reported-by: Kalle Valo <kvalo@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Kalle Valo <kvalo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-05-02 15:14:20 +02:00
Florent Fourcot	6f37c9f9df	Revert "rtnetlink: return EINVAL when request cannot succeed" This reverts commit `b6177d3240` ip-link command is testing kernel capability by sending a RTM_NEWLINK request, without any argument. It accepts everything in reply, except EOPNOTSUPP and EINVAL (functions iplink_have_newlink / accept_msg) So we must keep compatiblity here, invalid empty message should not return EINVAL Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Tested-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-22 11:02:00 +01:00
Florent Fourcot	b6177d3240	rtnetlink: return EINVAL when request cannot succeed A request without interface name/interface index/interface group cannot work. We should return EINVAL Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Brian Baboch <brian.baboch@wifirst.fr> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-04-19 13:38:55 +02:00
Florent Fourcot	dee04163e9	rtnetlink: return ENODEV when IFLA_ALT_IFNAME is used in dellink If IFLA_ALT_IFNAME is set and given interface is not found, we should return ENODEV and be consistent with IFLA_IFNAME behaviour This commit extends feature of commit `76c9ac0ee8`, "net: rtnetlink: add possibility to use alternative names as message handle" CC: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Brian Baboch <brian.baboch@wifirst.fr> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-04-19 13:38:50 +02:00
Florent Fourcot	5ea08b5286	rtnetlink: enable alt_ifname for setlink/newlink buffer called "ifname" given in function rtnl_dev_get is always valid when called by setlink/newlink, but contains only empty string when IFLA_IFNAME is not given. So IFLA_ALT_IFNAME is always ignored This patch fixes rtnl_dev_get function with a remove of ifname argument, and move ifname copy in do_setlink when required. It extends feature of commit `76c9ac0ee8`, "net: rtnetlink: add possibility to use alternative names as message handle"" CC: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Brian Baboch <brian.baboch@wifirst.fr> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-04-19 13:38:43 +02:00
Florent Fourcot	ef2a7c9065	rtnetlink: return ENODEV when ifname does not exist and group is given When the interface does not exist, and a group is given, the given parameters are being set to all interfaces of the given group. The given IFNAME/ALT_IF_NAME are being ignored in that case. That can be dangerous since a typo (or a deleted interface) can produce weird side effects for caller: Case 1: IFLA_IFNAME=valid_interface IFLA_GROUP=1 MTU=1234 Case 1 will update MTU and group of the given interface "valid_interface". Case 2: IFLA_IFNAME=doesnotexist IFLA_GROUP=1 MTU=1234 Case 2 will update MTU of all interfaces in group 1. IFLA_IFNAME is ignored in this case This behaviour is not consistent and dangerous. In order to fix this issue, we now return ENODEV when the given IFNAME does not exist. Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Brian Baboch <brian.baboch@wifirst.fr> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-04-19 13:38:31 +02:00
Paolo Abeni	edf45f007a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2022-04-15 09:26:00 +02:00
Petr Machata	23cfe941b5	rtnetlink: Fix handling of disabled L3 stats in RTM_GETSTATS replies When L3 stats are disabled, rtnl_offload_xstats_get_size_stats() returns size of 0, which is supposed to be an indication that the corresponding attribute should not be emitted. However, instead, the current code reserves a 0-byte attribute. The reason this does not show up as a citation on a kasan kernel is that netdev_offload_xstats_get(), which is supposed to fill in the data, never ends up getting called, because rtnl_offload_xstats_get_stats() notices that the stats are not actually used and skips the call. Thus a zero-length IFLA_OFFLOAD_XSTATS_L3_STATS attribute ends up in a response, confusing the userspace. Fix by skipping the L3-stats related block in rtnl_offload_xstats_fill(). Fixes: `0e7788fd76` ("net: rtnetlink: Add UAPI for obtaining L3 offload xstats") Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/591b58e7623edc3eb66dd1fcfa8c8f133d090974.1649794741.git.petrm@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-04-14 09:01:26 +02:00
Nikolay Aleksandrov	ea2c0f9e3f	net: rtnetlink: add ndm flags and state mask attributes Add ndm flags/state masks which will be used for bulk delete filtering. All of these are used by the bridge and vxlan drivers. Also minimal attr policy validation is added, it is up to ndo_fdb_del_bulk implementers to further validate them. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-13 12:46:26 +01:00
Nikolay Aleksandrov	9e83425993	net: rtnetlink: add NLM_F_BULK support to rtnl_fdb_del When NLM_F_BULK is specified in a fdb del message we need to handle it differently. First since this is a new call we can strictly validate the passed attributes, at first only ifindex and vlan are allowed as these will be the initially supported filter attributes, any other attribute is rejected. The mac address is no longer mandatory, but we use it to error out in older kernels because it cannot be specified with bulk request (the attribute is not allowed) and then we have to dispatch the call to ndo_fdb_del_bulk if the device supports it. The del bulk callback can do further validation of the attributes if necessary. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-13 12:46:26 +01:00
Nikolay Aleksandrov	a6cec0bcd3	net: rtnetlink: add bulk delete support flag Add a new rtnl flag (RTNL_FLAG_BULK_DEL_SUPPORTED) which is used to verify that the delete operation allows bulk object deletion. Also emit a warning if anyone tries to set it for non-delete kind. Suggested-by: David Ahern <dsahern@kernel.org> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-13 12:46:26 +01:00
Nikolay Aleksandrov	2e9ea3e30f	net: rtnetlink: add helper to extract msg type's kind Add a helper which extracts the msg type's kind using the kind mask (0x3). Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-13 12:46:25 +01:00
Nikolay Aleksandrov	12dc5c2cb7	net: rtnetlink: add msg kind names Add rtnl kind names instead of using raw values. We'll need to check for DEL kind later to validate bulk flag support. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-13 12:46:25 +01:00
Jakub Kicinski	6264f58ca0	net: extract a few internals from netdevice.h There's a number of functions and static variables used under net/core/ but not from the outside. We currently dump most of them into netdevice.h. That bad for many reasons: - netdevice.h is very cluttered, hard to figure out what the APIs are; - netdevice.h is very long; - we have to touch netdevice.h more which causes expensive incremental builds. Create a header under net/core/ and move some declarations. The new header is also a bit of a catch-all but that's fine, if we create more specific headers people will likely over-think where their declaration fit best. And end up putting them in netdevice.h, again. More work should be done on splitting netdevice.h into more targeted headers, but that'd be more time consuming so small steps. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-04-07 20:32:09 -07:00
Johannes Berg	0b5c21bbc0	net: ensure net_todo_list is processed quickly In [1], Will raised a potential issue that the cfg80211 code, which does (from a locking perspective) rtnl_lock() wiphy_lock() rtnl_unlock() might be suspectible to ABBA deadlocks, because rtnl_unlock() calls netdev_run_todo(), which might end up calling rtnl_lock() again, which could then deadlock (see the comment in the code added here for the scenario). Some back and forth and thinking ensued, but clearly this can't happen if the net_todo_list is empty at the rtnl_unlock() here. Clearly, the code here cannot actually put an entry on it, and all other users of rtnl_unlock() will empty it since that will always go through netdev_run_todo(), emptying the list. So the only other way to get there would be to add to the list and then unlock the RTNL without going through rtnl_unlock(), which is only possible through __rtnl_unlock(). However, this isn't exported and not used in many places, and none of them seem to be able to unregister before using it. Therefore, add a WARN_ON() in the code to ensure this invariant won't be broken, so that the cfg80211 (or any similar) code stays safe. [1] https://lore.kernel.org/r/Yjzpo3TfZxtKPMAG@google.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20220404113847.0ee02e4a70da.Ic73d206e217db20fd22dcec14fe5442ca732804b@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-04-05 14:28:16 -07:00
Jakub Kicinski	155fb43b70	net: limit altnames to 64k total Property list (altname is a link "property") is wrapped in a nlattr. nlattrs length is 16bit so practically speaking the list of properties can't be longer than that, otherwise user space would have to interpret broken netlink messages. Prevent the problem from occurring by checking the length of the property list before adding new entries. Reported-by: George Shuklin <george.shuklin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-10 20:15:23 -08:00
Jakub Kicinski	5d26cff5bd	net: account alternate interface name memory George reports that altnames can eat up kernel memory. We should charge that memory appropriately. Reported-by: George Shuklin <george.shuklin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-03-10 20:15:20 -08:00
Tom Rix	57d29a2935	net: rtnetlink: fix error handling in rtnl_fill_statsinfo() The clang static analyzer reports this issue rtnetlink.c:5481:2: warning: Undefined or garbage value returned to caller return err; ^~~~~~~~~~ There is a function level err variable, in the list_for_each_entry_rcu block there is a shadow err. Remove the shadow. In the same block, the call to nla_nest_start_noflag() can fail without setting an err. Set the err to -EMSGSIZE. Fixes: `216e690631` ("net: rtnetlink: rtnl_fill_statsinfo(): Permit non-EMSGSIZE error returns") Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-07 12:26:53 +00:00
Petr Machata	5fd0b838ef	net: rtnetlink: Add UAPI toggle for IFLA_OFFLOAD_XSTATS_L3_STATS The offloaded HW stats are designed to allow per-netdevice enablement and disablement. Add an attribute, IFLA_STATS_SET_OFFLOAD_XSTATS_L3_STATS, which should be carried by the RTM_SETSTATS message, and expresses a desire to toggle L3 offload xstats on or off. As part of the above, add an exported function rtnl_offload_xstats_notify() that drivers can use when they have installed or deinstalled the counters backing the HW stats. At this point, it is possible to enable, disable and query L3 offload xstats on netdevices. (However there is no driver actually implementing these.) Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 10:37:23 +00:00
Petr Machata	03ba356670	net: rtnetlink: Add RTM_SETSTATS The offloaded HW stats are designed to allow per-netdevice enablement and disablement. These stats are only accessible through RTM_GETSTATS, and therefore should be toggled by a RTM_SETSTATS message. Add it, and the necessary skeleton handler. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 10:37:23 +00:00
Petr Machata	0e7788fd76	net: rtnetlink: Add UAPI for obtaining L3 offload xstats Add a new IFLA_STATS_LINK_OFFLOAD_XSTATS child attribute, IFLA_OFFLOAD_XSTATS_L3_STATS, to carry statistics for traffic that takes place in a HW router. The offloaded HW stats are designed to allow per-netdevice enablement and disablement. Additionally, as a netdevice is configured, it may become or cease being suitable for binding of a HW counter. Both of these aspects need to be communicated to the userspace. To that end, add another child attribute, IFLA_OFFLOAD_XSTATS_HW_S_INFO: - attr nest IFLA_OFFLOAD_XSTATS_HW_S_INFO - attr nest IFLA_OFFLOAD_XSTATS_L3_STATS - attr IFLA_OFFLOAD_XSTATS_HW_S_INFO_REQUEST - {0,1} as u8 - attr IFLA_OFFLOAD_XSTATS_HW_S_INFO_USED - {0,1} as u8 Thus this one attribute is a nest that can be used to carry information about various types of HW statistics, and indexing is very simply done by wrapping the information for a given statistics suite into the attribute that carries the suite is the RTM_GETSTATS query. At the same time, because _HW_S_INFO is nested directly below IFLA_STATS_LINK_OFFLOAD_XSTATS, it is possible through filtering to request only the metadata about individual statistics suites, without having to hit the HW to get the actual counters. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 10:37:23 +00:00
Petr Machata	216e690631	net: rtnetlink: rtnl_fill_statsinfo(): Permit non-EMSGSIZE error returns Obtaining stats for the IFLA_STATS_LINK_OFFLOAD_XSTATS nest involves a HW access, and can fail for more reasons than just netlink message size exhaustion. Therefore do not always return -EMSGSIZE on the failure path, but respect the error code provided by the callee. Set the error explicitly where it is reasonable to assume -EMSGSIZE as the failure reason. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 10:37:22 +00:00
Petr Machata	05415bccbb	net: rtnetlink: Propagate extack to rtnl_offload_xstats_fill() Later patches add handlers for more HW-backed statistics. An extack will be useful when communicating HW / driver errors to the client. Add the arguments as appropriate. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 10:37:22 +00:00
Petr Machata	46efc97b73	net: rtnetlink: RTM_GETSTATS: Allow filtering inside nests The filter_mask field of RTM_GETSTATS header determines which top-level attributes should be included in the netlink response. This saves processing time by only including the bits that the user cares about instead of always dumping everything. This is doubly important for HW-backed statistics that would typically require a trip to the device to fetch the stats. So far there was only one HW-backed stat suite per attribute. However, IFLA_STATS_LINK_OFFLOAD_XSTATS is a nest, and will gain a new stat suite in the following patches. It would therefore be advantageous to be able to filter within that nest, and select just one or the other HW-backed statistics suite. Extend rtnetlink so that RTM_GETSTATS permits attributes in the payload. The scheme is as follows: - RTM_GETSTATS - struct if_stats_msg - attr nest IFLA_STATS_GET_FILTERS - attr IFLA_STATS_LINK_OFFLOAD_XSTATS - u32 filter_mask This scheme reuses the existing enumerators by nesting them in a dedicated context attribute. This is covered by policies as usual, therefore a gradual opt-in is possible. Currently only IFLA_STATS_LINK_OFFLOAD_XSTATS nest has filtering enabled, because for the SW counters the issue does not seem to be that important. rtnl_offload_xstats_get_size() and _fill() are extended to observe the requested filters. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 10:37:22 +00:00
Petr Machata	f6e0fb8129	net: rtnetlink: Stop assuming that IFLA_OFFLOAD_XSTATS_* are dev-backed The IFLA_STATS_LINK_OFFLOAD_XSTATS attribute is a nest whose child attributes carry various special hardware statistics. The code that handles this nest was written with the idea that all these statistics would be exposed by the device driver of a physical netdevice. In the following patches, a new attribute is added to the abovementioned nest, which however can be defined for some soft netdevices. The NDO-based approach to querying these does not work, because it is not the soft netdevice driver that exposes these statistics, but an offloading NIC driver that does so. The current code does not scale well to this usage. Simply rewrite it back to the pattern seen in other fill-like and get_size-like functions elsewhere. Extract to helpers the code that is concerned with handling specifically NDO-backed statistics so that it can be easily reused should more such statistics be added. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 10:37:22 +00:00
Petr Machata	6b524a1d01	net: rtnetlink: Namespace functions related to IFLA_OFFLOAD_XSTATS_* The currently used names rtnl_get_offload_stats() and rtnl_get_offload_stats_size() do not clearly show the namespace. The former function additionally seems to have been named this way in accordance with the NDO name, as opposed to the naming used in the rtnetlink.c file (and indeed elsewhere in the netlink handling code). As more and differently-flavored attributes are introduced, a common clear prefix is needed for all related functions. Rename the functions to follow the rtnl_offload_xstats_* naming scheme. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-03-03 10:37:22 +00:00
Jakub Kicinski	6b5567b1b2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-17 11:44:20 -08:00
Petr Machata	22b67d1719	net: rtnetlink: rtnl_stats_get(): Emit an extack for unset filter_mask Both get and dump handlers for RTM_GETSTATS require that a filter_mask, a mask of which attributes should be emitted in the netlink response, is unset. rtnl_stats_dump() does include an extack in the bounce, rtnl_stats_get() however does not. Fix the omission. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/01feb1f4bbd22a19f6629503c4f366aed6424567.1645020876.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-16 20:56:21 -08:00
Eric Dumazet	5891cd5ec4	net_sched: add __rcu annotation to netdev->qdisc syzbot found a data-race [1] which lead me to add __rcu annotations to netdev->qdisc, and proper accessors to get LOCKDEP support. [1] BUG: KCSAN: data-race in dev_activate / qdisc_lookup_rcu write to 0xffff888168ad6410 of 8 bytes by task 13559 on cpu 1: attach_default_qdiscs net/sched/sch_generic.c:1167 [inline] dev_activate+0x2ed/0x8f0 net/sched/sch_generic.c:1221 __dev_open+0x2e9/0x3a0 net/core/dev.c:1416 __dev_change_flags+0x167/0x3f0 net/core/dev.c:8139 rtnl_configure_link+0xc2/0x150 net/core/rtnetlink.c:3150 __rtnl_newlink net/core/rtnetlink.c:3489 [inline] rtnl_newlink+0xf4d/0x13e0 net/core/rtnetlink.c:3529 rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5594 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 ___sys_sendmsg net/socket.c:2467 [inline] __sys_sendmsg+0x195/0x230 net/socket.c:2496 __do_sys_sendmsg net/socket.c:2505 [inline] __se_sys_sendmsg net/socket.c:2503 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888168ad6410 of 8 bytes by task 13560 on cpu 0: qdisc_lookup_rcu+0x30/0x2e0 net/sched/sch_api.c:323 __tcf_qdisc_find+0x74/0x3a0 net/sched/cls_api.c:1050 tc_del_tfilter+0x1c7/0x1350 net/sched/cls_api.c:2211 rtnetlink_rcv_msg+0x5ba/0x7e0 net/core/rtnetlink.c:5585 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 ___sys_sendmsg net/socket.c:2467 [inline] __sys_sendmsg+0x195/0x230 net/socket.c:2496 __do_sys_sendmsg net/socket.c:2505 [inline] __se_sys_sendmsg net/socket.c:2503 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0xffffffff85dee080 -> 0xffff88815d96ec00 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 13560 Comm: syz-executor.2 Not tainted 5.17.0-rc3-syzkaller-00116-gf1baf68e1383-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: `470502de5b` ("net: sched: unlock rules update API") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Buslov <vladbu@mellanox.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-14 13:36:36 +00:00
Eric Dumazet	ede6c39c4f	net: make net->dev_unreg_count atomic Having to acquire rtnl from netdev_run_todo() for every dismantled device is not desirable when/if rtnl is under stress. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-02-10 15:30:26 +00:00
Eric Dumazet	c6f6f2444b	rtnetlink: make sure to refresh master_dev/m_ops in __rtnl_newlink() While looking at one unrelated syzbot bug, I found the replay logic in __rtnl_newlink() to potentially trigger use-after-free. It is better to clear master_dev and m_ops inside the loop, in case we have to replay it. Fixes: `ba7d49b1f0` ("rtnetlink: provide api for getting and setting slave info") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20220201012106.216495-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-01 20:19:00 -08:00
Coco Li	eac1b93c14	gro: add ability to control gro max packet size Eric Dumazet suggested to allow users to modify max GRO packet size. We have seen GRO being disabled by users of appliances (such as wifi access points) because of claimed bufferbloat issues, or some work arounds in sch_cake, to split GRO/GSO packets. Instead of disabling GRO completely, one can chose to limit the maximum packet size of GRO packets, depending on their latency constraints. This patch adds a per device gro_max_size attribute that can be changed with ip link command. ip link set dev eth0 gro_max_size 16000 Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Coco Li <lixiaoyan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-06 12:27:05 +00:00
Sebastian Andrzej Siewior	fd888e85fe	net: Write lock dev_base_lock without disabling bottom halves. The writer acquires dev_base_lock with disabled bottom halves. The reader can acquire dev_base_lock without disabling bottom halves because there is no writer in softirq context. On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts as a lock to ensure that resources, that are protected by disabling bottom halves, remain protected. This leads to a circular locking dependency if the lock acquired with disabled bottom halves (as in write_lock_bh()) and somewhere else with enabled bottom halves (as by read_lock() in netstat_show()) followed by disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout() -> spin_lock_bh()). This is the reverse locking order. All read_lock() invocation are from sysfs callback which are not invoked from softirq context. Therefore there is no need to disable bottom halves while acquiring a write lock. Acquire the write lock of dev_base_lock without disabling bottom halves. Reported-by: Pei Zhang <pezhang@redhat.com> Reported-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-29 12:12:36 +00:00
Jakub Kicinski	2106efda78	net: remove .ndo_change_proto_down .ndo_change_proto_down was added seemingly to enable out-of-tree implementations. Over 2.5yrs later we still have no real users upstream. Hardwire the generic implementation for now, we can revert once real users materialize. (rocker is a test vehicle, not a user.) We need to drop the optimization on the sysfs side, because unlike ndos priv_flags will be changed at runtime, so we'd need READ_ONCE/WRITE_ONCE everywhere.. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-23 12:18:48 +00:00
Eric Dumazet	6d872df3e3	net: annotate accesses to dev->gso_max_segs dev->gso_max_segs is written under RTNL protection, or when the device is not yet visible, but is read locklessly. Add netif_set_gso_max_segs() helper. Add the READ_ONCE()/WRITE_ONCE() pairs, and use netif_set_gso_max_segs() where we can to better document what is going on. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-22 12:49:42 +00:00
Jakub Kicinski	efd38f75bb	net: rtnetlink: use __dev_addr_set() Get it ready for constant netdev->dev_addr. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-24 13:59:44 +01:00
luo penghao	50af5969bb	net/core: Remove unused assignment operations and variable Although if_info_size is assigned, it has not been used. And the variable should also be deleted. The clang_analyzer complains as follows: net/core/rtnetlink.c:3806: warning: Although the value stored to 'if_info_size' is used in the enclosing expression, the value is never actually read from 'if_info_size'. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-21 12:48:59 +01:00
Kyungrok Chung	254ec036db	net: make use of helper netif_is_bridge_master() Make use of netdev helper functions to improve code readability. Replace 'dev->priv_flags & IFF_EBRIDGE' with netif_is_bridge_master(dev). Signed-off-by: Kyungrok Chung <acadx0@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-16 15:02:56 +01:00
Jakub Kicinski	9fe1155233	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-10-07 15:24:06 -07:00
Eric Dumazet	d343679919	rtnetlink: fix if_nlmsg_stats_size() under estimation rtnl_fill_statsinfo() is filling skb with one mandatory if_stats_msg structure. nlmsg_put(skb, pid, seq, type, sizeof(struct if_stats_msg), flags); But if_nlmsg_stats_size() never considered the needed storage. This bug did not show up because alloc_skb(X) allocates skb with extra tailroom, because of added alignments. This could very well be changed in the future to have deterministic behavior. Fixes: `10c9ead9f3` ("rtnetlink: add new RTM_GETSTATS message to dump link stats") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Roopa Prabhu <roopa@nvidia.com> Acked-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-10-06 15:09:46 +01:00
Yajun Deng	4fc2998983	net: rtnetlink: convert rcu_assign_pointer to RCU_INIT_POINTER It no need barrier when assigning a NULL value to an RCU protected pointer. So use RCU_INIT_POINTER() instead for more fast. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-19 12:56:02 +01:00
Jakub Kicinski	97c78d0af5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/net/wwan/mhi_wwan_mbim.c - drop the extra arg. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-26 17:57:57 -07:00
Andrey Ignatov	96a6b93b69	rtnetlink: Return correct error on changing device netns Currently when device is moved between network namespaces using RTM_NEWLINK message type and one of netns attributes (FLA_NET_NS_PID, IFLA_NET_NS_FD, IFLA_TARGET_NETNSID) but w/o specifying IFLA_IFNAME, and target namespace already has device with same name, userspace will get EINVAL what is confusing and makes debugging harder. Fix it so that userspace gets more appropriate EEXIST instead what makes debugging much easier. Before: # ./ifname.sh + ip netns add ns0 + ip netns exec ns0 ip link add l0 type dummy + ip netns exec ns0 ip link show l0 8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 66:90:b5:d5:78:69 brd ff:ff:ff:ff:ff:ff + ip link add l0 type dummy + ip link show l0 10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 6e:c6:1f:15:20:8d brd ff:ff:ff:ff:ff:ff + ip link set l0 netns ns0 RTNETLINK answers: Invalid argument After: # ./ifname.sh + ip netns add ns0 + ip netns exec ns0 ip link add l0 type dummy + ip netns exec ns0 ip link show l0 8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 1e:4a:72:e3:e3:8f brd ff:ff:ff:ff:ff:ff + ip link add l0 type dummy + ip link show l0 10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether f2:fc:fe:2b:7d:a6 brd ff:ff:ff:ff:ff:ff + ip link set l0 netns ns0 RTNETLINK answers: File exists The problem is that do_setlink() passes its `char ifname` argument, that it gets from a caller, to __dev_change_net_namespace() as is (as `const char pat`), but semantics of ifname and pat can be different. For example, __rtnl_newlink() does this: net/core/rtnetlink.c 3270 char ifname[IFNAMSIZ]; ... 3286 if (tb[IFLA_IFNAME]) 3287 nla_strscpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ); 3288 else 3289 ifname[0] = '\0'; ... 3364 if (dev) { ... 3394 return do_setlink(skb, dev, ifm, extack, tb, ifname, status); 3395 } , i.e. do_setlink() gets ifname pointer that is always valid no matter if user specified IFLA_IFNAME or not and then do_setlink() passes this ifname pointer as is to __dev_change_net_namespace() as pat argument. But the pat (pattern) in __dev_change_net_namespace() is used as: net/core/dev.c 11198 err = -EEXIST; 11199 if (__dev_get_by_name(net, dev->name)) { 11200 /* We get here if we can't use the current device name */ 11201 if (!pat) 11202 goto out; 11203 err = dev_get_valid_name(net, dev, pat); 11204 if (err < 0) 11205 goto out; 11206 } As the result the `goto out` path on line 11202 is neven taken and instead of returning EEXIST defined on line 11198, __dev_change_net_namespace() returns an error from dev_get_valid_name() and this, in turn, will be EINVAL for ifname[0] = '\0' set earlier. Fixes: `d8a5ec6727` ("[NET]: netlink support for moving devices between network namespaces.") Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-26 12:08:08 +01:00
Lahav Schlesinger	d3432bf10f	net: Support filtering interfaces on no master Currently there's support for filtering neighbours/links for interfaces which have a specific master device (using the IFLA_MASTER/NDA_MASTER attributes). This patch adds support for filtering interfaces/neighbours dump for interfaces that don't have a master. Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20210810090658.2778960-1-lschlesinger@drivenets.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-08-10 16:03:34 -07:00
Rocco Yue	8679c31e02	net: add extack arg for link ops Pass extack arg to validate_linkmsg and validate_link_af callbacks. If a netlink attribute has a reject_message, use the extended ack mechanism to carry the message back to user space. Signed-off-by: Rocco Yue <rocco.yue@mediatek.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-08-04 10:01:26 +01:00
Yajun Deng	f9b282b36d	net: netlink: add the case when nlh is NULL Add the case when nlh is NULL in nlmsg_report(), so that the caller doesn't need to deal with this case. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-27 11:43:50 +01:00
Yajun Deng	cfdf0d9ae7	rtnetlink: use nlmsg_notify() in rtnetlink_send() The netlink_{broadcast, unicast} don't deal with 'if (err > 0' statement but nlmsg_{multicast, unicast} do. The nlmsg_notify() contains them. so use nlmsg_notify() instead. so that the caller wouldn't deal with 'if (err > 0' statement. v2: use nlmsg_notify() will do well. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-07-16 10:46:35 -07:00
Vladimir Oltean	78ecc8903d	net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del} "Static" is a loaded word, and probably not what the author meant when the code was written. In particular, this looks weird: $ bridge fdb add dev swp0 00:01:02:03:04:05 local # totally fine, but $ bridge fdb add dev swp0 00:01:02:03:04:05 static [ 2020.708298] swp0: FDB only supports static addresses # hmm what? By looking at the implementation which uses dev_uc_add/dev_uc_del it is absolutely clear that only local addresses are supported, and the proper Network Unreachability Detection state is being used for this purpose (user space indeed sets NUD_PERMANENT when local addresses are meant). So it is just the message that is wrong, fix it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-29 11:31:57 -07:00

1 2 3 4 5 ...

759 commits