1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

380 commits

Author SHA1 Message Date
Amit Cohen
b34f4de6d3 selftests: mlxsw: qos_pfc: Adjust the test to support 8 lanes
'qos_pfc' test checks PFC behavior. The idea is to limit the traffic
using a shaper somewhere in the flow of the packets. In this area, the
buffer is smaller than the buffer at the beginning of the flow, so it fills
up until there is no more space left. The test configures there PFC
which is supposed to notice that the headroom is filling up and send PFC
Xoff to indicate the transmitter to stop sending traffic for the priorities
sharing this PG.

The Xon/Xoff threshold is auto-configured and always equal to
2*(MTU rounded up to cell size). Even after sending the PFC Xoff packet,
traffic will keep arriving until the transmitter receives and processes
the PFC packet. This amount of traffic is known as the PFC delay allowance.

Currently the buffer for the delay traffic is configured as 100KB. The
MTU in the test is 10KB, therefore the threshold for Xoff is about 20KB.
This allows 80KB extra to be stored in this buffer.

8-lane ports use two buffers among which the configured buffer is split,
the Xoff threshold then applies to each buffer in parallel.

The test does not take into account the behavior of 8-lane ports, when the
ports are configured to 400Gbps with 8 lanes or 800Gbps with 8 lanes,
packets are dropped and the test fails.

Check if the relevant ports use 8 lanes, in such case double the size of
the buffer, as the headroom is split half-half.

Cc: Shuah Khan <shuah@kernel.org>
Fixes: bfa804784e ("selftests: mlxsw: Add a PFC test")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/23ff11b7dff031eb04a41c0f5254a2b636cd8ebb.1705502064.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-18 09:48:09 -08:00
Amit Cohen
40cc674baf selftests: mlxsw: qos_pfc: Remove wrong description
In the diagram of the topology, $swp3 and $swp4 are described as 1Gbps
ports. This is wrong information, the test does not configure such speed.

Cc: Shuah Khan <shuah@kernel.org>
Fixes: bfa804784e ("selftests: mlxsw: Add a PFC test")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/0087e2d416aff7e444d15f7c2958fc1d438dc27e.1705502064.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-18 09:48:09 -08:00
Ido Schimmel
483ae90d8f mlxsw: spectrum_acl_tcam: Fix stack corruption
When tc filters are first added to a net device, the corresponding local
port gets bound to an ACL group in the device. The group contains a list
of ACLs. In turn, each ACL points to a different TCAM region where the
filters are stored. During forwarding, the ACLs are sequentially
evaluated until a match is found.

One reason to place filters in different regions is when they are added
with decreasing priorities and in an alternating order so that two
consecutive filters can never fit in the same region because of their
key usage.

In Spectrum-2 and newer ASICs the firmware started to report that the
maximum number of ACLs in a group is more than 16, but the layout of the
register that configures ACL groups (PAGT) was not updated to account
for that. It is therefore possible to hit stack corruption [1] in the
rare case where more than 16 ACLs in a group are required.

Fix by limiting the maximum ACL group size to the minimum between what
the firmware reports and the maximum ACLs that fit in the PAGT register.

Add a test case to make sure the machine does not crash when this
condition is hit.

[1]
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: mlxsw_sp_acl_tcam_group_update+0x116/0x120
[...]
 dump_stack_lvl+0x36/0x50
 panic+0x305/0x330
 __stack_chk_fail+0x15/0x20
 mlxsw_sp_acl_tcam_group_update+0x116/0x120
 mlxsw_sp_acl_tcam_group_region_attach+0x69/0x110
 mlxsw_sp_acl_tcam_vchunk_get+0x492/0xa20
 mlxsw_sp_acl_tcam_ventry_add+0x25/0xe0
 mlxsw_sp_acl_rule_add+0x47/0x240
 mlxsw_sp_flower_replace+0x1a9/0x1d0
 tc_setup_cb_add+0xdc/0x1c0
 fl_hw_replace_filter+0x146/0x1f0
 fl_change+0xc17/0x1360
 tc_new_tfilter+0x472/0xb90
 rtnetlink_rcv_msg+0x313/0x3b0
 netlink_rcv_skb+0x58/0x100
 netlink_unicast+0x244/0x390
 netlink_sendmsg+0x1e4/0x440
 ____sys_sendmsg+0x164/0x260
 ___sys_sendmsg+0x9a/0xe0
 __sys_sendmsg+0x7a/0xc0
 do_syscall_64+0x40/0xe0
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

Fixes: c3ab435466 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Reported-by: Orel Hagag <orelh@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/2d91c89afba59c22587b444994ae419dbea8d876.1705502064.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-18 09:48:08 -08:00
Amit Cohen
6d6eeabcfa mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure
Lately, a bug was found when many TC filters are added - at some point,
several bugs are printed to dmesg [1] and the switch is crashed with
segmentation fault.

The issue starts when gen_pool_free() fails because of unexpected
behavior - a try to free memory which is already freed, this leads to BUG()
call which crashes the switch and makes many other bugs.

Trying to track down the unexpected behavior led to a bug in eRP code. The
function mlxsw_sp_acl_erp_table_alloc() gets a pointer to the allocated
index, sets the value and returns an error code. When gen_pool_alloc()
fails it returns address 0, we track it and return -ENOBUFS outside, BUT
the call for gen_pool_alloc() already override the index in erp_table
structure. This is a problem when such allocation is done as part of
table expansion. This is not a new table, which will not be used in case
of allocation failure. We try to expand eRP table and override the
current index (non-zero) with zero. Then, it leads to an unexpected
behavior when address 0 is freed twice. Note that address 0 is valid in
erp_table->base_index and indeed other tables use it.

gen_pool_alloc() fails in case that there is no space left in the
pre-allocated pool, in our case, the pool is limited to
ACL_MAX_ERPT_BANK_SIZE, which is read from hardware. When more than max
erp entries are required, we exceed the limit and return an error, this
error leads to "Failed to migrate vregion" print.

Fix this by changing erp_table->base_index only in case of a successful
allocation.

Add a test case for such a scenario. Without this fix it causes
segmentation fault:

$ TESTS="max_erp_entries_test" ./tc_flower.sh
./tc_flower.sh: line 988:  1560 Segmentation fault      tc filter del dev $h2 ingress chain $i protocol ip pref $i handle $j flower &>/dev/null

[1]:
kernel BUG at lib/genalloc.c:508!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 6 PID: 3531 Comm: tc Not tainted 6.7.0-rc5-custom-ga6893f479f5e #1
Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 07/12/2021
RIP: 0010:gen_pool_free_owner+0xc9/0xe0
...
Call Trace:
 <TASK>
 __mlxsw_sp_acl_erp_table_other_dec+0x70/0xa0 [mlxsw_spectrum]
 mlxsw_sp_acl_erp_mask_destroy+0xf5/0x110 [mlxsw_spectrum]
 objagg_obj_root_destroy+0x18/0x80 [objagg]
 objagg_obj_destroy+0x12c/0x130 [objagg]
 mlxsw_sp_acl_erp_mask_put+0x37/0x50 [mlxsw_spectrum]
 mlxsw_sp_acl_ctcam_region_entry_remove+0x74/0xa0 [mlxsw_spectrum]
 mlxsw_sp_acl_ctcam_entry_del+0x1e/0x40 [mlxsw_spectrum]
 mlxsw_sp_acl_tcam_ventry_del+0x78/0xd0 [mlxsw_spectrum]
 mlxsw_sp_flower_destroy+0x4d/0x70 [mlxsw_spectrum]
 mlxsw_sp_flow_block_cb+0x73/0xb0 [mlxsw_spectrum]
 tc_setup_cb_destroy+0xc1/0x180
 fl_hw_destroy_filter+0x94/0xc0 [cls_flower]
 __fl_delete+0x1ac/0x1c0 [cls_flower]
 fl_destroy+0xc2/0x150 [cls_flower]
 tcf_proto_destroy+0x1a/0xa0
...
mlxsw_spectrum3 0000:07:00.0: Failed to migrate vregion
mlxsw_spectrum3 0000:07:00.0: Failed to migrate vregion

Fixes: f465261aa1 ("mlxsw: spectrum_acl: Implement common eRP core")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/4cfca254dfc0e5d283974801a24371c7b6db5989.1705502064.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-18 09:48:08 -08:00
Benjamin Poirier
dd2d40acdb selftests: bonding: Add more missing config options
As a followup to commit 03fb8565c8 ("selftests: bonding: add missing
build configs"), add more networking-specific config options which are
needed for bonding tests.

For testing, I used the minimal config generated by virtme-ng and I added
the options in the config file. All bonding tests passed.

Fixes: bbb774d921 ("net: Add tests for bonding and team address list management") # for ipv6
Fixes: 6cbe791c0f ("kselftest: bonding: add num_grat_arp test") # for tc options
Fixes: 222c94ec0a ("selftests: bonding: add tests for ether type changes") # for nlmon
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Link: https://lore.kernel.org/r/20240116154926.202164-1-bpoirier@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-18 11:59:26 +01:00
Jakub Kicinski
39369c9a6e selftests: netdevsim: add a config file
netdevsim tests aren't very well integrated with kselftest,
which has its advantages and disadvantages. But regardless
of the intended integration - a config file to know what kernel
to build is very useful, add one.

Fixes: fc4c93f145 ("selftests: add basic netdevsim devlink flash testing")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240116154311.1945801-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-18 11:51:02 +01:00
Jakub Kicinski
03fb8565c8 selftests: bonding: add missing build configs
bonding tests also try to create bridge, veth and dummy
interfaces. These are not currently listed in config.

Fixes: bbb774d921 ("net: Add tests for bonding and team address list management")
Fixes: c078290a2b ("selftests: include bonding tests into the kselftest infra")
Acked-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Link: https://lore.kernel.org/r/20240116020201.1883023-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-16 06:58:53 -08:00
Jakub Kicinski
4697381bd0 selftests: netdevsim: correct expected FEC strings
ethtool CLI has changed its output. Make the test compatible.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240114224748.1210578-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-16 12:42:29 +01:00
Jakub Kicinski
2c4ca79772 selftests: netdevsim: sprinkle more udevadm settle
Number of tests are failing when netdev renaming is active
on the system. Add udevadm settle in logic determining
the names.

Fixes: 242aaf03dc ("selftests: add a test for ethtool pause stats")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240114224726.1210532-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-16 12:16:04 +01:00
Benjamin Poirier
c2518da8e6 selftests: bonding: Change script interpreter
The tests changed by this patch, as well as the scripts they source, use
features which are not part of POSIX sh (ex. 'source' and 'local'). As a
result, these tests fail when /bin/sh is dash such as on Debian. Change the
interpreter to bash so that these tests can run successfully.

Fixes: d43eff0b85 ("selftests: bonding: up/down delay w/ slave link flapping")
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-16 09:06:15 +01:00
Jakub Kicinski
e63c1822ac Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/broadcom/bnxt/bnxt.c
  e009b2efb7 ("bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()")
  0f2b214779 ("bnxt_en: Fix compile error without CONFIG_RFS_ACCEL")
https://lore.kernel.org/all/20240105115509.225aa8a2@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 18:06:46 -08:00
Hangbin Liu
61fa2493ca selftests: bonding: do not set port down when adding to bond
Similar to commit be80942465 ("selftests: bonding: do not set port down
before adding to bond"). The bond-arp-interval-causes-panic test failed
after commit a4abfa627c ("net: rtnetlink: Enslave device before bringing
it up") as the kernel will set the port down _after_ adding to bond if setting
port down specifically.

Fix it by removing the link down operation when adding to bond.

Fixes: 2ffd57327f ("selftests: bonding: cause oops in bond_rr_gen_slave_id")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Tested-by: Benjamin Poirier <benjamin.poirier@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02 14:17:05 +00:00
Ido Schimmel
af51d6bd0b selftests: mlxsw: Add PCI reset test
Test that PCI reset works correctly by verifying that only the expected
reset methods are supported and that after issuing the reset the ifindex
of the port changes.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18 17:38:51 +00:00
Jiri Pirko
6151ff9c75 selftests: netdevsim: use suitable existing dummy file for flash test
The file name used in flash test was "dummy" because at the time test
was written, drivers were responsible for file request and as netdevsim
didn't do that, name was unused. However, the file load request is
now done in devlink code and therefore the file has to exist.
Use first random file from /lib/firmware for this purpose.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13 10:43:13 +01:00
Zhengchao Shao
bf68583624 selftests: bonding: create directly devices in the target namespaces
If failed to set link1_1 to netns client, we should delete link1_1 in the
cleanup path. But if set link1_1 to netns client successfully, delete
link1_1 will report warning. So it will be safer creating directly the
devices in the target namespaces.

Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Closes: https://lore.kernel.org/all/ZNyJx1HtXaUzOkNA@Laptop-X1/
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-28 10:24:08 +01:00
Jakub Kicinski
57ce6427e0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

include/net/inet_sock.h
  f866fbc842 ("ipv4: fix data-races around inet->inet_id")
  c274af2242 ("inet: introduce inet->inet_flags")
https://lore.kernel.org/all/679ddff6-db6e-4ff6-b177-574e90d0103d@tessares.net/

Adjacent changes:

drivers/net/bonding/bond_alb.c
  e74216b8de ("bonding: fix macvlan over alb bond support")
  f11e5bd159 ("bonding: support balance-alb with openvswitch")

drivers/net/ethernet/broadcom/bgmac.c
  d6499f0b7c ("net: bgmac: Return PTR_ERR() for fixed_phy_register()")
  23a14488ea ("net: bgmac: Fix return value check for fixed_phy_register()")

drivers/net/ethernet/broadcom/genet/bcmmii.c
  32bbe64a13 ("net: bcmgenet: Fix return value check for fixed_phy_register()")
  acf50d1adb ("net: bcmgenet: Return PTR_ERR() for fixed_phy_register()")

net/sctp/socket.c
  f866fbc842 ("ipv4: fix data-races around inet->inet_id")
  b09bde5c35 ("inet: move inet->mc_loop to inet->inet_frags")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-24 10:51:39 -07:00
Hangbin Liu
246af950b9 selftests: bonding: add macvlan over bond testing
Add a macvlan over bonding test with mode active-backup, balance-tlb
and balance-alb.

]# ./bond_macvlan.sh
TEST: active-backup: IPv4: client->server                           [ OK ]
TEST: active-backup: IPv6: client->server                           [ OK ]
TEST: active-backup: IPv4: client->macvlan_1                        [ OK ]
TEST: active-backup: IPv6: client->macvlan_1                        [ OK ]
TEST: active-backup: IPv4: client->macvlan_2                        [ OK ]
TEST: active-backup: IPv6: client->macvlan_2                        [ OK ]
TEST: active-backup: IPv4: macvlan_1->macvlan_2                     [ OK ]
TEST: active-backup: IPv6: macvlan_1->macvlan_2                     [ OK ]
TEST: active-backup: IPv4: server->client                           [ OK ]
TEST: active-backup: IPv6: server->client                           [ OK ]
TEST: active-backup: IPv4: macvlan_1->client                        [ OK ]
TEST: active-backup: IPv6: macvlan_1->client                        [ OK ]
TEST: active-backup: IPv4: macvlan_2->client                        [ OK ]
TEST: active-backup: IPv6: macvlan_2->client                        [ OK ]
TEST: active-backup: IPv4: macvlan_2->macvlan_2                     [ OK ]
TEST: active-backup: IPv6: macvlan_2->macvlan_2                     [ OK ]
[...]
TEST: balance-alb: IPv4: client->server                             [ OK ]
TEST: balance-alb: IPv6: client->server                             [ OK ]
TEST: balance-alb: IPv4: client->macvlan_1                          [ OK ]
TEST: balance-alb: IPv6: client->macvlan_1                          [ OK ]
TEST: balance-alb: IPv4: client->macvlan_2                          [ OK ]
TEST: balance-alb: IPv6: client->macvlan_2                          [ OK ]
TEST: balance-alb: IPv4: macvlan_1->macvlan_2                       [ OK ]
TEST: balance-alb: IPv6: macvlan_1->macvlan_2                       [ OK ]
TEST: balance-alb: IPv4: server->client                             [ OK ]
TEST: balance-alb: IPv6: server->client                             [ OK ]
TEST: balance-alb: IPv4: macvlan_1->client                          [ OK ]
TEST: balance-alb: IPv6: macvlan_1->client                          [ OK ]
TEST: balance-alb: IPv4: macvlan_2->client                          [ OK ]
TEST: balance-alb: IPv6: macvlan_2->client                          [ OK ]
TEST: balance-alb: IPv4: macvlan_2->macvlan_2                       [ OK ]
TEST: balance-alb: IPv6: macvlan_2->macvlan_2                       [ OK ]

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-24 10:07:13 +02:00
Hangbin Liu
27aa43f83c selftest: bond: add new topo bond_topo_2d1c.sh
Add a new testing topo bond_topo_2d1c.sh which is used more commonly.
Make bond_topo_3d1c.sh just source bond_topo_2d1c.sh and add the
extra link.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-24 10:07:13 +02:00
Hangbin Liu
be80942465 selftests: bonding: do not set port down before adding to bond
Before adding a port to bond, it need to be set down first. In the
lacpdu test the author set the port down specifically. But commit
a4abfa627c ("net: rtnetlink: Enslave device before bringing it up")
changed the operation order, the kernel will set the port down _after_
adding to bond. So all the ports will be down at last and the test failed.

In fact, the veth interfaces are already inactive when added. This
means there's no need to set them down again before adding to the bond.
Let's just remove the link down operation.

Fixes: a4abfa627c ("net: rtnetlink: Enslave device before bringing it up")
Reported-by: Zhengchao Shao <shaozhengchao@huawei.com>
Closes: https://lore.kernel.org/netdev/a0ef07c7-91b0-94bd-240d-944a330fcabd@huawei.com/
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20230817082459.1685972-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-21 19:05:42 -07:00
Ido Schimmel
f520489e99 selftests: mlxsw: Fix test failure on Spectrum-4
Remove assumptions about shared buffer cell size and instead query the
cell size from devlink. Adjust the test to send small packets that fit
inside a single cell.

Tested on Spectrum-{1,2,3,4}.

Fixes: 4735402173 ("mlxsw: spectrum: Extend to support Spectrum-4 ASIC")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/f7dfbf3c4d1cb23838d9eb99bab09afaa320c4ca.1692268427.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18 19:41:06 -07:00
Zhengchao Shao
e56e220d73 selftests: bonding: remove redundant delete action of device link1_1
When run command "ip netns delete client", device link1_1 has been
deleted. So, it is no need to delete link1_1 again. Remove it.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16 07:07:16 +01:00
Petr Machata
aae5bb8d18 selftests: mlxsw: router_bridge_lag: Add a new selftest
Add a selftest to verify enslavement to a LAG with upper after fresh
devlink reload.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/373a7754daa4dac32759a45095f47b08a2a869c8.1691498735.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-09 15:27:51 -07:00
Petr Machata
67d5ffb9ed selftests: mlxsw: rif_bridge: Add a new selftest
This test verifies driver behavior with regards to creation of RIFs for a
bridge as LAGs are added or removed to/from it, and ports added or removed
to/from the LAG.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-02 09:18:18 +01:00
Petr Machata
6b3f46837c selftests: mlxsw: rif_lag_vlan: Add a new selftest
This test verifies driver behavior with regards to creation of RIFs for LAG
VLAN uppers as ports are added or removed to/from the LAG.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-02 09:18:18 +01:00
Petr Machata
4308967d98 selftests: mlxsw: rif_lag: Add a new selftest
This test verifies driver behavior with regards to creation of RIFs for a
LAG as ports are added or removed to/from it.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-02 09:18:18 +01:00
Petr Machata
d7eb1f1751 selftests: mlxsw: rtnetlink: Drop obsolete tests
Support for enslaving ports to LAGs with uppers will be added in the
following patches. Selftests to make sure it actually does the right thing
are ready and will be sent as a follow-up.

Similarly, ordering of MACVLAN creation and RIF creation will be relaxed
and it will be permitted to create a MACVLAN first.

Thus these two tests are obsolete. Drop them.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21 08:54:04 +01:00
Ido Schimmel
0a1a818d8a selftests: mlxsw: Test port range registers' occupancy
Test that filters that match on the same port range, but with different
combination of IPv4/IPv6 and TCP/UDP all use the same port range
register by observing port range registers' occupancy via
devlink-resource.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/0a2eb63b234fb062ff011e80231868cc80000c81.1689092769.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-12 16:57:18 -07:00
Ido Schimmel
45c5a38476 selftests: mlxsw: Add scale test for port ranges
Query the maximum number of supported port range registers using
devlink-resource and test that this number can be reached by configuring
tc filters with different port ranges. Test that an error is returned in
case the maximum number is exceeded.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/48eee181270d9f291e09d1858c7b26a3f7fcc164.1689092769.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-12 16:57:18 -07:00
Petr Machata
664bc72dd2 selftests: mlxsw: one_armed_router: Use port MAC for bridge address
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.

At the time that the front panel port is enslaved to the bridge, the bridge
MAC address does not have the same prefix as other interfaces in the
system. On Nvidia Spectrum-1 machines all the RIFs have to have the same
38-bit MAC address prefix. Since the bridge does not obey this limitation,
the RIF cannot be created, and the enslavement attempt is vetoed on the
grounds of the configuration not being offloadable.

The bridge eventually inherits MAC address from its first member, after the
enslavement is acked. A number of (mainly VXLAN) selftests already work
around the problem by setting the MAC address to whatever it will
eventually be anyway. Do the same for this selftest.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-21 14:02:52 -07:00
Petr Machata
5541577521 selftests: mlxsw: vxlan: Disable IPv6 autogen on bridges
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.

At the time that the front panel port is enslaved to the bridge (this holds
for all bridges used here), the bridge MAC address does not have the same
prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all
the RIFs have to have the same 38-bit MAC address prefix. Since the bridge
does not obey this limitation, the RIF cannot be created, and the
enslavement attempt is vetoed on the grounds of the configuration not being
offloadable.

The selftest itself however checks various aspects of VXLAN offloading and
the bridges do not need to participate in routing traffic. The IP addresses
or the RIFs are irrelevant.

Fix by disabling automatic IPv6 address generation for the HW-offloaded
bridges in this selftest, thus exempting them from mlxsw router attention.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-21 14:02:52 -07:00
Petr Machata
08035d8e35 selftests: mlxsw: spectrum: q_in_vni_veto: Disable IPv6 autogen on a bridge
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.

At the time that the front panel port is enslaved to the bridge, the bridge
MAC address does not have the same prefix as other interfaces in the
system. On Nvidia Spectrum-1 machines all the RIFs have to have the same
38-bit MAC address prefix. Since the bridge does not obey this limitation,
the RIF cannot be created, and the enslavement attempt is vetoed on the
grounds of the configuration not being offloadable.

The selftest itself however checks vetoing of a different aspect of the
configuration and the bridge does not need to participate in routing
traffic. The IP address or the RIF are irrelevant.

Fix by disabling automatic IPv6 address generation for the HW-offloaded
bridge in this selftest, thus exempting it from mlxsw router attention.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-21 14:02:52 -07:00
Petr Machata
ea2d5f757e selftests: mlxsw: qos_mc_aware: Disable IPv6 autogen on bridges
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.

At the time that the front panel port is enslaved to the bridge (this holds
for both bridges used here), the bridge MAC address does not have the same
prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all
the RIFs have to have the same 38-bit MAC address prefix. Since the bridge
does not obey this limitation, the RIF cannot be created, and the
enslavement attempt is vetoed on the grounds of the configuration not being
offloadable.

The selftest itself however checks traffic prioritization and scheduling,
and the bridges serve for their L2 forwarding capabilities, and do not need
to participate in routing traffic. The IP addresses or the RIFs are
irrelevant.

Fix by disabling automatic IPv6 address generation for the HW-offloaded
bridges in this selftest, thus exempting them from mlxsw router attention.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-21 14:02:52 -07:00
Petr Machata
ec7023e674 selftests: mlxsw: qos_ets_strict: Disable IPv6 autogen on bridges
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.

At the time that the front panel port is enslaved to the bridge (this holds
for both bridges used here), the bridge MAC address does not have the same
prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all
the RIFs have to have the same 38-bit MAC address prefix. Since the bridge
does not obey this limitation, the RIF cannot be created, and the
enslavement attempt is vetoed on the grounds of the configuration not being
offloadable.

The selftest itself however checks traffic prioritization and scheduling,
and the bridges serve for their L2 forwarding capabilities, and do not need
to participate in routing traffic. The IP addresses or the RIFs are
irrelevant.

Fix by disabling automatic IPv6 address generation for the HW-offloaded
bridges in this selftest, thus exempting them from mlxsw router attention.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-21 14:02:52 -07:00
Petr Machata
6349f9bbbf selftests: mlxsw: qos_dscp_bridge: Disable IPv6 autogen on a bridge
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.

At the time that the front panel port is enslaved to the bridge, the bridge
MAC address does not have the same prefix as other interfaces in the
system. On Nvidia Spectrum-1 machines all the RIFs have to have the same
38-bit MAC address prefix. Since the bridge does not obey this limitation,
the RIF cannot be created, and the enslavement attempt is vetoed on the
grounds of the configuration not being offloadable.

The selftest itself however checks DCB DSCP-based prioritization, and the
bridge serves for its L2 forwarding capabilities, and does not need to
participate in routing traffic. The IP address or the RIF are irrelevant.

Fix by disabling automatic IPv6 address generation for the HW-offloaded
bridge in this selftest, thus exempting it from mlxsw router attention.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-21 14:02:52 -07:00
Petr Machata
32b3a7bf85 selftests: mlxsw: mirror_gre_scale: Disable IPv6 autogen on a bridge
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.

At the time that the front panel port is enslaved to the bridge, the bridge
MAC address does not have the same prefix as other interfaces in the
system. On Nvidia Spectrum-1 machines all the RIFs have to have the same
38-bit MAC address prefix. Since the bridge does not obey this limitation,
the RIF cannot be created, and the enslavement attempt is vetoed on the
grounds of the configuration not being offloadable.

The selftest itself however checks how many mirroring sessions a machine is
capable of offloading. The IP address or the RIF are irrelevant.

Fix by disabling automatic IPv6 address generation for the HW-offloaded
bridge in this selftest, thus exempting it from mlxsw router attention.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-21 14:02:52 -07:00
Petr Machata
a758dc469a selftests: mlxsw: extack: Disable IPv6 autogen on bridges
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.

At the time that the front panel port is enslaved to the bridge (this holds
for all bridges used here), the bridge MAC address does not have the same
prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all
the RIFs have to have the same 38-bit MAC address prefix. Since the bridge
does not obey this limitation, the RIF cannot be created, and the
enslavement attempt is vetoed on the grounds of the configuration not being
offloadable.

The selftest itself however checks whether a different vetoed aspect of the
configuration provides an extack. The IP address or the RIF are irrelevant.

Fix by disabling automatic IPv6 address generation for the HW-offloaded
bridges in this selftest, thus exempting them from mlxsw router attention.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-21 14:02:52 -07:00
Petr Machata
8cfdd300a5 selftests: mlxsw: q_in_q_veto: Disable IPv6 autogen on bridges
In a future patch, mlxsw will start adding RIFs to uppers of front panel
port netdevices, if they have an IP address.

The swp enslavement to the 802.1ad bridge is not allowed, because RIFs are
not allowed to be created for 802.1ad bridges, but the address indicates
one needs to be created. Thus the veto selftests fail already during the
port enslavement. Then the attempt to create a VLAN on top of the same
bridge is not vetoed, because the bridge is not related to mlxsw, and the
selftest fails.

Fix by disabling automatic IPv6 address generation for the bridges in this
selftest, thus exempting them from the mlxsw router attention.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-21 14:02:52 -07:00
Petr Machata
34ad708d1b selftests: mlxsw: egress_vid_classification: Fix the diagram
The topology diagram implies that $swp1 and $swp2 are members of the bridge
br0, when in fact only their uppers, $swp1.10 and $swp2.10 are. Adjust the
diagram.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 11:29:49 +01:00
Petr Machata
204cc3d04f selftests: mlxsw: ingress_rif_conf_1d: Fix the diagram
The topology diagram implies that $swp1 and $swp2 are members of the bridge
br0, when in fact only their uppers, $swp1.10 and $swp2.10 are. Adjust the
diagram.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05 11:29:49 +01:00
Jakub Kicinski
bc88ba0cad Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes. No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-11 09:06:55 -07:00
Linus Torvalds
6e27831b91 Networking fixes for 6.4-rc2, including fixes from netfilter
Current release - regressions:
 
   - mtk_eth_soc: fix NULL pointer dereference
 
 Previous releases - regressions:
 
   - core:
     - skb_partial_csum_set() fix against transport header magic value
     - fix load-tearing on sk->sk_stamp in sock_recv_cmsgs().
     - annotate sk->sk_err write from do_recvmmsg()
     - add vlan_get_protocol_and_depth() helper
 
   - netlink: annotate accesses to nlk->cb_running
 
   - netfilter: always release netdev hooks from notifier
 
 Previous releases - always broken:
 
   - core: deal with most data-races in sk_wait_event()
 
   - netfilter: fix possible bug_on with enable_hooks=1
 
   - eth: bonding: fix send_peer_notif overflow
 
   - eth: xpcs: fix incorrect number of interfaces
 
   - eth: ipvlan: fix out-of-bounds caused by unclear skb->cb
 
   - eth: stmmac: Initialize MAC_ONEUS_TIC_COUNTER register
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmRcxawSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkv6wQAJgfOBlDAkZNKHzwtMuFiLxECeEMWY9h
 wJCyiq0qXnz9p5ZqjdmTmA8B+jUp9VkpgN5Z3lid5hXDfzDrvXL1KGZW4pc4ooz9
 GUzrp0EUzO5UsyrlZRS9vJ9mbCGN5M1ZWtWH93g8OzGJPRnLs0Q/Tr4IFTBVKzVb
 GmJPy/ZYWYDjnvx3BgewRDuYeH3Rt9lsIt4Pxq/E+D8W3ypvVM0m3GvrO5eEzMeu
 EfeilAdmJGJUufeoGguKt0hheqILS3kNCjQO25XS2Lq1OqetnR/wqTwXaaVxL2du
 Eb2ca7wKkihDpl2l8bQ3ss6vqM0HEpZ63Y2PJaNBS8ASdLsMq4n2L6j2JMfT8hWY
 RG3nJS7F2UFLyYmCJjNL1/I+Z9XeMyFKnHORzHK1dAkMlhd+8NauKWAxdjlxMbxX
 p1msyTl54bG0g6FrU/zAirCWNAAZYCPdZG/XvA/2Jj9mdy64OlGlv/QdJvfjcx+C
 L6nkwZfwXU7QUwKeeTfP8abte2SLrXIxkJrnNEAntPnFOSmd16+/yvQ8JVlbWTMd
 JugJrSAIxjOglIr/1fsnUuV+Ab+JDYQv/wkoyzvtcY2tjhTAHzgmTwwSfeYiCTJE
 rEbjyVvVgMcLTUIk/R9QC5/k6nX/7/KRDHxPOMBX4boOsuA0ARVjzt8uKRvv/7cS
 dRV98RwvCKvD
 =MoPD
 -----END PGP SIGNATURE-----

Merge tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter.

  Current release - regressions:

   - mtk_eth_soc: fix NULL pointer dereference

  Previous releases - regressions:

   - core:
      - skb_partial_csum_set() fix against transport header magic value
      - fix load-tearing on sk->sk_stamp in sock_recv_cmsgs().
      - annotate sk->sk_err write from do_recvmmsg()
      - add vlan_get_protocol_and_depth() helper

   - netlink: annotate accesses to nlk->cb_running

   - netfilter: always release netdev hooks from notifier

  Previous releases - always broken:

   - core: deal with most data-races in sk_wait_event()

   - netfilter: fix possible bug_on with enable_hooks=1

   - eth: bonding: fix send_peer_notif overflow

   - eth: xpcs: fix incorrect number of interfaces

   - eth: ipvlan: fix out-of-bounds caused by unclear skb->cb

   - eth: stmmac: Initialize MAC_ONEUS_TIC_COUNTER register"

* tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits)
  af_unix: Fix data races around sk->sk_shutdown.
  af_unix: Fix a data race of sk->sk_receive_queue->qlen.
  net: datagram: fix data-races in datagram_poll()
  net: mscc: ocelot: fix stat counter register values
  ipvlan:Fix out-of-bounds caused by unclear skb->cb
  docs: networking: fix x25-iface.rst heading & index order
  gve: Remove the code of clearing PBA bit
  tcp: add annotations around sk->sk_shutdown accesses
  net: add vlan_get_protocol_and_depth() helper
  net: pcs: xpcs: fix incorrect number of interfaces
  net: deal with most data-races in sk_wait_event()
  net: annotate sk->sk_err write from do_recvmmsg()
  netlink: annotate accesses to nlk->cb_running
  kselftest: bonding: add num_grat_arp test
  selftests: forwarding: lib: add netns support for tc rule handle stats get
  Documentation: bonding: fix the doc of peer_notif_delay
  bonding: fix send_peer_notif overflow
  net: ethernet: mtk_eth_soc: fix NULL pointer dereference
  selftests: nft_flowtable.sh: check ingress/egress chain too
  selftests: nft_flowtable.sh: monitor result file sizes
  ...
2023-05-11 08:42:47 -05:00
Liang Li
2f0f556713 selftests: bonding: delete unnecessary line
"ip link set dev "$devbond1" nomaster"
This line code in bond-eth-type-change.sh is unnecessary.
Because $devbond1 was not added to any master device.

Signed-off-by: Liang Li <liali@redhat.com>
Acked-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-10 10:06:42 +01:00
Hangbin Liu
6cbe791c0f kselftest: bonding: add num_grat_arp test
TEST: num_grat_arp (active-backup miimon num_grat_arp 10)           [ OK ]
TEST: num_grat_arp (active-backup miimon num_grat_arp 20)           [ OK ]
TEST: num_grat_arp (active-backup miimon num_grat_arp 30)           [ OK ]
TEST: num_grat_arp (active-backup miimon num_grat_arp 50)           [ OK ]

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-10 09:27:20 +01:00
Linus Torvalds
f085df1be6 Disable building BPF based features by default for v6.4.
We need to better polish building with BPF skels, so revert back to
 making it an experimental feature that has to be explicitely enabled
 using BUILD_BPF_SKEL=1.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZFbCXwAKCRCyPKLppCJ+
 J7cHAP97erKY4hBXArjpfzcvpFmboh/oqhbTLntyIpS6TEnOyQEAyervAPGIjQYC
 DCo4foyXmOWn3dhNtK9M+YiRl3o2SgQ=
 =7G78
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool updates from Arnaldo Carvalho de Melo:
 "Third version of perf tool updates, with the build problems with with
  using a 'vmlinux.h' generated from the main build fixed, and the bpf
  skeleton build disabled by default.

  Build:

   - Require libtraceevent to build, one can disable it using
     NO_LIBTRACEEVENT=1.

     It is required for tools like 'perf sched', 'perf kvm', 'perf
     trace', etc.

     libtraceevent is available in most distros so installing
     'libtraceevent-devel' should be a one-time event to continue
     building perf as usual.

     Using NO_LIBTRACEEVENT=1 produces tooling that is functional and
     sufficient for lots of users not interested in those libtraceevent
     dependent features.

   - Allow Python support in 'perf script' when libtraceevent isn't
     linked, as not all features requires it, for instance Intel PT does
     not use tracepoints.

   - Error if the python interpreter needed for jevents to work isn't
     available and NO_JEVENTS=1 isn't set, preventing a build without
     support for JSON vendor events, which is a rare but possible
     condition. The two check error messages:

        $(error ERROR: No python interpreter needed for jevents generation. Install python or build with NO_JEVENTS=1.)
        $(error ERROR: Python interpreter needed for jevents generation too old (older than 3.6). Install a newer python or build with NO_JEVENTS=1.)

   - Make libbpf 1.0 the minimum required when building with out of
     tree, distro provided libbpf.

   - Use libsdtc++'s and LLVM's libcxx's __cxa_demangle, a portable C++
     demangler, add 'perf test' entry for it.

   - Make binutils libraries opt in, as distros disable building with it
     due to licensing, they were used for C++ demangling, for instance.

   - Switch libpfm4 to opt-out rather than opt-in, if libpfm-devel (or
     equivalent) isn't installed, we'll just have a build warning:

       Makefile.config:1144: libpfm4 not found, disables libpfm4 support. Please install libpfm4-dev

   - Add a feature test for scandirat(), that is not implemented so far
     in musl and uclibc, disabling features that need it, such as
     scanning for tracepoints in /sys/kernel/tracing/events.

  perf BPF filters:

   - New feature where BPF can be used to filter samples, for instance:

      $ sudo ./perf record -e cycles --filter 'period > 1000' true
      $ sudo ./perf script
           perf-exec 2273949 546850.708501:       5029 cycles:  ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
           perf-exec 2273949 546850.708508:      32409 cycles:  ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
           perf-exec 2273949 546850.708526:     143369 cycles:  ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
           perf-exec 2273949 546850.708600:     372650 cycles:  ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
           perf-exec 2273949 546850.708791:     482953 cycles:  ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
                true 2273949 546850.709036:     501985 cycles:  ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
                true 2273949 546850.709292:     503065 cycles:      7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)

   - In addition to 'period' (PERF_SAMPLE_PERIOD), the other
     PERF_SAMPLE_ can be used for filtering, and also some other sample
     accessible values, from tools/perf/Documentation/perf-record.txt:

        Essentially the BPF filter expression is:

        <term> <operator> <value> (("," | "||") <term> <operator> <value>)*

     The <term> can be one of:
        ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr,
        code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat,
        p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock,
        mem_dtlb, mem_blk, mem_hops

     The <operator> can be one of:
        ==, !=, >, >=, <, <=, &

     The <value> can be one of:
        <number> (for any term)
        na, load, store, pfetch, exec (for mem_op)
        l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl)
        na, none, hit, miss, hitm, fwd, peer (for mem_snoop)
        remote (for mem_remote)
        na, locked (for mem_locked)
        na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb)
        na, by_data, by_addr (for mem_blk)
        hops0, hops1, hops2, hops3 (for mem_hops)

  perf lock contention:

   - Show lock type with address.

   - Track and show mmap_lock, siglock and per-cpu rq_lock with address.
     This is done for mmap_lock by following the current->mm pointer:

      $ sudo ./perf lock con -abl -- sleep 10
       contended   total wait     max wait     avg wait            address   symbol
       ...
           16344    312.30 ms      2.22 ms     19.11 us   ffff8cc702595640
           17686    310.08 ms      1.49 ms     17.53 us   ffff8cc7025952c0
               3     84.14 ms     45.79 ms     28.05 ms   ffff8cc78114c478   mmap_lock
            3557     76.80 ms     68.75 us     21.59 us   ffff8cc77ca3af58
               1     68.27 ms     68.27 ms     68.27 ms   ffff8cda745dfd70
               9     54.53 ms      7.96 ms      6.06 ms   ffff8cc7642a48b8   mmap_lock
           14629     44.01 ms     60.00 us      3.01 us   ffff8cc7625f9ca0
            3481     42.63 ms    140.71 us     12.24 us   ffffffff937906ac   vmap_area_lock
           16194     38.73 ms     42.15 us      2.39 us   ffff8cd397cbc560
              11     38.44 ms     10.39 ms      3.49 ms   ffff8ccd6d12fbb8   mmap_lock
               1      5.43 ms      5.43 ms      5.43 ms   ffff8cd70018f0d8
            1674      5.38 ms    422.93 us      3.21 us   ffffffff92e06080   tasklist_lock
             581      4.51 ms    130.68 us      7.75 us   ffff8cc9b1259058
               5      3.52 ms      1.27 ms    703.23 us   ffff8cc754510070
             112      3.47 ms     56.47 us     31.02 us   ffff8ccee38b3120
             381      3.31 ms     73.44 us      8.69 us   ffffffff93790690   purge_vmap_area_lock
             255      3.19 ms     36.35 us     12.49 us   ffff8d053ce30c80

   - Update default map size to 16384.

   - Allocate single letter option -M for --map-nr-entries, as it is
     proving being frequently used.

   - Fix struct rq lock access for older kernels with BPF's CO-RE
     (Compile once, run everywhere).

   - Fix problems found with MSAn.

  perf report/top:

   - Add inline information when using --call-graph=fp or lbr, as was
     already done to the --call-graph=dwarf callchain mode.

   - Improve the 'srcfile' sort key performance by really using an
     optimization introduced in 6.2 for the 'srcline' sort key that
     avoids calling addr2line for comparision with each sample.

  perf sched:

   - Make 'perf sched latency/map/replay' to use "sched:sched_waking"
     instead of "sched:sched_waking", consistent with 'perf record'
     since d566a9c2d4 ("perf sched: Prefer sched_waking event when it
     exists").

  perf ftrace:

   - Make system wide the default target for latency subcommand, run the
     following command then generate some network traffic and press
     control+C:

       # perf ftrace latency -T __kfree_skb
     ^C
         DURATION     |      COUNT | GRAPH                                          |
          0 - 1    us |         27 | #############                                  |
          1 - 2    us |         22 | ###########                                    |
          2 - 4    us |          8 | ####                                           |
          4 - 8    us |          5 | ##                                             |
          8 - 16   us |         24 | ############                                   |
         16 - 32   us |          2 | #                                              |
         32 - 64   us |          1 |                                                |
         64 - 128  us |          0 |                                                |
        128 - 256  us |          0 |                                                |
        256 - 512  us |          0 |                                                |
        512 - 1024 us |          0 |                                                |
          1 - 2    ms |          0 |                                                |
          2 - 4    ms |          0 |                                                |
          4 - 8    ms |          0 |                                                |
          8 - 16   ms |          0 |                                                |
         16 - 32   ms |          0 |                                                |
         32 - 64   ms |          0 |                                                |
         64 - 128  ms |          0 |                                                |
        128 - 256  ms |          0 |                                                |
        256 - 512  ms |          0 |                                                |
        512 - 1024 ms |          0 |                                                |
          1 - ...   s |          0 |                                                |
       #

  perf top:

   - Add --branch-history (LBR: Last Branch Record) option, just like
     already available for 'perf record'.

   - Fix segfault in thread__comm_len() where thread->comm was being
     used outside thread->comm_lock.

  perf annotate:

   - Allow configuring objdump and addr2line in ~/.perfconfig., so that
     you can use alternative binaries, such as llvm's.

  perf kvm:

   - Add TUI mode for 'perf kvm stat report'.

  Reference counting:

   - Add reference count checking infrastructure to check for use after
     free, done to the 'cpumap', 'namespaces', 'maps' and 'map' structs,
     more to come.

     To build with it use -DREFCNT_CHECKING=1 in the make command line
     to build tools/perf. Documented at:

       https://perf.wiki.kernel.org/index.php/Reference_Count_Checking

   - The above caught, for instance, fix, present in this series:

        - Fix maps use after put in 'perf test "Share thread maps"':

          'maps' is copied from leader, but the leader is put on line 79
          and then 'maps' is used to read the reference count below - so
          a use after put, with the put of maps happening within
          thread__put.

     Fixed by reversing the order of puts so that the leader is put
     last.

   - Also several fixes were made to places where reference counts were
     not being held.

   - Make this one of the tests in 'make -C tools/perf build-test' to
     regularly build test it and to make sure no direct access to the
     reference counted structs are made, doing that via accessors to
     check the validity of the struct pointer.

  ARM64:

   - Fix 'perf report' segfault when filtering coresight traces by
     sparse lists of CPUs.

   - Add support for 'simd' as a sort field for 'perf report', to show
     ARM's NEON SIMD's predicate flags: "partial" and "empty".

  arm64 vendor events:

   - Add N1 metrics.

  Intel vendor events:

   - Add graniterapids, grandridge and sierraforrest events.

   - Refresh events for: alderlake, aldernaken, broadwell, broadwellde,
     broadwellx, cascadelakx, haswell, haswellx, icelake, icelakex,
     jaketown, meteorlake, knightslanding, sandybridge, sapphirerapids,
     silvermont, skylake, tigerlake and westmereep-dp

   - Refresh metrics for alderlake-n, broadwell, broadwellde,
     broadwellx, haswell, haswellx, icelakex, ivybridge, ivytown and
     skylakex.

  perf stat:

   - Implement --topdown using JSON metrics.

   - Add TopdownL1 JSON metric as a default if present, but disable it
     for now for some Intel hybrid architectures, a series of patches
     addressing this is being reviewed and will be submitted for v6.5.

   - Use metrics for --smi-cost.

   - Update topdown documentation.

  Vendor events (JSON) infrastructure:

   - Add support for computing and printing metric threshold values. For
     instance, here is one found in thesapphirerapids json file:

       {
           "BriefDescription": "Percentage of cycles spent in System Management Interrupts.",
           "MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
           "MetricGroup": "smi",
           "MetricName": "smi_cycles",
           "MetricThreshold": "smi_cycles > 0.1",
           "ScaleUnit": "100%"
       },

   - Test parsing metric thresholds with the fake PMU in 'perf test
     pmu-events'.

   - Support for printing metric thresholds in 'perf list'.

   - Add --metric-no-threshold option to 'perf stat'.

   - Add rand (reverse and) and has_pmem (optane memory) support to
     metrics.

   - Sort list of input files to avoid depending on the order from
     readdir() helping in obtaining reproducible builds.

  S/390:

   - Add common metrics: - CPI (cycles per instruction), prbstate (ratio
     of instructions executed in problem state compared to total number
     of instructions), l1mp (Level one instruction and data cache misses
     per 100 instructions).

   - Add cache metrics for z13, z14, z15 and z16.

   - Add metric for TLB and cache.

  ARM:

   - Add raw decoding for SPE (Statistical Profiling Extension) v1.3 MTE
     (Memory Tagging Extension) and MOPS (Memory Operations) load/store.

  Intel PT hardware tracing:

   - Add event type names UINTR (User interrupt delivered) and UIRET
     (Exiting from user interrupt routine), documented in table 32-50
     "CFE Packet Type and Vector Fields Details" in the Intel Processor
     Trace chapter of The Intel SDM Volume 3 version 078.

   - Add support for new branch instructions ERETS and ERETU.

   - Fix CYC timestamps after standalone CBR

  ARM CoreSight hardware tracing:

   - Allow user to override timestamp and contextid settings.

   - Fix segfault in dso lookup.

   - Fix timeless decode mode detection.

   - Add separate decode paths for timeless and per-thread modes.

  auxtrace:

   - Fix address filter entire kernel size.

  Miscellaneous:

   - Fix use-after-free and unaligned bugs in the PLT handling routines.

   - Use zfree() to reduce chances of use after free.

   - Add missing 0x prefix for addresses printed in hexadecimal in 'perf
     probe'.

   - Suppress massive unsupported target platform errors in the unwind
     code.

   - Fix return incorrect build_id size in elf_read_build_id().

   - Fix 'perf scripts intel-pt-events.py' IPC output for Python 2 .

   - Add missing new parameter in kfree_skb tracepoint to the python
     scripts using it.

   - Add 'perf bench syscall fork' benchmark.

   - Add support for printing PERF_MEM_LVLNUM_UNC (Uncached access) in
     'perf mem'.

   - Fix wrong size expectation for perf test 'Setup struct
     perf_event_attr' caused by the patch adding
     perf_event_attr::config3.

   - Fix some spelling mistakes"

* tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (365 commits)
  Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL"
  Revert "perf build: Warn for BPF skeletons if endian mismatches"
  perf metrics: Fix SEGV with --for-each-cgroup
  perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE
  perf stat: Separate bperf from bpf_profiler
  perf test record+probe_libc_inet_pton: Fix call chain match on x86_64
  perf test record+probe_libc_inet_pton: Fix call chain match on s390
  perf tracepoint: Fix memory leak in is_valid_tracepoint()
  perf cs-etm: Add fix for coresight trace for any range of CPUs
  perf build: Fix unescaped # in perf build-test
  perf unwind: Suppress massive unsupported target platform errors
  perf script: Add new parameter in kfree_skb tracepoint to the python scripts using it
  perf script: Print raw ip instead of binary offset for callchain
  perf symbols: Fix return incorrect build_id size in elf_read_build_id()
  perf list: Modify the warning message about scandirat(3)
  perf list: Fix memory leaks in print_tracepoint_events()
  perf lock contention: Rework offset calculation with BPF CO-RE
  perf lock contention: Fix struct rq lock access
  perf stat: Disable TopdownL1 on hybrid
  perf stat: Avoid SEGV on counter->name
  ...
2023-05-07 11:32:18 -07:00
Petr Machata
8fcac79270 selftests: forwarding: generalize bail_on_lldpad from mlxsw
mlxsw selftests often invoke a bail_on_lldpad() helper to make sure LLDPAD
is not running, to prevent conflicts between the QoS configuration applied
through TC or DCB command line tool, and the DCB configuration that LLDPAD
might apply. This helper might be useful to others. Move the function to
lib.sh, and parameterize to make reusable in other contexts.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20 20:03:21 -07:00
Petr Machata
54e906f163 selftests: forwarding: sch_tbf_*: Add a pre-run hook
The driver-specific wrappers of these selftests invoke bail_on_lldpad to
make sure that LLDPAD doesn't trample the configuration. The function
bail_on_lldpad is going to move to lib.sh in the next patch. With that, it
won't be visible for the wrappers before sourcing the framework script. And
after sourcing it, it is too late: the selftest will have run by then.

One option might be to source NUM_NETIFS=0 lib.sh from the wrapper, but
even if that worked (it might, it might not), that seems cumbersome. lib.sh
is doing fair amount of stuff, and even if it works today, it does not look
particularly solid as a solution.

Instead, introduce a hook, sch_tbf_pre_hook(), that when available, gets
invoked. Move the bail to the hook.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20 20:03:21 -07:00
Hangbin Liu
2e825f8acc selftests: bonding: add arp validate test
This patch add bonding arp validate tests with mode active backup,
monitor arp_ip_target and ns_ip6_target. It also checks mii_status
to make sure all slaves are UP.

Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-07 08:47:20 +01:00
Hangbin Liu
481b56e039 selftests: bonding: re-format bond option tests
To improve the testing process for bond options, A new bond topology lib
is added to our testing setup. The current option_prio.sh file will be
renamed to bond_options.sh so that all bonding options can be tested here.
Specifically, for priority testing, we will run all tests using modes
1, 5, and 6. These changes will help us streamline the testing process
and ensure that our bond options are rigorously evaluated.

Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-07 08:47:20 +01:00
Patrice Duroux
7f8d3fbe09 perf tests test_bridge_fdb_stress.sh: Fix redirection of stderr to stdin
It's not 2&>1, the correct is 2>&1.

Signed-off-by: Patrice Duroux <patrice.duroux@gmail.com>
Cc: linux-kselftest@vger.kernel.org
Link: https://lore.kernel.org/r/20230303193058.21274-1-patrice.duroux@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04 09:39:55 -03:00
Nikolay Aleksandrov
222c94ec0a selftests: bonding: add tests for ether type changes
Add new network selftests for the bonding device which exercise the ether
type changing call paths. They also test for the recent syzbot bug[1] which
causes a warning and results in wrong device flags (IFF_SLAVE missing).
The test adds three bond devices and a nlmon device, enslaves one of the
bond devices to the other and then uses the nlmon device for successful
and unsuccesful enslaves both of which change the bond ether type. Thus
we can test for both MASTER and SLAVE flags at the same time.

If the flags are properly restored we get:
TEST: Change ether type of an enslaved bond device with unsuccessful enslave   [ OK ]
TEST: Change ether type of an enslaved bond device with successful enslave   [ OK ]

[1] https://syzkaller.appspot.com/bug?id=391c7b1f6522182899efba27d891f1743e8eb3ef

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17 07:56:41 +00:00