In order for the Felix DSA driver to be able to turn on/off flooding
towards its CPU port, we need to redirect calls on the NPI port to
actually act upon the index in the analyzer block that corresponds to
the CPU port module. This was never necessary until now because DSA
(or the bridge) never called ocelot_port_bridge_flags() for the NPI
port.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As more police parameters are passed to flow_offload, driver can check
them to make sure hardware handles packets in the way indicated by tc.
The conform-exceed control should be drop/pipe or drop/ok. Besides,
for drop/ok, the police should be the last action. As hardware can't
configure peakrate/avrate/overhead, offload should not be supported if
any of them is configured.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently ocelot uses a pvid of 0 for standalone ports and ports under a
VLAN-unaware bridge, and the pvid of the bridge for ports under a
VLAN-aware bridge. Standalone ports do not perform learning, but packets
received on them are still subject to FDB lookups. So if the MAC DA that
a standalone port receives has been also learned on a VLAN-unaware
bridge port, ocelot will attempt to forward to that port, even though it
can't, so it will drop packets.
So there is a desire to avoid that, and isolate the FDBs of different
bridges from one another, and from standalone ports.
The ocelot switch library has two distinct entry points: the felix DSA
driver and the ocelot switchdev driver.
We need to code up a minimal bridge_num allocation in the ocelot
switchdev driver too, this is copied from DSA with the exception that
ocelot does not care about DSA trees, cross-chip bridging etc. So it
only looks at its own ports that are already in the same bridge.
The ocelot switchdev driver uses the bridge_num it has allocated itself,
while the felix driver uses the bridge_num allocated by DSA. They are
both stored inside ocelot_port->bridge_num by the common function
ocelot_port_bridge_join() which receives the bridge_num passed by value.
Once we have a bridge_num, we can only use it to enforce isolation
between VLAN-unaware bridges. As far as I can see, ocelot does not have
anything like a FID that further makes VLAN 100 from a port be different
to VLAN 100 from another port with regard to FDB lookup. So we simply
deny multiple VLAN-aware bridges.
For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we
allocate a VLAN for each bridge_num. This will be used as the pvid of
each port that is under that VLAN-unaware bridge, for as long as that
bridge is VLAN-unaware.
VID 0 remains only for standalone ports. It is okay if all standalone
ports use the same VID 0, since they perform no address learning, the
FDB will contain no entry in VLAN 0, so the packets will always be
flooded to the only possible destination, the CPU port.
The CPU port module doesn't need to be member of the VLANs to receive
packets, but if we use the DSA tag_8021q protocol, those packets are
part of the data plane as far as ocelot is concerned, so there it needs
to. Just ensure that the DSA tag_8021q CPU port is a member of all
reserved VLANs when it is created, and is removed when it is deleted.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds the logic in the Felix DSA driver and Ocelot switch library.
For Ocelot switches, the DEST_IDX that is the output of the MAC table
lookup is a logical port (equal to physical port, if no LAG is used, or
a dynamically allocated number otherwise). The allocation we have in
place for LAG IDs is different from DSA's, so we can't use that:
- DSA allocates a continuous range of LAG IDs starting from 1
- Ocelot appears to require that physical ports and LAG IDs are in the
same space of [0, num_phys_ports), and additionally, ports that aren't
in a LAG must have physical port id == logical port id
The implication is that an FDB entry towards a LAG might need to be
deleted and reinstalled when the LAG ID changes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ocelot switch library does not need this information, but the felix
DSA driver does.
As a reminder, the VSC9959 switch in LS1028A doesn't have an IRQ line
for packet extraction, so to be notified that a PTP packet needs to be
dequeued, it receives that packet also over Ethernet, by setting up a
packet trap. The Felix driver needs to install special kinds of traps
for packets in need of RX timestamps, such that the packets are
replicated both over Ethernet and over the CPU port module.
But the Ocelot switch library sets up more than one trap for PTP event
messages; it also traps PTP general messages, MRP control messages etc.
Those packets don't need PTP timestamps, so there's no reason for the
Felix driver to send them to the CPU port module.
By knowing which traps need PTP timestamps, the Felix driver can
adjust the traps installed using ocelot_trap_add() such that only those
will actually get delivered to the CPU port module.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using the ocelot-8021q tagging protocol, the CPU port isn't
configured as an NPI port, but is a regular port. So a "trap to CPU"
operation is actually a "redirect" operation. So DSA needs to set up the
trapping action one way or another, depending on the tagging protocol in
use.
To ease DSA's work of modifying the action, keep all currently installed
traps in a list, so that DSA can live-patch them when the tagging
protocol changes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MRP assist code installs a VCAP IS2 trapping rule for each port, but
since the key and the action is the same, just the ingress port mask
differs, there isn't any need to do this. We can save some space in the
TCAM by using a single filter and adjusting the ingress port mask.
Reuse the ocelot_trap_add() and ocelot_trap_del() functions for this
purpose.
Now that the cookies are no longer per port, we need to change the
allocation scheme such that MRP traps use a fixed number.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MRP frames are configured to be trapped to the CPU queue 7, and this
number is reflected in the extraction header. However, the information
isn't used anywhere, so just leave MRP frames to go to CPU queue 0
unless needed otherwise.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Every use case that needed VCAP filters (in order: DSA tag_8021q, MRP,
PTP traps) has hardcoded filter identifiers that worked well enough for
that use case alone. But when two or more of those use cases would be
used together, some of those identifiers would overlap, leading to
breakage.
Add definitions for each cookie and centralize them in ocelot_vcap.h,
such that the overlaps are more obvious.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver uses an identifier equal to (ocelot->num_phys_ports + port)
for MRP traps installed when the system is in the role of an MRC, and an
identifier equal to (port) otherwise.
Use the same identifier in both cases as a consolidation for the various
cookie values spread throughout the driver.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ocelot_vlan_member_del() will free the struct ocelot_bridge_vlan, so if
this is the same as the port's pvid_vlan which we access afterwards,
what we're accessing is freed memory.
Fix the bug by determining whether to clear ocelot_port->pvid_vlan prior
to calling ocelot_vlan_member_del().
Fixes: d4004422f6 ("net: mscc: ocelot: track the port pvid using a pointer")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create and utilize bulk regmap reads instead of single access for gathering
stats. The background reading of statistics happens frequently, and over
a few contiguous memory regions.
High speed PCIe buses and MMIO access will probably see negligible
performance increase. Lower speed buses like SPI and I2C could see
significant performance increase, since the bus configuration and register
access times account for a large percentage of data transfer time.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Regmap supports bulk register reads. Ocelot does not. This patch adds
support for Ocelot to invoke bulk regmap reads. That will allow any driver
that performs consecutive reads over memory regions to optimize that
access.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ocelot_update_stats function only needs to read from one port, yet it
was updating the stats for all ports. Update to only read the stats that
are necessary.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An ongoing workqueue populates the stats buffer. At the same time, a user
might query the statistics. While writing to the buffer is mutex-locked,
reading from the buffer wasn't. This could lead to buggy reads by ethtool.
This patch fixes the former blamed commit, but the bug was introduced in
the latter.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Fixes: 1e1caa9735 ("ocelot: Clean up stats update deferred work")
Fixes: a556c76adc ("net: mscc: Add initial Ocelot switch support")
Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/all/20220210150451.416845-2-colin.foster@in-advantage.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The filters for the PTP trap keys are incorrectly configured, in the
sense that is2_entry_set() only looks at trap->key.ipv4.dport or
trap->key.ipv6.dport if trap->key.ipv4.proto or trap->key.ipv6.proto is
set to IPPROTO_TCP or IPPROTO_UDP.
But we don't do that, so is2_entry_set() goes through the "else" branch
of the IP protocol check, and ends up installing a rule for "Any IP
protocol match" (because msk is also 0). The UDP port is ignored.
This means that when we run "ptp4l -i swp0 -4", all IP traffic is
trapped to the CPU, which hinders bridging.
Fix this by specifying the IP protocol in the VCAP IS2 filters for PTP
over UDP.
Fixes: 96ca08c058 ("net: mscc: ocelot: set up traps for PTP packets")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clang static analysis reports this issue
ocelot_flower.c:563:8: warning: 1st function call argument
is an uninitialized value
!is_zero_ether_addr(match.mask->dst)) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The variable match is used before it is set. So move the
block.
Fixes: 75944fda1d ("net: mscc: ocelot: offload ingress skbedit and vlan actions to VCAP IS1")
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the blamed commit, the call to the function
switchdev_bridge_port_offload was passing the wrong argument for
atomic_nb. It was ocelot_netdevice_nb instead of ocelot_swtchdev_nb.
This patch fixes this issue.
Fixes: 4e51bf44a0 ("net: bridge: move the switchdev object replay helpers to "push" mode")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following command sequence:
tc qdisc del dev swp0 clsact
tc qdisc add dev swp0 ingress_block 1 clsact
tc qdisc add dev swp1 ingress_block 1 clsact
tc filter add block 1 flower action drop
tc qdisc del dev swp0 clsact
produces the following NPD:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000014
pc : vcap_entry_set+0x14/0x70
lr : ocelot_vcap_filter_del+0x198/0x234
Call trace:
vcap_entry_set+0x14/0x70
ocelot_vcap_filter_del+0x198/0x234
ocelot_cls_flower_destroy+0x94/0xe4
felix_cls_flower_del+0x70/0x84
dsa_slave_setup_tc_block_cb+0x13c/0x60c
dsa_slave_setup_tc_block_cb_ig+0x20/0x30
tc_setup_cb_reoffload+0x44/0x120
fl_reoffload+0x280/0x320
tcf_block_playback_offloads+0x6c/0x184
tcf_block_unbind+0x80/0xe0
tcf_block_setup+0x174/0x214
tcf_block_offload_cmd.isra.0+0x100/0x13c
tcf_block_offload_unbind+0x5c/0xa0
__tcf_block_put+0x54/0x174
tcf_block_put_ext+0x5c/0x74
clsact_destroy+0x40/0x60
qdisc_destroy+0x4c/0x150
qdisc_put+0x70/0x90
qdisc_graft+0x3f0/0x4c0
tc_get_qdisc+0x1cc/0x364
rtnetlink_rcv_msg+0x124/0x340
The reason is that the driver isn't prepared to receive two tc filters
with the same cookie. It unconditionally creates a new struct
ocelot_vcap_filter for each tc filter, and it adds all filters with the
same identifier (cookie) to the ocelot_vcap_block.
The problem is here, in ocelot_vcap_filter_del():
/* Gets index of the filter */
index = ocelot_vcap_block_get_filter_index(block, filter);
if (index < 0)
return index;
/* Delete filter */
ocelot_vcap_block_remove_filter(ocelot, block, filter);
/* Move up all the blocks over the deleted filter */
for (i = index; i < block->count; i++) {
struct ocelot_vcap_filter *tmp;
tmp = ocelot_vcap_block_find_filter_by_index(block, i);
vcap_entry_set(ocelot, i, tmp);
}
what will happen is ocelot_vcap_block_get_filter_index() will return the
index (@index) of the first filter found with that cookie. This is _not_
the index of _this_ filter, but the other one with the same cookie,
because ocelot_vcap_filter_equal() gets fooled.
Then later, ocelot_vcap_block_remove_filter() is coded to remove all
filters that are ocelot_vcap_filter_equal() with the passed @filter.
So unexpectedly, both filters get deleted from the list.
Then ocelot_vcap_filter_del() will attempt to move all the other filters
up, again finding them by index (@i). The block count is 2, @index was 0,
so it will attempt to move up filter @i=0 and @i=1. It assigns tmp =
ocelot_vcap_block_find_filter_by_index(block, i), which is now a NULL
pointer because ocelot_vcap_block_remove_filter() has removed more than
one filter.
As far as I can see, this problem has been there since the introduction
of tc offload support, however I cannot test beyond the blamed commit
due to hardware availability. In any case, any fix cannot be backported
that far, due to lots of changes to the code base.
Therefore, let's go for the correct solution, which is to not call
ocelot_vcap_filter_add() and ocelot_vcap_filter_del(), unless the filter
is actually unique and not shared. For the shared filters, we should
just modify the ingress port mask and call ocelot_vcap_filter_replace(),
a function introduced by commit 95706be13b ("net: mscc: ocelot: create
a function that replaces an existing VCAP filter"). This way,
block->rules will only contain filters with unique cookies, by design.
Fixes: 07d985eef0 ("net: dsa: felix: Wire up the ocelot cls_flower methods")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for flushing the MAC table on a given port in the ocelot
switch library, and use this functionality in the felix DSA driver.
This operation is needed when a port leaves a bridge to become
standalone, and when the learning is disabled, and when the STP state
changes to a state where no FDB entry should be present.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220107144229.244584-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Assuming the test setup described here:
https://patchwork.kernel.org/project/netdevbpf/cover/20210205130240.4072854-1-vladimir.oltean@nxp.com/
(swp1 and swp2 are in bond0, and bond0 is in a bridge with swp0)
it can be seen that when swp1 goes down (on either board A or B), then
traffic that should go through that port isn't forwarded anywhere.
A dump of the PGID table shows the following:
PGID_DST[0] = ports 0
PGID_DST[1] = ports 1
PGID_DST[2] = ports 2
PGID_DST[3] = ports 3
PGID_DST[4] = ports 4
PGID_DST[5] = ports 5
PGID_DST[6] = no ports
PGID_AGGR[0] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[1] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[2] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[3] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[4] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[5] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[6] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[7] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[8] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[9] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[10] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[11] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[12] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[13] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[14] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[15] = ports 0, 1, 2, 3, 4, 5
PGID_SRC[0] = ports 1, 2
PGID_SRC[1] = ports 0
PGID_SRC[2] = ports 0
PGID_SRC[3] = no ports
PGID_SRC[4] = no ports
PGID_SRC[5] = no ports
PGID_SRC[6] = ports 0, 1, 2, 3, 4, 5
Whereas a "good" PGID configuration for that setup should have looked
like this:
PGID_DST[0] = ports 0
PGID_DST[1] = ports 1, 2
PGID_DST[2] = ports 1, 2
PGID_DST[3] = ports 3
PGID_DST[4] = ports 4
PGID_DST[5] = ports 5
PGID_DST[6] = no ports
PGID_AGGR[0] = ports 0, 2, 3, 4, 5
PGID_AGGR[1] = ports 0, 2, 3, 4, 5
PGID_AGGR[2] = ports 0, 2, 3, 4, 5
PGID_AGGR[3] = ports 0, 2, 3, 4, 5
PGID_AGGR[4] = ports 0, 2, 3, 4, 5
PGID_AGGR[5] = ports 0, 2, 3, 4, 5
PGID_AGGR[6] = ports 0, 2, 3, 4, 5
PGID_AGGR[7] = ports 0, 2, 3, 4, 5
PGID_AGGR[8] = ports 0, 2, 3, 4, 5
PGID_AGGR[9] = ports 0, 2, 3, 4, 5
PGID_AGGR[10] = ports 0, 2, 3, 4, 5
PGID_AGGR[11] = ports 0, 2, 3, 4, 5
PGID_AGGR[12] = ports 0, 2, 3, 4, 5
PGID_AGGR[13] = ports 0, 2, 3, 4, 5
PGID_AGGR[14] = ports 0, 2, 3, 4, 5
PGID_AGGR[15] = ports 0, 2, 3, 4, 5
PGID_SRC[0] = ports 1, 2
PGID_SRC[1] = ports 0
PGID_SRC[2] = ports 0
PGID_SRC[3] = no ports
PGID_SRC[4] = no ports
PGID_SRC[5] = no ports
PGID_SRC[6] = ports 0, 1, 2, 3, 4, 5
In other words, in the "bad" configuration, the attempt is to remove the
inactive swp1 from the destination ports via PGID_DST. But when a MAC
table entry is learned, it is learned towards PGID_DST 1, because that
is the logical port id of the LAG itself (it is equal to the lowest
numbered member port). So when swp1 becomes inactive, if we set
PGID_DST[1] to contain just swp1 and not swp2, the packet will not have
any chance to reach the destination via swp2.
The "correct" way to remove swp1 as a destination is via PGID_AGGR
(remove swp1 from the aggregation port groups for all aggregation
codes). This means that PGID_DST[1] and PGID_DST[2] must still contain
both swp1 and swp2. This makes the MAC table still treat packets
destined towards the single-port LAG as "multicast", and the inactive
ports are removed via the aggregation code tables.
The change presented here is a design one: the ocelot_get_bond_mask()
function used to take an "only_active_ports" argument. We don't need
that. The only call site that specifies only_active_ports=true,
ocelot_set_aggr_pgids(), must retrieve the entire bonding mask, because
it must program that into PGID_DST. Additionally, it must also clear the
inactive ports from the bond mask here, which it can't do if bond_mask
just contains the active ports:
ac = ocelot_read_rix(ocelot, ANA_PGID_PGID, i);
ac &= ~bond_mask; <---- here
/* Don't do division by zero if there was no active
* port. Just make all aggregation codes zero.
*/
if (num_active_ports)
ac |= BIT(aggr_idx[i % num_active_ports]);
ocelot_write_rix(ocelot, ac, ANA_PGID_PGID, i);
So it becomes the responsibility of ocelot_set_aggr_pgids() to take
ocelot_port->lag_tx_active into consideration when populating the
aggr_idx array.
Fixes: 23ca3b727e ("net: mscc: ocelot: rebalance LAGs on link up/down events")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220107164332.402133-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add index to flow_action_entry structure and delete index from police and
gate child structure.
We make this change to offload tc action for driver to identify a tc
action.
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to get mac from device-tree using of_get_ethdev_address.
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 94dd016ae5 ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP
ioctl to active device") the user could get bond active interface's
PHC index directly. But when there is a failover, the bond active
interface will change, thus the PHC index is also changed. This may
break the user's program if they did not update the PHC timely.
This patch adds a new hwtstamp_config flag HWTSTAMP_FLAG_BONDED_PHC_INDEX.
When the user wants to get the bond active interface's PHC, they need to
add this flag and be aware the PHC index may be changed.
With the new flag. All flag checks in current drivers are removed. Only
the checking in net_hwtstamp_validate() is kept.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dma_addr was declared using DEFINE_DMA_UNMAP_ADDR() which requires to
use dma_unmap_addr() to access it.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 753a026cfe ("net: ocelot: add FDMA support")
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ethernet frames can be extracted or injected autonomously to or from
the device’s DDR3/DDR3L memory and/or PCIe memory space. Linked list
data structures in memory are used for injecting or extracting Ethernet
frames. The FDMA generates interrupts when frame extraction or
injection is done and when the linked lists need updating.
The FDMA is shared between all the ethernet ports of the switch and
uses a linked list of descriptors (DCB) to inject and extract packets.
Before adding descriptors, the FDMA channels must be stopped. It would
be inefficient to do that each time a descriptor would be added so the
channels are restarted only once they stopped.
Both channels uses ring-like structure to feed the DCBs to the FDMA.
head and tail are never touched by hardware and are completely handled
by the driver. On top of that, page recycling has been added and is
mostly taken from gianfar driver.
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Co-developed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit adds support for changing MTU for the ocelot register based
interface. For ocelot, JUMBO frame size can be set up to 25000 bytes
but has been set to 9000 which is a saner value and allows for maximum
gain of performance with FDMA.
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In order to support PTP in FDMA, PTP handling code is needed. Since
this is the same as for register-based extraction, export it with
a new ocelot_ptp_rx_timestamp() function.
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
FDMA will need this code to prepare the injection frame header when
sending SKBs. Move this code into ocelot_ifh_port_set() and add
conditional IFH setting for vlan and rew op if they are not set.
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move these to a separate file will allow them to be shared to other
drivers.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If err is true, the function will be returned, but mutex_lock isn't
released.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver doesn't support RX timestamping for non-PTP packets, but it
declares that it does. Restrict the reported RX filters to PTP v2 over
L2 and over L4.
Fixes: 4e3b0468e6 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
IEEE 1588 support was declared too soon for the Ocelot switch. Out of
reset, this switch does not apply any special treatment for PTP packets,
i.e. when an event message is received, the natural tendency is to
forward it by MAC DA/VLAN ID. This poses a problem when the ingress port
is under a bridge, since user space application stacks (written
primarily for endpoint ports, not switches) like ptp4l expect that PTP
messages are always received on AF_PACKET / AF_INET sockets (depending
on the PTP transport being used), and never being autonomously
forwarded. Any forwarding, if necessary (for example in Transparent
Clock mode) is handled in software by ptp4l. Having the hardware forward
these packets too will cause duplicates which will confuse endpoints
connected to these switches.
So PTP over L2 barely works, in the sense that PTP packets reach the CPU
port, but they reach it via flooding, and therefore reach lots of other
unwanted destinations too. But PTP over IPv4/IPv6 does not work at all.
This is because the Ocelot switch have a separate destination port mask
for unknown IP multicast (which PTP over IP is) flooding compared to
unknown non-IP multicast (which PTP over L2 is) flooding. Specifically,
the driver allows the CPU port to be in the PGID_MC port group, but not
in PGID_MCIPV4 and PGID_MCIPV6. There are several presentations from
Allan Nielsen which explain that the embedded MIPS CPU on Ocelot
switches is not very powerful at all, so every penny they could save by
not allowing flooding to the CPU port module matters. Unknown IP
multicast did not make it.
The de facto consensus is that when a switch is PTP-aware and an
application stack for PTP is running, switches should have some sort of
trapping mechanism for PTP packets, to extract them from the hardware
data path. This avoids both problems:
(a) PTP packets are no longer flooded to unwanted destinations
(b) PTP over IP packets are no longer denied from reaching the CPU since
they arrive there via a trap and not via flooding
It is not the first time when this change is attempted. Last time, the
feedback from Allan Nielsen and Andrew Lunn was that the traps should
not be installed by default, and that PTP-unaware switching may be
desired for some use cases:
https://patchwork.ozlabs.org/project/netdev/patch/20190813025214.18601-5-yangbo.lu@nxp.com/
To address that feedback, the present patch adds the necessary packet
traps according to the RX filter configuration transmitted by user space
through the SIOCSHWTSTAMP ioctl. Trapping is done via VCAP IS2, where we
keep 5 filters, which are amended each time RX timestamping is enabled
or disabled on a port:
- 1 for PTP over L2
- 2 for PTP over IPv4 (UDP ports 319 and 320)
- 2 for PTP over IPv6 (UDP ports 319 and 320)
The cookie by which these filters (invisible to tc) are identified is
strategically chosen such that it does not collide with the filters used
for the ocelot-8021q tagging protocol by the Felix driver, or with the
MRP traps set up by the Ocelot library.
Other alternatives were considered, like patching user space to do
something, but there are so many ways in which PTP packets could be made
to reach the CPU, generically speaking, that "do what?" is a very valid
question. The ptp4l program from the linuxptp stack already attempts to
do something: it calls setsockopt(IP_ADD_MEMBERSHIP) (and
PACKET_ADD_MEMBERSHIP, respectively) which translates in both cases into
a dev_mc_add() on the interface, in the kernel:
https://github.com/richardcochran/linuxptp/blob/v3.1.1/udp.c#L73https://github.com/richardcochran/linuxptp/blob/v3.1.1/raw.c
Reality shows that this is not sufficient in case the interface belongs
to a switchdev driver, as dev_mc_add() does not show the intention to
trap a packet to the CPU, but rather the intention to not drop it (it is
strictly for RX filtering, same as promiscuous does not mean to send all
traffic to the CPU, but to not drop traffic with unknown MAC DA). This
topic is a can of worms in itself, and it would be great if user space
could just stay out of it.
On the other hand, setting up PTP traps privately within the driver is
not new by any stretch of the imagination:
https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c#L833https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/dsa/hirschmann/hellcreek.c#L1050https://elixir.bootlin.com/linux/v5.16-rc2/source/include/linux/dsa/sja1105.h#L21
So this is the approach taken here as well. The difference here being
that we prepare and destroy the traps per port, dynamically at runtime,
as opposed to driver init time, because apparently, PTP-unaware
forwarding is a use case.
Fixes: 4e3b0468e6 ("net: mscc: PTP Hardware Clock (PHC) support")
Reported-by: Po Liu <po.liu@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
VCAP (Versatile Content Aware Processor) is the TCAM-based engine behind
tc flower offload on ocelot, among other things. The ingress port mask
on which VCAP rules match is present as a bit field in the actual key of
the rule. This means that it is possible for a rule to be shared among
multiple source ports. When the rule is added one by one on each desired
port, that the ingress port mask of the key must be edited and rewritten
to hardware.
But the API in ocelot_vcap.c does not allow for this. For one thing,
ocelot_vcap_filter_add() and ocelot_vcap_filter_del() are not symmetric,
because ocelot_vcap_filter_add() works with a preallocated and
prepopulated filter and programs it to hardware, and
ocelot_vcap_filter_del() does both the job of removing the specified
filter from hardware, as well as kfreeing it. That is to say, the only
option of editing a filter in place, which is to delete it, modify the
structure and add it back, does not work because it results in
use-after-free.
This patch introduces ocelot_vcap_filter_replace, which trivially
reprograms a VCAP entry to hardware, at the exact same index at which it
existed before, without modifying any list or allocating any memory.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ocelot driver, when asked to timestamp all receiving packets, 1588
v1 or NTP, says "nah, here's 1588 v2 for you".
According to this discussion:
https://patchwork.kernel.org/project/netdevbpf/patch/20211104133204.19757-8-martin.kaistra@linutronix.de/#24577647
drivers that downgrade from a wider request to a narrower response (or
even a response where the intersection with the request is empty) are
buggy, and should return -ERANGE instead. This patch fixes that.
Fixes: 4e3b0468e6 ("net: mscc: PTP Hardware Clock (PHC) support")
Suggested-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The VSC9959 switch embedded within NXP LS1028A (and that version of
Ocelot switches only) supports cut-through forwarding - meaning it can
start the process of looking up the destination ports for a packet, and
forward towards those ports, before the entire packet has been received
(as opposed to the store-and-forward mode).
The up side is having lower forwarding latency for large packets. The
down side is that frames with FCS errors are forwarded instead of being
dropped. However, erroneous frames do not result in incorrect updates of
the FDB or incorrect policer updates, since these processes are deferred
inside the switch to the end of frame. Since the switch starts the
cut-through forwarding process after all packet headers (including IP,
if any) have been processed, packets with large headers and small
payload do not see the benefit of lower forwarding latency.
There are two cases that need special attention.
The first is when a packet is multicast (or flooded) to multiple
destinations, one of which doesn't have cut-through forwarding enabled.
The switch deals with this automatically by disabling cut-through
forwarding for the frame towards all destination ports.
The second is when a packet is forwarded from a port of lower link speed
towards a port of higher link speed. This is not handled by the hardware
and needs software intervention.
Since we practically need to update the cut-through forwarding domain
from paths that aren't serialized by the rtnl_mutex (phylink
mac_link_down/mac_link_up ops), this means we need to serialize physical
link events with user space updates of bonding/bridging domains.
Enabling cut-through forwarding is done per {egress port, traffic class}.
I don't see any reason why this would be a configurable option as long
as it works without issues, and there doesn't appear to be any user
space configuration tool to toggle this on/off, so this patch enables
cut-through forwarding on all eligible ports and traffic classes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20211125125808.2383984-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The only called takes ocelot_port->bridge and passes it as the "bridge"
argument to this function, which then compares it with
ocelot_port->bridge. This is not useful.
Instead, we would like this function to return 0 if ocelot_port->bridge
is not present, which is what this patch does.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20211125125808.2383984-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
PSFP rules take effect on the streams from any port of VSC9959 switch.
This patch use ingress port to limit the rule only active on this port.
Each stream can only match two ingress source ports in VSC9959. Streams
from lowest port gets the configuration of SFID pointed by MAC Table
lookup and streams from highest port gets the configuration of (SFID+1)
pointed by MAC Table lookup. This patch defines the PSFP rule on highest
port as dummy rule, which means that it does not modify the MAC table.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Policer was previously automatically assigned from the highest index to
the lowest index from policer pool. But police action of tc flower now
uses index to set an police entry. This patch uses the police index to
set vcap policers, so that one policer can be shared by multiple rules.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PSFP support gate and police action. This patch add the gate and police
action to flower parse action, check chain ID to determine which block
to offload. Adding psfp callback functions to add, delete and update gate
and police in PSFP table if hardware supports it.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some chips in the ocelot series such as VSC9959 support Per-Stream
Filtering and Policing(PSFP), which is processing after VCAP blocks.
We set this block on chain 30000 and set vcap IS2 chain to goto PSFP
chain if hardware support.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ocelot_mact_learn_streamdata() can be used in VSC9959 to overwrite an
FDB entry with stream data. The stream data includes SFID and SSID which
can be used for PSFP and FRER set.
ocelot_mact_lookup() can be used to check if the given {DMAC, VID} FDB
entry is exist, and also can retrieve the DEST_IDX and entry type for
the FDB entry.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ocelot_net has no special behaviour in its validation implementation, so
can be switched to phylink_generic_validate().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode in the
validation function. Remove this to simplify it.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Populate the phy interface mode bitmap for the MSCC Ocelot driver with
the interface modes supported by the MAC.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
DSA would like to remove the rtnl_lock from its
SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE handlers, and the felix driver uses
the same MAC table functions as ocelot.
This means that the MAC table functions will no longer be implicitly
serialized with respect to each other by the rtnl_mutex, we need to add
a dedicated lock in ocelot for the non-atomic operations of selecting a
MAC table row, reading/writing what we want and polling for completion.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>