Make sure we don't try to transition the fw_status_ready
while we're still in the FW_STOPPING state, else we can
get stuck in limbo waiting on a transition that already
happened.
While we're here we can remove a superfluous check on
the lif pointer.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some graceful updates that get initially triggered by the
RESET event, especially with older firmware, the fw_generation
bits don't change but the fw_status is seen to go to 0 then back
to 1. However, the driver didn't perform the restart, remained
waiting for fw_generation to change, and got left in limbo.
This is because the clearing of idev->fw_status_ready to 0
didn't happen correctly as it was buried in the transition
trigger: since the transition down was triggered not here
but in the RESET event handler, the clear to 0 didn't happen,
so the transition back to 1 wasn't detected.
Fix this particular case by bringing the setting of
idev->fw_status_ready back out to where it was before.
Fixes: 398d1e37f9 ("ionic: add FW_STOPPING state")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This (ab)use of a data buffer made some static code checkers
rather itchy, so we replace the a generic data buffer with
the union in the struct ionic_vf_setattr_cmd.
Fixes: fbb39807e9 ("ionic: support sr-iov operations")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver can be premature in detecting stalled firmware
when the heartbeat is not updated because the firmware can
occasionally take a long time (more than 2 seconds) to service
a request, and doesn't update the heartbeat during that time.
The firmware heartbeat is not necessarily a steady 1 second
periodic beat, but better described as something that should
progress at least once in every DECVMD_TIMEOUT period.
The single-threaded design in the FW means that if a devcmd
or adminq request launches a large internal job, it is stuck
waiting for that job to finish before it can get back to
updating the heartbeat. Since all requests are "guaranteed"
to finish within the DEVCMD_TIMEOUT period, the driver needs
to less aggressive in checking the heartbeat progress.
We change our current 2 second window to something bigger than
DEVCMD_TIMEOUT which should take care of most of the issue.
We stop checking for the heartbeat while waiting for a request,
as long as we're still watching for the FW status. Lastly,
we make sure our FW status is up to date before running a
devcmd request.
Once we do this, we need to not check the heartbeat on DEV
commands because it may be stalled while we're on the fw_down
path. Instead, we can rely on the is_fw_running check.
Fixes: b2b9a8d7ed ("ionic: avoid races in ionic_heartbeat_check")
Signed-off-by: Brett Creeley <brett@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when an administrator configures a VF via ndo_set_vf*,
the driver will send the set command to FW and then update the
cached value. The cached value is then used when reporting
VF info via ndo_get_vf_config.
A problem is that the VF info may have been updated between
the last ndo_set_vf* and ndo_get_vf_info commands via some
other method, i.e. a VF changes its MAC address (assuming it's
allowed to do so) and since this is all managed by the FW,
this new value won't be reflected in the PF's cache of values.
To fix this, update the driver to always get the latest VF
information by making use of the IONIC_CMD_VF_GETATTR dev
command. The FW may not support getting all the attributes for
IONIC_CMD_VF_GETATTR, so the driver will only update the cached
VF config members if their associated IONIC_CMD_VF_GETATTR
was successful. Otherwise the cached VF config members will
remain the same as what was set in ndo_set_vf*.
Fixes: fbb39807e9 ("ionic: support sr-iov operations")
Signed-off-by: Brett Creeley <brett@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
When IONIC_EVENT_RESET is received, we only need to start the
fw_down process if we aren't already down, and we need to be
sure to set the FW_STOPPING state on the way.
If this is how we noticed that FW was stopped, it is most
likely from a FW update, and we'll see a new FW generation.
The update happens quickly enough that we might not see
fw_status==0, so we need to be sure things get restarted when
we see the fw_generation change.
Fixes: d2662072c0 ("ionic: monitor fw status generation")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Between fw running and fw actually stopped into reset, we need
a fw_stopping concept to catch and block some actions while
we're transitioning to FW_RESET state. This will help to be
sure the fw_up task is not scheduled until after the fw_down
task has completed.
On some rare occasion timing, it is possible for the fw_up task
to try to run before the fw_down task, then not get run after
the fw_down task has run, leaving the device in a down state.
This is possible if the watchdog goes off in between finding the
down transition and starting the fw_down task, where the later
watchdog sees the FW is back up and schedules a fw_up task.
Fixes: c672412f61 ("ionic: remove lifs on fw reset")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's possible the FW is already shutting down while the driver is being
removed and/or when the driver is going through reset. This can cause
unexpected/unnecessary errors to be printed:
eth0: DEV_CMD IONIC_CMD_PORT_RESET (12) error, IONIC_RC_ERROR (29) failed
eth1: DEV_CMD IONIC_CMD_RESET (3) error, IONIC_RC_ERROR (29) failed
Fix this by checking the FW status register before issuing the reset
commands.
Also, since err may not be assigned in ionic_port_reset(), assign it a
default value of 0, and remove an unnecessary log message.
Fixes: fbfb803153 ("ionic: Add hardware init and device commands")
Signed-off-by: Brett Creeley <brett@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull the watchdog init code out to a separate bite-sized
function. Code cleaning for now, will be a useful change in
the near future.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The watchdog expects the lif to fully exist when it goes off,
so lets not start the watchdog until all is ready in case there
is some quirky time dialation that makes probe take multiple
seconds.
Fixes: 089406bc5a ("ionic: add a watchdog timer to monitor heartbeat")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
These debug stats are not really useful, their collection is
likely detrimental to performance, and they suck up a lot
of memory which never gets used if no one ever enables the
priv-flag to print them, so just remove these bits.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to separate the atomic needs of __dev_uc_sync()
and __dev_mc_sync() from the safe rx_mode handling, we need
to have the ndo handler manipulate the driver's filter list,
and later have the driver sync the filters to the firmware,
outside of the atomic context.
Here we put __dev_mc_sync() and __dev_uc_sync() back into the
ndo callback to give them their netif_addr_lock context and
have them update the driver's filter list, flagging changes
that should be made to the device filter list. Later, in the
rx_mode handler, we read those hints and sync up the device's
list as needed.
It is possible for multiple add/delete requests to come from
the stack before the rx_mode task processes the list, but the
handling of the sync status flag should keep everything sorted
correctly. For example, if a delete of an existing filter is
followed by another add before the rx_mode task is run, as can
happen when going in and out of a bond, the add will cancel
the delete and no actual changes will be sent to the device.
We also add a check in the watchdog to see if there are any
stray unsync'd filters, possibly left over from a filter
overflow and waiting to get sync'd after some other filter
gets removed to make room.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The top 4 bits of the fw_status in dev_info_regs is reserved
for the status generation. This generation number is an
arbitrary value defined when firmware starts up. If the FW
is killed/crashed/stopped and then restarted, it will create
a different generation number. With this mechanism, the host
driver can detect that the FW has crashed and restarted, and
the driver can then take steps to re-initialize its connection.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
These are changes to compile and link the new code, but no
new feature support is available or advertised yet.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rework the heartbeat checks to be sure that we're getting an
atomic operation. Through testing we found occasions where a
separate thread could clash with this check and cause erroneous
heartbeat check results.
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Block some actions while the FW is in a reset activity
and the queues are not configured.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clean up a couple of struct uses to make for better fast path
access.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove memory allocation fail messages where the OOM stack
trace will make it obvious which allocation request failed.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With a few more uses of true and false in function calls, we
need to give them some useful names so we can tell from the
calling point what we're doing.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The sparse complaints around the static_asserts were obscuring
more useful complaints. So, don't check the static_asserts,
and fix the remaining sparse complaints.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In one corner case scenario, the driver device lif setup can
get delayed such that the ionic_watchdog_cb() timer goes off
before the ionic->lif is set, thus causing a NULL pointer panic.
We catch the problem by checking for a NULL lif just a little
earlier in the callback.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to be better at making sure we don't have a link check
watchdog go off while we're shutting things down, so let's stop
the timer as soon as we start the remove.
Meanwhile, since that was the only thing in
ionic_dev_teardown(), simplify and remove that function.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The in_interrupt() usage in this driver tries to figure out which context
may sleep and which context may not sleep. in_interrupt() is not really
suitable as it misses both preemption disabled and interrupt disabled
invocations from task context.
Conditionals like that in driver code are frowned upon in general because
invocations of functions from invalid contexts might not be detected
as the conditional papers over it.
ionic_lif_addr() and _ionoc_lif_rx_mode() can be called from:
1) ->ndo_set_rx_mode() which is under netif_addr_lock_bh()) so it must not
sleep.
2) Init and setup functions which are in fully preemptible task context.
ionic_link_status_check_request() has two call paths:
1) NAPI which obviously cannot sleep
2) Setup which is again fully preemptible task context
Add arguments which convey the execution context to the affected functions
and let the callers provide the context instead of letting the functions
deduce it.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove some unnecessary struct fields and related code.
Co-developed-by: Neel Patel <neel@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use index counters rather than pointers for tracking head
and tail in the queues to save a little memory and to perhaps
slightly faster queue processing.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
As we aren't yet supporting multiple lifs, we can remove
complexity by removing the list concept and related code,
to be re-engineered later when actually needed.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The version 1 Tx queues can use longer SG lists than the
original version 0 queues, but we need to check to see if the
firmware supports the v1 Tx queues. This implements the queue
type query for all queue types, and uses the information to
set up for using the longer Tx SG lists.
Because the Tx SG list can be longer, we need to limit the
max ring length to be sure we stay inside the boundaries of a
DMA allocation max size, so we lower the max Tx ring size.
The driver sets its highest known version in the Q_IDENTITY
command, and the FW returns the highest version that it knows,
bounded by the driver's version. The negotiated version number
is later used in the Q_INIT commands.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the FW RESET event comes to the driver from the firmware,
or the fw_status goes to 0 (stopped) or to 0xff (no PCI
connection), then shut down the driver activity. This event
signals a FW upgrade where we need to quiesce all operations and
wait for the FW to restart. The FW will continue the update
process once it sees all the LIFs are reset. When the update
process is done it will set the fw_status back to RUNNING.
Meanwhile, the heartbeat check continues and when the fw_status
is seen as set to running we can restart the driver operations.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a link_status_check to the heartbeat watchdog.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fw_status field is only 8 bits, so fix the read. Also,
we only want to look at the one status bit, to allow for future
use of the other bits, and watch for a bad PCI read.
Fixes: 97ca486592 ("ionic: add heartbeat check")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the netdev ops for managing VFs. Since most of the
management work happens in the NIC firmware, the driver becomes
mostly a pass-through for the network stack commands that want
to control and configure the VFs.
We also tweak ionic_station_set() a little to allow for
the VFs that start off with a zero'd mac address.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a watchdog to periodically monitor the NIC heartbeat.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most of our firmware has a heartbeat feature that the driver
can watch for to see if the FW is still alive and likely to
answer a dev_cmd or AdminQ request.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most of the NIC configuration happens through the AdminQ message
queue. NAPI is used for basic interrupt handling and message
queue management. These routines are set up to be shared among
different types of queues when used in slow-path handling.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ionic interrupt model is based on interrupt control blocks
accessed through the PCI BAR. Doorbell registers are used by
the driver to signal to the NIC that requests are waiting on
the message queues. Interrupts are used by the NIC to signal
to the driver that answers are waiting on the completion queues.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The LIF is the Logical Interface, which represents the external
connections. The NIC can multiplex many LIFs to a single port,
but in most setups, LIF0 is the primary control for the port.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The port management commands apply to the physical port
associated with the PCI device, which might be shared among
several logical interfaces.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ionic device has a small set of PCI registers, including a
device control and data space, and a large set of message
commands.
Also adds new DEVLINK_INFO_VERSION_GENERIC tags for
ASIC_ID, ASIC_REV, and FW.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>