1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

459 commits

Author SHA1 Message Date
Lijun Pan
42557dab78 ibmvnic: add memory barrier to protect long term buffer
dma_rmb() barrier is added to load the long term buffer before copying
it to socket buffer; and dma_wmb() barrier is added to update the
long term buffer before it being accessed by VIOS (virtual i/o server).

Fixes: 032c5e8284 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Acked-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-15 15:12:01 -08:00
Lijun Pan
1a42156f52 ibmvnic: substitute mb() with dma_wmb() for send_*crq* functions
The CRQ and subCRQ descriptors are DMA mapped, so dma_wmb(),
though weaker, is good enough to protect the data structures.

Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Acked-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-15 15:11:04 -08:00
Lijun Pan
1c7d45e7b2 ibmvnic: simplify reset_long_term_buff function
The only thing reset_long_term_buff() should do is set
buffer to zero. After doing that, it is not necessary to
send_request_map again to VIOS since it actually does not
change the mapping. So, keep memset function and remove all
others.

Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-15 15:10:38 -08:00
Sukadev Bhattiprolu
d4083d3c00 ibmvnic: Set to CLOSED state even on error
If set_link_state() fails for any reason, we still cleanup the adapter
state and cannot recover from a partial close anyway. So set the adapter
to CLOSED state. That way if a new soft/hard reset is processed, the
adapter will remain in the CLOSED state until the next ibmvnic_open().

Fixes: 01d9bd792d ("ibmvnic: Reorganize device close")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Reported-by: Abdul Haleem <abdhalee@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-11 14:33:14 -08:00
Lijun Pan
8a96c80e27 ibmvnic: prefer strscpy over strlcpy
Fix this warning:
WARNING: Prefer strscpy over strlcpy - see: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/

Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-11 13:03:00 -08:00
Lijun Pan
4bb9f2e482 ibmvnic: remove unused spinlock_t stats_lock definition
stats_lock is no longer used. So remove it.

Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-11 13:03:00 -08:00
Lijun Pan
91dc5d2553 ibmvnic: fix miscellaneous checks
Fix the following checkpatch checks:
CHECK: Macro argument 'off' may be better as '(off)' to
avoid precedence issues
CHECK: Alignment should match open parenthesis
CHECK: multiple assignments should be avoided
CHECK: Blank lines aren't necessary before a close brace '}'
CHECK: Please use a blank line after function/struct/union/enum
declarations
CHECK: Unnecessary parentheses around 'rc != H_FUNCTION'

Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-11 13:03:00 -08:00
Lijun Pan
914789acaa ibmvnic: avoid multiple line dereference
Fix the following checkpatch warning:
WARNING: Avoid multiple line dereference

Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-11 13:03:00 -08:00
Lijun Pan
f78afaace6 ibmvnic: fix braces
Fix the following checkpatch warning:
WARNING: braces {} are not necessary for single statement blocks

Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-11 13:03:00 -08:00
Lijun Pan
bab08bedcd ibmvnic: fix block comments
Fix the following checkpatch warning:
WARNING: networking block comments don't use an empty /* line, use /* Comment...

Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-11 13:03:00 -08:00
Lijun Pan
429aa36469 ibmvnic: prefer 'unsigned long' over 'unsigned long int'
Fix the following checkpatch warnings:
WARNING: Prefer 'unsigned long' over 'unsigned long int' as the int is unnecessary
WARNING: Prefer 'long' over 'long int' as the int is unnecessary

Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-11 13:03:00 -08:00
David S. Miller
dc9d87581d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-02-10 13:30:12 -08:00
Sukadev Bhattiprolu
ef66a1eace ibmvnic: Clear failover_pending if unable to schedule
Normally we clear the failover_pending flag when processing the reset.
But if we are unable to schedule a failover reset we must clear the
flag ourselves. We could fail to schedule the reset if we are in PROBING
state (eg: when booting via kexec) or because we could not allocate memory.

Thanks to Cris Forno for helping isolate the problem and for testing.

Fixes: 1d85049374 ("powerpc/vnic: Extend "failover pending" window")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Tested-by: Cristobal Forno <cforno12@linux.ibm.com>
Link: https://lore.kernel.org/r/20210203050802.680772-1-sukadev@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-06 10:36:22 -08:00
Jakub Kicinski
d1e1355aef Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 14:21:31 -08:00
Lijun Pan
2719cb445d ibmvnic: remove unnecessary rmb() inside ibmvnic_poll
rmb() can be removed since:
1. pending_scrq() has dma_rmb() at the function end;
2. dma_rmb(), though weaker, is enough here.

Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Acked-by: Dwip Banerjee <dnbanerg@us.ibm.com>
Acked-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01 20:21:12 -08:00
Lijun Pan
665ab1eb18 ibmvnic: rework to ensure SCRQ entry reads are properly ordered
Move the dma_rmb() between pending_scrq() and ibmvnic_next_scrq()
into the end of pending_scrq() to save the duplicated code since
this dma_rmb will be used 3 times.

Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Acked-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01 20:21:12 -08:00
Lijun Pan
5e9eff5dfa ibmvnic: device remove has higher precedence over reset
Returning -EBUSY in ibmvnic_remove() does not actually hold the
removal procedure since driver core doesn't care for the return
value (see __device_release_driver() in drivers/base/dd.c
calling dev->bus->remove()) though vio_bus_remove
(in arch/powerpc/platforms/pseries/vio.c) records the
return value and passes it on. [1]

During the device removal precedure, checking for resetting
bit is dropped so that we can continue executing all the
cleanup calls in the rest of the remove function. Otherwise,
it can cause latent memory leaks and kernel crashes.

[1] https://lore.kernel.org/linuxppc-dev/20210117101242.dpwayq6wdgfdzirl@pengutronix.de/T/#m48f5befd96bc9842ece2a3ad14f4c27747206a53
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Fixes: 7d7195a026 ("ibmvnic: Do not process device remove during device reset")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Link: https://lore.kernel.org/r/20210129043402.95744-1-ljp@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01 18:32:29 -08:00
Jakub Kicinski
c358f95205 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/can/dev.c
  b552766c87 ("can: dev: prevent potential information leak in can_fill_info()")
  3e77f70e73 ("can: dev: move driver related infrastructure into separate subdir")
  0a042c6ec9 ("can: dev: move netlink related code into seperate file")

  Code move.

drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
  57ac4a31c4 ("net/mlx5e: Correctly handle changing the number of queues when the interface is down")
  214baf2287 ("net/mlx5e: Support HTB offload")

  Adjacent code changes

net/switchdev/switchdev.c
  20776b465c ("net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP")
  ffb68fc58e ("net: switchdev: remove the transaction structure from port object notifiers")
  bae33f2b5a ("net: switchdev: remove the transaction structure from port attributes")

  Transaction parameter gets dropped otherwise keep the fix.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-28 17:09:31 -08:00
Lijun Pan
e41aec79e6 ibmvnic: Ensure that CRQ entry read are correctly ordered
Ensure that received Command-Response Queue (CRQ) entries are
properly read in order by the driver. dma_rmb barrier has
been added before accessing the CRQ descriptor to ensure
the entire descriptor is read before processing.

Fixes: 032c5e8284 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Link: https://lore.kernel.org/r/20210128013442.88319-1-ljp@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-27 19:12:02 -08:00
Lee Jones
807086021b net: ethernet: ibm: ibmvnic: Fix some kernel-doc misdemeanours
Fixes the following W=1 kernel build warning(s):

 from drivers/net/ethernet/ibm/ibmvnic.c:35:
 inlined from ‘handle_vpd_rsp’ at drivers/net/ethernet/ibm/ibmvnic.c:4124:3:
 drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'hdr_field' not described in 'build_hdr_data'
 drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'skb' not described in 'build_hdr_data'
 drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'hdr_len' not described in 'build_hdr_data'
 drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'hdr_data' not described in 'build_hdr_data'
 drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'hdr_field' not described in 'create_hdr_descs'
 drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'hdr_data' not described in 'create_hdr_descs'
 drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'len' not described in 'create_hdr_descs'
 drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'hdr_len' not described in 'create_hdr_descs'
 drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'scrq_arr' not described in 'create_hdr_descs'
 drivers/net/ethernet/ibm/ibmvnic.c:1474: warning: Function parameter or member 'txbuff' not described in 'build_hdr_descs_arr'
 drivers/net/ethernet/ibm/ibmvnic.c:1474: warning: Function parameter or member 'num_entries' not described in 'build_hdr_descs_arr'
 drivers/net/ethernet/ibm/ibmvnic.c:1474: warning: Function parameter or member 'hdr_field' not described in 'build_hdr_descs_arr'
 drivers/net/ethernet/ibm/ibmvnic.c:1832: warning: Function parameter or member 'adapter' not described in 'do_change_param_reset'
 drivers/net/ethernet/ibm/ibmvnic.c:1832: warning: Function parameter or member 'rwi' not described in 'do_change_param_reset'
 drivers/net/ethernet/ibm/ibmvnic.c:1832: warning: Function parameter or member 'reset_state' not described in 'do_change_param_reset'
 drivers/net/ethernet/ibm/ibmvnic.c:1911: warning: Function parameter or member 'adapter' not described in 'do_reset'
 drivers/net/ethernet/ibm/ibmvnic.c:1911: warning: Function parameter or member 'rwi' not described in 'do_reset'
 drivers/net/ethernet/ibm/ibmvnic.c:1911: warning: Function parameter or member 'reset_state' not described in 'do_reset'

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-16 18:13:17 -08:00
Lijun Pan
3f5ec374ae ibmvnic: merge do_change_param_reset into do_reset
Commit b27507bb59 ("net/ibmvnic: unlock rtnl_lock in reset so
linkwatch_event can run") introduced do_change_param_reset function to
solve the rtnl lock issue. Majority of the code in do_change_param_reset
duplicates do_reset. Also, we can handle the rtnl lock issue in do_reset
itself. Hence merge do_change_param_reset back into do_reset to clean up
the code.

Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Link: https://lore.kernel.org/r/20210106213514.76027-1-ljp@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-09 12:34:45 -08:00
YANG LI
862aecbd95 ibmvnic: fix: NULL pointer dereference.
The error is due to dereference a null pointer in function
reset_one_sub_crq_queue():

if (!scrq) {
    netdev_dbg(adapter->netdev,
               "Invalid scrq reset. irq (%d) or msgs(%p).\n",
		scrq->irq, scrq->msgs);
		return -EINVAL;
}

If the expression is true, scrq must be a null pointer and cannot
dereference.

Fixes: 9281cf2d58 ("ibmvnic: avoid memset null scrq msgs")
Signed-off-by: YANG LI <abaci-bugfix@linux.alibaba.com>
Reported-by: Abaci <abaci@linux.alibaba.com>
Acked-by: Lijun Pan <ljp@linux.ibm.com>
Link: https://lore.kernel.org/r/1609312994-121032-1-git-send-email-abaci-bugfix@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-04 13:23:33 -08:00
Lijun Pan
1f45dc2206 ibmvnic: continue fatal error reset after passive init
Commit f9c6cea0b3 ("ibmvnic: Skip fatal error reset after passive init")
says "If the passive
CRQ initialization occurs before the FATAL reset task is processed,
the FATAL error reset task would try to access a CRQ message queue
that was freed, causing an oops. The problem may be most likely to
occur during DLPAR add vNIC with a non-default MTU, because the DLPAR
process will automatically issue a change MTU request.
Fix this by not processing fatal error reset if CRQ is passively
initialized after client-driven CRQ initialization fails."

The original commit skips a specific reset condition, but that does
not fix the problem it claims to fix, and misses a reset condition.
The effective fix is commit 0e435befae ("ibmvnic: fix NULL pointer
dereference in ibmvic_reset_crq") and commit a0faaa27c7 ("ibmvnic:
fix NULL pointer dereference in reset_sub_crq_queues"). With above
two fixes, there are no more crashes seen as described even without
the original commit, so I would like to revert the original commit.

Fixes: f9c6cea0b3 ("ibmvnic: Skip fatal error reset after passive init")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Link: https://lore.kernel.org/r/20201223204904.12677-1-ljp@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-23 12:56:10 -08:00
Lijun Pan
a0c8be56af ibmvnic: fix login buffer memory leak
Commit 34f0f4e3f4 ("ibmvnic: Fix login buffer memory leaks") frees
login_rsp_buffer in release_resources() and send_login()
because handle_login_rsp() does not free it.
Commit f3ae59c0c0 ("ibmvnic: store RX and TX subCRQ handle array in
ibmvnic_adapter struct") frees login_rsp_buffer in handle_login_rsp().
It seems unnecessary to free it in release_resources() and send_login().
There are chances that handle_login_rsp returns earlier without freeing
buffers. Double-checking the buffer is harmless since
release_login_buffer and release_login_rsp_buffer will
do nothing if buffer is already freed.

Fixes: f3ae59c0c0 ("ibmvnic: store RX and TX subCRQ handle array in ibmvnic_adapter struct")
Fixes: 34f0f4e3f4 ("ibmvnic: Fix login buffer memory leaks")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Link: https://lore.kernel.org/r/20201219213919.21045-1-ljp@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-22 18:40:46 -08:00
Lijun Pan
6be4666221 use __netdev_notify_peers in ibmvnic
Start to use the lockless version of netdev_notify_peers.

Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-16 11:43:26 -08:00
Dwip N. Banerjee
c2af62256e ibmvnic: fix rx buffer tracking and index management in replenish_rx_pool partial success
We observed that in the error case for batched send_subcrq_indirect() the
driver does not account for the partial success case. This caused Linux to
crash when free_map and pool index are inconsistent.

Driver needs to update the rx pools "available" count when some batched
sends worked but an error was encountered as part of the whole operation.
Also track replenish_add_buff_failure for statistic purposes.

Fixes: 4f0b6812e9 ("ibmvnic: Introduce batched RX buffer descriptor transmission")
Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-09 19:06:10 -08:00
Sukadev Bhattiprolu
38bd5cec76 ibmvnic: add some debugs
We sometimes run into situations where a soft/hard reset of the adapter
takes a long time or fails to complete. Having additional messages that
include important adapter state info will hopefully help understand what
is happening, reduce the guess work and minimize requests to reproduce
problems with debug patches.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Link: https://lore.kernel.org/r/20201205022235.2414110-1-sukadev@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-07 17:13:08 -08:00
Jakub Kicinski
55fd59b003 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:
	drivers/net/ethernet/ibm/ibmvnic.c

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03 15:44:09 -08:00
Thomas Falcon
ba246c1751 ibmvnic: Fix TX completion error handling
TX completions received with an error return code are not
being processed properly. When an error code is seen, do not
proceed to the next completion before cleaning up the existing
entry's data structures.

Fixes: 032c5e8284 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-01 10:09:04 -08:00
Thomas Falcon
b71ec95223 ibmvnic: Ensure that SCRQ entry reads are correctly ordered
Ensure that received Subordinate Command-Response Queue (SCRQ)
entries are properly read in order by the driver. These queues
are used in the ibmvnic device to process RX buffer and TX completion
descriptors. dma_rmb barriers have been added after checking for a
pending descriptor to ensure the correct descriptor entry is checked
and after reading the SCRQ descriptor to ensure the entire
descriptor is read before processing.

Fixes: 032c5e8284 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-01 10:09:04 -08:00
Dany Madden
98c41f04a6 ibmvnic: reduce wait for completion time
Reduce the wait time for Command Response Queue response from 30 seconds
to 20 seconds, as recommended by VIOS and Power Hypervisor teams.

Fixes: bd0b672313 ("ibmvnic: Move login and queue negotiation into ibmvnic_open")
Fixes: 53da09e929 ("ibmvnic: Add set_link_state routine for setting adapter link state")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-28 13:26:49 -08:00
Dany Madden
a86d5c682b ibmvnic: no reset timeout for 5 seconds after reset
Reset timeout is going off right after adapter reset. This patch ensures
that timeout is scheduled if it has been 5 seconds since the last reset.
5 seconds is the default watchdog timeout.

Fixes: ed651a1087 ("ibmvnic: Updated reset handling")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-28 13:26:48 -08:00
Dany Madden
c98d9cc417 ibmvnic: send_login should check for crq errors
send_login() does not check for the result of ibmvnic_send_crq() of the
login request. This results in the driver needlessly retrying the login
10 times even when CRQ is no longer active. Check the return code and
give up in case of errors in sending the CRQ.

The only time we want to retry is if we get a PARITALSUCCESS response
from the partner.

Fixes: 032c5e8284 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-28 13:26:48 -08:00
Sukadev Bhattiprolu
76cdc5c5d9 ibmvnic: track pending login
If after ibmvnic sends a LOGIN it gets a FAILOVER, it is possible that
the worker thread will start reset process and free the login response
buffer before it gets a (now stale) LOGIN_RSP. The ibmvnic tasklet will
then try to access the login response buffer and crash.

Have ibmvnic track pending logins and discard any stale login responses.

Fixes: 032c5e8284 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-28 13:26:48 -08:00
Sukadev Bhattiprolu
f15fde9d47 ibmvnic: delay next reset if hard reset fails
If auto-priority failover is enabled, the backing device needs time
to settle if hard resetting fails for any reason. Add a delay of 60
seconds before retrying the hard-reset.

Fixes: 2770a7984d ("ibmvnic: Introduce hard reset recovery")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-28 13:26:48 -08:00
Dany Madden
0cb4bc66ba ibmvnic: restore adapter state on failed reset
In a failed reset, driver could end up in VNIC_PROBED or VNIC_CLOSED
state and cannot recover in subsequent resets, leaving it offline.
This patch restores the adapter state to reset_state, the original
state when reset was called.

Fixes: b27507bb59 ("net/ibmvnic: unlock rtnl_lock in reset so linkwatch_event can run")
Fixes: 2770a7984d ("ibmvnic: Introduce hard reset recovery")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-28 13:26:48 -08:00
Dany Madden
9281cf2d58 ibmvnic: avoid memset null scrq msgs
scrq->msgs could be NULL during device reset, causing Linux to crash.
So, check before memset scrq->msgs.

Fixes: c8b2ad0a4a ("ibmvnic: Sanitize entire SCRQ buffer on reset")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-28 13:26:48 -08:00
Dany Madden
18f141bf97 ibmvnic: stop free_all_rwi on failed reset
When ibmvnic fails to reset, it breaks out of the reset loop and frees
all of the remaining resets from the workqueue. Doing so prevents the
adapter from recovering if no reset is scheduled after that. Instead,
have the driver continue to process resets on the workqueue.

Remove the no longer need free_all_rwi().

Fixes: ed651a1087 ("ibmvnic: Updated reset handling")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-28 13:26:48 -08:00
Dany Madden
31d6b40360 ibmvnic: handle inconsistent login with reset
Inconsistent login with the vnicserver is causing the device to be
removed. This does not give the device a chance to recover from error
state. This patch schedules a FATAL reset instead to bring the adapter
up.

Fixes: 032c5e8284 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-28 13:26:48 -08:00
Jakub Kicinski
5c39f26e67 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Trivial conflict in CAN, keep the net-next + the byteswap wrapper.

Conflicts:
	drivers/net/can/usb/gs_usb.c

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-27 18:25:27 -08:00
Lijun Pan
3ada288150 ibmvnic: enhance resetting status check during module exit
Based on the discussion with Sukadev Bhattiprolu and Dany Madden,
we believe that checking adapter->resetting bit is preferred
since RESETTING state flag is not as strict as resetting bit.
RESETTING state flag is removed since it is verbose now.

Fixes: 7d7195a026 ("ibmvnic: Do not process device remove during device reset")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-24 16:28:34 -08:00
Lijun Pan
0e435befae ibmvnic: fix NULL pointer dereference in ibmvic_reset_crq
crq->msgs could be NULL if the previous reset did not complete after
freeing crq->msgs. Check for NULL before dereferencing them.

Snippet of call trace:
...
ibmvnic 30000003 env3 (unregistering): Releasing sub-CRQ
ibmvnic 30000003 env3 (unregistering): Releasing CRQ
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc0000000000c1a30
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: ibmvnic(E-) rpadlpar_io rpaphp xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables xsk_diag tcp_diag udp_diag tun raw_diag inet_diag unix_diag bridge af_packet_diag netlink_diag stp llc rfkill sunrpc pseries_rng xts vmx_crypto uio_pdrv_genirq uio binfmt_misc ip_tables xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ibmvnic]
CPU: 20 PID: 8426 Comm: kworker/20:0 Tainted: G            E     5.10.0-rc1+ #12
Workqueue: events __ibmvnic_reset [ibmvnic]
NIP:  c0000000000c1a30 LR: c008000001b00c18 CTR: 0000000000000400
REGS: c00000000d05b7a0 TRAP: 0380   Tainted: G            E      (5.10.0-rc1+)
MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 44002480  XER: 20040000
CFAR: c0000000000c19ec IRQMASK: 0
GPR00: 0000000000000400 c00000000d05ba30 c008000001b17c00 0000000000000000
GPR04: 0000000000000000 0000000000000000 0000000000000000 00000000000001e2
GPR08: 000000000001f400 ffffffffffffd950 0000000000000000 c008000001b0b280
GPR12: c0000000000c19c8 c00000001ec72e00 c00000000019a778 c00000002647b440
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000006 0000000000000001 0000000000000003 0000000000000002
GPR24: 0000000000001000 c008000001b0d570 0000000000000005 c00000007ab5d550
GPR28: c00000007ab5c000 c000000032fcf848 c00000007ab5cc00 c000000032fcf800
NIP [c0000000000c1a30] memset+0x68/0x104
LR [c008000001b00c18] ibmvnic_reset_crq+0x70/0x110 [ibmvnic]
Call Trace:
[c00000000d05ba30] [0000000000000800] 0x800 (unreliable)
[c00000000d05bab0] [c008000001b0a930] do_reset.isra.40+0x224/0x634 [ibmvnic]
[c00000000d05bb80] [c008000001b08574] __ibmvnic_reset+0x17c/0x3c0 [ibmvnic]
[c00000000d05bc50] [c00000000018d9ac] process_one_work+0x2cc/0x800
[c00000000d05bd20] [c00000000018df58] worker_thread+0x78/0x520
[c00000000d05bdb0] [c00000000019a934] kthread+0x1c4/0x1d0
[c00000000d05be20] [c00000000000d5d0] ret_from_kernel_thread+0x5c/0x6c

Fixes: 032c5e8284 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-24 16:28:34 -08:00
Lijun Pan
a0faaa27c7 ibmvnic: fix NULL pointer dereference in reset_sub_crq_queues
adapter->tx_scrq and adapter->rx_scrq could be NULL if the previous reset
did not complete after freeing sub crqs. Check for NULL before
dereferencing them.

Snippet of call trace:
ibmvnic 30000006 env6: Releasing sub-CRQ
ibmvnic 30000006 env6: Releasing CRQ
...
ibmvnic 30000006 env6: Got Control IP offload Response
ibmvnic 30000006 env6: Re-setting tx_scrq[0]
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc008000003dea7cc
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: rpadlpar_io rpaphp xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag tun bridge stp llc rfkill sunrpc pseries_rng xts vmx_crypto uio_pdrv_genirq uio binfmt_misc ip_tables xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmvnic ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
CPU: 80 PID: 1856 Comm: kworker/80:2 Tainted: G        W         5.8.0+ #4
Workqueue: events __ibmvnic_reset [ibmvnic]
NIP:  c008000003dea7cc LR: c008000003dea7bc CTR: 0000000000000000
REGS: c0000007ef7db860 TRAP: 0380   Tainted: G        W          (5.8.0+)
MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28002422  XER: 0000000d
CFAR: c000000000bd9520 IRQMASK: 0
GPR00: c008000003dea7bc c0000007ef7dbaf0 c008000003df7400 c0000007fa26ec00
GPR04: c0000007fcd0d008 c0000007fcd96350 0000000000000027 c0000007fcd0d010
GPR08: 0000000000000023 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000002000 c00000001ec18e00 c0000000001982f8 c0000007bad6e840
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 fffffffffffffef7
GPR24: 0000000000000402 c0000007fa26f3a8 0000000000000003 c00000016f8ec048
GPR28: 0000000000000000 0000000000000000 0000000000000000 c0000007fa26ec00
NIP [c008000003dea7cc] ibmvnic_reset_init+0x15c/0x258 [ibmvnic]
LR [c008000003dea7bc] ibmvnic_reset_init+0x14c/0x258 [ibmvnic]
Call Trace:
[c0000007ef7dbaf0] [c008000003dea7bc] ibmvnic_reset_init+0x14c/0x258 [ibmvnic] (unreliable)
[c0000007ef7dbb80] [c008000003de8860] __ibmvnic_reset+0x408/0x970 [ibmvnic]
[c0000007ef7dbc50] [c00000000018b7cc] process_one_work+0x2cc/0x800
[c0000007ef7dbd20] [c00000000018bd78] worker_thread+0x78/0x520
[c0000007ef7dbdb0] [c0000000001984c4] kthread+0x1d4/0x1e0
[c0000007ef7dbe20] [c00000000000cea8] ret_from_kernel_thread+0x5c/0x74

Fixes: 57a49436f4 ("ibmvnic: Reset sub-crqs during driver reset")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-24 16:28:34 -08:00
Lijun Pan
855a631a4c ibmvnic: skip tx timeout reset while in resetting
Sometimes it takes longer than 5 seconds (watchdog timeout) to complete
failover, migration, and other resets. In stead of scheduling another
timeout reset, we wait for the current one to complete.

Suggested-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21 15:30:47 -08:00
Lijun Pan
98025bce3a ibmvnic: notify peers when failover and migration happen
Commit 61d3e1d9bc ("ibmvnic: Remove netdev notify for failover resets")
excluded the failover case for notify call because it said
netdev_notify_peers() can cause network traffic to stall or halt.
Current testing does not show network traffic stall
or halt because of the notify call for failover event.
netdev_notify_peers may be used when a device wants to inform the
rest of the network about some sort of a reconfiguration
such as failover or migration.

It is unnecessary to call that in other events like
FATAL, NON_FATAL, CHANGE_PARAM, and TIMEOUT resets
since in those scenarios the hardware does not change.
If the driver must do a hard reset, it is necessary to notify peers.

Fixes: 61d3e1d9bc ("ibmvnic: Remove netdev notify for failover resets")
Suggested-by: Brian King <brking@linux.vnet.ibm.com>
Suggested-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21 15:30:47 -08:00
Lijun Pan
8393597579 ibmvnic: fix call_netdevice_notifiers in do_reset
When netdev_notify_peers was substituted in
commit 986103e792 ("net/ibmvnic: Fix RTNL deadlock during device reset"),
call_netdevice_notifiers(NETDEV_RESEND_IGMP, dev) was missed.
Fix it now.

Fixes: 986103e792 ("net/ibmvnic: Fix RTNL deadlock during device reset")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-21 15:30:47 -08:00
Dwip N. Banerjee
41ed0a00ff ibmvnic: Do not replenish RX buffers after every polling loop
Reduce the amount of time spent replenishing RX buffers by only doing
so once available buffers has fallen under a certain threshold, in this
case half of the total number of buffers, or if the polling loop exits
before the packets processed is less than its budget. Non-exhaustion of
NAPI budget implies lower incoming packet pressure, allowing the leeway
to refill the buffers in preparation for any impending burst.

Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 19:50:41 -08:00
Dwip N. Banerjee
e552aa313b ibmvnic: Use netdev_alloc_skb instead of alloc_skb to replenish RX buffers
Take advantage of the additional optimizations in netdev_alloc_skb when
allocating socket buffers to be used for packet reception.

Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com>
Acked-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 19:50:34 -08:00
Dwip N. Banerjee
ec20f36bb4 ibmvnic: Correctly re-enable interrupts in NAPI polling routine
If the current NAPI polling loop exits without completing it's
budget, only re-enable interrupts if there are no entries remaining
in the queue and napi_complete_done is successful. If there are entries
remaining on the queue that were missed, restart the polling loop.

Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 19:50:34 -08:00
Dwip N. Banerjee
9a87c3fca2 ibmvnic: Ensure that device queue memory is cache-line aligned
PCI bus slowdowns were observed on IBM VNIC devices as a result
of partial cache line writes and non-cache aligned full cache line writes.
Ensure that packet data buffers are cache-line aligned to avoid these
slowdowns.

Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 19:50:34 -08:00