linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
Michael Chan	c54bc3ced5	bnxt_en: Release PCI regions when DMA mask setup fails during probe. Jump to init_err_release to cleanup. bnxt_unmap_bars() will also be called but it will do nothing if the BARs are not mapped yet. Fixes: `c0c050c58d` ("bnxt_en: New Broadcom ethernet driver.") Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/1605858271-8209-1-git-send-email-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-20 10:09:09 -08:00
Zhang Changzhong	3383176efc	bnxt_en: fix error return code in bnxt_init_board() Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: `c0c050c58d` ("bnxt_en: New Broadcom ethernet driver.") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Link: https://lore.kernel.org/r/1605792621-6268-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-19 21:49:01 -08:00
Zhang Changzhong	b5f796b62c	bnxt_en: fix error return code in bnxt_init_one() Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: `c213eae8d3` ("bnxt_en: Improve VF/PF link change logic.") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Link: https://lore.kernel.org/r/1605701851-20270-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-19 21:46:30 -08:00
Michael Chan	fa97f303fa	bnxt_en: Fix counter overflow logic. bnxt_add_one_ctr() adds a hardware counter to a software counter and adjusts for the hardware counter wraparound against the mask. The logic assumes that the hardware counter is always smaller than or equal to the mask. This assumption is mostly correct. But in some cases if the firmware is older and does not provide the accurate mask, the driver can use a mask that is smaller than the actual hardware mask. This can cause some extra carry bits to be added to the software counter, resulting in counters that far exceed the actual value. Fix it by masking the hardware counter with the mask passed into bnxt_add_one_ctr(). Fixes: `fea6b33355` ("bnxt_en: Accumulate all counters.") Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-16 17:39:46 -08:00
Michael Chan	eba93de6d3	bnxt_en: Free port stats during firmware reset. Firmware is unable to retain the port counters during any kind of fatal or non-fatal resets, so we must clear the port counters to avoid false detection of port counter overflow. Fixes: `fea6b33355` ("bnxt_en: Accumulate all counters.") Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-16 17:39:46 -08:00
Vasundhara Volam	825741b071	bnxt_en: Send HWRM_FUNC_RESET fw command unconditionally. In the AER or firmware reset flow, if we are in fatal error state or if pci_channel_offline() is true, we don't send any commands to the firmware because the commands will likely not reach the firmware and most commands don't matter much because the firmware is likely to be reset imminently. However, the HWRM_FUNC_RESET command is different and we should always attempt to send it. In the AER flow for example, the .slot_reset() call will trigger this fw command and we need to try to send it to effect the proper reset. Fixes: `b340dc680e` ("bnxt_en: Avoid sending firmware messages when AER error is detected.") Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 18:26:35 -07:00
Michael Chan	a1301f08c5	bnxt_en: Check abort error state in bnxt_open_nic(). bnxt_open_nic() is called during configuration changes that require the NIC to be closed and then opened. This call is protected by rtnl_lock. Firmware reset can be happening at the same time. Only critical portions of the entire firmware reset sequence are protected by the rtnl_lock. It is possible that bnxt_open_nic() can be called when the firmware reset sequence is aborting. In that case, bnxt_open_nic() needs to check if the ABORT_ERR flag is set and abort if it is. The configuration change that resulted in the bnxt_open_nic() call will fail but the NIC will be brought to a consistent IF_DOWN state. Without this patch, if bnxt_open_nic() were to continue in this error state, it may crash like this: [ 1648.659736] BUG: unable to handle kernel NULL pointer dereference at (null) [ 1648.659768] IP: [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en] [ 1648.659796] PGD 101e1b3067 PUD 101e1b2067 PMD 0 [ 1648.659813] Oops: 0000 [#1] SMP [ 1648.659825] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc dell_smbios dell_wmi_descriptor dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper vfat cryptd fat pcspkr ipmi_ssif sg k10temp i2c_piix4 wmi ipmi_si ipmi_devintf ipmi_msghandler tpm_crb acpi_power_meter sch_fq_codel ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci drm libahci megaraid_sas crct10dif_pclmul crct10dif_common [ 1648.660063] tg3 libata crc32c_intel bnxt_en(OE) drm_panel_orientation_quirks devlink ptp pps_core dm_mirror dm_region_hash dm_log dm_mod fuse [ 1648.660105] CPU: 13 PID: 3867 Comm: ethtool Kdump: loaded Tainted: G OE ------------ 3.10.0-1152.el7.x86_64 #1 [ 1648.660911] Hardware name: Dell Inc. PowerEdge R7515/0R4CNN, BIOS 1.2.14 01/28/2020 [ 1648.661662] task: ffff94e64cbc9080 ti: ffff94f55df1c000 task.ti: ffff94f55df1c000 [ 1648.662409] RIP: 0010:[<ffffffffc01e9b3a>] [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en] [ 1648.663171] RSP: 0018:ffff94f55df1fba8 EFLAGS: 00010202 [ 1648.663927] RAX: 0000000000000000 RBX: ffff94e6827e0000 RCX: 0000000000000000 [ 1648.664684] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94e6827e08c0 [ 1648.665433] RBP: ffff94f55df1fc20 R08: 00000000000001ff R09: 0000000000000008 [ 1648.666184] R10: 0000000000000d53 R11: ffff94f55df1f7ce R12: ffff94e6827e08c0 [ 1648.666940] R13: ffff94e6827e08c0 R14: ffff94e6827e08c0 R15: ffffffffb9115e40 [ 1648.667695] FS: 00007f8aadba5740(0000) GS:ffff94f57eb40000(0000) knlGS:0000000000000000 [ 1648.668447] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1648.669202] CR2: 0000000000000000 CR3: 0000001022772000 CR4: 0000000000340fe0 [ 1648.669966] Call Trace: [ 1648.670730] [<ffffffffc01f1d5d>] ? bnxt_need_reserve_rings+0x9d/0x170 [bnxt_en] [ 1648.671496] [<ffffffffc01fa7ea>] __bnxt_open_nic+0x8a/0x9a0 [bnxt_en] [ 1648.672263] [<ffffffffc01f7479>] ? bnxt_close_nic+0x59/0x1b0 [bnxt_en] [ 1648.673031] [<ffffffffc01fb11b>] bnxt_open_nic+0x1b/0x50 [bnxt_en] [ 1648.673793] [<ffffffffc020037c>] bnxt_set_ringparam+0x6c/0xa0 [bnxt_en] [ 1648.674550] [<ffffffffb8a5f564>] dev_ethtool+0x1334/0x21a0 [ 1648.675306] [<ffffffffb8a719ff>] dev_ioctl+0x1ef/0x5f0 [ 1648.676061] [<ffffffffb8a324bd>] sock_do_ioctl+0x4d/0x60 [ 1648.676810] [<ffffffffb8a326bb>] sock_ioctl+0x1eb/0x2d0 [ 1648.677548] [<ffffffffb8663230>] do_vfs_ioctl+0x3a0/0x5b0 [ 1648.678282] [<ffffffffb8b8e678>] ? __do_page_fault+0x238/0x500 [ 1648.679016] [<ffffffffb86634e1>] SyS_ioctl+0xa1/0xc0 [ 1648.679745] [<ffffffffb8b93f92>] system_call_fastpath+0x25/0x2a [ 1648.680461] Code: 9e 60 01 00 00 0f 1f 40 00 45 8b 8e 48 01 00 00 31 c9 45 85 c9 0f 8e 73 01 00 00 66 0f 1f 44 00 00 49 8b 86 a8 00 00 00 48 63 d1 <48> 8b 14 d0 48 85 d2 0f 84 46 01 00 00 41 8b 86 44 01 00 00 c7 [ 1648.681986] RIP [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en] [ 1648.682724] RSP <ffff94f55df1fba8> [ 1648.683451] CR2: 0000000000000000 Fixes: `ec5d31e3c1` ("bnxt_en: Handle firmware reset status during IF_UP.") Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 18:26:35 -07:00
Vasundhara Volam	f75d9a0aa9	bnxt_en: Re-write PCI BARs after PCI fatal error. When a PCIe fatal error occurs, the internal latched BAR addresses in the chip get reset even though the BAR register values in config space are retained. pci_restore_state() will not rewrite the BAR addresses if the BAR address values are valid, causing the chip's internal BAR addresses to stay invalid. So we need to zero the BAR registers during PCIe fatal error to force pci_restore_state() to restore the BAR addresses. These write cycles to the BAR registers will cause the proper BAR addresses to latch internally. Fixes: `6316ea6db9` ("bnxt_en: Enable AER support.") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 18:26:35 -07:00
Vasundhara Volam	631ce27a30	bnxt_en: Invoke cancel_delayed_work_sync() for PFs also. As part of the commit `b148bb238c` ("bnxt_en: Fix possible crash in bnxt_fw_reset_task()."), cancel_delayed_work_sync() is called only for VFs to fix a possible crash by cancelling any pending delayed work items. It was assumed by mistake that the flush_workqueue() call on the PF would flush delayed work items as well. As flush_workqueue() does not cancel the delayed workqueue, extend the fix for PFs. This fix will avoid the system crash, if there are any pending delayed work items in fw_reset_task() during driver's .remove() call. Unify the workqueue cleanup logic for both PF and VF by calling cancel_work_sync() and cancel_delayed_work_sync() directly in bnxt_remove_one(). Fixes: `b148bb238c` ("bnxt_en: Fix possible crash in bnxt_fw_reset_task().") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 18:26:35 -07:00
Vasundhara Volam	21d6a11e2c	bnxt_en: Fix regression in workqueue cleanup logic in bnxt_remove_one(). A recent patch has moved the workqueue cleanup logic before calling unregister_netdev() in bnxt_remove_one(). This caused a regression because the workqueue can be restarted if the device is still open. Workqueue cleanup must be done after unregister_netdev(). The workqueue will not restart itself after the device is closed. Call bnxt_cancel_sp_work() after unregister_netdev() and call bnxt_dl_fw_reporters_destroy() after that. This fixes the regession and the original NULL ptr dereference issue. Fixes: `b16939b59c` ("bnxt_en: Fix NULL ptr dereference crash in bnxt_fw_reset_task()") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-26 18:26:35 -07:00
Vasundhara Volam	4933f6753b	bnxt_en: Add bnxt_hwrm_nvm_get_dev_info() to query NVM info. Add a new bnxt_hwrm_nvm_get_dev_info() to query firmware version information via NVM_GET_DEV_INFO firmware command. Use it to get the running version of the NVM configuration information. This new function will also be used in subsequent patches to get the stored firmware versions. Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/1602493854-29283-8-git-send-email-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-12 14:27:03 -07:00
Michael Chan	8eddb3e7ce	bnxt_en: Log unknown link speed appropriately. If the VF virtual link is set to always enabled, the speed may be unknown when the physical link is down. The driver currently logs the link speed as 4294967295 Mbps which is SPEED_UNKNOWN. Modify the link up log message as "speed unknown" which makes more sense. Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/1602493854-29283-7-git-send-email-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-12 14:27:03 -07:00
Michael Chan	c966c67c09	bnxt_en: Log event_data1 and event_data2 when handling RESET_NOTIFY event. Log these values that contain useful firmware state information. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/1602493854-29283-6-git-send-email-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-12 14:27:03 -07:00
Michael Chan	03ab8ca1e9	bnxt_en: Simplify bnxt_async_event_process(). event_data1 and event_data2 are used when processing most events. Store these in local variables at the beginning of the function to simplify many of the case statements. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/1602493854-29283-5-git-send-email-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-12 14:27:03 -07:00
Michael Chan	8fb35cd302	bnxt_en: Set driver default message level. Currently, bp->msg_enable has default value of 0. It is more useful to have the commonly used NETIF_MSG_DRV and NETIF_MSG_HW enabled by default. v2: Change the fall back bnxt_reset_task() inside bnxt_rx_ring_reset() to silent mode. With older fw, we would take the fall back path and it would be very noisy. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/1602493854-29283-4-git-send-email-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-12 14:27:03 -07:00
Vasundhara Volam	cf223bfaf7	bnxt_en: Return -EROFS to user space, if NVM writes are not permitted. If NVRAM resources are locked, NVM writes are not permitted. In such scenarios, firmware returns HWRM_ERR_CODE_RESOURCE_LOCKED error to firmware commands. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/1602493854-29283-2-git-send-email-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-12 14:27:02 -07:00
Michael Chan	8d4bd96b54	bnxt_en: Eliminate unnecessary RX resets. Currently, the driver will schedule RX ring reset when we get a buffer error in the RX completion record. These RX buffer errors can be due to normal out-of-buffer conditions or a permanent error in the RX ring. Because the driver cannot distinguish between these 2 conditions, we assume all these buffer errors require reset. This is very disruptive when it is just a normal out-of-buffer condition. Newer firmware will now monitor the rings for the permanent failure and will send a notification to the driver when it happens. This allows the driver to reset only when such a notification is received. In environments where we have predominently out-of-buffer conditions, we now can avoid these unnecessary resets. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-04 14:41:05 -07:00
Michael Chan	1b5c8b63d6	bnxt_en: Reduce unnecessary message log during RX errors. There is logic in the RX path to detect unexpected handles in the RX completion. We'll print a warning and schedule a reset. The next expected handle is then set to 0xffff which is guaranteed to not match any valid handle. This will force all remaining packets in the ring to be discarded before the reset. There can be hundreds of these packets remaining in the ring and there is no need to print the warnings for these forced errors. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-04 14:41:05 -07:00
Michael Chan	8a27d4b9e5	bnxt_en: Add a software counter for RX ring reset. Add a per ring rx_resets counter to count these RX resets. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-04 14:41:05 -07:00
Michael Chan	8fbf58e17d	bnxt_en: Implement RX ring reset in response to buffer errors. On some older chips, it is necessary to do a reset when we get buffer errors associated with an RX ring. These buffer errors may become frequent if the RX ring underruns under heavy traffic. The current code does a global reset of all reasources when this happens. This works but creates a big disruption of all rings when one RX ring is having problem. This patch implements a localized RX ring reset of just the RX ring having the issue. All other rings including all TX rings will not be affected by this single RX ring reset. Only the older chips prior to the P5 class supports this reset. Because it is not a global reset, packets may still be arriving while we are calling firmware to reset that ring. We need to be sure that we don't post any buffers during this time while the ring is undergoing reset. After firmware completes successfully, the ring will be in the reset state with no buffers and we can start filling it with new buffers and posting them. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-04 14:41:05 -07:00
Michael Chan	7737d325f8	bnxt_en: Refactor bnxt_init_one_rx_ring(). bnxt_init_one_rx_ring() includes logic to initialize the BDs for one RX ring and to allocate the buffers. Separate the allocation logic into a new bnxt_alloc_one_rx_ring() function. The allocation function will be used later to allocate new buffers for one specified RX ring when we reset that RX ring. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-04 14:41:05 -07:00
Michael Chan	975bc99a4a	bnxt_en: Refactor bnxt_free_rx_skbs(). bnxt_free_rx_skbs() frees all the allocated buffers and SKBs for every RX ring. Refactor this function by calling a new function bnxt_free_one_rx_ring_skbs() to free these buffers on one specified RX ring at a time. This is preparation work for resetting one RX ring during run-time. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-04 14:41:05 -07:00
Michael Chan	fc8864e0b6	bnxt_en: Log FW health status info, if reset is aborted. If firmware does not come out of reset, log FW health status info to provide more information on firmware status. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-04 14:41:05 -07:00
Edwin Peer	87f7ab8d6f	bnxt_en: perform no master recovery during startup The NS3 SoC platforms require assistance from the OP-TEE to recover firmware if a crash occurs while no driver is bound. The CRASHED_NO_MASTER condition is recorded in the firmware status register during the crash to indicate when driver intervension is needed to coordinate a firmware reload. This condition is detected during early driver initialization in order to effect a firmware fastboot on supported platforms when necessary. Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-04 14:41:05 -07:00
Edwin Peer	ba02629ff6	bnxt_en: log firmware status on firmware init failure Firmware now supports device independent discovery of the status register location. This status register can provide more detailed information about firmware errors, especially if problems occur before the HWRM interface is functioning. Attempt to map this register if it is present and report the firmware status on firmware init failures. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-04 14:41:05 -07:00
Edwin Peer	3e9ec2bb93	bnxt_en: refactor bnxt_alloc_fw_health() The allocator for the firmware health structure conflates allocation and capability checks, limiting the reusability of the code. This patch separates out the capability check and disablement and improves the warning message to better describe the consequences of an allocation failure. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-04 14:41:05 -07:00
Michael Chan	ccd6a9dcab	bnxt_en: Implement ethtool set_fec_param() method. This feature allows the user to set the different FEC modes on the NIC port. Any new setting will take effect immediately after a link toggle. Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-27 13:35:46 -07:00
Michael Chan	2046e3c356	bnxt_en: Report Active FEC encoding during link up. The current code is reporting the FEC configured settings during link up. Change it to report the more useful active FEC encoding that may be negotiated or auto detected. Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-27 13:35:46 -07:00
Michael Chan	8b2775890a	bnxt_en: Report FEC settings to ethtool. Implement .get_fecparam() method to report the configured and active FEC settings. Also report the supported and advertised FEC settings to the .get_link_ksettings() method. Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-27 13:35:46 -07:00
Michael Chan	3128e811b1	bnxt_en: Handle ethernet link being disabled by firmware. On some 200G dual port NICs, if one port is configured to 200G, firmware will disable the ethernet link on the other port. Firmware will send notification to the driver for the disabled port when this happens. Define a new field in the link_info structure to keep track of this state. The new phy_state field replaces the unused loop_back field. Log a message when the phy_state changes state. In the disabled state, disallow any PHY configurations on the disabled port as the firmware will fail all calls to configure the PHY in this state. Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-27 13:35:46 -07:00
Edwin Peer	d058426ea8	bnxt_en: add basic infrastructure to support PAM4 link speeds The firmware interface has added support for new link speeds using PAM4 modulation. Expand the bnxt_link_info structure to closely mirror the new firmware structures. Add logic to copy the PAM4 capabilities and settings from the firmware. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-27 13:35:46 -07:00
Edwin Peer	c916062a89	bnxt_en: refactor code to limit speed advertising Extract the code for determining an advertised speed is no longer supported into a separate function. This will avoid some code duplication in a later patch when supporting PAM4 speeds, since these speeds are specified in a separate field. Reviewed-by: Scott Branden <scott.branden@broadcom.com> Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-27 13:35:45 -07:00
Michael Chan	9d6b648c31	bnxt_en: Update firmware interface spec to 1.10.1.65. The main changes include FEC, ECN statistics, HWRM_PORT_PHY_QCFG response size reduction, and a new counter added to ctx_hw_stats_ext struct to support the new 58818 chip. The ctx_hw_stats_ext structure is now the superset supporting the new 58818 chips and the prior P5 chips. Add a new flag to identify the new chip and use constants for the chip specific ring statistics sizes instead of the size of the structure. Because the HWRM_PORT_PHY_QCFG response structure size has shrunk back to 96 bytes, the workaround added earlier to limit the size of this message for forwarding to the VF can be removed. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-27 13:35:45 -07:00
David S. Miller	3ab0a7a0c3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Two minor conflicts: 1) net/ipv4/route.c, adding a new local variable while moving another local variable and removing it's initial assignment. 2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes. One pretty prints the port mode differently, whilst another changes the driver to try and obtain the port mode from the port node rather than the switch node. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-22 16:45:34 -07:00
Michael Chan	c07fa08f02	bnxt_en: Fix wrong flag value passed to HWRM_PORT_QSTATS_EXT fw call. The wrong flag value caused the firmware call to return actual port counters instead of the counter masks. This messed up the counter overflow logic and caused erratic extended port counters to be displayed under ethtool -S. Fixes: `531d1d269c` ("bnxt_en: Retrieve hardware masks for port counters.") Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-20 19:04:45 -07:00
Michael Chan	d2b42d010f	bnxt_en: Fix HWRM_FUNC_QSTATS_EXT firmware call. Fix it to set the required fid input parameter. The firmware call fails without this patch. Fixes: `d752d0536c` ("bnxt_en: Retrieve hardware counter masks from firmware if available.") Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-20 19:04:44 -07:00
Edwin Peer	d69753fa1e	bnxt_en: return proper error codes in bnxt_show_temp Returning "unknown" as a temperature value violates the hwmon interface rules. Appropriate error codes should be returned via device_attribute show instead. These will ultimately be propagated to the user via the file system interface. In addition to the corrected error handling, it is an even better idea to not present the sensor in sysfs at all if it is known that the read will definitely fail. Given that temp1_input is currently the only sensor reported, ensure no hwmon registration if TEMP_MONITOR_QUERY is not supported or if it will fail due to access permissions. Something smarter may be needed if and when other sensors are added. Fixes: `12cce90b93` ("bnxt_en: fix HWRM error when querying VF temperature") Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-20 19:04:44 -07:00
Vasundhara Volam	492adcf481	bnxt_en: Use memcpy to copy VPD field info. Using strlcpy() to copy from VPD is not correct because VPD strings are not necessarily NULL terminated. Use memcpy() to copy the VPD length up to the destination buffer size - 1. The destination is zeroed memory so it will always be NULL terminated. Fixes: `a0d0fd70fe` ("bnxt_en: Read partno and serialno of the board from VPD") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-20 19:04:44 -07:00
Jakub Kicinski	5198d545db	net: remove napi_hash_del() from driver-facing API We allow drivers to call napi_hash_del() before calling netif_napi_del() to batch RCU grace periods. This makes the API asymmetric and leaks internal implementation details. Soon we will want the grace period to protect more than just the NAPI hash table. Restructure the API and have drivers call a new function - __netif_napi_del() if they want to take care of RCU waits. Note that only core was checking the return status from napi_hash_del() so the new helper does not report if the NAPI was actually deleted. Some notes on driver oddness: - veth observed the grace period before calling netif_napi_del() but that should not matter - myri10ge observed normal RCU flavor - bnx2x and enic did not actually observe the grace period (unless they did so implicitly) - virtio_net and enic only unhashed Rx NAPIs The last two points seem to indicate that the calls to napi_hash_del() were a left over rather than an optimization. Regardless, it's easy enough to correct them. This patch may introduce extra synchronize_net() calls for interfaces which set NAPI_STATE_NO_BUSY_POLL and depend on free_netdev() to call netif_napi_del(). This seems inevitable since we want to use RCU for netpoll dev->napi_list traversal, and almost no drivers set IFF_DISABLE_NETPOLL. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-10 13:08:46 -07:00
Vasundhara Volam	b16939b59c	bnxt_en: Fix NULL ptr dereference crash in bnxt_fw_reset_task() bnxt_fw_reset_task() which runs from a workqueue can race with bnxt_remove_one(). For example, if firmware reset and VF FLR are happening at about the same time. bnxt_remove_one() already cancels the workqueue and waits for it to finish, but we need to do this earlier before the devlink reporters are destroyed. This will guarantee that the devlink reporters will always be valid when bnxt_fw_reset_task() is still running. Fixes: `b148bb238c` ("bnxt_en: Fix possible crash in bnxt_fw_reset_task().") Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 10:08:18 -07:00
Vasundhara Volam	b340dc680e	bnxt_en: Avoid sending firmware messages when AER error is detected. When the driver goes through PCIe AER reset in error state, all firmware messages will timeout because the PCIe bus is no longer accessible. This can lead to AER reset taking many minutes to complete as each firmware command takes time to timeout. Define a new macro BNXT_NO_FW_ACCESS() to skip these firmware messages when either firmware is in fatal error state or when pci_channel_offline() is true. It now takes a more reasonable 20 to 30 seconds to complete AER recovery. Fixes: `b4fff2079d` ("bnxt_en: Do not send firmware messages if firmware is in error state.") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 10:08:18 -07:00
Linus Torvalds	3e8d3bdc2a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: 1) Use netif_rx_ni() when necessary in batman-adv stack, from Jussi Kivilinna. 2) Fix loss of RTT samples in rxrpc, from David Howells. 3) Memory leak in hns_nic_dev_probe(), from Dignhao Liu. 4) ravb module cannot be unloaded, fix from Yuusuke Ashizuka. 5) We disable BH for too lokng in sctp_get_port_local(), add a cond_resched() here as well, from Xin Long. 6) Fix memory leak in st95hf_in_send_cmd, from Dinghao Liu. 7) Out of bound access in bpf_raw_tp_link_fill_link_info(), from Yonghong Song. 8) Missing of_node_put() in mt7530 DSA driver, from Sumera Priyadarsini. 9) Fix crash in bnxt_fw_reset_task(), from Michael Chan. 10) Fix geneve tunnel checksumming bug in hns3, from Yi Li. 11) Memory leak in rxkad_verify_response, from Dinghao Liu. 12) In tipc, don't use smp_processor_id() in preemptible context. From Tuong Lien. 13) Fix signedness issue in mlx4 memory allocation, from Shung-Hsi Yu. 14) Missing clk_disable_prepare() in gemini driver, from Dan Carpenter. 15) Fix ABI mismatch between driver and firmware in nfp, from Louis Peens. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (110 commits) net/smc: fix sock refcounting in case of termination net/smc: reset sndbuf_desc if freed net/smc: set rx_off for SMCR explicitly net/smc: fix toleration of fake add_link messages tg3: Fix soft lockup when tg3_reset_task() fails. doc: net: dsa: Fix typo in config code sample net: dp83867: Fix WoL SecureOn password nfp: flower: fix ABI mismatch between driver and firmware tipc: fix shutdown() of connectionless socket ipv6: Fix sysctl max for fib_multipath_hash_policy drivers/net/wan/hdlc: Change the default of hard_header_len to 0 net: gemini: Fix another missing clk_disable_unprepare() in probe net: bcmgenet: fix mask check in bcmgenet_validate_flow() amd-xgbe: Add support for new port mode net: usb: dm9601: Add USB ID of Keenetic Plus DSL vhost: fix typo in error message net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init() pktgen: fix error message with wrong function name net: ethernet: ti: am65-cpsw: fix rmii 100Mbit link mode cxgb4: fix thermal zone device registration ...	2020-09-03 18:50:48 -07:00
Jakub Kicinski	96ecdcc992	bnxt: don't enable NAPI until rings are ready Netpoll can try to poll napi as soon as napi_enable() is called. It crashes trying to access a doorbell which is still NULL: BUG: kernel NULL pointer dereference, address: 0000000000000000 CPU: 59 PID: 6039 Comm: ethtool Kdump: loaded Tainted: G S 5.9.0-rc1-00469-g5fd99b5d9950-dirty #26 RIP: 0010:bnxt_poll+0x121/0x1c0 Code: c4 20 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 41 8b 86 a0 01 00 00 41 23 85 18 01 00 00 49 8b 96 a8 01 00 00 0d 00 00 00 24 <89> 02 41 f6 45 77 02 74 cb 49 8b ae d8 01 00 00 31 c0 c7 44 24 1a netpoll_poll_dev+0xbd/0x1a0 __netpoll_send_skb+0x1b2/0x210 netpoll_send_udp+0x2c9/0x406 write_ext_msg+0x1d7/0x1f0 console_unlock+0x23c/0x520 vprintk_emit+0xe0/0x1d0 printk+0x58/0x6f x86_vector_activate.cold+0xf/0x46 __irq_domain_activate_irq+0x50/0x80 __irq_domain_activate_irq+0x32/0x80 __irq_domain_activate_irq+0x32/0x80 irq_domain_activate_irq+0x25/0x40 __setup_irq+0x2d2/0x700 request_threaded_irq+0xfb/0x160 __bnxt_open_nic+0x3b1/0x750 bnxt_open_nic+0x19/0x30 ethtool_set_channels+0x1ac/0x220 dev_ethtool+0x11ba/0x2240 dev_ioctl+0x1cf/0x390 sock_do_ioctl+0x95/0x130 Reported-by: Rob Sherwood <rsher@fb.com> Fixes: `c0c050c58d` ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-26 16:16:39 -07:00
Michael Chan	b43b9f53fb	bnxt_en: Setup default RSS map in all scenarios. The recent changes to support user-defined RSS map assume that RX rings are always reserved and the default RSS map is set after the RX rings are successfully reserved. If the firmware spec is older than 1.6.1, no ring reservations are required and the default RSS map is not setup at all. In another scenario where the fw Resource Manager is older, RX rings are not reserved and we also end up with no valid RSS map. Fix both issues in bnxt_need_reserve_rings(). In both scenarios described above, we don't need to reserve RX rings so we need to call this new function bnxt_check_rss_map_no_rmgr() to setup the default RSS map when needed. Without valid RSS map, the NIC won't receive packets properly. Fixes: `1667cbf6a4` ("bnxt_en: Add logical RSS indirection table structure.") Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-26 07:19:03 -07:00
Edwin Peer	5fa65524f6	bnxt_en: init RSS table for Minimal-Static VF reservation There are no VF rings available during probe when the device is configured using the Minimal-Static reservation strategy. In this case, the RSS indirection table can only be initialized later, during bnxt_open_nic(). However, this was not happening because the rings will already have been reserved via bnxt_init_dflt_ring_mode(), causing bnxt_need_reserve_rings() to return false in bnxt_reserve_rings() and bypass the RSS table init. Solve this by pushing the call to bnxt_set_dflt_rss_indir_tbl() into __bnxt_reserve_rings(), which is common to both paths and is called whenever ring configuration is changed. After doing this, the RSS table init that must be called from bnxt_init_one() happens implicitly via bnxt_set_default_rings(), necessitating doing the allocation earlier in order to avoid a null pointer dereference. Fixes: `bd3191b5d8` ("bnxt_en: Implement ethtool -X to set indirection table.") Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-26 07:19:03 -07:00
Edwin Peer	12cce90b93	bnxt_en: fix HWRM error when querying VF temperature Firmware returns RESOURCE_ACCESS_DENIED for HWRM_TEMP_MONITORY_QUERY for VFs. This produces unpleasing error messages in the log when temp1_input is queried via the hwmon sysfs interface from a VF. The error is harmless and expected, so silence it and return unknown as the value. Since the device temperature is not particularly sensitive information, provide flexibility to change this policy in future by silencing the error rather than avoiding the HWRM call entirely for VFs. Fixes: `cde49a42a9` ("bnxt_en: Add hwmon sysfs support to read temperature") Cc: Marc Smith <msmith626@gmail.com> Reported-by: Marc Smith <msmith626@gmail.com> Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-26 07:19:03 -07:00
Michael Chan	b148bb238c	bnxt_en: Fix possible crash in bnxt_fw_reset_task(). bnxt_fw_reset_task() is run from a delayed workqueue. The current code is not cancelling the workqueue in the driver's .remove() method and it can potentially crash if the device is removed with the workqueue still pending. The fix is to clear the BNXT_STATE_IN_FW_RESET flag and then cancel the delayed workqueue in bnxt_remove_one(). bnxt_queue_fw_reset_work() also needs to check that this flag is set before scheduling. This will guarantee that no rescheduling will be done after it is cancelled. Fixes: `230d1f0de7` ("bnxt_en: Handle firmware reset.") Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-26 07:19:03 -07:00
Vasundhara Volam	df3875ec55	bnxt_en: Fix PCI AER error recovery flow When a PCI error is detected the PCI state could be corrupt, save the PCI state after initialization and restore it after the slot reset. Fixes: `6316ea6db9` ("bnxt_en: Enable AER support.") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-08-26 07:19:03 -07:00
Gustavo A. R. Silva	df561f6688	treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>	2020-08-23 17:36:59 -05:00
Li Heng	bbf1b94a73	bnxt_en: Remove superfluous memset() Fixes coccicheck warning: ./drivers/net/ethernet/broadcom/bnxt/bnxt.c:3730:19-37: WARNING: dma_alloc_coherent use in stats -> hw_stats already zeroes out memory, so memset is not needed dma_alloc_coherent use in status already zeroes out memory, so memset is not needed Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Li Heng <liheng40@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-30 17:41:05 -07:00

1 2 3 4 5 ...

706 commits