Commit 25733c4e67 ("ath10k: pci: use mutex for diagnostic window CE
polling") introduced a regression where we try to sleep (grab a mutex)
in an atomic context:
[ 233.602619] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:254
[ 233.602626] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
[ 233.602636] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.1.0-rc2 #4
[ 233.602642] Hardware name: Google Scarlet (DT)
[ 233.602647] Call trace:
[ 233.602663] dump_backtrace+0x0/0x11c
[ 233.602672] show_stack+0x20/0x28
[ 233.602681] dump_stack+0x98/0xbc
[ 233.602690] ___might_sleep+0x154/0x16c
[ 233.602696] __might_sleep+0x78/0x88
[ 233.602704] mutex_lock+0x2c/0x5c
[ 233.602717] ath10k_pci_diag_read_mem+0x68/0x21c [ath10k_pci]
[ 233.602725] ath10k_pci_diag_read32+0x48/0x74 [ath10k_pci]
[ 233.602733] ath10k_pci_dump_registers+0x5c/0x16c [ath10k_pci]
[ 233.602741] ath10k_pci_fw_crashed_dump+0xb8/0x548 [ath10k_pci]
[ 233.602749] ath10k_pci_napi_poll+0x60/0x128 [ath10k_pci]
[ 233.602757] net_rx_action+0x140/0x388
[ 233.602766] __do_softirq+0x1b0/0x35c
[...]
ath10k_pci_fw_crashed_dump() is called from NAPI contexts, and firmware
memory dumps are retrieved using the diag memory interface.
A simple reproduction case is to run this on QCA6174A /
WLAN.RM.4.4.1-00132-QCARMSWP-1, which happens to be a way to b0rk the
firmware:
dd if=/sys/kernel/debug/ieee80211/phy0/ath10k/mem_value bs=4K count=1
of=/dev/null
(NB: simulated firmware crashes, via debugfs, don't trigger firmware
dumps.)
The fix is to move the crash-dump into a workqueue context, and avoid
relying on 'data_lock' for most mutual exclusion. We only keep using it
here for protecting 'fw_crash_counter', while the rest of the coredump
buffers are protected by a new 'dump_mutex'.
I've tested the above with simulated firmware crashes (debugfs 'reset'
file), real firmware crashes (the 'dd' command above), and a variety of
reboot and suspend/resume configurations on QCA6174A.
Reported here:
http://lkml.kernel.org/linux-wireless/20190325202706.GA68720@google.com
Fixes: 25733c4e67 ("ath10k: pci: use mutex for diagnostic window CE polling")
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
When the driver receives the tx completion of the
descriptor over ce, it clears the nbytes configured
for that particular descriptor. WCN3990 uses ce
descriptors with 64-bit address.
Currently during handling the tx completion of the
descriptors, the nbytes are accessed from the descriptors
using ce_desc for 32-bit targets. This will lead to clearing
of memory at incorrect offset if DMA MASK is set to greater
than 32 bits.
Attach different ce tx copy completed handler for targets
using address above 32-bit address.
Tested HW: WCN3990
Tested FW: WLAN.HL.2.0-01387-QCAHLSWMTPLZ-1
Signed-off-by: Rakesh Pillai <pillair@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Use SPDX identifiers everywhere in ath10k.
Makefile was incorrectly marked in commit b24413180f ("License cleanup: add
SPDX GPL-2.0 license identifier to files with no license"), fix that as well.
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Commit 750afb08ca ("cross-tree: phase out dma_zalloc_coherent()") introduced
a new checkpatch warning:
drivers/net/wireless/ath/ath10k/ce.c:1602: line over 90 characters
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
WCN3990 supports shadow registers write operation support
for copy engine for regular operation in powersave mode.
Since WCN3990 is a 64-bit target, the shadow register
implementation needs to be done in the copy engine handlers
for 64-bit target. Currently the shadow register implementation
is present in the 32-bit target handlers of copy engine.
Fix the shadow register copy engine write operation
implementation for 64-bit target(WCN3990).
Tested HW: WCN3990
Tested FW: WLAN.HL.2.0-01188-QCAHLSWMTPLZ-1
Fixes: b7ba83f7c4 ("ath10k: add support for shadow register for WNC3990")
Signed-off-by: Rakesh Pillai <pillair@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
ath.git patches for 5.1. Major changes:
ath10k
* change QMI interface to support the new (and backwards incompatible)
interface from HL3.1 and used in recent HL2.0 branch firmware releases
ath
* add new country codes for US
WCN3990 is a 37-bit target but can address memory range
only upto 35 bits. The 36th bit is used to control the
smmu/iommu translation and the 37th bit is used by the
internal bus masters to access the wifi subsystem internal
SRAM. With the DMA mask set to 37i-bit, the host driver
can get 37-bit dma address, which leads to incorrect
address access in the target.
Hence the host driver can used addresses upto 35-bit
for WCN3990. Fix the dma mask for wcn3990 to 35-bit,
instead of 37-bit.
Tested HW: WCN3990
Tested FW: WLAN.HL.2.0-01188-QCAHLSWMTPLZ-1
Tested-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Rakesh Pillai <pillair@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
We already need to zero out memory for dma_alloc_coherent(), as such
using dma_zalloc_coherent() is superflous. Phase it out.
This change was generated with the following Coccinelle SmPL patch:
@ replace_dma_zalloc_coherent @
expression dev, size, data, handle, flags;
@@
-dma_zalloc_coherent(dev, size, handle, flags)
+dma_alloc_coherent(dev, size, handle, flags)
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
[hch: re-ran the script on the latest tree]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Existing copy engine interrupt enable logic assumes that last
CE is using polling mode and due to this interrupt for last copy engine
are always disabled. WCN3990 uses last CE for pktlog and
interrupt remains disabled with existing logic.
To mitigate this issue, introduce CE_ATTR_POLL flag and control
the interrupt based on the flag which can be set in ce_attr.
Testing:
Tested on WCN3990 and QCA6174 HW.
Tested FW: WLAN.HL.2.0-01192-QCAHLSWMTPLZ-1,
WLAN.RM.4.4.1-00109-QCARMSWPZ-1
Signed-off-by: Govind Singh <govinds@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct foo {
int stuff;
void *entry[];
};
instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:
instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);
This issue was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
sizeof(struct ce_desc) should be a copy-paste mistake
just use sizeof(struct ce_desc_64) to avoid mem leak
Fixes: b7ba83f7c4 ("ath10k: add support for shadow register for WNC3990")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
SRRI/DRRI are not mapped in the HW Shadow block and can lead
to un-clocked access if common subsystem in the target is
powered down due to idle mode.
To mitigate this problem SRRI/DRRI can be read from
DDR instead of doing an actual hardware read.
Host allocates non cached memory on ddr and configures
the physical address of this memory to the CE hardware.
The hardware updates the RRI on this particular location.
Read SRRI/DRRI from DDR location instead of
direct target read.
Enable retention restore on ddr using hw params to enable
in specific targets.
Signed-off-by: Govind Singh <govinds@codeaurora.org>
Signed-off-by: Rakesh Pillai <pillair@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
WCN3990 needs shadow register write operation support
for copy engine for regular operation in powersave mode.
Add support for copy engine shadow register write in
datapath tx for WCN3990
Signed-off-by: Rakesh Pillai <pillair@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Correct a minor bug in the commit 0628467f97 ("ath10k: fix
copy engine 5 destination ring stuck") which introduced a change to fix
firmware assert that happens when ring indices of copy engine 5 are stuck
for a specific duration, problem with this fix is that it did not use
ring arithmatic. As a result,firmware asserts did not go away entirely
athough the frequency of occurrence has reduced. Using ring arithmatic
to fix the issue.
Tested on QCA9984(fw version-10.4-3.4-00082).
Fixes: 0628467f97 ("ath10k: fix copy engine 5 destination ring stuck)
Signed-off-by: Manikanta Pubbisetty <mpubbise@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
CE layer is shared between pci and snoc target and results
in duplicate object inclusion if both modules are compiled
together statically and undefined KBUILD_MODNAME if
compiled as module.
Fix this by building ce layer in ath10k core module by
adding ce object inclusion with ATH10K_CE boolean CONFIG.
Signed-off-by: Govind Singh <govinds@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
WCN3990 CE descriptor uses 64bit address for
src/dst ring buffer. It has extended field for toeplitz
hash result, which is being used for HW assisted
hash results.
To accommodate WCN3990 descriptor, define new CE
descriptor for extended addressing mode and related
methods to handle the descriptor data.
Signed-off-by: Govind Singh <govinds@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
CE send and receive API's are using u32 ring address, which
truncates the address for target with 64bit addressing range.
Use dma_addr_t for ce buffers to support target with extended
addressing range.
Signed-off-by: Govind Singh <govinds@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Remove bus specific dependencies from CE layer
to have common CE layer across multiple targets.
This is required for adding support for WCN3990
chipset support as WCN3990 chipset uses SNOC
bus interface with Copy Engine endpoint.
Signed-off-by: Govind Singh <govinds@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Define structures for the copy engine ctrl/misc registers,
that includes CE CMD halt, watermark source, watermark destination,
host IE ring, source, destination and dmax ring.
This adds support to avoid the conditional compilation,
code optimization and dynamic configuration of the copy engine
register map for respective hardware bus interface.
Signed-off-by: Sarada Prasanna Garnayak <c_sgarna@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
ath10k firmware checks nbytes == 0 as part of determining if DMA
has completed successfully. To help make this work more often,
have the driver initialize nbytes to zero when freeing the descriptor
slot.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
In 'ath10k_ce_alloc_pipe' the compile time sanity check to
ensure that there is sufficient buffers in CE4 for HTT Tx
MSDU descriptors, but this did not take into account of the
case with 'peer flow control' enabled, fix this.
Cc: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Remove obselete Copy Engine comments referring to the function
ath10k_ce_sendlist_send as this function was removed long time back
by the commit 2e761b5a52 ("ath10k: remove ce_sendlist_send").
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Fixes checkpatch warnings:
drivers/net/wireless/ath/ath10k/pci.c:1593: Statements should start on a tabstop
drivers/net/wireless/ath/ath10k/ce.c:962: Alignment should match open parenthesis
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
use dma_zalloc_coherent() instead of dma_alloc_coherent and memset().
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Firmware is running watchdog timer for tracking copy engine ring index
and write index. Whenever both indices are stuck at same location for
given duration, watchdog will be trigger to assert target. While
updating copy engine destination ring write index, driver ensures that
write index will not be same as read index by finding delta between these
two indices (CE_RING_DELTA).
HTT target to host copy engine (CE5) is special case where ring buffers
will be reused and delta check is not applied while updating write index.
In rare scenario, whenever CE5 ring is full, both indices will be referring
same location and this is causing CE ring stuck issue as explained
above. This issue is originally reported on IPQ4019 during long hour stress
testing and during veriwave max clients testsuites. The same issue is
also observed in other chips as well. Fix this by ensuring that write
index is one less than read index which means that full ring is
available for receiving data.
Cc: stable@vger.kernel.org
Tested-by: Tamizh chelvam <c_traja@qti.qualcomm.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Correct some trivial comment typos.
Remove unnecessary parentheses in a long line.
Signed-off-by: Joe Perches <joe@perches.com>
[kvalo@qca.qualcomm.com: drop the change for return]
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
With the %pK format specifier we hide the kernel addresses
with the help of kptr_restrict sysctl.
In this patch, %p is changed to %pK in the driver code.
The sysctl is documented in Documentation/sysctl/kernel.txt.
Signed-off-by: Maharaja Kennadyrajan <c_mkenna@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Fix checkpatch warnings about use of spaces with operators:
spaces preferred around that '*' (ctx:VxV)
This has been recently added to checkpatch.
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Whenever htt rx indication i.e target to host messages are received
on rx copy engine (CE5), the message will be freed after processing
the response. Then CE 5 will be refilled with new descriptors at
post rx processing. This memory alloc and free operations can be avoided
by reusing the same descriptors.
During CE pipe allocation, full ring is not initialized i.e n-1 entries
are filled up. So for CE 5 full ring should be filled up to reuse
descriptors. Moreover CE 5 write index will be updated in single shot
instead of incremental access. This could avoid multiple pci_write and
ce_ring access. From experiments, It improves CPU usage by ~3% in IPQ4019
platform.
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The physical address necessary to unmap DMA ('bufferp') is stored
in ath10k_skb_cb as 'paddr'. For diag register read and write
operations, 'paddr' is stored in transfer context. ath10k doesn't rely
on the meta/transfer_id. So the unused output arguments {bufferp, nbytesp
and transfer_idp} are removed from CE recv_next completion.
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
For the messages from host to target, shadow copy of CE descriptors
are maintained in source ring. Before writing actual CE descriptor,
first shadow copy is filled and then it is copied to CE address space.
To optimize in download path and to reduce d-cache pressure, removing
shadow copy of CE descriptors. This will also reduce driver memory
consumption by 33KB during on device probing.
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The physical address necessary to unmap DMA ('bufferp') is stored
in ath10k_skb_cb as 'paddr'. ath10k doesn't rely on the meta/transfer_id
when handling send completion (htc ep id is stored in sk_buff control
buffer). So the unused output arguments {bufferp, nbytesp and transfer_idp}
are removed from CE send completion. This change is needed before removing
the shadow copy of copy engine (CE) descriptors in follow up patch.
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Currently to avoid uncached memory access while filling up copy engine
descriptors, shadow descriptors are used. This can be optimized further
by removing shadow descriptors. To achieve that first shadow ring
dependency in ce_send is removed by creating local copy of the
descriptor on stack and make a one-shot copy into the "uncached"
descriptor.
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Register receive callbacks for every copy engines (CE) separately
instead of having common receive handler. Some of the copy engines
receives different type of messages (i.e HTT/HTC/pktlog) from target.
Hence to service them accordingly, register per copy engine receive
callbacks.
Reviewed-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Register send completion callbacks for every copy engines (CE) separately
instead of having common completion handler. Since some of the copy
engines delivers different type of messages, per-CE callbacks help to
service them differently.
Reviewed-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
After processing received packets from copy engine, host will allocate
new buffer and queue them back to copy engine ring for further
packet reception. On post rx processing path, skb allocation and
dma mapping are unnecessarily handled within ce_lock. This is affecting
peak throughput and also causing more CPU consumption. Optimize this
by acquiring ce_lock only when accessing copy engine ring and moving
skb allocation out of ce_lock.
In AP148 platform with QCA99x0 in conducted environment, UDP uplink peak
throughput is improved from ~1320 Mbps to ~1450 Mbps and TCP uplink peak
throughput is increased from ~1240 Mbps (70% host CPU load) to ~1300 Mbps
(71% CPU load). Similarly ~40Mbps improvement is observed in downlink
path.
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
QCA99X0 uses two new copy engine src desc flags for interrupt
indication. Bit_2 is to mark if host interrupt is disabled after
processing the current desc and bit_3 is to mark if target interrupt
is diabled after the processing of current descriptor.
CE_DESC_FLAGS_META_DATA_MASK and CE_DESC_FLAGS_META_DATA_LSB are based
on the target type.
Signed-off-by: Vasanthakumar Thiagarajan <vthiagar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The QCA6174 in combination with new wmi-tlv firmware is capable of
multi-channel, beamforming, tdls and other features.
This patch just makes it possible to boot these devices and do some basic stuff
like connect to an AP without encryption. Some things may not work or may be
unreliable. New features will be implemented later. This will be addressed
eventually with future patches.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Latest main firmware branch introduced a new WMI
ABI called wmi-tlv. It is not a tlv strictly
speaking but something that resembles it because
it is ordered and may have duplicate id entries.
This prepares ath10k to support new hw.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
While testing other things I've found that CE
items aren't cleared properly. This could lead to
null dereferences in BMI.
To prevent that make sure CE revoking clears the
nbytes value (which is used as a buffer completion
indication) and memset the entire CE ring data
shared between host and target when
(re)initializing.
Also make sure to check BMI xfer pointer and print
a splat instead of crashing the kernel.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Make ath10k_pci_init_pipes() effectively only
alter shared target-host data.
The per_transfer_context is a host-only thing.
It is necessary to preserve it's contents for a
more robust ring cleanup.
This is required for future warm reset fixes.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Calling init to reinit ce pipe state would also
re-set all static structure links and setting
(which don't change over driver lifecycle).
Make it so alloc links structures and initializes
static data and init part to setup state
variables and clear stuff.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
diag_read() is used for reading from firmware memory via the diagnose window.
First user will be cal_data debugfs file.
To serialise diagnostic window access and make it safe to use while firmware is
running take ce_lock both in ath10k_pci_diag_write_mem() and
ath10k_pci_diag_read_mem(). Because of that all the CE calls had to be changed
to _nolock variants.
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The diagnostic window (CE7) uses polling and is not initiliased to retrieve
interrupts so disable interrupts altogether for CE7. Otherwise ath10k crashes
when using the diagnostic window while the firmware is running due to NULL
dereference and polling reads timeout.
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
This makes it a lot easier to log and debug
messages if there's more than 1 ath10k device on a
system.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It was possible on a host system running low on
memory to end up with no rx buffers on pci pipes.
This makes the driver more robust as it won't fail
to start if it can't allocate all rx buffers right
away. If it is fatal then upper layers will notice
trouble anyway.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It doesn't make much sense to overwrite send_cb
and recv_cb callbacks over and over again whenever
transport starts. Just make sure to unmask copy
engine interrupts when starting.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>