1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

111 commits

Author SHA1 Message Date
Herbert Xu
d3b17c6d9d crypto: qat - Fix ADF_DEV_RESET_SYNC memory leak
Using completion_done to determine whether the caller has gone
away only works after a complete call.  Furthermore it's still
possible that the caller has not yet called wait_for_completion,
resulting in another potential UAF.

Fix this by making the caller use cancel_work_sync and then freeing
the memory safely.

Fixes: 7d42e09760 ("crypto: qat - resolve race condition during AER recovery")
Cc: <stable@vger.kernel.org> #6.8+
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-17 18:55:07 +08:00
Lucas Segarra Fernandez
483fd65ce2 crypto: qat - validate slices count returned by FW
The function adf_send_admin_tl_start() enables the telemetry (TL)
feature on a QAT device by sending the ICP_QAT_FW_TL_START message to
the firmware. This triggers the FW to start writing TL data to a DMA
buffer in memory and returns an array containing the number of
accelerators of each type (slices) supported by this HW.
The pointer to this array is stored in the adf_tl_hw_data data
structure called slice_cnt.

The array slice_cnt is then used in the function tl_print_dev_data()
to report in debugfs only statistics about the supported accelerators.
An incorrect value of the elements in slice_cnt might lead to an out
of bounds memory read.
At the moment, there isn't an implementation of FW that returns a wrong
value, but for robustness validate the slice count array returned by FW.

Fixes: 69e7649f7c ("crypto: qat - add support for device telemetry")
Signed-off-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26 17:26:09 +08:00
Adam Guerin
d281a28bd2 crypto: qat - improve error logging to be consistent across features
Improve error logging in rate limiting feature. Staying consistent with
the error logging found in the telemetry feature.

Fixes: d9fb840837 ("crypto: qat - add rate limiting feature to qat_4xxx")
Signed-off-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19 18:54:19 +08:00
Damian Muszynski
5d5bd24f41 crypto: qat - implement dh fallback for primes > 4K
The Intel QAT driver provides support for the Diffie-Hellman (DH)
algorithm, limited to prime numbers up to 4K. This driver is used
by default on platforms with integrated QAT hardware for all DH requests.
This has led to failures with algorithms requiring larger prime sizes,
such as ffdhe6144.

  alg: ffdhe6144(dh): test failed on vector 1, err=-22
  alg: self-tests for ffdhe6144(qat-dh) (ffdhe6144(dh)) failed (rc=-22)

Implement a fallback mechanism when an unsupported request is received.

Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19 18:54:18 +08:00
Colin Ian King
f5c2cf9d14 crypto: qat - Fix spelling mistake "Invalide" -> "Invalid"
There is a spelling mistake in a dev_err message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12 15:07:51 +08:00
Gustavo A. R. Silva
140e4c85d5 crypto: qat - Avoid -Wflex-array-member-not-at-end warnings
-Wflex-array-member-not-at-end is coming in GCC-14, and we are getting
ready to enable it globally.

Use the `__struct_group()` helper to separate the flexible array
from the rest of the members in flexible `struct qat_alg_buf_list`,
through tagged `struct qat_alg_buf_list_hdr`, and avoid embedding the
flexible-array member in the middle of `struct qat_alg_fixed_buf_list`.

Also, use `container_of()` whenever we need to retrieve a pointer to
the flexible structure.

So, with these changes, fix the following warnings:
drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

Link: https://github.com/KSPP/linux/issues/202
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-05 15:45:59 +08:00
Xin Zeng
f0bbfc391a crypto: qat - implement interface for live migration
Add logic to implement the interface for live migration defined in
qat/qat_mig_dev.h. This is specific for QAT GEN4 Virtual Functions
(VFs).

This introduces a migration data manager which is used to handle the
device state during migration. The manager ensures that the device state
is stored in a format that can be restored in the destination node.

The VF state is organized into a hierarchical structure that includes a
preamble, a general state section, a MISC bar section and an ETR bar
section. The latter contains the state of the 4 ring pairs contained on
a VF. Here is a graphical representation of the state:

    preamble | general state section | leaf state
             | MISC bar state section| leaf state
             | ETR bar state section | bank0 state section | leaf state
                                     | bank1 state section | leaf state
                                     | bank2 state section | leaf state
                                     | bank3 state section | leaf state

In addition to the implementation of the qat_migdev_ops interface and
the state manager framework, add a mutex in pfvf to avoid pf2vf messages
during migration.

Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:47:43 +08:00
Xin Zeng
0fce55e533 crypto: qat - add interface for live migration
Extend the driver with a new interface to be used for VF live migration.
This allows to create and destroy a qat_mig_dev object that contains
a set of methods to allow to save and restore the state of QAT VF.
This interface will be used by the qat-vfio-pci module.

Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:47:43 +08:00
Siming Wan
bbfdde7d19 crypto: qat - add bank save and restore flows
Add logic to save, restore, quiesce and drain a ring bank for QAT GEN4
devices.
This allows to save and restore the state of a Virtual Function (VF) and
will be used to implement VM live migration.

Signed-off-by: Siming Wan <siming.wan@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:47:43 +08:00
Siming Wan
3fa1057e35 crypto: qat - expand CSR operations for QAT GEN4 devices
Extend the CSR operations for QAT GEN4 devices to allow saving and
restoring the rings state.

The new operations will be used as a building block for implementing the
state save and restore of Virtual Functions necessary for VM live
migration.

This adds the following operations:
 - read ring status register
 - read ring underflow/overflow status register
 - read ring nearly empty status register
 - read ring nearly full status register
 - read ring full status register
 - read ring complete status register
 - read ring exception status register
 - read/write ring exception interrupt mask register
 - read ring configuration register
 - read ring base register
 - read/write ring interrupt enable register
 - read ring interrupt flag register
 - read/write ring interrupt source select register
 - read ring coalesced interrupt enable register
 - read ring coalesced interrupt control register
 - read ring flag and coalesced interrupt enable register
 - read ring service arbiter enable register
 - get ring coalesced interrupt control enable mask

Signed-off-by: Siming Wan <siming.wan@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:47:43 +08:00
Siming Wan
84058ffb91 crypto: qat - rename get_sla_arr_of_type()
The function get_sla_arr_of_type() returns a pointer to an SLA type
specific array.
Rename it and expose it as it will be used externally to this module.

This does not introduce any functional change.

Signed-off-by: Siming Wan <siming.wan@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:47:43 +08:00
Giovanni Cabiddu
680302d191 crypto: qat - relocate CSR access code
As the common hw_data files are growing and the adf_hw_csr_ops is going
to be extended with new operations, move all logic related to ring CSRs
to the newly created adf_gen[2|4]_hw_csr_data.[c|h] files.

This does not introduce any functional change.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:47:43 +08:00
Xin Zeng
867e801005 crypto: qat - move PFVF compat checker to a function
Move the code that implements VF version compatibility on the PF side to
a separate function so that it can be reused when doing VM live
migration.

This does not introduce any functional change.

Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:47:43 +08:00
Xin Zeng
1f8d6a163c crypto: qat - relocate and rename 4xxx PF2VM definitions
Move and rename ADF_4XXX_PF2VM_OFFSET and ADF_4XXX_VM2PF_OFFSET to
ADF_GEN4_PF2VM_OFFSET and ADF_GEN4_VM2PF_OFFSET respectively.
These definitions are moved from adf_gen4_pfvf.c to adf_gen4_hw_data.h
as they are specific to GEN4 and not just to qat_4xxx.

This change is made in anticipation of their use in live migration.

This does not introduce any functional change.

Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:47:43 +08:00
Giovanni Cabiddu
1894cb1de6 crypto: qat - adf_get_etr_base() helper
Add and use the new helper function adf_get_etr_base() which retrieves
the virtual address of the ring bar.

This will be used extensively when adding support for Live Migration.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02 10:47:43 +08:00
Damian Muszynski
ed3d95fe78 crypto: qat - make ring to service map common for QAT GEN4
The function get_ring_to_svc_map() is present in both 420xx and
4xxx drivers. Rework the logic to make it generic to GEN4 devices
and move it to qat_common/adf_gen4_hw_data.c.

Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-24 08:41:20 +08:00
Adam Guerin
bca79b9f56 crypto: qat - fix comment structure
Move comment description to the same line as the function name.

This is to fix the following warning when compiling the QAT driver
using the clang compiler with CC=clang W=2:
    drivers/crypto/intel/qat/qat_common/qat_crypto.c:108: warning: missing initial short description on line:
     * qat_crypto_vf_dev_config()

Signed-off-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-24 08:41:20 +08:00
Adam Guerin
ff39134514 crypto: qat - remove unnecessary description from comment
Remove extra description from comments as it is not required.

This is to fix the following warning when compiling the QAT driver
using the clang compiler with CC=clang W=2:
    drivers/crypto/intel/qat/qat_common/adf_dev_mgr.c:65: warning: contents before sections
    drivers/crypto/intel/qat/qat_common/adf_isr.c:380: warning: contents before sections
    drivers/crypto/intel/qat/qat_common/adf_vf_isr.c:298: warning: contents before sections

Signed-off-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-24 08:41:20 +08:00
Adam Guerin
a66cf93ab3 crypto: qat - remove double initialization of value
Remove double initialization of the reg variable.

This is to fix the following warning when compiling the QAT driver
using clang scan-build:
    drivers/crypto/intel/qat/qat_common/adf_gen4_ras.c:1010:6: warning: Value stored to 'reg' during its initialization is never read [deadcode.DeadStores]
     1010 |         u32 reg = ADF_CSR_RD(csr, ADF_GEN4_SSMCPPERR);
          |             ^~~   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    drivers/crypto/intel/qat/qat_common/adf_gen4_ras.c:1109:6: warning: Value stored to 'reg' during its initialization is never read [deadcode.DeadStores]
     1109 |         u32 reg = ADF_CSR_RD(csr, ADF_GEN4_SER_ERR_SSMSH);
          |             ^~~   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: 99b1c9826e ("crypto: qat - count QAT GEN4 errors")
Signed-off-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-24 08:41:20 +08:00
Adam Guerin
f99fb7d660 crypto: qat - avoid division by zero
Check if delta_us is not zero and return -EINVAL if it is.
delta_us is unlikely to be zero as there is a sleep between the reads of
the two timestamps.

This is to fix the following warning when compiling the QAT driver
using clang scan-build:
    drivers/crypto/intel/qat/qat_common/adf_clock.c:87:9: warning: Division by zero [core.DivideZero]
       87 |         temp = DIV_ROUND_CLOSEST_ULL(temp, delta_us);
          |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: e2980ba57e ("crypto: qat - add measure clock frequency")
Signed-off-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-24 08:41:20 +08:00
Adam Guerin
9a5dcada14 crypto: qat - removed unused macro in adf_cnv_dbgfs.c
This macro was added but never used, remove it.

This is to fix the following warning when compiling the QAT driver
using the clang compiler with CC=clang W=2:
    drivers/crypto/intel/qat/qat_common/adf_cnv_dbgfs.c:19:9: warning: macro is not used [-Wunused-macros]
       19 | #define CNV_SLICE_ERR_MASK              GENMASK(7, 0)
          |         ^

Fixes: d807f0240c ("crypto: qat - add cnv_errors debugfs file")
Signed-off-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-24 08:41:20 +08:00
Adam Guerin
dfff0e35fa crypto: qat - remove unused macros in qat_comp_alg.c
As a result of the removal of qat_zlib_deflate, some defines where not
removed. Remove them.

This is to fix the following warning when compiling the QAT driver
using the clang compiler with CC=clang W=2:
    drivers/crypto/intel/qat/qat_common/qat_comp_algs.c:21:9: warning: macro is not used [-Wunused-macros]
       21 | #define QAT_RFC_1950_CM_OFFSET 4
          |         ^
    drivers/crypto/intel/qat/qat_common/qat_comp_algs.c:16:9: warning: macro is not used [-Wunused-macros]
       16 | #define QAT_RFC_1950_HDR_SIZE 2
          |         ^
    drivers/crypto/intel/qat/qat_common/qat_comp_algs.c:17:9: warning: macro is not used [-Wunused-macros]
       17 | #define QAT_RFC_1950_FOOTER_SIZE 4
          |         ^
    drivers/crypto/intel/qat/qat_common/qat_comp_algs.c:22:9: warning: macro is not used [-Wunused-macros]
       22 | #define QAT_RFC_1950_DICT_MASK 0x20
          |         ^
    drivers/crypto/intel/qat/qat_common/qat_comp_algs.c:18:9: warning: macro is not used [-Wunused-macros]
       18 | #define QAT_RFC_1950_CM_DEFLATE 8
          |         ^
    drivers/crypto/intel/qat/qat_common/qat_comp_algs.c:20:9: warning: macro is not used [-Wunused-macros]
       20 | #define QAT_RFC_1950_CM_MASK 0x0f
          |         ^
    drivers/crypto/intel/qat/qat_common/qat_comp_algs.c:23:9: warning: macro is not used [-Wunused-macros]
       23 | #define QAT_RFC_1950_COMP_HDR 0x785e
          |         ^
    drivers/crypto/intel/qat/qat_common/qat_comp_algs.c:19:9: warning: macro is not used [-Wunused-macros]
       19 | #define QAT_RFC_1950_CM_DEFLATE_CINFO_32K 7
          |         ^

Fixes: e9dd20e0e5 ("crypto: qat - Remove zlib-deflate")
Signed-off-by: Adam Guerin <adam.guerin@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-24 08:41:20 +08:00
Dan Carpenter
bcc06e1b3d crypto: qat - uninitialized variable in adf_hb_error_inject_write()
There are a few issues in this code.  If *ppos is non-zero then the
first part of the buffer is not initialized.  We never initialize the
last character of the buffer.  The return is not checked so it's
possible that none of the buffer is initialized.

This is debugfs code which is root only and the impact of these bugs is
very small.  However, it's still worth fixing.  To fix this:
1) Check that *ppos is zero.
2) Use copy_from_user() instead of simple_write_to_buffer().
3) Explicitly add a NUL terminator.

Fixes: e2b67859ab ("crypto: qat - add heartbeat error simulator")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-24 08:41:20 +08:00
Damian Muszynski
7d42e09760 crypto: qat - resolve race condition during AER recovery
During the PCI AER system's error recovery process, the kernel driver
may encounter a race condition with freeing the reset_data structure's
memory. If the device restart will take more than 10 seconds the function
scheduling that restart will exit due to a timeout, and the reset_data
structure will be freed. However, this data structure is used for
completion notification after the restart is completed, which leads
to a UAF bug.

This results in a KFENCE bug notice.

  BUG: KFENCE: use-after-free read in adf_device_reset_worker+0x38/0xa0 [intel_qat]
  Use-after-free read at 0x00000000bc56fddf (in kfence-#142):
  adf_device_reset_worker+0x38/0xa0 [intel_qat]
  process_one_work+0x173/0x340

To resolve this race condition, the memory associated to the container
of the work_struct is freed on the worker if the timeout expired,
otherwise on the function that schedules the worker.
The timeout detection can be done by checking if the caller is
still waiting for completion or not by using completion_done() function.

Fixes: d8cba25d2c ("crypto: qat - Intel(R) QAT driver framework")
Cc: <stable@vger.kernel.org>
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-17 09:09:16 +08:00
Damian Muszynski
c2304e1a0b crypto: qat - change SLAs cleanup flow at shutdown
The implementation of the Rate Limiting (RL) feature includes the cleanup
of all SLAs during device shutdown. For each SLA, the firmware is notified
of the removal through an admin message, the data structures that take
into account the budgets are updated and the memory is freed.
However, this explicit cleanup is not necessary as (1) the device is
reset, and the firmware state is lost and (2) all RL data structures
are freed anyway.

In addition, if the device is unresponsive, for example after a PCI
AER error is detected, the admin interface might not be available.
This might slow down the shutdown sequence and cause a timeout in
the recovery flows which in turn makes the driver believe that the
device is not recoverable.

Fix by replacing the explicit SLAs removal with just a free of the
SLA data structures.

Fixes: d9fb840837 ("crypto: qat - add rate limiting feature to qat_4xxx")
Cc: <stable@vger.kernel.org>
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-17 09:09:16 +08:00
Mun Chun Yep
9567d3dc76 crypto: qat - improve aer error reset handling
Rework the AER reset and recovery flow to take into account root port
integrated devices that gets reset between the error detected and the
slot reset callbacks.

In adf_error_detected() the devices is gracefully shut down. The worker
threads are disabled, the error conditions are notified to listeners and
through PFVF comms and finally the device is reset as part of
adf_dev_down().

In adf_slot_reset(), the device is brought up again. If SRIOV VFs were
enabled before reset, these are re-enabled and VFs are notified of
restarting through PFVF comms.

Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-09 12:57:18 +08:00
Furong Zhou
750fa7c20e crypto: qat - limit heartbeat notifications
When the driver detects an heartbeat failure, it starts the recovery
flow. Set a limit so that the number of events is limited in case the
heartbeat status is read too frequently.

Signed-off-by: Furong Zhou <furong.zhou@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-09 12:57:18 +08:00
Damian Muszynski
f5419a4239 crypto: qat - add auto reset on error
Expose the `auto_reset` sysfs attribute to configure the driver to reset
the device when a fatal error is detected.

When auto reset is enabled, the driver resets the device when it detects
either an heartbeat failure or a fatal error through an interrupt.

This patch is based on earlier work done by Shashank Gupta.

Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-09 12:57:18 +08:00
Mun Chun Yep
2aaa1995a9 crypto: qat - add fatal error notification
Notify a fatal error condition and optionally reset the device in
the following cases:
  * if the device reports an uncorrectable fatal error through an
    interrupt
  * if the heartbeat feature detects that the device is not
    responding

This patch is based on earlier work done by Shashank Gupta.

Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-09 12:57:18 +08:00
Mun Chun Yep
4469f9b234 crypto: qat - re-enable sriov after pf reset
When a Physical Function (PF) is reset, SR-IOV gets disabled, making the
associated Virtual Functions (VFs) unavailable. Even after reset and
using pci_restore_state, VFs remain uncreated because the numvfs still
at 0. Therefore, it's necessary to reconfigure SR-IOV to re-enable VFs.

This commit introduces the ADF_SRIOV_ENABLED configuration flag to cache
the SR-IOV enablement state. SR-IOV is only re-enabled if it was
previously configured.

This commit also introduces a dedicated workqueue without
`WQ_MEM_RECLAIM` flag for enabling SR-IOV during Heartbeat and CPM error
resets, preventing workqueue flushing warning.

This patch is based on earlier work done by Shashank Gupta.

Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-09 12:57:18 +08:00
Mun Chun Yep
ec26f8e6c7 crypto: qat - update PFVF protocol for recovery
Update the PFVF logic to handle restart and recovery. This adds the
following functions:

  * adf_pf2vf_notify_fatal_error(): allows the PF to notify VFs that the
    device detected a fatal error and requires a reset. This sends to
    VF the event `ADF_PF2VF_MSGTYPE_FATAL_ERROR`.
  * adf_pf2vf_wait_for_restarting_complete(): allows the PF to wait for
    `ADF_VF2PF_MSGTYPE_RESTARTING_COMPLETE` events from active VFs
    before proceeding with a reset.
  * adf_pf2vf_notify_restarted(): enables the PF to notify VFs with
    an `ADF_PF2VF_MSGTYPE_RESTARTED` event after recovery, indicating that
    the device is back to normal. This prompts VF drivers switch back to
    use the accelerator for workload processing.

These changes improve the communication and synchronization between PF
and VF drivers during system restart and recovery processes.

Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-09 12:57:18 +08:00
Furong Zhou
758a0087db crypto: qat - disable arbitration before reset
Disable arbitration to avoid new requests to be processed before
resetting a device.

This is needed so that new requests are not fetched when an error is
detected.

Signed-off-by: Furong Zhou <furong.zhou@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-09 12:57:18 +08:00
Furong Zhou
ae508d7afb crypto: qat - add fatal error notify method
Add error notify method to report a fatal error event to all the
subsystems registered. In addition expose an API,
adf_notify_fatal_error(), that allows to trigger a fatal error
notification asynchronously in the context of a workqueue.

This will be invoked when a fatal error is detected by the ISR or
through Heartbeat.

Signed-off-by: Furong Zhou <furong.zhou@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-09 12:57:18 +08:00
Damian Muszynski
e2b67859ab crypto: qat - add heartbeat error simulator
Add a mechanism that allows to inject a heartbeat error for testing
purposes.
A new attribute `inject_error` is added to debugfs for each QAT device.
Upon a write on this attribute, the driver will inject an error on the
device which can then be detected by the heartbeat feature.
Errors are breaking the device functionality thus they require a
device reset in order to be recovered.

This functionality is not compiled by default, to enable it
CRYPTO_DEV_QAT_ERROR_INJECTION must be set.

Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-09 12:57:18 +08:00
Erick Archer
4da3bc65d2 crypto: qat - use kcalloc_node() instead of kzalloc_node()
As noted in the "Deprecated Interfaces, Language Features, Attributes,
and Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead
to values wrapping around and a smaller allocation being made than the
caller was expecting. Using those allocations could lead to linear
overflows of heap memory and other misbehaviors.

So, use the purpose specific kcalloc_node() function instead of the
argument count * size in the kzalloc_node() function.

Link: https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1]
Link: https://github.com/KSPP/linux/issues/162
Signed-off-by: Erick Archer <erick.archer@gmx.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-01-26 16:39:33 +08:00
Arnd Bergmann
23a22e831e crypto: qat - avoid memcpy() overflow warning
The use of array_size() leads gcc to assume the memcpy() can have a larger
limit than actually possible, which triggers a string fortification warning:

In file included from include/linux/string.h:296,
                 from include/linux/bitmap.h:12,
                 from include/linux/cpumask.h:12,
                 from include/linux/sched.h:16,
                 from include/linux/delay.h:23,
                 from include/linux/iopoll.h:12,
                 from drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.c:3:
In function 'fortify_memcpy_chk',
    inlined from 'adf_gen4_init_thd2arb_map' at drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.c:401:3:
include/linux/fortify-string.h:579:4: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
  579 |    __write_overflow_field(p_size_field, size);
      |    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/fortify-string.h:588:4: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
  588 |    __read_overflow2_field(q_size_field, size);
      |    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Add an explicit range check to avoid this.

Fixes: 5da6a2d535 ("crypto: qat - generate dynamically arbiter mappings")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-01-26 16:36:57 +08:00
Damian Muszynski
5da6a2d535 crypto: qat - generate dynamically arbiter mappings
The thread-to-arbiter mapping describes which arbiter can assign jobs
to an acceleration engine thread.
The existing mappings are functionally correct, but hardcoded and not
optimized.

Replace the static mappings with an algorithm that generates optimal
mappings, based on the loaded configuration.

The logic has been made common so that it can be shared between all
QAT GEN4 devices.

Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-12-29 11:25:56 +08:00
Lucas Segarra Fernandez
eb52707716 crypto: qat - add support for ring pair level telemetry
Expose through debugfs ring pair telemetry data for QAT GEN4 devices.

This allows to gather metrics about the PCIe channel and device TLB for
a selected ring pair. It is possible to monitor maximum 4 ring pairs at
the time per device.

For details, refer to debugfs-driver-qat_telemetry in Documentation/ABI.

This patch is based on earlier work done by Wojciech Ziemba.

Signed-off-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-12-29 11:25:56 +08:00
Lucas Segarra Fernandez
69e7649f7c crypto: qat - add support for device telemetry
Expose through debugfs device telemetry data for QAT GEN4 devices.

This allows to gather metrics about the performance and the utilization
of a device. In particular, statistics on (1) the utilization of the
PCIe channel, (2) address translation, when SVA is enabled and (3) the
internal engines for crypto and data compression.

If telemetry is supported by the firmware, the driver allocates a DMA
region and a circular buffer. When telemetry is enabled, through the
`control` attribute in debugfs, the driver sends to the firmware, via
the admin interface, the `TL_START` command. This triggers the device to
periodically gather telemetry data from hardware registers and write it
into the DMA memory region. The device writes into the shared region
every second.

The driver, every 500ms, snapshots the DMA shared region into the
circular buffer. This is then used to compute basic metric
(min/max/average) on each counter, every time the `device_data` attribute
is queried.

Telemetry counters are exposed through debugfs in the folder
/sys/kernel/debug/qat_<device>_<BDF>/telemetry.

For details, refer to debugfs-driver-qat_telemetry in Documentation/ABI.

This patch is based on earlier work done by Wojciech Ziemba.

Signed-off-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-12-29 11:25:55 +08:00
Lucas Segarra Fernandez
7f06679dd5 crypto: qat - add admin msgs for telemetry
Extend the admin interface with two new public APIs to enable
and disable the telemetry feature: adf_send_admin_tl_start() and
adf_send_admin_tl_stop().

The first, sends to the firmware, through the ICP_QAT_FW_TL_START
message, the IO address where the firmware will write telemetry
metrics and a list of ring pairs (maximum 4) to be monitored.
It returns the number of accelerators of each type supported by
this hardware. After this message is sent, the firmware starts
periodically reporting telemetry data using by writing into the
dma buffer specified as input.

The second, sends the admin message ICP_QAT_FW_TL_STOP
which stops the reporting of telemetry data.

This patch is based on earlier work done by Wojciech Ziemba.

Signed-off-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-12-29 11:25:55 +08:00
Lucas Segarra Fernandez
b6e4b6eb1e crypto: qat - include pci.h for GET_DEV()
GET_DEV() macro expansion relies on struct pci_dev being defined.

Include <linux/pci.h> at adf_accel_devices.h.

Signed-off-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-12-29 11:25:55 +08:00
Jie Wang
fcf60f4bcf crypto: qat - add support for 420xx devices
Add support for 420xx devices by including a new device driver that
supports such devices, updates to the firmware loader and capabilities.

Compared to 4xxx devices, 420xx devices have more acceleration engines
(16 service engines and 1 admin) and support the wireless cipher
algorithms ZUC and Snow 3G.

Signed-off-by: Jie Wang <jie.wang@intel.com>
Co-developed-by: Dong Xie <dong.xie@intel.com>
Signed-off-by: Dong Xie <dong.xie@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-12-22 12:30:19 +08:00
Jie Wang
98a4f29fba crypto: qat - move fw config related structures
Relocate the structures adf_fw_objs and adf_fw_config from the file
adf_4xxx_hw_data.c to the newly created adf_fw_config.h.

These structures will be used by new device drivers.

This does not introduce any functional change.

Signed-off-by: Jie Wang <jie.wang@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-12-22 12:30:19 +08:00
Jie Wang
de51d22364 crypto: qat - relocate portions of qat_4xxx code
Move logic that is common between QAT GEN4 accelerators to the
qat_common folder. This includes addresses of CSRs, setters and
configuration logic.
When moved, functions and defines have been renamed from 4XXX to GEN4.

Code specific to the device is moved to the file adf_gen4_hw_data.c.
Code related to configuration is moved to the newly created
adf_gen4_config.c.

This does not introduce any functional change.

Signed-off-by: Jie Wang <jie.wang@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-12-22 12:30:19 +08:00
Jie Wang
b34bd0fd56 crypto: qat - change signature of uof_get_num_objs()
Add accel_dev as parameter of the function uof_get_num_objs().
This is in preparation for the introduction of the QAT 420xx driver as
it will allow to reconfigure the ae_mask when a configuration that does
not require all AEs is loaded on the device.

This does not introduce any functional change.

Signed-off-by: Jie Wang <jie.wang@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-12-22 12:30:19 +08:00
Jie Wang
4db87a5f9e crypto: qat - relocate and rename get_service_enabled()
Move the function get_service_enabled() from adf_4xxx_hw_data.c to
adf_cfg_services.c and rename it as adf_get_service_enabled().
This function is not specific to the 4xxx and will be used by
other QAT drivers.

This does not introduce any functional change.

Signed-off-by: Jie Wang <jie.wang@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-12-22 12:30:19 +08:00
Giovanni Cabiddu
a643212c9f crypto: qat - add NULL pointer check
There is a possibility that the function adf_devmgr_pci_to_accel_dev()
might return a NULL pointer.
Add a NULL pointer check in the function rp2srv_show().

Fixes: dbc8876dd8 ("crypto: qat - add rp2svc sysfs attribute")
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: David Guckian <david.guckian@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-12-08 11:59:44 +08:00
Damian Muszynski
487caa8d5e crypto: qat - fix mutex ordering in adf_rl
If the function validate_user_input() returns an error, the error path
attempts to unlock an unacquired mutex.
Acquire the mutex before calling validate_user_input(). This is not
strictly necessary but simplifies the code.

Fixes: d9fb840837 ("crypto: qat - add rate limiting feature to qat_4xxx")
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-12-08 11:59:44 +08:00
Damian Muszynski
6627f03c21 crypto: qat - fix error path in add_update_sla()
The input argument `sla_in` is a pointer to a structure that contains
the parameters of the SLA which is being added or updated.
If this pointer is NULL, the function should return an error as
the data required for the algorithm is not available.
By mistake, the logic jumps to the error path which dereferences
the pointer.

This results in a warnings reported by the static analyzer Smatch when
executed without a database:

    drivers/crypto/intel/qat/qat_common/adf_rl.c:871 add_update_sla()
    error: we previously assumed 'sla_in' could be null (see line 812)

This issue was not found in internal testing as the pointer cannot be
NULL. The function add_update_sla() is only called (indirectly) by
the rate limiting sysfs interface implementation in adf_sysfs_rl.c
which ensures that the data structure is allocated and valid. This is
also proven by the fact that Smatch executed with a database does not
report such error.

Fix it by returning with error if the pointer `sla_in` is NULL.

Fixes: d9fb840837 ("crypto: qat - add rate limiting feature to qat_4xxx")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-12-08 11:59:44 +08:00
Damian Muszynski
d71fdd0f3c crypto: qat - add sysfs_added flag for rate limiting
The qat_rl sysfs attribute group is registered within the adf_dev_start()
function, alongside other driver components.
If any of the functions preceding the group registration fails,
the adf_dev_start() function returns, and the caller, to undo the
operation, invokes adf_dev_stop() followed by adf_dev_shutdown().
However, the current flow lacks information about whether the
registration of the qat_rl attribute group was successful or not.

In cases where this condition is encountered, an error similar to
the following might be reported:

    4xxx 0000:6b:00.0: Starting device qat_dev0
    4xxx 0000:6b:00.0: qat_dev0 started 9 acceleration engines
    4xxx 0000:6b:00.0: Failed to send init message
    4xxx 0000:6b:00.0: Failed to start device qat_dev0
    sysfs group 'qat_rl' not found for kobject '0000:6b:00.0'
    ...
    sysfs_remove_groups+0x2d/0x50
    adf_sysfs_rl_rm+0x44/0x70 [intel_qat]
    adf_rl_stop+0x2d/0xb0 [intel_qat]
    adf_dev_stop+0x33/0x1d0 [intel_qat]
    adf_dev_down+0xf1/0x150 [intel_qat]
    ...
    4xxx 0000:6b:00.0: qat_dev0 stopped 9 acceleration engines
    4xxx 0000:6b:00.0: Resetting device qat_dev0

To prevent attempting to remove attributes from a group that has not
been added yet, a flag named 'sysfs_added' is introduced. This flag
is set to true upon the successful registration of the attribute group.

Fixes: d9fb840837 ("crypto: qat - add rate limiting feature to qat_4xxx")
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-12-01 18:03:26 +08:00