linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
Holger Dengler	b3840c8bfc	s390/ap: rename ap debug configuration option The configuration option ZCRYPT_DEBUG is used only in ap queue code, so rename it to AP_DEBUG. It also no longer depends on ZCRYPT but on AP. While at it, also update the help text. Signed-off-by: Holger Dengler <dengler@linux.ibm.com> Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2024-04-09 17:29:56 +02:00
Harald Freudenberger	6a2892d09d	s390/ap: add debug possibility for AP messages This patch introduces two dynamic debug hexdump invocation possibilities to be able to a) dump an AP message immediately before it goes into the firmware queue and b) dump a fresh from the firmware queue received AP message. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-02-16 14:30:13 +01:00
Harald Freudenberger	08b2c3706d	s390/zcrypt: introduce dynamic debugging for AP and zcrypt code This patch replaces all the s390 debug feature calls with debug level by dynamic debug calls pr_debug. These calls are much more flexible and each single invocation can get enabled/disabled at runtime wheres the s390 debug feature debug calls have only one knob - enable or disable all in one bunch. The benefit is especially significant with high frequency called functions like the AP bus scan. In most debugging scenarios you don't want and need them, but sometimes it is crucial to know exactly when and how long the AP bus scan took. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-02-16 14:30:12 +01:00
Harald Freudenberger	207022d39d	s390/ap: handle outband SE bind state change This patch addresses some weird scenarios where an outband manipulation of the SE bind state of a queue assigned and maybe in use by an SE guest with AP pass-through support took place. So for example when the guest has bound and associated a queue and then this domain has been zeroed on the service element. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2023-11-30 16:24:23 +01:00
Harald Freudenberger	d4c53ae8e4	s390/ap: store TAPQ hwinfo in struct ap_card As of now the AP card struct held only part of the queue's hwinfo (that is the GR2 register content returned with an TAPQ invocation). This patch reworks struct ap_card to hold the whole hwinfo now. As there is a nice bit field union on top of this ap_tapq_hwinfo struct, all the ugly bit checkings can now get replaced by simple evaluations of the required bit field. Suggested-by: Ingo Franzki <ifranzki@linux.ibm.com> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2023-11-30 16:24:23 +01:00
Harald Freudenberger	c40284b364	s390/ap: re-enable interrupt for AP queues This patch introduces some code lines which check for interrupt support enabled on an AP queue after a reply has been received. This invocation has been chosen as there is a good chance to have the queue empty at that time. As the enablement of the irq imples a state machine change the queue should not have any pending requests or unreceived replies. Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2023-11-05 22:34:57 +01:00
Harald Freudenberger	01c89ab7f8	s390/ap: rework to use irq info from ap queue status This patch reworks the irq handling and reporting code for the AP queue interrupt handling to always use the irq info from the queue status. Until now the interrupt status of an AP queue was stored into a bool variable within the ap_queue struct. This variable was set on a successful interrupt enablement and cleared with kicking a reset. However, it may be that the interrupt state is manipulated outband for example by a hypervisor. This patch removes this variable and instead the irq bit from the AP queue status which is always reflecting the current irq state is used. Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2023-11-05 22:34:57 +01:00
Harald Freudenberger	a19a161482	s390/zcrypt: introduce new internal AP queue se_bound attribute This patch introduces a new AP queue internal attribute se_bound which reflects the bound state of an APQN within a Secure Execution environment. With introduction of Secure Execution guests now an AP firmware queue needs to be bound to the guest before usage. This patch introduces a new internal attribute reflecting this bound state and some glue code to handle this new field during lifetime of an AP queue device. Together with that now the zcrypt scheduler considers the state of the AP queues when a message is about to be distributed among the existing queues. There is a new function ap_queue_usable() which returns true only when all conditions for using this AP queue device are fulfilled. In details this means: the AP queue needs to be configured, not checkstopped and within an SE environment it needs to be bound. So the new function gives and indication if the AP queue device is ready to serve requests or not. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2023-10-16 13:04:09 +02:00
Harald Freudenberger	32d1d9204f	s390/ap: re-init AP queues on config on On a state toggle from config off to config on and on the state toggle from checkstop to not checkstop the queue's internal states was set but the state machine was not nudged. This did not care as on the first enqueue of a request the state machine kick ran. However, within an Secure Execution guest a queue is only chosen by the scheduler when it has been bound. But to bind a queue, it needs to run through the initial states (reset, enable interrupts, ...). So this is like a chicken-and-egg problem and the result was in fact that a queue was unusable after a config off/on toggle. With some slight rework of the handling of these states now the new function _ap_queue_init_state() is called which is the core of the ap_queue_init_state() function but without locking handling. This has the benefit that it can be called on all the places where a (re-)init of the AP queue's state machine is needed. Fixes: `2d72eaf036` ("s390/ap: implement SE AP bind, unbind and associate") Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2023-10-16 13:04:09 +02:00
Harald Freudenberger	5ac8c72462	s390/zcrypt: remove CEX2 and CEX3 device drivers Remove the legacy device driver code for CEX2 and CEX3 cards. The last machines which are able to handle CEX2 crypto cards are z10 EC first available 2008 and z10 BC first available 2009. The last machines able to handle a CEX3 crypto card are z196 first available 2010 and z114 first available 2011. Please note that this does not imply to drop CEX2 and CEX3 support in general. With older kernels on hardware up to the aforementioned machine models these crypto cards will get support by IBM. The removal of the CEX2 and CEX3 device drivers code opens up some simplifications, for example support for crypto cards without rng support can be removed also. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2023-07-24 12:12:22 +02:00
Harald Freudenberger	0fdcc88bb9	s390/zcrypt: cleanup some debug code This patch removes most of the debug code which is build in when CONFIG_ZCRYPT_DEBUG is enabled. There is no real exploiter for this code any more and at least one ioctl fails with this code enabled. The CONFIG_ZCRYPT_DEBUG kernel config option still makes sense as some debug sysfs entries can get enabled with this and maybe long term a new better designed debug and error injection way will get introduced. This patch only removes code surrounded by the named kernel config option. This option should by default always be off anyway. The structs and defines removed by the patch have been used only by code surrounded by a CONFIG_ZCRYPT_DEBUG ifdef and thus can be removed also. In the end this patch removes all the failure-injection possibilities which had been available when the kernel had been build with CONFIG_ZCRYPT_DEBUG. It has never been used that much and was too unflexible anyway. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2023-07-03 11:19:41 +02:00
Harald Freudenberger	038c5bedbc	s390/ap: add ap status asynch error support Review and extend the low level AP code to be able to deal with asynchronous reported errors on APQNs. The hypervisor and the SE guest may be confronted with an asynchronously reported error at return of an AP instruction. So all places where AP instructions are called need review and may eventually need extensions. However, not all places need rework. As together with the AP status and the enabled asynch bit there is always a response code set. The asynch error reporting comes with new response codes which may be simple handled in the default case of a switch statement. The idea behind this patch is to report asynch errors as -EPERM (read this as "Operation not permitted") which reflects the fact that only a rapq (with F bit enabled) is a valid AP instruction when an asynch error is flagged. The AP queue state machine functions return AP_SM_WAIT_NONE when a asynch error is detected to reflect the fact, that the state machine can't do anything with such an error as long as the queue is reset. Unfortunately the ap bus scan function needed some update as the ap_queue_info() now needs to return 3 states: 1 if an APQN exists and info is available, -1 if it is assumed an APQN does not exist and the new return value 0 without any info values filled. This 0 returncode is handled as "there is an APQN but we currently don't know any more hw info about this, so please use your previous info and try again later". Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2023-03-20 11:12:49 +01:00
Harald Freudenberger	2d72eaf036	s390/ap: implement SE AP bind, unbind and associate Implementation of the new functions for SE AP support: bind, unbind and associate. There are two new sysfs attributes for this: /sys/devices/ap/cardxx/xx.yyyy/se_bind /sys/devices/ap/cardxx/xx.yyyy/se_associate Writing a 1 into the se_bind attribute triggers the SE AP bind for this AP queue, writing a 0 into does an unbind - that's a reset (RAPQ) with the F bit enabled. The se_associate attribute needs an integer value in range 0...2^16-1 written in. This is the index into a secrets table feed into the ultravisor. For more details please see the Architecture documents. These both new ap queue attributes are only visible inside a SE guest with SB (Secure Binding) available. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2023-03-20 11:12:49 +01:00
Harald Freudenberger	263c8454db	s390/ap: introduce low frequency polling possibility For some events the ap bus needs to poll. For example when an AP queue is reset until the reset is through. Also when no interrupt support is available (e.g. zVM) there is a need to poll until all requests have been processed and all replies have been delivered. Polling is done with a high resolution timer by default run with a rate of 4kHz (LPAR) or 666Hz (zVM guest). For some events (wait for reset complete, wait for irq enabled complete) this is a much too high poll rate which triggers a lot of TAPQ invocations. This patch introduces the possibility for the state machine functions to return a new wait enum AP_SM_WAIT_LOW_TIMEOUT which gives a hint to the ap_wait() function to eventually set up the timer with a more relaxed timeout value of 25Hz. This patch also includes a slight rework of the sysfs functions parsing the timer related stuff: Use of kstrtobool and kstrtoul instead of sscanf. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2023-03-20 11:12:49 +01:00
Harald Freudenberger	4bdf3c3956	s390/ap: provide F bit parameter for ap_rapq() and ap_zapq() Extent the ap inline functions ap_rapq() (calls PQAP(RAPQ)) and ap_zapq() (calls PQAP(ZAPQ)) with a new parameter to enable the new architectured F bit which forces an unassociate and/or unbind on a secure execution associated and/or bound queue. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2023-03-20 11:12:49 +01:00
Harald Freudenberger	088174960e	s390/ap: filter ap card functions, new queue functions attribute With SE SB (Secure Binding) some currently unused and thus always zero bits in the TAPQ GR2 result are now used to show the binding state of a queue. So to check if a card has changed the comparing base is exactly this GR2 value shown as 'ap_function' in sysfs (/sys/devices/ap/cardxx/ap_functions). Now there is some queue specific info in this info and so a new mask TAPQ_CARD_FUNC_CMP_MASK is used to filter out only the relevant bits for card compare. For the same reason now the function bits (including exactly this bind/associate information) need to be exposed to user space now. So tools like lszcrypt can evaluate binding/association state on a queue base. So here comes a new sysfs attribute /sys/devices/ap/cardxx/xx.yyyy/ap_functions This sysfs attribute is similar to the already existing ap_functions attribute at ap card level. It shows the upper 32 bits of GR2 from an invocation of TAPQ for this AP queue. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2023-03-20 11:12:48 +01:00
Harald Freudenberger	964d581daf	s390/zcrypt: replace scnprintf with sysfs_emit Replace scnprintf() with sysfs_emit() and friends where possible. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2023-03-20 11:12:48 +01:00
Harald Freudenberger	8794c59613	s390/zcrypt: rework length information for dqap The inline ap_dqap function does not return the number of bytes actually written into the message buffer. The calling code inspects the AP message header to figure out what kind of AP message has been received and pulls the length information from this header. This processing may not work correctly in cases where only a fragment of the reply is received. With this patch the ap_dqap inline function now returns the number of actually written bytes in the *length parameter. So the calling function has a chance to compare the number of received bytes against what the AP message header length field states. This is especially useful in cases where a message could only get partially received. The low level reply processing functions needed some rework to be able to catch this new length information and compare it the right way. The rework also deals with some situations where until now the reply length was not correctly calculated and/or set. All this has been heavily tested as the modifications on the reply length information may affect crypto load. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2023-03-20 11:12:47 +01:00
Harald Freudenberger	003d248fee	s390/zcrypt: make psmid unsigned long instead of long long Since s390 kernel build does not support 32 bit build any more there is no difference between long and long long. So this patch reworks all occurrences of psmid (a 64 bit value) to use unsigned long now. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2023-03-20 11:12:47 +01:00
Harald Freudenberger	ebf95e8846	s390/ap,zcrypt,vfio: introduce and use ap_queue_status_reg union Introduce a new ap queue status register wrapper union to access register wide values. So the inline assembler only sees register wide values but the surrounding code may use a more structured view of the same value and a reader of the code (and the compiler) gets a clear understanding about the mapping between fields and register values. All the changes to access the ap queue status are local to the inline functions within ap.h. However, the struct ap_qirq_ctrl has been replaces by a union for same reason and this needed slight adaptions in the calling code. Suggested-by: Halil Pasic <pasic@linux.ibm.com> Suggested-by: Andreas Arnez <arnez@linux.ibm.com> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2023-02-27 15:29:36 +01:00
Nicolin Chen	10e19d492a	vfio/ap: Pass in physical address of ind to ap_aqic() The ap_aqic() is called by vfio_ap_irq_enable() where it passes in a virt value that's casted from a physical address "h_nib". Inside the ap_aqic(), it does virt_to_phys() again. Since ap_aqic() needs a physical address, let's just pass in a pa of ind directly. So change the "ind" to "pa_ind". Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20220723020256.30081-4-nicolinc@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-07-23 07:29:10 -06:00
Harald Freudenberger	2004b57cde	s390/zcrypt: code cleanup This patch tries to fix as much as possible of the checkpatch.pl --strict findings: CHECK: Logical continuations should be on the previous line CHECK: No space is necessary after a cast CHECK: Alignment should match open parenthesis CHECK: 'useable' may be misspelled - perhaps 'usable'? WARNING: Possible repeated word: 'is' CHECK: spaces preferred around that '' (ctx:VxV) CHECK: Comparison to NULL could be written "!msg" CHECK: Prefer kzalloc(sizeof(zc)...) over kzalloc(sizeof(struct...)...) CHECK: Unnecessary parentheses around resp_type->work CHECK: Avoid CamelCase: <xcRB> There is no functional change comming with this patch, only code cleanup, renaming, whitespaces, indenting, ... but no semantic change in any way. Also the API (zcrypt and pkey header file) is semantically unchanged. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Jürgen Christ <jchrist@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2022-04-25 13:54:14 +02:00
Harald Freudenberger	a7e701dba1	s390/zcrypt: handle checkstopped cards with new state A crypto card may be in checkstopped state. With this patch this is handled as a new state in the ap card and ap queue structs. There is also a new card sysfs attribute /sys/devices/ap/cardxx/chkstop and a new queue sysfs attribute /sys/devices/ap/cardxx/xx.yyyy/chkstop displaying the checkstop state of the card or queue. Please note that the queue's checkstop state is only a copy of the card's checkstop state but makes maintenance much easier. The checkstop state expressed here is the result of an RC 0x04 (CHECKSTOP) during an AP command, mostly the PQAP(TAPQ) command which is 'testing' the queue. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Jürgen Christ <jchrist@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2022-03-08 00:33:00 +01:00
Harald Freudenberger	d64e5e9120	s390/ap/zcrypt: debug feature improvements This patch adds some debug feature improvements related to some failures happened in the past. With CEX8 the max request and response sizes have been extended but the user space applications did not rework their code and thus ran into receive buffer issues. This ffdc patch here helps with additional checks and debug feature messages in debugging and pointing to the root cause of some failures related to wrong buffer sizes. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Jürgen Christ <jchrist@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2022-03-08 00:33:00 +01:00
Harald Freudenberger	3f74eb5f78	s390/zcrypt: rework of debug feature messages This patch reworks all the debug feature invocations to be more uniform. All invocations now use the macro with the level already part of the macro name. All messages now start with %s filled with __func__ (well there are still some exceptions), and some message text has been shortened or reworked. There is no functional code touched with this patch. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2021-10-26 15:21:27 +02:00
Harald Freudenberger	3826350e6d	s390/ap: Fix hanging ioctl caused by orphaned replies When a queue is switched to soft offline during heavy load and later switched to soft online again and now used, it may be that the caller is blocked forever in the ioctl call. The failure occurs because there is a pending reply after the queue(s) have been switched to offline. This orphaned reply is received when the queue is switched to online and is accidentally counted for the outstanding replies. So when there was a valid outstanding reply and this orphaned reply is received it counts as the outstanding one thus dropping the outstanding counter to 0. Voila, with this counter the receive function is not called any more and the real outstanding reply is never received (until another request comes in...) and the ioctl blocks. The fix is simple. However, instead of readjusting the counter when an orphaned reply is detected, I check the queue status for not empty and compare this to the outstanding counter. So if the queue is not empty then the counter must not drop to 0 but at least have a value of 1. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2021-10-26 15:21:27 +02:00
Heiko Carstens	948e50551b	s390/ap: fix kernel doc comments Get rid of warnings like: drivers/s390/crypto/ap_bus.c:216: warning: bad line: drivers/s390/crypto/ap_bus.c:444: warning: Function parameter or member 'floating' not described in 'ap_interrupt_handler' Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2021-09-15 14:29:21 +02:00
Harald Freudenberger	cabebb697c	s390/ap: fix state machine hang after failure to enable irq If for any reason the interrupt enable for an ap queue fails the state machine run for the queue returned wrong return codes to the caller. So the caller assumed interrupt support for this queue in enabled and thus did not re-establish the high resolution timer used for polling. In the end this let to a hang for the user space process waiting "forever" for the reply. This patch reworks these return codes to return correct indications for the caller to re-establish the timer when a queue runs without interrupt support. Please note that this is fixing a wrong behavior after a first failure (enable interrupt support for the queue) failed. However, looks like this occasionally happens on KVM systems. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2021-08-26 20:22:12 +02:00
Harald Freudenberger	1f0d22defd	s390/ap: Rework ap_dqap to deal with messages greater than recv buffer Rework of the ap_dqap() inline function with the dqap inline assembler invocation and the caller code in ap_queue.c to be able to handle replies which exceed the receive buffer size. ap_dqap() now provides two additional parameters to handle together with the caller the case where a reply in the firmware queue entry exceeds the given message buffer size. It depends on the caller how to exactly handle this. The behavior implemented now by ap_sm_recv() in ap_queue.c is to simple purge this entry from the firmware queue and let the caller 'receive' a -EMSGSIZE for the request without delivering any reply data - not even a truncated reply message. However, the reworked ap_dqap() could now get invoked in a way that the message is received in multiple parts and the caller assembles the parts into one reply message. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Suggested-by: Juergen Christ <jchrist@linux.ibm.com> Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2021-07-08 15:37:27 +02:00
Harald Freudenberger	bd39654a22	s390/AP: support new dynamic AP bus size limit This patch provides support for new dynamic AP bus message limit with the existing zcrypt device driver and AP bus core code. There is support for a new field 'ml' from TAPQ query. The field gives if != 0 the AP bus limit for this card in 4k chunk units. The actual message size limit per card is shown as a new read-only sysfs attribute. The sysfs attribute /sys/devices/ap/cardxx/max_msg_size shows the upper limit in bytes used by the AP bus and zcrypt device driver for requests and replies send to and received from this card. Currently up to CEX7 support only max 12kB msg size and thus the field shows 12288 meaning the upper limit of a valid msg for this card is 12kB. Please note that the usable payload is somewhat lower and depends on the msg type and thus the header struct which is to be prepended by the zcrypt dd. The dispatcher responsible for choosing the right card and queue is aware of the individual card AP bus message limit. So a request is only assigned to a queue of a card which is able to handle the size of the request (e.g. a 14kB request will never go to a max 12kB card). If no such card is found the ioctl will fail with ENODEV. The reply buffer held by the device driver is determined by the ml field of the TAPQ for this card. If a response from the card exceeds this limit however, the response is not truncated but the ioctl for this request will fail with errno EMSGSIZE to indicate that the device driver has dropped the response because it would overflow the buffer limit. If the request size does not indicate to the dispatcher that an adapter with extended limit is to be used, a random card will be chosen when no specific card is addressed (ANY addressing). This may result in an ioctl failure when the reply size needs an adapter with extended limit but the randomly chosen one is not capable of handling the broader reply size. The user space application needs to use dedicated addressing to forward such a request only to suitable cards to get requests like this processed properly. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ingo Tuchscherer <ingo.tuchscherer@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2021-07-05 12:44:23 +02:00
Harald Freudenberger	e73a99f328	s390/ap: Fix hanging ioctl caused by wrong msg counter When a AP queue is switched to soft offline, all pending requests are purged out of the pending requests list and 'received' by the upper layer like zcrypt device drivers. This is also done for requests which are already enqueued into the firmware queue. A request in a firmware queue may eventually produce an response message, but there is no waiting process any more. However, the response was counted with the queue_counter and as this counter was reset to 0 with the offline switch, the pending response caused the queue_counter to get negative. The next request increased this counter to 0 (instead of 1) which caused the ap code to assume there is nothing to receive and so the response for this valid request was never tried to fetch from the firmware queue. This all caused a queue to not work properly after a switch offline/online and in the end processes to hang forever when trying to send a crypto request after an queue offline/online switch cicle. Fixed by a) making sure the counter does not drop below 0 and b) on a successful enqueue of a message has at least a value of 1. Additionally a warning is emitted, when a reply can't get assigned to a waiting process. This may be normal operation (process had timeout or has been killed) but may give a hint that something unexpected happened (like this odd behavior described above). Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2021-06-16 23:32:02 +02:00
Harald Freudenberger	4366dd7251	s390/zcrypt: fix wrong format specifications Fixes 5 wrong format specification findings found by the kernel test robot in ap_queue.c: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat] __func__, status.response_code, Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reported-by: kernel test robot <lkp@intel.com> Fixes: `2ea2a6099a` ("s390/ap: add error response code field for ap queue devices") Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2020-10-09 23:45:30 +02:00
Harald Freudenberger	27c4f6738b	s390/zcrypt: Introduce Failure Injection feature Introduce a way to specify additional debug flags with an crpyto request to be able to trigger certain failures within the zcrypt device drivers and/or ap core code. This failure injection possibility is only enabled with a kernel debug build CONFIG_ZCRYPT_DEBUG) and should never be available on a regular kernel running in production environment. Details: * The ioctl(ICARSAMODEXPO) get's a struct ica_rsa_modexpo. If the leftmost bit of the 32 bit unsigned int inputdatalength field is set, the uppermost 16 bits are separated and used as debug flag value. The process is checked to have the CAP_SYS_ADMIN capability enabled or EPERM is returned. * The ioctl(ICARSACRT) get's a struct ica_rsa_modexpo_crt. If the leftmost bit of the 32 bit unsigned int inputdatalength field is set, the uppermost 16 bits are separated and used als debug flag value. The process is checked to have the CAP_SYS_ADMIN capability enabled or EPERM is returned. * The ioctl(ZSECSENDCPRB) used to send CCA CPRBs get's a struct ica_xcRB. If the leftmost bit of the 32 bit unsigned int status field is set, the uppermost 16 bits of this field are used as debug flag value. The process is checked to have the CAP_SYS_ADMIN capability enabled or EPERM is returned. * The ioctl(ZSENDEP11CPRB) used to send EP11 CPRBs get's a struct ep11_urb. If the leftmost bit of the 64 bit unsigned int req_len field is set, the uppermost 16 bits of this field are used as debug flag value. The process is checked to have the CAP_SYS_ADMIN capability enabled or EPERM is returned. So it is possible to send an additional 16 bit value to the zcrypt API to be used to carry a failure injection command which may trigger special behavior within the zcrypt API and layers below. This 16 bit value is for the rest of the test referred as 'fi command' for Failure Injection. The lower 8 bits of the fi command construct a numerical argument in the range of 1-255 and is the 'fi action' to be performed with the request or the resulting reply: * 0x00 (all requests): No failure injection action but flags may be provided which may affect the processing of the request or reply. * 0x01 (only CCA CPRBs): The CPRB's agent_ID field is set to 'FF'. This results in an reply code 0x90 (Transport-Protocol Failure). * 0x02 (only CCA CPRBs): After the APQN to send to has been chosen, the domain field within the CPRB is overwritten with value 99 to enforce an reply with RY 0x8A. * 0x03 (all requests): At NQAP invocation the invalid qid value 0xFF00 is used causing an response code of 0x01 (AP queue not valid). The upper 8 bits of the fi command may carry bit flags which may influence the processing of an request or response: * 0x01: No retry. If this bit is set, the usual loop in the zcrypt API which retries an CPRB up to 10 times when the lower layers return with EAGAIN is abandoned after the first attempt to send the CPRB. * 0x02: Toggle special. Toggles the special bit on this request. This should result in an reply code RY~0x41 and result in an ioctl failure with errno EINVAL. This failure injection possibilities may get some further extensions in the future. As of now this is a starting point for Continuous Test and Integration to trigger some failures and watch for the reaction of the ap bus and zcrypt device driver code. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2020-10-07 21:50:01 +02:00
Harald Freudenberger	e0332629e3	s390/ap/zcrypt: revisit ap and zcrypt error handling Revisit the ap queue error handling: Based on discussions and evaluatios with the firmware folk here is now a rework of the response code handling for all the AP instructions. The idea is to distinguish between failures because of some kind of invalid request where a retry does not make any sense and a failure where another attempt to send the very same request may succeed. The first case is handled by returning EINVAL to the userspace application. The second case results in retries within the zcrypt API controlled by a per message retry counter. Revisit the zcrpyt error handling: Similar here, based on discussions with the firmware people here comes a rework of the handling of all the reply codes. Main point here is that there are only very few cases left, where a zcrypt device queue is switched to offline. It should never be the case that an AP reply message is 'unknown' to the device driver as it indicates a total mismatch between device driver and crypto card firmware. In all other cases, the code distinguishes between failure because of invalid message (see above - EINVAL) or failures of the infrastructure (see above - EAGAIN). Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2020-10-07 21:50:01 +02:00
Harald Freudenberger	4f2fcccdb5	s390/ap: add card/queue deconfig state This patch adds a new config state to the ap card and queue devices. This state reflects the response code 0x03 "AP deconfigured" on TQAP invocation and is tracked with every ap bus scan. Together with this new state now a card/queue device which is 'deconfigured' is not disposed any more. However, for backward compatibility the online state now needs to take this state into account. So a card/queue is offline when the device is not configured. Furthermore a device can't get switched from offline to online state when not configured. The config state is shown in sysfs at /sys/devices/ap/cardxx/config for the card and /sys/devices/ap/cardxx/xx.yyyy/config for each queue within each card. It is a read-only attribute reflecting the negation of the 'AP deconfig' state as it is noted in the AP documents. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2020-10-07 21:50:00 +02:00
Harald Freudenberger	2ea2a6099a	s390/ap: add error response code field for ap queue devices On AP instruction failures the last response code is now kept in the struct ap_queue. There is also a new sysfs attribute showing this field (enabled only on debug kernels). Also slight rework of the AP_DBF macros to get some more content into one debug feature message line. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2020-10-07 21:50:00 +02:00
Harald Freudenberger	0b641cbd24	s390/ap: split ap queue state machine state from device state The state machine for each ap queue covered a mixture of device states and state machine (firmware queue state) states. This patch splits the device states and the state machine states into two different enums and variables. The major state is the device state with currently these values: AP_DEV_STATE_UNINITIATED - fresh and virgin, not touched AP_DEV_STATE_OPERATING - queue dev is working normal AP_DEV_STATE_SHUTDOWN - remove/unbind/shutdown in progress AP_DEV_STATE_ERROR - device is in error state only when the device state is > UNINITIATED the state machine is run. The state machine represents the states of the firmware queue: AP_SM_STATE_RESET_START - starting point, reset (RAPQ) ap queue AP_SM_STATE_RESET_WAIT - reset triggered, waiting to be finished if irqs enabled, set up irq (AQIC) AP_SM_STATE_SETIRQ_WAIT - enable irq triggered, waiting to be finished, then go to IDLE AP_SM_STATE_IDLE - queue is operational but empty AP_SM_STATE_WORKING - queue is operational, requests are stored and replies may wait for getting fetched AP_SM_STATE_QUEUE_FULL - firmware queue is full, so only replies can get fetched For debugging each ap queue shows a sysfs attribute 'states' which displays the device and state machine state and is only available when the kernel is build with CONFIG_ZCRYPT_DEBUG enabled. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2020-10-07 21:49:59 +02:00
Harald Freudenberger	dc4b6ded3c	s390/ap: rename and clarify ap state machine related stuff There is a state machine held for each ap queue device. The states and functions related to this where somethimes noted with _sm_ somethimes without. This patch clarifies and renames all the ap queue state machine related functions, enums and defines to have a _sm_ in the name. There is no functional change coming with this patch - it's only beautifying code. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2020-07-03 10:49:49 +02:00
Harald Freudenberger	74ecbef7b9	s390/zcrypt: code beautification and struct field renames Some beautifications related to the internal only used struct ap_message and related code. Instead of one int carrying only the special flag now a u32 flags field is used. At struct CPRBX the pointers to additional data are now marked with __user. This caused some changes needed on code, where these structs are also used within the zcrypt misc functions. The ica_rsa_* structs now use the generic types __u8, __u32, ... instead of char, unsigned int. zcrypt_msg6 and zcrypt_msg50 use min_t() instead of min(). Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2020-07-03 10:49:34 +02:00
Harald Freudenberger	bc4b295e87	s390/ap: introduce new ap function ap_get_qdev() Provide a new interface function to be used by the ap drivers: struct ap_queue *ap_get_qdev(ap_qid_t qid); Returns ptr to the struct ap_queue device or NULL if there was no ap_queue device with this qid found. When something is found, the reference count of the embedded device is increased. So the caller has to decrease the reference count after use with a call to put_device(&aq->ap_dev.device). With this patch also the ap_card_list is removed from the ap core code and a new hashtable is introduced which stores hnodes of all the ap queues known to the ap bus. The hashtable approach and a first implementation of this interface comes from a previous patch from Anthony Krowiak and an idea from Halil Pasic. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Suggested-by: Tony Krowiak <akrowiak@linux.ibm.com> Suggested-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2020-05-20 10:22:51 +02:00
Harald Freudenberger	41677b1d94	s390/ap: remove power management code from ap bus and drivers The s390 power management support has been removed. So the api registration and the suspend and resume callbacks and all the code related to this for the ap bus and the ap drivers is removed with this patch. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2020-03-27 10:22:47 +01:00
Joe Perches	fcf0220abc	s390/zcrypt: use fallthrough; Convert the various uses of fallthrough comments to fallthrough; Done via script Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe.com/ Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2020-03-25 12:39:37 +01:00
Harald Freudenberger	40501c70e3	s390/zcrypt: replace snprintf/sprintf with scnprintf snprintf() may not always return the correct size of used bytes but instead the length the resulting string would be if it would fit into the buffer. So scnprintf() is the function to use when the real length of the resulting string is needed. Replace all occurrences of snprintf() with scnprintf() where the return code is further processed. Also find and fix some occurrences where sprintf() was used. Suggested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2020-03-23 13:41:54 +01:00
Harald Freudenberger	fcd98d4002	s390/zcrypt: fix card and queue total counter wrap The internal statistic counters for the total number of requests processed per card and per queue used integers. So they do wrap after a rather huge amount of crypto requests processed. This patch introduces uint64 counters which should hold much longer but still may wrap. The sysfs attributes request_count for card and queue also used only %ld and now display the counter value with %llu. This is not a security relevant fix. The int overflow which happened is not in any way exploitable as a security breach. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2020-02-10 12:49:35 +01:00
Harald Freudenberger	0c874cd042	s390/zcrypt: move ap device reset from bus to driver code This patch moves the reset invocation of an ap device when fresh detected from the ap bus to the probe() function of the driver responsible for this device. The virtualisation of ap devices makes it necessary to remove unconditioned resets on fresh appearing apqn devices. It may be that such a device is already enabled for guest usage. So there may be a race condition between host ap bus and guest ap bus doing the reset. This patch moves the reset from the ap bus to the zcrypt drivers. So if there is no zcrypt driver bound to an ap device - for example the ap device is bound to the vfio device driver - the ap device is untouched passed to the vfio device driver. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2020-01-09 16:59:18 +01:00
Vasily Gorbik	3cdd986067	s390/zcrypt: adjust switch fall through comments for -Wimplicit-fallthrough Silence the following warnings when built with -Wimplicit-fallthrough=3 enabled by default since 5.3-rc2: In file included from ./include/linux/preempt.h:11, from ./include/linux/spinlock.h:51, from ./include/linux/mmzone.h:8, from ./include/linux/gfp.h:6, from ./include/linux/slab.h:15, from drivers/s390/crypto/ap_queue.c:13: drivers/s390/crypto/ap_queue.c: In function 'ap_sm_recv': ./include/linux/list.h:577:2: warning: this statement may fall through [-Wimplicit-fallthrough=] 577 \| for (pos = list_first_entry(head, typeof(pos), member); \ \| ^~~ drivers/s390/crypto/ap_queue.c:147:3: note: in expansion of macro 'list_for_each_entry' 147 \| list_for_each_entry(ap_msg, &aq->pendingq, list) { \| ^~~~~~~~~~~~~~~~~~~ drivers/s390/crypto/ap_queue.c:155:2: note: here 155 \| case AP_RESPONSE_NO_PENDING_REPLY: \| ^~~~ drivers/s390/crypto/zcrypt_msgtype6.c: In function 'convert_response_ep11_xcrb': drivers/s390/crypto/zcrypt_msgtype6.c:871:6: warning: this statement may fall through [-Wimplicit-fallthrough=] 871 \| if (msg->cprbx.cprb_ver_id == 0x04) \| ^ drivers/s390/crypto/zcrypt_msgtype6.c:874:2: note: here 874 \| default: / Unknown response type, this should NEVER EVER happen / \| ^~~~~~~ drivers/s390/crypto/zcrypt_msgtype6.c: In function 'convert_response_rng': drivers/s390/crypto/zcrypt_msgtype6.c:901:6: warning: this statement may fall through [-Wimplicit-fallthrough=] 901 \| if (msg->cprbx.cprb_ver_id == 0x02) \| ^ drivers/s390/crypto/zcrypt_msgtype6.c:907:2: note: here 907 \| default: / Unknown response type, this should NEVER EVER happen / \| ^~~~~~~ drivers/s390/crypto/zcrypt_msgtype6.c: In function 'convert_response_xcrb': drivers/s390/crypto/zcrypt_msgtype6.c:838:6: warning: this statement may fall through [-Wimplicit-fallthrough=] 838 \| if (msg->cprbx.cprb_ver_id == 0x02) \| ^ drivers/s390/crypto/zcrypt_msgtype6.c:844:2: note: here 844 \| default: / Unknown response type, this should NEVER EVER happen / \| ^~~~~~~ drivers/s390/crypto/zcrypt_msgtype6.c: In function 'convert_response_ica': drivers/s390/crypto/zcrypt_msgtype6.c:801:6: warning: this statement may fall through [-Wimplicit-fallthrough=] 801 \| if (msg->cprbx.cprb_ver_id == 0x02) \| ^ drivers/s390/crypto/zcrypt_msgtype6.c:808:2: note: here 808 \| default: / Unknown response type, this should NEVER EVER happen */ \| ^~~~~~~ Acked-by: Patrick Steuer <patrick.steuer@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2019-08-02 13:58:23 +02:00
Harald Freudenberger	16222cfb96	s390/zcrypt: fix possible deadlock situation on ap queue remove With commit `01396a374c` ("s390/zcrypt: revisit ap device remove procedure") the ap queue remove is now a two stage process. However, a del_timer_sync() call may trigger the timer function which may try to lock the very same spinlock as is held by the function just initiating the del_timer_sync() call. This could end up in a deadlock situation. Very unlikely but possible as you need to remove an ap queue at the exact sime time when a timeout of a request occurs. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reported-by: Pierre Morel <pmorel@linux.ibm.com> Fixes: commit `01396a374c` ("s390/zcrypt: revisit ap device remove procedure") Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2019-04-10 17:46:24 +02:00
Harald Freudenberger	01396a374c	s390/zcrypt: revisit ap device remove procedure Working with the vfio-ap driver let to some revisit of the way how an ap (queue) device is removed from the driver. With the current implementation all the cleanup was done before the driver even got notified about the removal. Now the ap queue removal is done in 3 steps: 1) A preparation step, all ap messages within the queue are flushed and so the driver does 'receive' them. Also a new state AP_STATE_REMOVE assigned to the queue makes sure there are no new messages queued in. 2) Now the driver's remove function is invoked and the driver should do the job of cleaning up it's internal administration lists or whatever. After 2) is done it is guaranteed, that the driver is not invoked any more. On the other hand the driver has to make sure that the APQN is not accessed any more after step 2 is complete. 3) Now the ap bus code does the job of total cleanup of the APQN. A reset with zero is triggered and the state of the queue goes to AP_STATE_UNBOUND. After step 3) is complete, the ap queue has no pending messages and the APQN is cleared and so there are no requests and replies lingering around in the firmware queue for this APQN. Also the interrupts are disabled. After these remove steps the ap queue device may be assigned to another driver. Stress testing this remove/probe procedure showed a problem with the correct module reference counting. The actual receive of an reply in the driver is done asynchronous with completions. So with a driver change on an ap queue the message flush triggers completions but the threads waiting for the completions may run at a time where the queue already has the new driver assigned. So the module_put() at receive time needs to be done on the driver module which queued the ap message. This change is also part of this patch. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2019-03-06 09:34:03 +01:00
Harald Freudenberger	b1af7528d2	s390/zcrypt: use new state UNBOUND during queue driver rebind When an alternate driver (vfio-ap) has bound an ap queue and this binding is revised the ap queue device is in an intermittent state not bound to any driver. The internal state variable covered this with the state AP_STATE_BORKED which is also used to reflect broken devices. When now an ap bus scan runs such a device is destroyed and on the next scan reconstructed. So a stress test with high frequency switching the queue driver between the default and the vfio-ap driver hit this gap and the queue was removed until the next ap bus scan. This fix now introduces another state for the in-between condition for a queue momentary not bound to a driver and so the ap bus scan function skips this device instead of removing it. Also some very slight but maybe helpful debug feature messages come with this patch - in particular a message showing that a broken card/queue device will get removed. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2019-02-13 08:27:44 +01:00
Harald Freudenberger	42a87d4103	s390/zcrypt: make sysfs reset attribute trigger queue reset Until now there is no way to reset a AP queue or card. Driving a card or queue offline and online again does only toggle the 'software' online state. The only way to trigger a (hardware) reset is by running hot-unplug/hot-plug for example on the HMC. This patch makes the queue reset attribute in sysfs writable. Writing into this attribute triggers a reset on the AP queue's state machine. So the AP queue is flushed and state machine runs through the initial states which cause a reset (PQAP(RAPQ)) and a re-registration to interrupts (PQAP(AQIC)) if available. The reset sysfs attribute is writable by root only. So only an administrator is allowed to initiate a reset of AP queues. Please note that the queue's counter values are left untouched by the reset. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2018-12-13 10:42:26 +01:00

1 2

65 commits