1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

2377 commits

Author SHA1 Message Date
Nicholas Piggin
6c12c4376b KVM: PPC: Book3S HV: remove unused kvmppc_h_protect argument
The va argument is not used in the function or set by its asm caller,
so remove it to be safe.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-8-npiggin@gmail.com
2021-04-12 13:36:23 +10:00
Nicholas Piggin
4b5f0a0d49 KVM: PPC: Book3S HV: Remove redundant mtspr PSPB
This SPR is set to 0 twice when exiting the guest.

Suggested-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-7-npiggin@gmail.com
2021-04-12 13:36:23 +10:00
Nicholas Piggin
72c1528721 KVM: PPC: Book3S HV: Prevent radix guests setting LPCR[TC]
Prevent radix guests setting LPCR[TC]. This bit only applies to hash
partitions.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-6-npiggin@gmail.com
2021-04-12 13:36:23 +10:00
Nicholas Piggin
bcc92a0d6d KVM: PPC: Book3S HV: Disallow LPCR[AIL] to be set to 1 or 2
These are already disallowed by H_SET_MODE from the guest, also disallow
these by updating LPCR directly.

AIL modes can affect the host interrupt behaviour while the guest LPCR
value is set, so filter it here too.

Suggested-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-5-npiggin@gmail.com
2021-04-12 13:36:23 +10:00
Nicholas Piggin
67145ef496 KVM: PPC: Book3S HV: Add a function to filter guest LPCR bits
Guest LPCR depends on hardware type, and future changes will add
restrictions based on errata and guest MMU mode. Move this logic
to a common function and use it for the cases where the guest
wants to update its LPCR (or the LPCR of a nested guest).

This also adds a warning in other places that set or update LPCR
if we try to set something that would have been disallowed by
the filter, as a sanity check.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-4-npiggin@gmail.com
2021-04-12 13:36:23 +10:00
Nicholas Piggin
a19b70abc6 KVM: PPC: Book3S HV: Nested move LPCR sanitising to sanitise_hv_regs
This will get a bit more complicated in future patches. Move it
into the helper function.

This change allows the L1 hypervisor to determine some of the LPCR
bits that the L0 is using to run it, which could be a privilege
violation (LPCR is HV-privileged), although the same problem exists
now for HFSCR for example. Discussion of the HV privilege issue is
ongoing and can be resolved with a later change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-3-npiggin@gmail.com
2021-04-12 13:36:23 +10:00
Nicholas Piggin
5088eb4092 KVM: PPC: Book3S HV P9: Restore host CTRL SPR after guest exit
The host CTRL (runlatch) value is not restored after guest exit. The
host CTRL should always be 1 except in CPU idle code, so this can result
in the host running with runlatch clear, and potentially switching to
a different vCPU which then runs with runlatch clear as well.

This has little effect on P9 machines, CTRL is only responsible for some
PMU counter logic in the host and so other than corner cases of software
relying on that, or explicitly reading the runlatch value (Linux does
not appear to be affected but it's possible non-Linux guests could be),
there should be no execution correctness problem, though it could be
used as a covert channel between guests.

There may be microcontrollers, firmware or monitoring tools that sample
the runlatch value out-of-band, however since the register is writable
by guests, these values would (should) not be relied upon for correct
operation of the host, so suboptimal performance or incorrect reporting
should be the worst problem.

Fixes: 95a6432ce9 ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-2-npiggin@gmail.com
2021-04-12 13:36:22 +10:00
Linus Torvalds
d94d14008e x86:
- take into account HVA before retrying on MMU notifier race
 - fixes for nested AMD guests without NPT
 - allow INVPCID in guest without PCID
 - disable PML in hardware when not in use
 - MMU code cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmA3eMQUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroP6TQf5ARpUyq3oo+13albwg+zNca6hzR8i
 Vl7dpoR3bSJCN3sTYFnlL9eXw5TxgeUL2nqKqma6ddZDNDEBLT2Bq8rcFkbi4pUf
 n7av76EEq74HW/jlUhKVug7Q5Dm5DiKC6BOH3RVuKHbr6iZseyF3jXZSX0Ppf0yF
 gvoy6cGyMW60NVLN5tuGeOjVQ1fxziE0SqB90fXuiWgZ5rzIBfbqJV7EOOZsGO67
 /LHSaEpvKutsc2a+Hx76yQNJjAbb2/O+4Bo5/RqfdqS5tRLGBzYggdJjLvAPvd6P
 pTNtDCnErvBZQfMedEQyHYuBL2Ca59fOp6i/ekOM2I+m7816+kSkdTMt2g==
 =iMHY
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Paolo Bonzini:
 "x86:

   - take into account HVA before retrying on MMU notifier race

   - fixes for nested AMD guests without NPT

   - allow INVPCID in guest without PCID

   - disable PML in hardware when not in use

   - MMU code cleanups:

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
  KVM: SVM: Fix nested VM-Exit on #GP interception handling
  KVM: vmx/pmu: Fix dummy check if lbr_desc->event is created
  KVM: x86/mmu: Consider the hva in mmu_notifier retry
  KVM: x86/mmu: Skip mmu_notifier check when handling MMIO page fault
  KVM: Documentation: rectify rst markup in KVM_GET_SUPPORTED_HV_CPUID
  KVM: nSVM: prepare guest save area while is_guest_mode is true
  KVM: x86/mmu: Remove a variety of unnecessary exports
  KVM: x86: Fold "write-protect large" use case into generic write-protect
  KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML
  KVM: VMX: Dynamically enable/disable PML based on memslot dirty logging
  KVM: x86: Further clarify the logic and comments for toggling log dirty
  KVM: x86: Move MMU's PML logic to common code
  KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function
  KVM: x86/mmu: Expand on the comment in kvm_vcpu_ad_need_write_protect()
  KVM: nVMX: Disable PML in hardware when running L2
  KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs
  KVM: x86/mmu: Pass the memslot to the rmap callbacks
  KVM: x86/mmu: Split out max mapping level calculation to helper
  KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE and HugeTLB pages
  KVM: nVMX: no need to undo inject_page_fault change on nested vmexit
  ...
2021-02-26 10:00:12 -08:00
Linus Torvalds
b12b472496 powerpc updates for 5.12
A large series adding wrappers for our interrupt handlers, so that irq/nmi/user
 tracking can be isolated in the wrappers rather than spread in each handler.
 
 Conversion of the 32-bit syscall handling into C.
 
 A series from Nick to streamline our TLB flushing when using the Radix MMU.
 
 Switch to using queued spinlocks by default for 64-bit server CPUs.
 
 A rework of our PCI probing so that it happens later in boot, when more generic
 infrastructure is available.
 
 Two small fixes to allow 32-bit little-endian processes to run on 64-bit
 kernels.
 
 Other smaller features, fixes & cleanups.
 
 Thanks to:
   Alexey Kardashevskiy, Ananth N Mavinakayanahalli, Aneesh Kumar K.V, Athira
   Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chengyang Fan, Christophe Leroy,
   Christopher M. Riedl, Fabiano Rosas, Florian Fainelli, Frederic Barrat, Ganesh
   Goudar, Hari Bathini, Jiapeng Chong, Joseph J Allen, Kajol Jain, Markus
   Elfring, Michal Suchanek, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver
   O'Halloran, Pingfan Liu, Po-Hsu Lin, Qian Cai, Ram Pai, Randy Dunlap, Sandipan
   Das, Stephen Rothwell, Tyrel Datwyler, Will Springer, Yury Norov, Zheng
   Yongjun.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmAzMagTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgAbBD/wMS2g1Q9oAGZPsx2NGd2RoeAauGxUs
 Yj6cZVmR+oa6sJyFYgEG7dT7tcwJITQxLBD3HpsHSnJ/rLrMloE33+cZNA9c4STz
 0mlzm3R7M5pOgcEqZglsgLP0RQeUuHSSF01g0kf1N3r+HYtmbmPjuUIl8CnAjlbT
 iMD2ZN2p8/r3kDDht0iBO534HUpsqhc00duSZgQhsV/PR7ZWVxoPk7PEJeo4vXlJ
 77986F7J5NLUTjMiLv5lTx49FcPbRd7a1jubsBtahJrwXj2GVvuy2i86G7HY+a+B
 eSxN7zJQgaFeLo0YPo7fZLBI0MAsIQt3nnZhKX0TMglbv/K8Aq64xiJqsVQdJ883
 CeEt0HvSJhsSC0C4O595NEINfDhDd+5IeSF9MvsujYXiUKRXtRkm1EPuAzTcZIzW
 NwkCLRo33NMXa+khMKaiqF/g7INayPUXoWESx75NXFsuNfcORvstkeUuEoi5GwJo
 TSlmosFqwRjghQ8eTLZuWBzmh3EpPGdtC4gm6D+lbzhzjah5c/1whyuLqra275kK
 E3Qt0/V0ixKyvlG7MI5yYh3L7+R/hrsflH7xIJJxZp2DW6mwBJzQYmkxDbSS8PzK
 nWien2XgpIQhSFat3QqreEFSfNkzdN2MClVi2Y1hpAgi+2Zm9rPdPNGcQI+DSOsB
 kpJkjOjWNJU/PQ==
 =dB2S
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - A large series adding wrappers for our interrupt handlers, so that
   irq/nmi/user tracking can be isolated in the wrappers rather than
   spread in each handler.

 - Conversion of the 32-bit syscall handling into C.

 - A series from Nick to streamline our TLB flushing when using the
   Radix MMU.

 - Switch to using queued spinlocks by default for 64-bit server CPUs.

 - A rework of our PCI probing so that it happens later in boot, when
   more generic infrastructure is available.

 - Two small fixes to allow 32-bit little-endian processes to run on
   64-bit kernels.

 - Other smaller features, fixes & cleanups.

Thanks to: Alexey Kardashevskiy, Ananth N Mavinakayanahalli, Aneesh
Kumar K.V, Athira Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chengyang
Fan, Christophe Leroy, Christopher M. Riedl, Fabiano Rosas, Florian
Fainelli, Frederic Barrat, Ganesh Goudar, Hari Bathini, Jiapeng Chong,
Joseph J Allen, Kajol Jain, Markus Elfring, Michal Suchanek, Nathan
Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pingfan Liu,
Po-Hsu Lin, Qian Cai, Ram Pai, Randy Dunlap, Sandipan Das, Stephen
Rothwell, Tyrel Datwyler, Will Springer, Yury Norov, and Zheng Yongjun.

* tag 'powerpc-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (188 commits)
  powerpc/perf: Adds support for programming of Thresholding in P10
  powerpc/pci: Remove unimplemented prototypes
  powerpc/uaccess: Merge raw_copy_to_user_allowed() into raw_copy_to_user()
  powerpc/uaccess: Merge __put_user_size_allowed() into __put_user_size()
  powerpc/uaccess: get rid of small constant size cases in raw_copy_{to,from}_user()
  powerpc/64: Fix stack trace not displaying final frame
  powerpc/time: Remove get_tbl()
  powerpc/time: Avoid using get_tbl()
  spi: mpc52xx: Avoid using get_tbl()
  powerpc/syscall: Avoid storing 'current' in another pointer
  powerpc/32: Handle bookE debugging in C in syscall entry/exit
  powerpc/syscall: Do not check unsupported scv vector on PPC32
  powerpc/32: Remove the counter in global_dbcr0
  powerpc/32: Remove verification of MSR_PR on syscall in the ASM entry
  powerpc/syscall: implement system call entry/exit logic in C for PPC32
  powerpc/32: Always save non volatile GPRs at syscall entry
  powerpc/syscall: Change condition to check MSR_RI
  powerpc/syscall: Save r3 in regs->orig_r3
  powerpc/syscall: Use is_compat_task()
  powerpc/syscall: Make interrupt.c buildable on PPC32
  ...
2021-02-22 14:34:00 -08:00
David Stevens
4a42d848db KVM: x86/mmu: Consider the hva in mmu_notifier retry
Track the range being invalidated by mmu_notifier and skip page fault
retries if the fault address is not affected by the in-progress
invalidation. Handle concurrent invalidations by finding the minimal
range which includes all ranges being invalidated. Although the combined
range may include unrelated addresses and cannot be shrunk as individual
invalidation operations complete, it is unlikely the marginal gains of
proper range tracking are worth the additional complexity.

The primary benefit of this change is the reduction in the likelihood of
extreme latency when handing a page fault due to another thread having
been preempted while modifying host virtual addresses.

Signed-off-by: David Stevens <stevensd@chromium.org>
Message-Id: <20210222024522.1751719-3-stevensd@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-22 13:16:53 -05:00
Jiapeng Chong
c9df3f809c powerpc/xive: Assign boolean values to a bool variable
Fix the following coccicheck warnings:

./arch/powerpc/kvm/book3s_xive.c:1856:2-17: WARNING: Assignment of 0/1
to bool variable.

./arch/powerpc/kvm/book3s_xive.c:1854:2-17: WARNING: Assignment of 0/1
to bool variable.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1612680192-43116-1-git-send-email-jiapeng.chong@linux.alibaba.com
2021-02-11 23:35:06 +11:00
Nicholas Piggin
72476aaa46 KVM: PPC: Book3S HV: Fix host radix SLB optimisation with hash guests
Commit 68ad28a4cd ("KVM: PPC: Book3S HV: Fix radix guest SLB side
channel") incorrectly removed the radix host instruction patch to skip
re-loading the host SLB entries when exiting from a hash
guest. Restore it.

Fixes: 68ad28a4cd ("KVM: PPC: Book3S HV: Fix radix guest SLB side channel")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-11 17:28:15 +11:00
Paul Mackerras
ab950e1acd KVM: PPC: Book3S HV: Ensure radix guest has no SLB entries
Commit 68ad28a4cd ("KVM: PPC: Book3S HV: Fix radix guest SLB side
channel") changed the older guest entry path, with the side effect
that vcpu->arch.slb_max no longer gets cleared for a radix guest.
This means that a HPT guest which loads some SLB entries, switches to
radix mode, runs the guest using the old guest entry path (e.g.,
because the indep_threads_mode module parameter has been set to
false), and then switches back to HPT mode would now see the old SLB
entries being present, whereas previously it would have seen no SLB
entries.

To avoid changing guest-visible behaviour, this adds a store
instruction to clear vcpu->arch.slb_max for a radix guest using the
old guest entry path.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-11 17:19:42 +11:00
Fabiano Rosas
a722076e94 KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2
These machines don't support running both MMU types at the same time,
so remove the KVM_CAP_PPC_MMU_HASH_V3 capability when the host is
using Radix MMU.

[paulus@ozlabs.org - added defensive check on
 kvmppc_hv_ops->hash_v3_possible]

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10 16:19:36 +11:00
Fabiano Rosas
25edcc50d7 KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path
The Facility Status and Control Register is a privileged SPR that
defines the availability of some features in problem state. Since it
can be written by the guest, we must restore it to the previous host
value after guest exit.

This restoration is currently done by taking the value from
current->thread.fscr, which in the P9 path is not enough anymore
because the guest could context switch the QEMU thread, causing the
guest-current value to be saved into the thread struct.

The above situation manifested when running a QEMU linked against a
libc with System Call Vectored support, which causes scv
instructions to be run by QEMU early during the guest boot (during
SLOF), at which point the FSCR is 0 due to guest entry. After a few
scv calls (1 to a couple hundred), the context switching happens and
the QEMU thread runs with the guest value, resulting in a Facility
Unavailable interrupt.

This patch saves and restores the host value of FSCR in the inner
guest entry loop in a way independent of current->thread.fscr. The old
way of doing it is still kept in place because it works for the old
entry path.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10 14:31:08 +11:00
Yang Li
63e9f23573 KVM: PPC: remove unneeded semicolon
Eliminate the following coccicheck warning:
./arch/powerpc/kvm/booke.c:701:2-3: Unneeded semicolon

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10 14:31:08 +11:00
Nicholas Piggin
7a7f94a3a9 KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLB
IH=6 may preserve hypervisor real-mode ERAT entries and is the
recommended SLBIA hint for switching partitions.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10 14:31:08 +11:00
Nicholas Piggin
078ebe35fc KVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guest
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10 14:31:08 +11:00
Nicholas Piggin
68ad28a4cd KVM: PPC: Book3S HV: Fix radix guest SLB side channel
The slbmte instruction is legal in radix mode, including radix guest
mode. This means radix guests can load the SLB with arbitrary data.

KVM host does not clear the SLB when exiting a guest if it was a
radix guest, which would allow a rogue radix guest to use the SLB as
a side channel to communicate with other guests.

Fix this by ensuring the SLB is cleared when coming out of a radix
guest. Only the first 4 entries are a concern, because radix guests
always run with LPCR[UPRT]=1, which limits the reach of slbmte. slbia
is not used (except in a non-performance-critical path) because it
can clear cached translations.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10 14:31:08 +11:00
Nicholas Piggin
b1b1697ae0 KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support
This reverts much of commit c01015091a ("KVM: PPC: Book3S HV: Run HPT
guests on POWER9 radix hosts"), which was required to run HPT guests on
RPT hosts on early POWER9 CPUs without support for "mixed mode", which
meant the host could not run with MMU on while guests were running.

This code has some corner case bugs, e.g., when the guest hits a machine
check or HMI the primary locks up waiting for secondaries to switch LPCR
to host, which they never do. This could all be fixed in software, but
most CPUs in production have mixed mode support, and those that don't
are believed to be all in installations that don't use this capability.
So simplify things and remove support.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10 14:31:08 +11:00
Ravi Bangoria
d9a47edabc KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR
Introduce KVM_CAP_PPC_DAWR1 which can be used by QEMU to query whether
KVM supports 2nd DAWR or not. The capability is by default disabled
even when the underlying CPU supports 2nd DAWR. QEMU needs to check
and enable it manually to use the feature.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10 14:31:08 +11:00
Ravi Bangoria
bd1de1a0e6 KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWR
KVM code assumes single DAWR everywhere. Add code to support 2nd DAWR.
DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/
unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR.
Also, KVM will support 2nd DAWR only if CPU_FTR_DAWR1 is set.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10 14:31:08 +11:00
Ravi Bangoria
122954ed7d KVM: PPC: Book3S HV: Rename current DAWR macros and variables
Power10 is introducing a second DAWR (Data Address Watchpoint
Register). Use real register names (with suffix 0) from ISA for
current macros and variables used by kvm.  One exception is
KVM_REG_PPC_DAWR.  Keep it as it is because it's uapi so changing it
will break userspace.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10 14:31:08 +11:00
Ravi Bangoria
afe7504930 KVM: PPC: Book3S HV: Allow nested guest creation when L0 hv_guest_state > L1
On powerpc, L1 hypervisor takes help of L0 using H_ENTER_NESTED
hcall to load L2 guest state in cpu. L1 hypervisor prepares the
L2 state in struct hv_guest_state and passes a pointer to it via
hcall. Using that pointer, L0 reads/writes that state directly
from/to L1 memory. Thus L0 must be aware of hv_guest_state layout
of L1. Currently it uses version field to achieve this. i.e. If
L0 hv_guest_state.version != L1 hv_guest_state.version, L0 won't
allow nested kvm guest.

This restriction can be loosened up a bit. L0 can be taught to
understand older layout of hv_guest_state, if we restrict the
new members to be added only at the end, i.e. we can allow
nested guest even when L0 hv_guest_state.version > L1
hv_guest_state.version. Though, the other way around is not
possible.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10 14:31:08 +11:00
Christophe Leroy
fd659e8f2c powerpc/32s: Change mfsrin() into a static inline function
mfsrin() is a macro.

Change in into an inline function to avoid conflicts in KVM
and make it more evolutive.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/72c7b9879e2e2e6f5c27dadda6486386c2b50f23.1612612022.git.christophe.leroy@csgroup.eu
2021-02-09 01:10:15 +11:00
Nicholas Piggin
3a96570ffc powerpc: convert interrupt handlers to use wrappers
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-29-npiggin@gmail.com
2021-02-09 00:02:12 +11:00
Nicholas Piggin
112665286d KVM: PPC: Book3S HV: Context tracking exit guest context before enabling irqs
Interrupts that occur in kernel mode expect that context tracking
is set to kernel. Enabling local irqs before context tracking
switches from guest to host means interrupts can come in and trigger
warnings about wrong context, and possibly worse.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-3-npiggin@gmail.com
2021-02-09 00:02:08 +11:00
Christophe Leroy
27f699579b powerpc/kvm: Force selection of CONFIG_PPC_FPU
book3s/32 kvm is designed with the assumption that
an FPU is always present.

Force selection of FPU support in the kernel when
build KVM.

Fixes: 7d68c89169 ("powerpc/32s: Allow deselecting CONFIG_PPC_FPU on mpc832x")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/74461a99fa1466f361532ca794ca0753be3d9f86.1611038044.git.christophe.leroy@csgroup.eu
2021-01-30 11:39:32 +11:00
Cédric Le Goater
d834915e8e KVM: PPC: Book3S HV: Include prototypes
It fixes these W=1 compile errors :

 CC [M]  arch/powerpc/kvm/book3s_64_mmu_hv.o
../arch/powerpc/kvm/book3s_64_mmu_hv.c:879:5: error: no previous prototype for ‘kvm_unmap_hva_range_hv’ [-Werror=missing-prototypes]
  879 | int kvm_unmap_hva_range_hv(struct kvm *kvm, unsigned long start, unsigned long end)
      |     ^~~~~~~~~~~~~~~~~~~~~~
../arch/powerpc/kvm/book3s_64_mmu_hv.c:888:6: error: no previous prototype for ‘kvmppc_core_flush_memslot_hv’ [-Werror=missing-prototypes]
  888 | void kvmppc_core_flush_memslot_hv(struct kvm *kvm,
      |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
../arch/powerpc/kvm/book3s_64_mmu_hv.c:970:5: error: no previous prototype for ‘kvm_age_hva_hv’ [-Werror=missing-prototypes]
  970 | int kvm_age_hva_hv(struct kvm *kvm, unsigned long start, unsigned long end)
      |     ^~~~~~~~~~~~~~
../arch/powerpc/kvm/book3s_64_mmu_hv.c:1011:5: error: no previous prototype for ‘kvm_test_age_hva_hv’ [-Werror=missing-prototypes]
 1011 | int kvm_test_age_hva_hv(struct kvm *kvm, unsigned long hva)
      |     ^~~~~~~~~~~~~~~~~~~
../arch/powerpc/kvm/book3s_64_mmu_hv.c:1019:6: error: no previous prototype for ‘kvm_set_spte_hva_hv’ [-Werror=missing-prototypes]
 1019 | void kvm_set_spte_hva_hv(struct kvm *kvm, unsigned long hva, pte_t pte)
      |      ^~~~~~~~~~~~~~~~~~~

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210104143206.695198-20-clg@kaod.org
2021-01-30 11:39:31 +11:00
Cédric Le Goater
9236f57a9e KVM: PPC: Make the VMX instruction emulation routines static
These are only used locally. It fixes these W=1 compile errors :

../arch/powerpc/kvm/powerpc.c:1521:5: error: no previous prototype for ‘kvmppc_get_vmx_dword’ [-Werror=missing-prototypes]
 1521 | int kvmppc_get_vmx_dword(struct kvm_vcpu *vcpu, int index, u64 *val)
      |     ^~~~~~~~~~~~~~~~~~~~
../arch/powerpc/kvm/powerpc.c:1539:5: error: no previous prototype for ‘kvmppc_get_vmx_word’ [-Werror=missing-prototypes]
 1539 | int kvmppc_get_vmx_word(struct kvm_vcpu *vcpu, int index, u64 *val)
      |     ^~~~~~~~~~~~~~~~~~~
../arch/powerpc/kvm/powerpc.c:1557:5: error: no previous prototype for ‘kvmppc_get_vmx_hword’ [-Werror=missing-prototypes]
 1557 | int kvmppc_get_vmx_hword(struct kvm_vcpu *vcpu, int index, u64 *val)
      |     ^~~~~~~~~~~~~~~~~~~~
../arch/powerpc/kvm/powerpc.c:1575:5: error: no previous prototype for ‘kvmppc_get_vmx_byte’ [-Werror=missing-prototypes]
 1575 | int kvmppc_get_vmx_byte(struct kvm_vcpu *vcpu, int index, u64 *val)
      |     ^~~~~~~~~~~~~~~~~~~

Fixes: acc9eb9305 ("KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction mmio emulation with analyse_instr() input")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210104143206.695198-19-clg@kaod.org
2021-01-30 11:39:31 +11:00
Linus Torvalds
8a5be36b93 powerpc updates for 5.11
- Switch to the generic C VDSO, as well as some cleanups of our VDSO
    setup/handling code.
 
  - Support for KUAP (Kernel User Access Prevention) on systems using the hashed
    page table MMU, using memory protection keys.
 
  - Better handling of PowerVM SMT8 systems where all threads of a core do not
    share an L2, allowing the scheduler to make better scheduling decisions.
 
  - Further improvements to our machine check handling.
 
  - Show registers when unwinding interrupt frames during stack traces.
 
  - Improvements to our pseries (PowerVM) partition migration code.
 
  - Several series from Christophe refactoring and cleaning up various parts of
    the 32-bit code.
 
  - Other smaller features, fixes & cleanups.
 
 Thanks to:
   Alan Modra, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Ard
   Biesheuvel, Athira Rajeev, Balamuruhan S, Bill Wendling, Cédric Le Goater,
   Christophe Leroy, Christophe Lombard, Colin Ian King, Daniel Axtens, David
   Hildenbrand, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geert
   Uytterhoeven, Giuseppe Sacco, Greg Kurz, Harish, Jan Kratochvil, Jordan
   Niethe, Kaixu Xia, Laurent Dufour, Leonardo Bras, Madhavan Srinivasan, Mahesh
   Salgaonkar, Mathieu Desnoyers, Nathan Lynch, Nicholas Piggin, Oleg Nesterov,
   Oliver O'Halloran, Oscar Salvador, Po-Hsu Lin, Qian Cai, Qinglang Miao, Randy
   Dunlap, Ravi Bangoria, Sachin Sant, Sandipan Das, Sebastian Andrzej Siewior ,
   Segher Boessenkool, Srikar Dronamraju, Tyrel Datwyler, Uwe Kleine-König,
   Vincent Stehlé, Youling Tang, Zhang Xiaoxu.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl/bURITHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgEzBEAC1Vwibcog2P9rkJPb0q3UGWSYSx25V
 h/LwwxtM9Tm14j/LZsSgkOgIsfMaWEBIw/8D4efQ7AX9aFo+R0c2DdQMx1MG5MXz
 gZk58+l3LwId6h9+OrwurpEW+ZmURLAtGMSyFdkeiZ3/XTnkbf1XnewC0QWQe56a
 EGLmjx1MFl45jspoy7UIUXsXoNJIfflEKhrgUzSUh8X2eLmvB9ws6A4BXxbVzyZl
 lZv3+uWimU2pFgdkB9jOCxoG4zFEr2o5ovLHG7zCCVo5JoXmTPQ5cMVBraH206ms
 +5vCmu4qI8uP5UlZW/mZfhrtDiMdHdQqqFOaQwVlOmoUbU6L6E6rxm3iVnov2Bbi
 iUgxoeJDxAb2cM2EWFK6oWVgr7+NkwvXM1IG8xtprhHrCdnC9r+psQr/dswb3LSg
 MJ7u/RCq3uixy2kWP8E0NEHw7ngQZ/ZKPqzfnmIWOC7tYUxgaL02I8Ff9/ZXAI2J
 CnmqFYOjrimHkcwXGOtKkXNvfU0DiL97qpK2AQNWElE8+bUUmpw+ltUrsdSycYmv
 Afc4WIcVrTA+a9laSLgjdZbolbNSa3p+cMIYdPrVx9g+xqygbxIKv+EDGNv1WHfD
 GU1gmohMY+ZkMOvFRMi8LAsEm0DH/etWE0py/8uyxDYKnGyD1Ur6452DStkmgGb2
 azmcaOyLdb+HXA==
 =Ga3K
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Switch to the generic C VDSO, as well as some cleanups of our VDSO
   setup/handling code.

 - Support for KUAP (Kernel User Access Prevention) on systems using the
   hashed page table MMU, using memory protection keys.

 - Better handling of PowerVM SMT8 systems where all threads of a core
   do not share an L2, allowing the scheduler to make better scheduling
   decisions.

 - Further improvements to our machine check handling.

 - Show registers when unwinding interrupt frames during stack traces.

 - Improvements to our pseries (PowerVM) partition migration code.

 - Several series from Christophe refactoring and cleaning up various
   parts of the 32-bit code.

 - Other smaller features, fixes & cleanups.

Thanks to: Alan Modra, Alexey Kardashevskiy, Andrew Donnellan, Aneesh
Kumar K.V, Ard Biesheuvel, Athira Rajeev, Balamuruhan S, Bill Wendling,
Cédric Le Goater, Christophe Leroy, Christophe Lombard, Colin Ian King,
Daniel Axtens, David Hildenbrand, Frederic Barrat, Ganesh Goudar,
Gautham R. Shenoy, Geert Uytterhoeven, Giuseppe Sacco, Greg Kurz,
Harish, Jan Kratochvil, Jordan Niethe, Kaixu Xia, Laurent Dufour,
Leonardo Bras, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu
Desnoyers, Nathan Lynch, Nicholas Piggin, Oleg Nesterov, Oliver
O'Halloran, Oscar Salvador, Po-Hsu Lin, Qian Cai, Qinglang Miao, Randy
Dunlap, Ravi Bangoria, Sachin Sant, Sandipan Das, Sebastian Andrzej
Siewior , Segher Boessenkool, Srikar Dronamraju, Tyrel Datwyler, Uwe
Kleine-König, Vincent Stehlé, Youling Tang, and Zhang Xiaoxu.

* tag 'powerpc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (304 commits)
  powerpc/32s: Fix cleanup_cpu_mmu_context() compile bug
  powerpc: Add config fragment for disabling -Werror
  powerpc/configs: Add ppc64le_allnoconfig target
  powerpc/powernv: Rate limit opal-elog read failure message
  powerpc/pseries/memhotplug: Quieten some DLPAR operations
  powerpc/ps3: use dma_mapping_error()
  powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10
  powerpc/perf: Fix Threshold Event Counter Multiplier width for P10
  powerpc/mm: Fix hugetlb_free_pmd_range() and hugetlb_free_pud_range()
  KVM: PPC: Book3S HV: Fix mask size for emulated msgsndp
  KVM: PPC: fix comparison to bool warning
  KVM: PPC: Book3S: Assign boolean values to a bool variable
  powerpc: Inline setup_kup()
  powerpc/64s: Mark the kuap/kuep functions non __init
  KVM: PPC: Book3S HV: XIVE: Add a comment regarding VP numbering
  powerpc/xive: Improve error reporting of OPAL calls
  powerpc/xive: Simplify xive_do_source_eoi()
  powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_EOI_FW
  powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_MASK_FW
  powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_SHIFT_BUG
  ...
2020-12-17 13:34:25 -08:00
Leonardo Bras
87fb4978ef KVM: PPC: Book3S HV: Fix mask size for emulated msgsndp
According to ISAv3.1 and ISAv3.0b, the msgsndp is described to split
RB in:
  msgtype <- (RB) 32:36
  payload <- (RB) 37:63
  t       <- (RB) 57:63

The current way of getting 'msgtype', and 't' is missing their MSB:
  msgtype: ((arg >> 27) & 0xf) : Gets (RB) 33:36, missing bit 32
  t:       (arg &= 0x3f)       : Gets (RB) 58:63, missing bit 57

Fixes this by applying the correct mask.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201208215707.31149-1-leobras.c@gmail.com
2020-12-15 22:22:06 +11:00
Kaixu Xia
a300bf8c5f KVM: PPC: fix comparison to bool warning
Fix the following coccicheck warning:

./arch/powerpc/kvm/booke.c:503:6-16: WARNING: Comparison to bool
./arch/powerpc/kvm/booke.c:505:6-17: WARNING: Comparison to bool
./arch/powerpc/kvm/booke.c:507:6-16: WARNING: Comparison to bool

Reported-by: Tosk Robot <tencent_os_robot@tencent.com>
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1604764178-8087-1-git-send-email-kaixuxia@tencent.com
2020-12-15 22:22:06 +11:00
Kaixu Xia
13751f8747 KVM: PPC: Book3S: Assign boolean values to a bool variable
Fix the following coccinelle warnings:

./arch/powerpc/kvm/book3s_xics.c:476:3-15: WARNING: Assignment of 0/1 to bool variable
./arch/powerpc/kvm/book3s_xics.c:504:3-15: WARNING: Assignment of 0/1 to bool variable

Reported-by: Tosk Robot <tencent_os_robot@tencent.com>
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1604730382-5810-1-git-send-email-kaixuxia@tencent.com
2020-12-15 22:22:06 +11:00
Cédric Le Goater
dddc4ef92d KVM: PPC: Book3S HV: XIVE: Add a comment regarding VP numbering
When the XIVE resources are allocated at the HW level, the VP
structures describing the vCPUs of a guest are distributed among
the chips to optimize the PowerBUS usage. For best performance, the
guest vCPUs can be pinned to match the VP structure distribution.

Currently, the VP identifiers are deduced from the vCPU id using
the kvmppc_pack_vcpu_id() routine which is not incorrect but not
optimal either. It VSMT is used, the result is not continuous and
the constraints on HW resources described above can not be met.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-14-clg@kaod.org
2020-12-11 09:53:11 +11:00
Cédric Le Goater
cf58b74666 powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_EOI_FW
This flag was used to support the P9 DD1 and we have stopped
supporting this CPU when DD2 came out. See skiboot commit:

  https://github.com/open-power/skiboot/commit/0b0d15e3c170

Also, remove eoi handler which is now unused.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-11-clg@kaod.org
2020-12-11 09:53:10 +11:00
Cédric Le Goater
b5277d18c6 powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_MASK_FW
This flag was used to support the PHB4 LSIs on P9 DD1 and we have
stopped supporting this CPU when DD2 came out. See skiboot commit:

  https://github.com/open-power/skiboot/commit/0b0d15e3c170

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-10-clg@kaod.org
2020-12-11 09:53:10 +11:00
Cédric Le Goater
4cc0e36df2 powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_SHIFT_BUG
This flag was used to support the PHB4 LSIs on P9 DD1 and we have
stopped supporting this CPU when DD2 came out. See skiboot commit:

  https://github.com/open-power/skiboot/commit/0b0d15e3c170

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-9-clg@kaod.org
2020-12-11 09:53:10 +11:00
Cédric Le Goater
4f1c3f7b08 powerpc/xive: Rename XIVE_IRQ_NO_EOI to show its a flag
This is a simple cleanup to identify easily all flags of the XIVE
interrupt structure. The interrupts flagged with XIVE_IRQ_FLAG_NO_EOI
are the escalations used to wake up vCPUs in KVM. They are handled
very differently from the rest.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-3-clg@kaod.org
2020-12-11 09:34:07 +11:00
Cédric Le Goater
9898367500 KVM: PPC: Book3S HV: XIVE: Show detailed configuration in debug output
This is useful to track allocation of the HW resources on per guest
basis. Making sure IPIs are local to the chip of the vCPUs reduces
rerouting between interrupt controllers and gives better performance
in case of pinning. Checking the distribution of VP structures on the
chips also helps in reducing PowerBUS traffic.

[ clg: resurrected show_sources and reworked ouput ]

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-2-clg@kaod.org
2020-12-11 09:34:07 +11:00
Nicholas Piggin
e89a8ca94b powerpc/64s: Remove MSR[ISF] bit
No supported processor implements this mode. Setting the bit in
MSR values can be a bit confusing (and would prevent the bit from
ever being reused). Remove it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201106045340.1935841-1-npiggin@gmail.com
2020-12-09 23:48:14 +11:00
Christophe Leroy
ff57698a96 powerpc: Fix update form addressing in inline assembly
In several places, inline assembly uses the "%Un" modifier
to enable the use of instruction with update form addressing,
but the associated "<>" constraint is missing.

As mentioned in previous patch, this fails with gcc 4.9, so
"<>" can't be used directly.

Use UPD_CONSTR macro everywhere %Un modifier is used.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/62eab5ca595485c192de1765bdac099f633a21d0.1603358942.git.christophe.leroy@csgroup.eu
2020-12-04 22:13:19 +11:00
Aneesh Kumar K.V
9f378b9f00 KVM: PPC: BOOK3S: PR: Ignore UAMOR SPR
With power7 and above we expect the cpu to support keys. The
number of keys are firmware controlled based on device tree.
PR KVM do not expose key details via device tree. Hence when running with PR KVM
we do run with MMU_FTR_KEY support disabled. But we can still
get updates on UAMOR. Hence ignore access to them and for mfstpr return
0 indicating no AMR/IAMR update is no allowed.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201127044424.40686-3-aneesh.kumar@linux.ibm.com
2020-12-04 01:01:24 +11:00
Nicholas Piggin
1d15ffdfc9 KVM: PPC: Book3S HV: Ratelimit machine check messages coming from guests
A number of machine check exceptions are triggerable by the guest.
Ratelimit these to avoid a guest flooding the host console and logs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Use dedicated ratelimit state, not printk_ratelimit()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201128070728.825934-5-npiggin@gmail.com
2020-12-04 01:01:23 +11:00
Nicholas Piggin
067c9f9c98 KVM: PPC: Book3S HV: Don't attempt to recover machine checks for FWNMI enabled guests
Guests that can deal with machine checks would actually prefer the
hypervisor not to try recover for them. For example if SLB multi-hits
are recovered by the hypervisor by clearing the SLB then the guest
will not be able to log the contents and debug its programming error.

If guests don't register for FWNMI, they may not be so capable and so
the hypervisor will continue to recover for those.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201128070728.825934-4-npiggin@gmail.com
2020-12-04 01:01:23 +11:00
Greg Kurz
f54db39fbe KVM: PPC: Book3S HV: XIVE: Fix vCPU id sanity check
Commit 062cfab706 ("KVM: PPC: Book3S HV: XIVE: Make VP block size
configurable") updated kvmppc_xive_vcpu_id_valid() in a way that
allows userspace to trigger an assertion in skiboot and crash the host:

[  696.186248988,3] XIVE[ IC 08  ] eq_blk != vp_blk (0 vs. 1) for target 0x4300008c/0
[  696.186314757,0] Assert fail: hw/xive.c:2370:0
[  696.186342458,0] Aborting!
xive-kvCPU 0043 Backtrace:
 S: 0000000031e2b8f0 R: 0000000030013840   .backtrace+0x48
 S: 0000000031e2b990 R: 000000003001b2d0   ._abort+0x4c
 S: 0000000031e2ba10 R: 000000003001b34c   .assert_fail+0x34
 S: 0000000031e2ba90 R: 0000000030058984   .xive_eq_for_target.part.20+0xb0
 S: 0000000031e2bb40 R: 0000000030059fdc   .xive_setup_silent_gather+0x2c
 S: 0000000031e2bc20 R: 000000003005a334   .opal_xive_set_vp_info+0x124
 S: 0000000031e2bd20 R: 00000000300051a4   opal_entry+0x134
 --- OPAL call token: 0x8a caller R1: 0xc000001f28563850 ---

XIVE maintains the interrupt context state of non-dispatched vCPUs in
an internal VP structure. We allocate a bunch of those on startup to
accommodate all possible vCPUs. Each VP has an id, that we derive from
the vCPU id for efficiency:

static inline u32 kvmppc_xive_vp(struct kvmppc_xive *xive, u32 server)
{
	return xive->vp_base + kvmppc_pack_vcpu_id(xive->kvm, server);
}

The KVM XIVE device used to allocate KVM_MAX_VCPUS VPs. This was
limitting the number of concurrent VMs because the VP space is
limited on the HW. Since most of the time, VMs run with a lot less
vCPUs, commit 062cfab706 ("KVM: PPC: Book3S HV: XIVE: Make VP
block size configurable") gave the possibility for userspace to
tune the size of the VP block through the KVM_DEV_XIVE_NR_SERVERS
attribute.

The check in kvmppc_pack_vcpu_id() was changed from

	cpu < KVM_MAX_VCPUS * xive->kvm->arch.emul_smt_mode

to

	cpu < xive->nr_servers * xive->kvm->arch.emul_smt_mode

The previous check was based on the fact that the VP block had
KVM_MAX_VCPUS entries and that kvmppc_pack_vcpu_id() guarantees
that packed vCPU ids are below KVM_MAX_VCPUS. We've changed the
size of the VP block, but kvmppc_pack_vcpu_id() has nothing to
do with it and it certainly doesn't ensure that the packed vCPU
ids are below xive->nr_servers. kvmppc_xive_vcpu_id_valid() might
thus return true when the VM was configured with a non-standard
VSMT mode, even if the packed vCPU id is higher than what we
expect. We end up using an unallocated VP id, which confuses
OPAL. The assert in OPAL is probably abusive and should be
converted to a regular error that the kernel can handle, but
we shouldn't really use broken VP ids in the first place.

Fix kvmppc_xive_vcpu_id_valid() so that it checks the packed
vCPU id is below xive->nr_servers, which is explicitly what we
want.

Fixes: 062cfab706 ("KVM: PPC: Book3S HV: XIVE: Make VP block size configurable")
Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/160673876747.695514.1809676603724514920.stgit@bahia.lan
2020-12-01 21:45:51 +11:00
Michael Ellerman
20fa40b147 Merge branch 'fixes' into next
Merge our fixes branch, in particular to bring in the changes for the
entry/uaccess flush.
2020-11-25 23:17:31 +11:00
Christophe Leroy
120c0518ec powerpc: Replace RFI by rfi on book3s/32 and booke
For book3s/32 and for booke, RFI is just an rfi.
Only 40x has a non trivial RFI.
CONFIG_PPC_RTAS is never selected by 40x platforms.

Make it more explicit by replacing RFI by rfi wherever possible.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b901ddfdeb8a0a3b7cb59999599cdfde1bbfe834.1604854583.git.christophe.leroy@csgroup.eu
2020-11-19 16:56:54 +11:00
Aneesh Kumar K.V
e80639405c powerpc/mm: Update tlbiel loop on POWER10
With POWER10, single tlbiel instruction invalidates all the congruence
class of the TLB and hence we need to issue only one tlbiel with SET=0.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201007053305.232879-1-aneesh.kumar@linux.ibm.com
2020-11-19 14:50:15 +11:00
Cédric Le Goater
75b4962026 KVM: PPC: Book3S HV: XIVE: Fix possible oops when accessing ESB page
When accessing the ESB page of a source interrupt, the fault handler
will retrieve the page address from the XIVE interrupt 'xive_irq_data'
structure. If the associated KVM XIVE interrupt is not valid, that is
not allocated at the HW level for some reason, the fault handler will
dereference a NULL pointer leading to the oops below :

  WARNING: CPU: 40 PID: 59101 at arch/powerpc/kvm/book3s_xive_native.c:259 xive_native_esb_fault+0xe4/0x240 [kvm]
  CPU: 40 PID: 59101 Comm: qemu-system-ppc Kdump: loaded Tainted: G        W        --------- -  - 4.18.0-240.el8.ppc64le #1
  NIP:  c00800000e949fac LR: c00000000044b164 CTR: c00800000e949ec8
  REGS: c000001f69617840 TRAP: 0700   Tainted: G        W        --------- -  -  (4.18.0-240.el8.ppc64le)
  MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 44044282  XER: 00000000
  CFAR: c00000000044b160 IRQMASK: 0
  GPR00: c00000000044b164 c000001f69617ac0 c00800000e96e000 c000001f69617c10
  GPR04: 05faa2b21e000080 0000000000000000 0000000000000005 ffffffffffffffff
  GPR08: 0000000000000000 0000000000000001 0000000000000000 0000000000000001
  GPR12: c00800000e949ec8 c000001ffffd3400 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 c000001f5c065160 c000000001c76f90
  GPR24: c000001f06f20000 c000001f5c065100 0000000000000008 c000001f0eb98c78
  GPR28: c000001dcab40000 c000001dcab403d8 c000001f69617c10 0000000000000011
  NIP [c00800000e949fac] xive_native_esb_fault+0xe4/0x240 [kvm]
  LR [c00000000044b164] __do_fault+0x64/0x220
  Call Trace:
  [c000001f69617ac0] [0000000137a5dc20] 0x137a5dc20 (unreliable)
  [c000001f69617b50] [c00000000044b164] __do_fault+0x64/0x220
  [c000001f69617b90] [c000000000453838] do_fault+0x218/0x930
  [c000001f69617bf0] [c000000000456f50] __handle_mm_fault+0x350/0xdf0
  [c000001f69617cd0] [c000000000457b1c] handle_mm_fault+0x12c/0x310
  [c000001f69617d10] [c00000000007ef44] __do_page_fault+0x264/0xbb0
  [c000001f69617df0] [c00000000007f8c8] do_page_fault+0x38/0xd0
  [c000001f69617e30] [c00000000000a714] handle_page_fault+0x18/0x38
  Instruction dump:
  40c2fff0 7c2004ac 2fa90000 409e0118 73e90001 41820080 e8bd0008 7c2004ac
  7ca90074 39400000 915c0000 7929d182 <0b090000> 2fa50000 419e0080 e89e0018
  ---[ end trace 66c6ff034c53f64f ]---
  xive-kvm: xive_native_esb_fault: accessing invalid ESB page for source 8 !

Fix that by checking the validity of the KVM XIVE interrupt structure.

Fixes: 6520ca64cd ("KVM: PPC: Book3S HV: XIVE: Add a mapping for the source ESB pages")
Cc: stable@vger.kernel.org # v5.2+
Reported-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201105134713.656160-1-clg@kaod.org
2020-11-16 23:28:30 +11:00