1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

1038 commits

Author SHA1 Message Date
Sean Christopherson
80d0f521d5 KVM: SVM: Require nrips support for SEV guests (and beyond)
Disallow SEV (and beyond) if nrips is disabled via module param, as KVM
can't read guest memory to partially emulate and skip an instruction.  All
CPUs that support SEV support NRIPS, i.e. this is purely stopping the user
from shooting themselves in the foot.

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20230825013621.2845700-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-25 09:00:40 -07:00
Sean Christopherson
cb49631ad1 KVM: SVM: Don't inject #UD if KVM attempts to skip SEV guest insn
Don't inject a #UD if KVM attempts to "emulate" to skip an instruction
for an SEV guest, and instead resume the guest and hope that it can make
forward progress.  When commit 04c40f344d ("KVM: SVM: Inject #UD on
attempted emulation for SEV guest w/o insn buffer") added the completely
arbitrary #UD behavior, there were no known scenarios where a well-behaved
guest would induce a VM-Exit that triggered emulation, i.e. it was thought
that injecting #UD would be helpful.

However, now that KVM (correctly) attempts to re-inject INT3/INTO, e.g. if
a #NPF is encountered when attempting to deliver the INT3/INTO, an SEV
guest can trigger emulation without a buffer, through no fault of its own.
Resuming the guest and retrying the INT3/INTO is architecturally wrong,
e.g. the vCPU will incorrectly re-hit code #DBs, but for SEV guests there
is literally no other option that has a chance of making forward progress.

Drop the #UD injection for all "skip" emulation, not just those related to
INT3/INTO, even though that means that the guest will likely end up in an
infinite loop instead of getting a #UD (the vCPU may also crash, e.g. if
KVM emulated everything about an instruction except for advancing RIP).
There's no evidence that suggests that an unexpected #UD is actually
better than hanging the vCPU, e.g. a soft-hung vCPU can still respond to
IRQs and NMIs to generate a backtrace.

Reported-by: Wu Zongyo <wuzongyo@mail.ustc.edu.cn>
Closes: https://lore.kernel.org/all/8eb933fd-2cf3-d7a9-32fe-2a1d82eac42a@mail.ustc.edu.cn
Fixes: 6ef88d6e36 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
Cc: stable@vger.kernel.org
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20230825013621.2845700-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-25 09:00:40 -07:00
Sean Christopherson
1952e74da9 KVM: SVM: Skip VMSA init in sev_es_init_vmcb() if pointer is NULL
Skip initializing the VMSA physical address in the VMCB if the VMSA is
NULL, which occurs during intrahost migration as KVM initializes the VMCB
before copying over state from the source to the destination (including
the VMSA and its physical address).

In normal builds, __pa() is just math, so the bug isn't fatal, but with
CONFIG_DEBUG_VIRTUAL=y, the validity of the virtual address is verified
and passing in NULL will make the kernel unhappy.

Fixes: 6defa24d3b ("KVM: SEV: Init target VMCBs in sev_migrate_from")
Cc: stable@vger.kernel.org
Cc: Peter Gonda <pgonda@google.com>
Reviewed-by: Peter Gonda <pgonda@google.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Link: https://lore.kernel.org/r/20230825022357.2852133-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-25 09:00:16 -07:00
Sean Christopherson
f1187ef24e KVM: SVM: Get source vCPUs from source VM for SEV-ES intrahost migration
Fix a goof where KVM tries to grab source vCPUs from the destination VM
when doing intrahost migration.  Grabbing the wrong vCPU not only hoses
the guest, it also crashes the host due to the VMSA pointer being left
NULL.

  BUG: unable to handle page fault for address: ffffe38687000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP NOPTI
  CPU: 39 PID: 17143 Comm: sev_migrate_tes Tainted: GO       6.5.0-smp--fff2e47e6c3b-next #151
  Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.28.0 07/10/2023
  RIP: 0010:__free_pages+0x15/0xd0
  RSP: 0018:ffff923fcf6e3c78 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffffe38687000000 RCX: 0000000000000100
  RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffffe38687000000
  RBP: ffff923fcf6e3c88 R08: ffff923fcafb0000 R09: 0000000000000000
  R10: 0000000000000000 R11: ffffffff83619b90 R12: ffff923fa9540000
  R13: 0000000000080007 R14: ffff923f6d35d000 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff929d0d7c0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffe38687000000 CR3: 0000005224c34005 CR4: 0000000000770ee0
  PKRU: 55555554
  Call Trace:
   <TASK>
   sev_free_vcpu+0xcb/0x110 [kvm_amd]
   svm_vcpu_free+0x75/0xf0 [kvm_amd]
   kvm_arch_vcpu_destroy+0x36/0x140 [kvm]
   kvm_destroy_vcpus+0x67/0x100 [kvm]
   kvm_arch_destroy_vm+0x161/0x1d0 [kvm]
   kvm_put_kvm+0x276/0x560 [kvm]
   kvm_vm_release+0x25/0x30 [kvm]
   __fput+0x106/0x280
   ____fput+0x12/0x20
   task_work_run+0x86/0xb0
   do_exit+0x2e3/0x9c0
   do_group_exit+0xb1/0xc0
   __x64_sys_exit_group+0x1b/0x20
   do_syscall_64+0x41/0x90
   entry_SYSCALL_64_after_hwframe+0x63/0xcd
   </TASK>
  CR2: ffffe38687000000

Fixes: 6defa24d3b ("KVM: SEV: Init target VMCBs in sev_migrate_from")
Cc: stable@vger.kernel.org
Cc: Peter Gonda <pgonda@google.com>
Reviewed-by: Peter Gonda <pgonda@google.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Link: https://lore.kernel.org/r/20230825022357.2852133-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-25 09:00:16 -07:00
Sean Christopherson
ee785c870d KVM: nSVM: Use KVM-governed feature framework to track "vNMI enabled"
Track "virtual NMI exposed to L1" via a governed feature flag instead of
using a dedicated bit/flag in vcpu_svm.

Note, checking KVM's capabilities instead of the "vnmi" param means that
the code isn't strictly equivalent, as vnmi_enabled could have been set
if nested=false where as that the governed feature cannot.  But that's a
glorified nop as the feature/flag is consumed only by paths that are
gated by nSVM being enabled.

Link: https://lore.kernel.org/r/20230815203653.519297-15-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17 11:43:31 -07:00
Sean Christopherson
b89456aee7 KVM: nSVM: Use KVM-governed feature framework to track "vGIF enabled"
Track "virtual GIF exposed to L1" via a governed feature flag instead of
using a dedicated bit/flag in vcpu_svm.

Note, checking KVM's capabilities instead of the "vgif" param means that
the code isn't strictly equivalent, as vgif_enabled could have been set
if nested=false where as that the governed feature cannot.  But that's a
glorified nop as the feature/flag is consumed only by paths that are

Link: https://lore.kernel.org/r/20230815203653.519297-14-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17 11:43:31 -07:00
Sean Christopherson
59d67fc1f0 KVM: nSVM: Use KVM-governed feature framework to track "Pause Filter enabled"
Track "Pause Filtering is exposed to L1" via governed feature flags
instead of using dedicated bits/flags in vcpu_svm.

No functional change intended.

Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Link: https://lore.kernel.org/r/20230815203653.519297-13-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17 11:43:30 -07:00
Sean Christopherson
e183d17ac3 KVM: nSVM: Use KVM-governed feature framework to track "LBRv enabled"
Track "LBR virtualization exposed to L1" via a governed feature flag
instead of using a dedicated bit/flag in vcpu_svm.

Note, checking KVM's capabilities instead of the "lbrv" param means that
the code isn't strictly equivalent, as lbrv_enabled could have been set
if nested=false where as that the governed feature cannot.  But that's a
glorified nop as the feature/flag is consumed only by paths that are
gated by nSVM being enabled.

Link: https://lore.kernel.org/r/20230815203653.519297-12-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17 11:43:30 -07:00
Sean Christopherson
4d2a1560ff KVM: nSVM: Use KVM-governed feature framework to track "vVM{SAVE,LOAD} enabled"
Track "virtual VMSAVE/VMLOAD exposed to L1" via a governed feature flag
instead of using a dedicated bit/flag in vcpu_svm.

Opportunistically add a comment explaining why KVM disallows virtual
VMLOAD/VMSAVE when the vCPU model is Intel.

No functional change intended.

Link: https://lore.kernel.org/r/20230815203653.519297-11-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17 11:43:29 -07:00
Sean Christopherson
4365a45571 KVM: nSVM: Use KVM-governed feature framework to track "TSC scaling enabled"
Track "TSC scaling exposed to L1" via a governed feature flag instead of
using a dedicated bit/flag in vcpu_svm.

Note, this fixes a benign bug where KVM would mark TSC scaling as exposed
to L1 even if overall nested SVM supported is disabled, i.e. KVM would let
L1 write MSR_AMD64_TSC_RATIO even when KVM didn't advertise TSCRATEMSR
support to userspace.

Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Link: https://lore.kernel.org/r/20230815203653.519297-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17 11:43:28 -07:00
Sean Christopherson
7a6a6a3bf5 KVM: nSVM: Use KVM-governed feature framework to track "NRIPS enabled"
Track "NRIPS exposed to L1" via a governed feature flag instead of using
a dedicated bit/flag in vcpu_svm.

No functional change intended.

Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Link: https://lore.kernel.org/r/20230815203653.519297-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17 11:43:28 -07:00
Sean Christopherson
fe60e8f65f KVM: x86: Use KVM-governed feature framework to track "XSAVES enabled"
Use the governed feature framework to track if XSAVES is "enabled", i.e.
if XSAVES can be used by the guest.  Add a comment in the SVM code to
explain the very unintuitive logic of deliberately NOT checking if XSAVES
is enumerated in the guest CPUID model.

No functional change intended.

Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Link: https://lore.kernel.org/r/20230815203653.519297-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17 11:38:28 -07:00
Sean Christopherson
f3cebc75e7 KVM: SVM: Set target pCPU during IRTE update if target vCPU is running
Update the target pCPU for IOMMU doorbells when updating IRTE routing if
KVM is actively running the associated vCPU.  KVM currently only updates
the pCPU when loading the vCPU (via avic_vcpu_load()), and so doorbell
events will be delayed until the vCPU goes through a put+load cycle (which
might very well "never" happen for the lifetime of the VM).

To avoid inserting a stale pCPU, e.g. due to racing between updating IRTE
routing and vCPU load/put, get the pCPU information from the vCPU's
Physical APIC ID table entry (a.k.a. avic_physical_id_cache in KVM) and
update the IRTE while holding ir_list_lock.  Add comments with --verbose
enabled to explain exactly what is and isn't protected by ir_list_lock.

Fixes: 411b44ba80 ("svm: Implements update_pi_irte hook to setup posted interrupt")
Reported-by: dengqiao.joey <dengqiao.joey@bytedance.com>
Cc: stable@vger.kernel.org
Cc: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20230808233132.2499764-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17 11:32:07 -07:00
Sean Christopherson
4c08e737f0 KVM: SVM: Take and hold ir_list_lock when updating vCPU's Physical ID entry
Hoist the acquisition of ir_list_lock from avic_update_iommu_vcpu_affinity()
to its two callers, avic_vcpu_load() and avic_vcpu_put(), specifically to
encapsulate the write to the vCPU's entry in the AVIC Physical ID table.
This will allow a future fix to pull information from the Physical ID entry
when updating the IRTE, without potentially consuming stale information,
i.e. without racing with the vCPU being (un)loaded.

Add a comment to call out that ir_list_lock does NOT protect against
multiple writers, specifically that reading the Physical ID entry in
avic_vcpu_put() outside of the lock is safe.

To preserve some semblance of independence from ir_list_lock, keep the
READ_ONCE() in avic_vcpu_load() even though acuiring the spinlock
effectively ensures the load(s) will be generated after acquiring the
lock.

Cc: stable@vger.kernel.org
Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Link: https://lore.kernel.org/r/20230808233132.2499764-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-17 11:31:37 -07:00
Peter Zijlstra
864bcaa38e x86/cpu/kvm: Provide UNTRAIN_RET_VM
Similar to how it doesn't make sense to have UNTRAIN_RET have two
untrain calls, it also doesn't make sense for VMEXIT to have an extra
IBPB call.

This cures VMEXIT doing potentially unret+IBPB or double IBPB.
Also, the (SEV) VMEXIT case seems to have been overlooked.

Redefine the meaning of the synthetic IBPB flags to:

 - ENTRY_IBPB     -- issue IBPB on entry  (was: entry + VMEXIT)
 - IBPB_ON_VMEXIT -- issue IBPB on VMEXIT

And have 'retbleed=ibpb' set *BOTH* feature flags to ensure it retains
the previous behaviour and issues IBPB on entry+VMEXIT.

The new 'srso=ibpb_vmexit' option only sets IBPB_ON_VMEXIT.

Create UNTRAIN_RET_VM specifically for the VMEXIT case, and have that
check IBPB_ON_VMEXIT.

All this avoids having the VMEXIT case having to check both ENTRY_IBPB
and IBPB_ON_VMEXIT and simplifies the alternatives.

Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121149.109557833@infradead.org
2023-08-16 21:58:59 +02:00
Borislav Petkov (AMD)
f58d6fbcb7 x86/CPU/AMD: Fix the DIV(0) initial fix attempt
Initially, it was thought that doing an innocuous division in the #DE
handler would take care to prevent any leaking of old data from the
divider but by the time the fault is raised, the speculation has already
advanced too far and such data could already have been used by younger
operations.

Therefore, do the innocuous division on every exit to userspace so that
userspace doesn't see any potentially old data from integer divisions in
kernel space.

Do the same before VMRUN too, to protect host data from leaking into the
guest too.

Fixes: 77245f1c3c ("x86/CPU/AMD: Do not leak quotient data after a division by 0")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230811213824.10025-1-bp@alien8.de
2023-08-14 11:02:50 +02:00
Linus Torvalds
138bcddb86 Add a mitigation for the speculative RAS (Return Address Stack) overflow
vulnerability on AMD processors. In short, this is yet another issue
 where userspace poisons a microarchitectural structure which can then be
 used to leak privileged information through a side channel.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmTQs1gACgkQEsHwGGHe
 VUo1UA/8C34PwJveZDcerdkaxSF+WKx7AjOI/L2ws1qn9YVFA3ItFMgVuFTrlY6c
 1eYKYB3FS9fVN3KzGOXGyhho6seHqfY0+8cyYupR+PVLn9rSy7GqHaIMr37FdQ2z
 yb9xu26v+gsvuPEApazS6MxijYS98u71rHhmg97qsHCnUiMJ01+TaGucntukNJv8
 FfwjZJvgeUiBPQ/6IeA/O0413tPPJ9weawPyW+sV1w7NlXjaUVkNXwiq/Xxbt9uI
 sWwMBjFHpSnhBRaDK8W5Blee/ZfsS6qhJ4jyEKUlGtsElMnZLPHbnrbpxxqA9gyE
 K+3ZhoHf/W1hhvcZcALNoUHLx0CvVekn0o41urAhPfUutLIiwLQWVbApmuW80fgC
 DhPedEFu7Wp6Okj5+Bqi/XOsOOWN2WRDSzdAq10o1C+e+fzmkr6y4E6gskfz1zXU
 ssD9S4+uAJ5bccS5lck4zLffsaA03nAYTlvl1KRP4pOz5G9ln6eyO20ar1WwfGAV
 o5ZsTJVGQMyVA49QFkksj+kOI3chkmDswPYyGn2y8OfqYXU4Ip4eN+VkjorIAo10
 zIec3Z0bCGZ9UUMylUmdtH3KAm8q0wVNoFrUkMEmO8j6nn7ew2BhwLMn4uu+nOnw
 lX2AG6PNhRLVDVaNgDsWMwejaDsitQPoWRuCIAZ0kQhbeYuwfpM=
 =73JY
 -----END PGP SIGNATURE-----

Merge tag 'x86_bugs_srso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86/srso fixes from Borislav Petkov:
 "Add a mitigation for the speculative RAS (Return Address Stack)
  overflow vulnerability on AMD processors.

  In short, this is yet another issue where userspace poisons a
  microarchitectural structure which can then be used to leak privileged
  information through a side channel"

* tag 'x86_bugs_srso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/srso: Tie SBPB bit setting to microcode patch detection
  x86/srso: Add a forgotten NOENDBR annotation
  x86/srso: Fix return thunks in generated code
  x86/srso: Add IBPB on VMEXIT
  x86/srso: Add IBPB
  x86/srso: Add SRSO_NO support
  x86/srso: Add IBPB_BRTYPE support
  x86/srso: Add a Speculative RAS Overflow mitigation
  x86/bugs: Increase the x86 bugs vector size to two u32s
2023-08-07 16:35:44 -07:00
Paolo Bonzini
63dbc67cf4 KVM: SEV: remove ghcb variable declarations
To avoid possible time-of-check/time-of-use issues, the GHCB should
almost never be accessed outside dump_ghcb, sev_es_sync_to_ghcb
and sev_es_sync_from_ghcb.  The only legitimate uses are to set the
exitinfo fields and to find the address of the scratch area embedded
in the ghcb.  Accessing ghcb_usage also goes through svm->sev_es.ghcb
in sev_es_validate_vmgexit(), but that is because anyway the value is
not used.

Removing a shortcut variable that contains the value of svm->sev_es.ghcb
makes these cases a bit more verbose, but it limits the chance of someone
reading the ghcb by mistake.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-08-04 13:33:07 -04:00
Paolo Bonzini
7588dbcebc KVM: SEV: only access GHCB fields once
A KVM guest using SEV-ES or SEV-SNP with multiple vCPUs can trigger
a double fetch race condition vulnerability and invoke the VMGEXIT
handler recursively.

sev_handle_vmgexit() maps the GHCB page using kvm_vcpu_map() and then
fetches the exit code using ghcb_get_sw_exit_code().  Soon after,
sev_es_validate_vmgexit() fetches the exit code again. Since the GHCB
page is shared with the guest, the guest is able to quickly swap the
values with another vCPU and hence bypass the validation. One vmexit code
that can be rejected by sev_es_validate_vmgexit() is SVM_EXIT_VMGEXIT;
if sev_handle_vmgexit() observes it in the second fetch, the call
to svm_invoke_exit_handler() will invoke sev_handle_vmgexit() again
recursively.

To avoid the race, always fetch the GHCB data from the places where
sev_es_sync_from_ghcb stores it.

Exploiting recursions on linux kernel has been proven feasible
in the past, but the impact is mitigated by stack guard pages
(CONFIG_VMAP_STACK).  Still, if an attacker manages to call the handler
multiple times, they can theoretically trigger a stack overflow and
cause a denial-of-service, or potentially guest-to-host escape in kernel
configurations without stack guard pages.

Note that winning the race reliably in every iteration is very tricky
due to the very tight window of the fetches; depending on the compiler
settings, they are often consecutive because of optimization and inlining.

Tested by booting an SEV-ES RHEL9 guest.

Fixes: CVE-2023-4155
Fixes: 291bd20d5d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Cc: stable@vger.kernel.org
Reported-by: Andy Nguyen <theflow@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-08-04 13:33:06 -04:00
Paolo Bonzini
4e15a0ddc3 KVM: SEV: snapshot the GHCB before accessing it
Validation of the GHCB is susceptible to time-of-check/time-of-use vulnerabilities.
To avoid them, we would like to always snapshot the fields that are read in
sev_es_validate_vmgexit(), and not use the GHCB anymore after it returns.

This means:

- invoking sev_es_sync_from_ghcb() before any GHCB access, including before
  sev_es_validate_vmgexit()

- snapshotting all fields including the valid bitmap and the sw_scratch field,
  which are currently not caching anywhere.

The valid bitmap is the first thing to be copied out of the GHCB; then,
further accesses will use the copy in svm->sev_es.

Fixes: 291bd20d5d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-08-04 13:33:06 -04:00
Sean Christopherson
223f93d4d8 KVM: nSVM: Skip writes to MSR_AMD64_TSC_RATIO if guest state isn't loaded
Skip writes to MSR_AMD64_TSC_RATIO that are done in the context of a vCPU
if guest state isn't loaded, i.e. if KVM will update MSR_AMD64_TSC_RATIO
during svm_prepare_switch_to_guest() before entering the guest.  Checking
guest_state_loaded may or may not be a net positive for performance as
the current_tsc_ratio cache will optimize away duplicate WRMSRs in the
vast majority of scenarios.  However, the cost of the check is negligible,
and the real motivation is to document that KVM needs to load the vCPU's
value only when running the vCPU.

Link: https://lore.kernel.org/r/20230729011608.1065019-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 17:16:30 -07:00
Sean Christopherson
2d63699099 KVM: x86: Always write vCPU's current TSC offset/ratio in vendor hooks
Drop the @offset and @multiplier params from the kvm_x86_ops hooks for
propagating TSC offsets/multipliers into hardware, and instead have the
vendor implementations pull the information directly from the vCPU
structure.  The respective vCPU fields _must_ be written at the same
time in order to maintain consistent state, i.e. it's not random luck
that the value passed in by all callers is grabbed from the vCPU.

Explicitly grabbing the value from the vCPU field in SVM's implementation
in particular will allow for additional cleanup without introducing even
more subtle dependencies.  Specifically, SVM can skip the WRMSR if guest
state isn't loaded, i.e. svm_prepare_switch_to_guest() will load the
correct value for the vCPU prior to entering the guest.

This also reconciles KVM's handling of related values that are stored in
the vCPU, as svm_write_tsc_offset() already assumes/requires the caller
to have updated l1_tsc_offset.

Link: https://lore.kernel.org/r/20230729011608.1065019-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 17:16:29 -07:00
Sean Christopherson
229725acfa KVM: SVM: Clean up preemption toggling related to MSR_AMD64_TSC_RATIO
Explicitly disable preemption when writing MSR_AMD64_TSC_RATIO only in the
"outer" helper, as all direct callers of the "inner" helper now run with
preemption already disabled.  And that isn't a coincidence, as the outer
helper requires a vCPU and is intended to be used when modifying guest
state and/or emulating guest instructions, which are typically done with
preemption enabled.

Direct use of the inner helper should be extremely limited, as the only
time KVM should modify MSR_AMD64_TSC_RATIO without a vCPU is when
sanitizing the MSR for a specific pCPU (currently done when {en,dis}abling
disabling SVM). The other direct caller is svm_prepare_switch_to_guest(),
which does have a vCPU, but is a one-off special case: KVM is about to
enter the guest on a specific pCPU and thus must have preemption disabled.

Link: https://lore.kernel.org/r/20230729011608.1065019-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 17:16:29 -07:00
Sean Christopherson
c0dc39bd2c KVM: nSVM: Use the "outer" helper for writing multiplier to MSR_AMD64_TSC_RATIO
When emulating nested SVM transitions, use the outer helper for writing
the TSC multiplier for L2.  Using the inner helper only for one-off cases,
i.e. for paths where KVM is NOT emulating or modifying vCPU state, will
allow for multiple cleanups:

 - Explicitly disabling preemption only in the outer helper
 - Getting the multiplier from the vCPU field in the outer helper
 - Skipping the WRMSR in the outer helper if guest state isn't loaded

Opportunistically delete an extra newline.

No functional change intended.

Link: https://lore.kernel.org/r/20230729011608.1065019-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 17:16:28 -07:00
Sean Christopherson
0c94e24684 KVM: nSVM: Load L1's TSC multiplier based on L1 state, not L2 state
When emulating nested VM-Exit, load L1's TSC multiplier if L1's desired
ratio doesn't match the current ratio, not if the ratio L1 is using for
L2 diverges from the default.  Functionally, the end result is the same
as KVM will run L2 with L1's multiplier if L2's multiplier is the default,
i.e. checking that L1's multiplier is loaded is equivalent to checking if
L2 has a non-default multiplier.

However, the assertion that TSC scaling is exposed to L1 is flawed, as
userspace can trigger the WARN at will by writing the MSR and then
updating guest CPUID to hide the feature (modifying guest CPUID is
allowed anytime before KVM_RUN).  E.g. hacking KVM's state_test
selftest to do

                vcpu_set_msr(vcpu, MSR_AMD64_TSC_RATIO, 0);
                vcpu_clear_cpuid_feature(vcpu, X86_FEATURE_TSCRATEMSR);

after restoring state in a new VM+vCPU yields an endless supply of:

  ------------[ cut here ]------------
  WARNING: CPU: 10 PID: 206939 at arch/x86/kvm/svm/nested.c:1105
           nested_svm_vmexit+0x6af/0x720 [kvm_amd]
  Call Trace:
   nested_svm_exit_handled+0x102/0x1f0 [kvm_amd]
   svm_handle_exit+0xb9/0x180 [kvm_amd]
   kvm_arch_vcpu_ioctl_run+0x1eab/0x2570 [kvm]
   kvm_vcpu_ioctl+0x4c9/0x5b0 [kvm]
   ? trace_hardirqs_off+0x4d/0xa0
   __se_sys_ioctl+0x7a/0xc0
   __x64_sys_ioctl+0x21/0x30
   do_syscall_64+0x41/0x90
   entry_SYSCALL_64_after_hwframe+0x63/0xcd

Unlike the nested VMRUN path, hoisting the svm->tsc_scaling_enabled check
into the if-statement is wrong as KVM needs to ensure L1's multiplier is
loaded in the above scenario.   Alternatively, the WARN_ON() could simply
be deleted, but that would make KVM's behavior even more subtle, e.g. it's
not immediately obvious why it's safe to write MSR_AMD64_TSC_RATIO when
checking only tsc_ratio_msr.

Fixes: 5228eb96a4 ("KVM: x86: nSVM: implement nested TSC scaling")
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230729011608.1065019-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 17:16:27 -07:00
Sean Christopherson
7cafe9b8e2 KVM: nSVM: Check instead of asserting on nested TSC scaling support
Check for nested TSC scaling support on nested SVM VMRUN instead of
asserting that TSC scaling is exposed to L1 if L1's MSR_AMD64_TSC_RATIO
has diverged from KVM's default.  Userspace can trigger the WARN at will
by writing the MSR and then updating guest CPUID to hide the feature
(modifying guest CPUID is allowed anytime before KVM_RUN).  E.g. hacking
KVM's state_test selftest to do

		vcpu_set_msr(vcpu, MSR_AMD64_TSC_RATIO, 0);
		vcpu_clear_cpuid_feature(vcpu, X86_FEATURE_TSCRATEMSR);

after restoring state in a new VM+vCPU yields an endless supply of:

  ------------[ cut here ]------------
  WARNING: CPU: 164 PID: 62565 at arch/x86/kvm/svm/nested.c:699
           nested_vmcb02_prepare_control+0x3d6/0x3f0 [kvm_amd]
  Call Trace:
   <TASK>
   enter_svm_guest_mode+0x114/0x560 [kvm_amd]
   nested_svm_vmrun+0x260/0x330 [kvm_amd]
   vmrun_interception+0x29/0x30 [kvm_amd]
   svm_invoke_exit_handler+0x35/0x100 [kvm_amd]
   svm_handle_exit+0xe7/0x180 [kvm_amd]
   kvm_arch_vcpu_ioctl_run+0x1eab/0x2570 [kvm]
   kvm_vcpu_ioctl+0x4c9/0x5b0 [kvm]
   __se_sys_ioctl+0x7a/0xc0
   __x64_sys_ioctl+0x21/0x30
   do_syscall_64+0x41/0x90
   entry_SYSCALL_64_after_hwframe+0x63/0xcd
  RIP: 0033:0x45ca1b

Note, the nested #VMEXIT path has the same flaw, but needs a different
fix and will be handled separately.

Fixes: 5228eb96a4 ("KVM: x86: nSVM: implement nested TSC scaling")
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230729011608.1065019-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 17:16:27 -07:00
Sean Christopherson
2e6b9bd49b KVM: SVM: Use "standard" stgi() helper when disabling SVM
Now that kvm_rebooting is guaranteed to be true prior to disabling SVM
in an emergency, use the existing stgi() helper instead of open coding
STGI.  In effect, eat faults on STGI if and only if kvm_rebooting==true.

Link: https://lore.kernel.org/r/20230721201859.2307736-19-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 15:37:15 -07:00
Sean Christopherson
6ae44e012f KVM: x86: Force kvm_rebooting=true during emergency reboot/crash
Set kvm_rebooting when virtualization is disabled in an emergency so that
KVM eats faults on virtualization instructions even if kvm_reboot() isn't
reached.

Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20230721201859.2307736-18-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 15:37:15 -07:00
Sean Christopherson
76ab816108 x86/virt: KVM: Move "disable SVM" helper into KVM SVM
Move cpu_svm_disable() into KVM proper now that all hardware
virtualization management is routed through KVM.  Remove the now-empty
virtext.h.

No functional change intended.

Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20230721201859.2307736-17-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 15:37:15 -07:00
Sean Christopherson
c4db4f20f3 KVM: SVM: Check that the current CPU supports SVM in kvm_is_svm_supported()
Check "this" CPU instead of the boot CPU when querying SVM support so that
the per-CPU checks done during hardware enabling actually function as
intended, i.e. will detect issues where SVM isn't support on all CPUs.

Disable migration for the use from svm_init() mostly so that the standard
accessors for the per-CPU data can be used without getting yelled at by
CONFIG_DEBUG_PREEMPT=y sanity checks.  Preventing the "disabled by BIOS"
error message from reporting the wrong CPU is largely a bonus, as ensuring
a stable CPU during module load is a non-goal for KVM.

Link: https://lore.kernel.org/all/ZAdxNgv0M6P63odE@google.com
Cc: Kai Huang <kai.huang@intel.com>
Cc: Chao Gao <chao.gao@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20230721201859.2307736-15-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 15:37:15 -07:00
Sean Christopherson
85fd29dd5f x86/virt: KVM: Open code cpu_has_svm() into kvm_is_svm_supported()
Fold the guts of cpu_has_svm() into kvm_is_svm_supported(), its sole
remaining user.

No functional change intended.

Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20230721201859.2307736-14-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 15:37:15 -07:00
Sean Christopherson
baeb4de7ad x86/reboot: KVM: Disable SVM during reboot via virt/KVM reboot callback
Use the virt callback to disable SVM (and set GIF=1) during an emergency
instead of blindly attempting to disable SVM.  Like the VMX case, if a
hypervisor, i.e. KVM, isn't loaded/active, SVM can't be in use.

Acked-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20230721201859.2307736-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03 15:37:14 -07:00
Sean Christopherson
a85cd52d72 KVM: SVM: Use svm_get_lbr_vmcb() helper to handle writes to DEBUGCTL
Use the recently introduced svm_get_lbr_vmcb() instead an open coded
equivalent to retrieve the target VMCB when emulating writes to
MSR_IA32_DEBUGCTLMSR.

No functional change intended.

Link: https://lore.kernel.org/r/20230607203519.1570167-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02 16:41:55 -07:00
Sean Christopherson
41dfb5f13e KVM: SVM: Clean up handling of LBR virtualization enabled
Clean up the enable_lbrv computation in svm_update_lbrv() to consolidate
the logic for computing enable_lbrv into a single statement, and to remove
the coding style violations (lack of curly braces on nested if).

No functional change intended.

Link: https://lore.kernel.org/r/20230607203519.1570167-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02 16:41:54 -07:00
Sean Christopherson
d518f8cc10 KVM: SVM: Fix dead KVM_BUG() code in LBR MSR virtualization
Refactor KVM's handling of LBR MSRs on SVM to avoid a second layer of
case statements, and thus eliminate a dead KVM_BUG() call, which (a) will
never be hit in the current code base and (b) if a future commit breaks
things, will never fire as KVM passes "false" instead "true" or '1' for
the KVM_BUG() condition.

Reported-by: Michal Luczaj <mhal@rbox.co>
Cc: Yuan Yao <yuan.yao@intel.com>
Link: https://lore.kernel.org/r/20230607203519.1570167-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-02 16:41:54 -07:00
Sean Christopherson
26a0652cb4 KVM: x86: Disallow KVM_SET_SREGS{2} if incoming CR0 is invalid
Reject KVM_SET_SREGS{2} with -EINVAL if the incoming CR0 is invalid,
e.g. due to setting bits 63:32, illegal combinations, or to a value that
isn't allowed in VMX (non-)root mode.  The VMX checks in particular are
"fun" as failure to disallow Real Mode for an L2 that is configured with
unrestricted guest disabled, when KVM itself has unrestricted guest
enabled, will result in KVM forcing VM86 mode to virtual Real Mode for
L2, but then fail to unwind the related metadata when synthesizing a
nested VM-Exit back to L1 (which has unrestricted guest enabled).

Opportunistically fix a benign typo in the prototype for is_valid_cr4().

Cc: stable@vger.kernel.org
Reported-by: syzbot+5feef0b9ee9c8e9e5689@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000f316b705fdf6e2b4@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230613203037.1968489-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29 11:05:31 -04:00
Sean Christopherson
b439eb8ab5 Revert "KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"
Now that handle_fastpath_set_msr_irqoff() acquires kvm->srcu, i.e. allows
dereferencing memslots during WRMSR emulation, drop the requirement that
"next RIP" is valid.  In hindsight, acquiring kvm->srcu would have been a
better fix than avoiding the pastpath, but at the time it was thought that
accessing SRCU-protected data in the fastpath was a one-off edge case.

This reverts commit 5c30e8101e.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230721224337.2335137-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29 11:05:27 -04:00
Sean Christopherson
a6bb570902 KVM: SVM: Don't try to pointlessly single-step SEV-ES guests for NMI window
Bail early from svm_enable_nmi_window() for SEV-ES guests without trying
to enable single-step of the guest, as single-stepping an SEV-ES guest is
impossible and the guest is responsible for *telling* KVM when it is ready
for an new NMI to be injected.

Functionally, setting TF and RF in svm->vmcb->save.rflags is benign as the
field is ignored by hardware, but it's all kinds of confusing.

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-10-aik@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28 16:13:43 -07:00
Sean Christopherson
389fbbec26 KVM: SVM: Don't defer NMI unblocking until next exit for SEV-ES guests
Immediately mark NMIs as unmasked in response to #VMGEXIT(NMI complete)
instead of setting awaiting_iret_completion and waiting until the *next*
VM-Exit to unmask NMIs.  The whole point of "NMI complete" is that the
guest is responsible for telling the hypervisor when it's safe to inject
an NMI, i.e. there's no need to wait.  And because there's no IRET to
single-step, the next VM-Exit could be a long time coming, i.e. KVM could
incorrectly hold an NMI pending for far longer than what is required and
expected.

Opportunistically fix a stale reference to HF_IRET_MASK.

Fixes: 916b54a768 ("KVM: x86: Move HF_NMI_MASK and HF_IRET_MASK into "struct vcpu_svm"")
Fixes: 4444dfe405 ("KVM: SVM: Add NMI support for an SEV-ES guest")
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-9-aik@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28 16:13:18 -07:00
Alexey Kardashevskiy
90cbf6d914 KVM: SEV-ES: Eliminate #DB intercept when DebugSwap enabled
Disable #DB for SEV-ES guests when DebugSwap is enabled. There is no point
in such intercept as KVM does not allow guest debug for SEV-ES guests.

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-8-aik@amd.com
[sean: add comment as to why KVM disables #DB intercept iff DebugSwap=1]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28 16:13:13 -07:00
Alexey Kardashevskiy
d1f85fbe83 KVM: SEV: Enable data breakpoints in SEV-ES
Add support for "DebugSwap for SEV-ES guests", which provides support
for swapping DR[0-3] and DR[0-3]_ADDR_MASK on VMRUN and VMEXIT, i.e.
allows KVM to expose debug capabilities to SEV-ES guests. Without
DebugSwap support, the CPU doesn't save/load most _guest_ debug
registers (except DR6/7), and KVM cannot manually context switch guest
DRs due the VMSA being encrypted.

Enable DebugSwap if and only if the CPU also supports NoNestedDataBp,
which causes the CPU to ignore nested #DBs, i.e. #DBs that occur when
vectoring a #DB.  Without NoNestedDataBp, a malicious guest can DoS
the host by putting the CPU into an infinite loop of vectoring #DBs
(see https://bugzilla.redhat.com/show_bug.cgi?id=1278496)

Set the features bit in sev_es_sync_vmsa() which is the last point
when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
init happens) is called not only when VCPU is initialised but also on
intrahost migration when VMSA is encrypted.

Eliminate DR7 intercepts as KVM can't modify guest DR7, and intercepting
DR7 would completely defeat the purpose of enabling DebugSwap.

Make X86_FEATURE_DEBUG_SWAP appear in /proc/cpuinfo (by not adding "") to
let the operator know if the VM can debug.

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-7-aik@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28 16:12:56 -07:00
Alexey Kardashevskiy
c2690b5f01 KVM: SVM/SEV/SEV-ES: Rework intercepts
Currently SVM setup is done sequentially in
init_vmcb() -> sev_init_vmcb() -> sev_es_init_vmcb()
and tries keeping SVM/SEV/SEV-ES bits separated. One of the exceptions
is DR intercepts which is for SEV-ES before sev_es_init_vmcb() runs.

Move the SEV-ES intercept setup to sev_es_init_vmcb(). From now on
set_dr_intercepts()/clr_dr_intercepts() handle SVM/SEV only.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Santosh Shukla <santosh.shukla@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-6-aik@amd.com
[sean: drop comment about intercepting DR7]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28 16:12:06 -07:00
Alexey Kardashevskiy
2837dd00f8 KVM: SEV-ES: explicitly disable debug
SVM/SEV enable debug registers intercepts to skip swapping DRs
on entering/exiting the guest. When the guest is in control of
debug registers (vcpu->guest_debug == 0), there is an optimisation to
reduce the number of context switches: intercepts are cleared and
the KVM_DEBUGREG_WONT_EXIT flag is set to tell KVM to do swapping
on guest enter/exit.

The same code also executes for SEV-ES, however it has no effect as
- it always takes (vcpu->guest_debug == 0) branch;
- KVM_DEBUGREG_WONT_EXIT is set but DR7 intercept is not cleared;
- vcpu_enter_guest() writes DRs but VMRUN for SEV-ES swaps them
with the values from _encrypted_ VMSA.

Be explicit about SEV-ES not supporting debug:
- return right away from dr_interception() and skip unnecessary processing;
- return an error right away from the KVM_SEV_LAUNCH_UPDATE_VMSA handler
if debugging was already enabled.
KVM_SET_GUEST_DEBUG are failing already after KVM_SEV_LAUNCH_UPDATE_VMSA
is finished due to vcpu->arch.guest_state_protected set to true.

Add WARN_ON to kvm_x86::sync_dirty_debug_regs() (saves guest DRs on
guest exit) to signify that SEV-ES won't hit that path.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-5-aik@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28 15:58:53 -07:00
Sean Christopherson
f8d808ed1b KVM: SVM: Rewrite sev_es_prepare_switch_to_guest()'s comment about swap types
Rewrite the comment(s) in sev_es_prepare_switch_to_guest() to explain the
swap types employed by the CPU for SEV-ES guests, i.e. to explain why KVM
needs to save a seemingly random subset of host state, and to provide a
decoder for the APM's Type-A/B/C terminology.

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-4-aik@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28 15:58:53 -07:00
Alexey Kardashevskiy
29de732cc9 KVM: SEV: Move SEV's GP_VECTOR intercept setup to SEV
Currently SVM setup is done sequentially in
init_vmcb() -> sev_init_vmcb() -> sev_es_init_vmcb() and tries
keeping SVM/SEV/SEV-ES bits separated. One of the exceptions
is #GP intercept which init_vmcb() skips setting for SEV guests and
then sev_es_init_vmcb() needlessly clears it.

Remove the SEV check from init_vmcb(). Clear the #GP intercept in
sev_init_vmcb(). SEV-ES will use the SEV setting.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Carlos Bilbao <carlos.bilbao@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Santosh Shukla <santosh.shukla@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-3-aik@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28 15:58:53 -07:00
Alexey Kardashevskiy
b265ee7bae KVM: SEV: move set_dr_intercepts/clr_dr_intercepts from the header
Static functions set_dr_intercepts() and clr_dr_intercepts() are only
called from SVM so move them to .c.

No functional change intended.

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Carlos Bilbao <carlos.bilbao@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Santosh Shukla <santosh.shukla@amd.com>
Link: https://lore.kernel.org/r/20230615063757.3039121-2-aik@amd.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-28 15:58:53 -07:00
Borislav Petkov (AMD)
d893832d0e x86/srso: Add IBPB on VMEXIT
Add the option to flush IBPB only on VMEXIT in order to protect from
malicious guests but one otherwise trusts the software that runs on the
hypervisor.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-07-27 11:07:19 +02:00
Paolo Bonzini
24975ce8b2 KVM SVM changes for 6.5:
- Drop manual TR/TSS load after VM-Exit now that KVM uses VMLOAD for host state
 
  - Fix a not-yet-problematic missing call to trace_kvm_exit() for VM-Exits that
    are handled in the fastpath
 
  - Print more descriptive information about the status of SEV and SEV-ES during
    module load
 
  - Assert that misc_cg_set_capacity() doesn't fail to avoid should-be-impossible
    memory leaks
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmSaK5ESHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5ApAQAJSlzFh6Cg6OzlKqsrXDTTrNuP7Yu5Pe
 tCQm9uppab++TyBz00GCoaUjqXJY1j1riStbl0j1yGJ69Ocjrqj58IGGeoj4NKgt
 Dsiwgs0IWCshe7noVcYQeC4FInrNiFOog7Zog7uDyJmtHprZHorcJ9rmsBXMmedw
 OSrzoxyhVwbtbPmgMfEP+xw4wccXVioci4EOySqI0GI9QrQ+cfdafs8irxxeLG7v
 IY1qG3fwNmGp2uHdb3lG48TUbggWzKG5o1RC+fwwN132Y2fepxjcAeZ25gNms3lz
 Q1fm7vPNkGRqelqg7x+z9B10D6uJc0hngZPe6Hs8C7y1+hvTjXwmx81WXsQxM7RM
 rhhbp1o1C0xKSLzFciaZyW4lQW4cw5wxGRNoIenpHUe48bK9wjTYxez2MiQwfbNJ
 Dt9RAaBVF/UdNBZu2wtA3czgHwOHKSqUOwO2N2iBW62KgRzITQe9r9VtVikslbQD
 /nAq7PJOGz8JuJXkDWI0nLYEW6pInzsiXB21CPQrYR8XOQnnWglzmMTL/KxPeVYg
 pBHJUf6U7AdhjHMkPp2Yc1eQTNspDzRfZBGFZz1YS103JpmUIs97W0phHru/ONKh
 1cBv2N9ZrOJhuL1LAxaLp9OSvR+UQP/mMdCAEjvUTpEbmqbtEAqWMkTTwIEdIi74
 PcaJfv7GsYJV
 =rcFn
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-svm-6.5' of https://github.com/kvm-x86/linux into HEAD

KVM SVM changes for 6.5:

 - Drop manual TR/TSS load after VM-Exit now that KVM uses VMLOAD for host state

 - Fix a not-yet-problematic missing call to trace_kvm_exit() for VM-Exits that
   are handled in the fastpath

 - Print more descriptive information about the status of SEV and SEV-ES during
   module load

 - Assert that misc_cg_set_capacity() doesn't fail to avoid should-be-impossible
   memory leaks
2023-07-01 07:19:42 -04:00
Paolo Bonzini
751d77fefa KVM x86/pmu changes for 6.5:
- Add support for AMD PerfMonV2, with a variety of cleanups and minor fixes
    included along the way
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmSaHFgSHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5twMP/15ZJFqZVigVQoATJeeR9tWUuyJe95xM
 lyfnTel91Sg8XOamdwBGi7jLpaDgj34Jm0cfM7/4LbJk2/taeaCLYmJd5w9FXvaw
 EkytQGO85hVNe2XuY+h+XxSIxpflKxgFuUnOwcDk2QbKgASzNSG/mJ9ZBx8PNVXD
 FnyOqpbbYDFspWWvUOAI/RkHnr/dALjXJsSUMvuh3nz5e1NTyubjCAZg+/bse2nR
 s8FrcSh4B0Lg0h4r2fdJ4sAiM/qWhcCIhq5svyTAcUG0T4rMS40LrosJOw3wkBRM
 dyZYXy6GEENeCFJPhenF1mTE1embFyZp89PV/FCNRZXODbnM4kheJFT9gucAjlKi
 ZafRcutrkYIVf4lZCMofDfQGLX/GCEJnwUPKyGygIsPoDRrdR7OLrFycON5bxocr
 9NBNG+2teQFbnt5irB/bBGojtIZtu3OEylkuRjQUQ3lJYQ5r6LddarI9acIu1SHt
 4rRfh8QN5qmMvVblaQzggOr6BPtmPr8QqMEMFncaUMCsV/82hRAEfvj2rifGFJNo
 Axz1ajMfirxyM45WzredUkzzsbphiiegPBELCLRZfHmaEhJ8P7t7wvri0bXt9YdI
 vjSfX+6ulOgDC+xAazE0gEJO4Uh5+g3Y+1e0fr43ltWzUOWdCQskzD3LE9DkqIXj
 KAaCuHYbYpIZ
 =MwqV
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-pmu-6.5' of https://github.com/kvm-x86/linux into HEAD

KVM x86/pmu changes for 6.5:

 - Add support for AMD PerfMonV2, with a variety of cleanups and minor fixes
   included along the way
2023-07-01 07:18:51 -04:00
Paolo Bonzini
36b68d360a KVM x86 changes for 6.5:
- Move handling of PAT out of MTRR code and dedup SVM+VMX code
 
  - Fix output of PIC poll command emulation when there's an interrupt
 
  - Fix a longstanding bug in the reporting of the number of entries returned by
    KVM_GET_CPUID2
 
  - Add a maintainer's handbook to document KVM x86 processes, preferred coding
    style, testing expectations, etc.
 
  - Misc cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmSaGMMSHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5iDIP/0PwY3J5odTEUTnAyuDFPimd5PBt9k/O
 B414wdpSKVgzq+0An4qM9mKRnklVIh2p8QqQTvDhcBUg3xb6CX9xZ4ery7hp/T5O
 tr5bAXs2AYX6jpxvsopt+w+E9j6fvkJhcJCRU9im3QbrqwUE+ecyU5OHvmv2n/GO
 syVZJbPOYuoLPKDjlSMrScE6fWEl9UOvHc5BK/vafTeyisMG3vv1BSmJj6GuiNNk
 TS1RRIg//cOZghQyDfdXt0azTmakNZyNn35xnoX9x8SRmdRykyUjQeHmeqWxPDso
 kiGO+CGancfS57S6ZtCkJjqEWZ1o/zKdOxr8MMf/3nJhv4kY7/5XtlVoACv5soW9
 bZEmNiXIaSbvKNMwAlLJxHFbLa1sMdSCb345CIuMdt5QiWJ53ZiTyIAJX6+eL+Zf
 8nkeekgPf5VUs6Zt0RdRPyvo+W7Vp9BtI87yDXm1nQKpbys2pt6CD3YB/oF4QViG
 a5cyGoFuqRQbS3nmbshIlR7EanTuxbhLZKrNrFnolZ5e624h3Cnk2hVsfTznVGiX
 vNHWM80phk1CWB9McErrZVkGfjlyVyBL13CBB2XF7Dl6PfF6/N22a9bOuTJD3tvk
 PlNx4hvZm3esvvyGpjfbSajTKYE8O7rxiE1KrF0BpZ5IUl5WSiTr6XCy/yI/mIeM
 hay2IWhPOF2z
 =D0BH
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-misc-6.5' of https://github.com/kvm-x86/linux into HEAD

KVM x86 changes for 6.5:

* Move handling of PAT out of MTRR code and dedup SVM+VMX code

* Fix output of PIC poll command emulation when there's an interrupt

* Add a maintainer's handbook to document KVM x86 processes, preferred coding
  style, testing expectations, etc.

* Misc cleanups
2023-07-01 07:08:59 -04:00