linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
Paolo Bonzini	9352e7470a	Merge remote-tracking branch 'kvm/queue' into HEAD x86 Xen-for-KVM: * Allow the Xen runstate information to cross a page boundary * Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured * add support for 32-bit guests in SCHEDOP_poll x86 fixes: * One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0). * Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few years back when eliminating unnecessary barriers when switching between vmcs01 and vmcs02. * Clean up the MSR filter docs. * Clean up vmread_error_trampoline() to make it more obvious that params must be passed on the stack, even for x86-64. * Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective of the current guest CPUID. * Fudge around a race with TSC refinement that results in KVM incorrectly thinking a guest needs TSC scaling when running on a CPU with a constant TSC, but no hardware-enumerated TSC frequency. * Advertise (on AMD) that the SMM_CTL MSR is not supported * Remove unnecessary exports Selftests: * Fix an inverted check in the access tracking perf test, and restore support for asserting that there aren't too many idle pages when running on bare metal. * Fix an ordering issue in the AMX test introduced by recent conversions to use kvm_cpu_has(), and harden the code to guard against similar bugs in the future. Anything that tiggers caching of KVM's supported CPUID, kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if the caching occurs before the test opts in via prctl(). * Fix build errors that occur in certain setups (unsure exactly what is unique about the problematic setup) due to glibc overriding static_assert() to a variant that requires a custom message. * Introduce actual atomics for clear/set_bit() in selftests Documentation: * Remove deleted ioctls from documentation * Various fixes	2022-12-12 15:54:07 -05:00
Paolo Bonzini	eb5618911a	KVM/arm64 updates for 6.2 - Enable the per-vcpu dirty-ring tracking mechanism, together with an option to keep the good old dirty log around for pages that are dirtied by something other than a vcpu. - Switch to the relaxed parallel fault handling, using RCU to delay page table reclaim and giving better performance under load. - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option, which multi-process VMMs such as crosvm rely on. - Merge the pKVM shadow vcpu state tracking that allows the hypervisor to have its own view of a vcpu, keeping that state private. - Add support for the PMUv3p5 architecture revision, bringing support for 64bit counters on systems that support it, and fix the no-quite-compliant CHAIN-ed counter support for the machines that actually exist out there. - Fix a handful of minor issues around 52bit VA/PA support (64kB pages only) as a prefix of the oncoming support for 4kB and 16kB pages. - Add/Enable/Fix a bunch of selftests covering memslots, breakpoints, stage-2 faults and access tracking. You name it, we got it, we probably broke it. - Pick a small set of documentation and spelling fixes, because no good merge window would be complete without those. As a side effect, this tag also drags: - The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring series - A shared branch with the arm64 tree that repaints all the system registers to match the ARM ARM's naming, and resulting in interesting conflicts -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmOODb0PHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpDztsQAInRnsgLl57/SpqhZzExNCllN6AT/bdeB3uz rnw3ScJOV174uNKp8lnPWoTvu2YUGiVtBp6tFHhDI8le7zHX438ZT8KE5mcs8p5i KfFKnb8SHV2DDpqkcy24c0Xl/6vsg1qkKrdfJb49yl5ZakRITDpynW/7tn6dXsxX wASeGFdCYeW4g2xMQzsCbtx6LgeQ8uomBmzRfPrOtZHYYxAn6+4Mj4595EC1sWxM AQnbp8tW3Vw46saEZAQvUEOGOW9q0Nls7G21YqQ52IA+ZVDK1LmAF2b1XY3edjkk pX8EsXOURfqdasBxfSfF3SgnUazoz9GHpSzp1cTVTktrPp40rrT7Ldtml0ktq69d 1malPj47KVMDsIq0kNJGnMxciXFgAHw+VaCQX+k4zhIatNwviMbSop2fEoxj22jc 4YGgGOxaGrnvmAJhreCIbr4CkZk5CJ8Zvmtfg+QM6npIp8BY8896nvORx/d4i6tT H4caadd8AAR56ANUyd3+KqF3x0WrkaU0PLHJLy1tKwOXJUUTjcpvIfahBAAeUlSR qEFrtb+EEMPgAwLfNOICcNkPZR/yyuYvM+FiUQNVy5cNiwFkpztpIctfOFaHySGF K07O2/a1F6xKL0OKRUg7hGKknF9ecmux4vHhiUMuIk9VOgNTWobHozBDorLKXMzC aWa6oGVC =iIPT -----END PGP SIGNATURE----- Merge tag 'kvmarm-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.2 - Enable the per-vcpu dirty-ring tracking mechanism, together with an option to keep the good old dirty log around for pages that are dirtied by something other than a vcpu. - Switch to the relaxed parallel fault handling, using RCU to delay page table reclaim and giving better performance under load. - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option, which multi-process VMMs such as crosvm rely on. - Merge the pKVM shadow vcpu state tracking that allows the hypervisor to have its own view of a vcpu, keeping that state private. - Add support for the PMUv3p5 architecture revision, bringing support for 64bit counters on systems that support it, and fix the no-quite-compliant CHAIN-ed counter support for the machines that actually exist out there. - Fix a handful of minor issues around 52bit VA/PA support (64kB pages only) as a prefix of the oncoming support for 4kB and 16kB pages. - Add/Enable/Fix a bunch of selftests covering memslots, breakpoints, stage-2 faults and access tracking. You name it, we got it, we probably broke it. - Pick a small set of documentation and spelling fixes, because no good merge window would be complete without those. As a side effect, this tag also drags: - The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring series - A shared branch with the arm64 tree that repaints all the system registers to match the ARM ARM's naming, and resulting in interesting conflicts	2022-12-09 09:12:12 +01:00
Sean Christopherson	8fcee04213	KVM: selftests: Restore assert for non-nested VMs in access tracking test Restore the assert (on x86-64) that <10% of pages are still idle when NOT running as a nested VM in the access tracking test. The original assert was converted to a "warning" to avoid false failures when running the test in a VM, but the non-nested case does not suffer from the same "infinite TLB size" issue. Using the HYPERVISOR flag isn't infallible as VMMs aren't strictly required to enumerate the "feature" in CPUID, but practically speaking anyone that is running KVM selftests in VMs is going to be using a VMM and hypervisor that sets the HYPERVISOR flag. Cc: David Matlack <dmatlack@google.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221129175300.4052283-3-seanjc@google.com	2022-12-01 15:31:39 -08:00
Sean Christopherson	a33004e844	KVM: selftests: Fix inverted "warning" in access tracking perf test Warn if the number of idle pages is greater than or equal to 10% of the total number of pages, not if the percentage of idle pages is less than 10%. The original code asserted that less than 10% of pages were still idle, but the check got inverted when the assert was converted to a warning. Opportunistically clean up the warning; selftests are 64-bit only, there is no need to use "%PRIu64" instead of "%lu". Fixes: `6336a810db` ("KVM: selftests: replace assertion with warning in access_tracking_perf_test") Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221129175300.4052283-2-seanjc@google.com	2022-12-01 15:31:32 -08:00
Oliver Upton	9ec1eb1bcc	KVM: selftests: Have perf_test_util signal when to stop vCPUs Signal that a test run is complete through perf_test_args instead of having tests open code a similar solution. Ensure that the field resets to false at the beginning of a test run as the structure is reused between test runs, eliminating a couple of bugs: access_tracking_perf_test hangs indefinitely on a subsequent test run, as 'done' remains true. The bug doesn't amount to much right now, as x86 supports a single guest mode. However, this is a precondition of enabling the test for other architectures with >1 guest mode, like arm64. memslot_modification_stress_test has the exact opposite problem, where subsequent test runs complete immediately as 'run_vcpus' remains false. Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> [oliver: added commit message, preserve spin_wait_for_next_iteration()] Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221118211503.4049023-2-oliver.upton@linux.dev	2022-11-29 17:29:42 +00:00
David Matlack	7812d80c0f	KVM: selftests: Rename perf_test_util symbols to memstress Replace the perf_test_ prefix on symbol names with memstress_ to match the new file name. "memstress" better describes the functionality proveded by this library, which is to provide functionality for creating and running a VM that stresses VM memory by reading and writing to guest memory on all vCPUs in parallel. "memstress" also contains the same number of chracters as "perf_test", making it a drop-in replacement in symbols, e.g. function names, without impacting line lengths. Also the lack of underscore between "mem" and "stress" makes it clear "memstress" is a noun. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221012165729.3505266-4-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-11-16 10:58:32 -08:00
David Matlack	9fda6753c9	KVM: selftests: Rename perf_test_util.[ch] to memstress.[ch] Rename the perf_test_util.[ch] files to memstress.[ch]. Symbols are renamed in the following commit to reduce the amount of churn here in hopes of playiing nice with git's file rename detection. The name "memstress" was chosen to better describe the functionality proveded by this library, which is to create and run a VM that reads/writes to guest memory on all vCPUs in parallel. "memstress" also contains the same number of chracters as "perf_test", making it a drop-in replacement in symbols, e.g. function names, without impacting line lengths. Also the lack of underscore between "mem" and "stress" makes it clear "memstress" is a noun. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221012165729.3505266-2-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-11-16 10:57:34 -08:00
Colton Lewis	6864c6442f	KVM: selftests: randomize which pages are written vs read Randomize which pages are written vs read using the random number generator. Change the variable wr_fract and associated function calls to write_percent that now operates as a percentage from 0 to 100 where X means each page has an X% chance of being written. Change the -f argument to -w to reflect the new variable semantics. Keep the same default of 100% writes. Population always uses 100% writes to ensure all memory is actually populated and not just mapped to the zero page. The prevents expensive copy-on-write faults from occurring during the dirty memory iterations below, which would pollute the performance results. Each vCPU calculates its own random seed by adding its index to the seed provided. Signed-off-by: Colton Lewis <coltonlewis@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20221107182208.479157-4-coltonlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-11-16 10:57:19 -08:00
Vipin Sharma	0001725d0f	KVM: selftests: Add atoi_positive() and atoi_non_negative() for input validation Many KVM selftests take command line arguments which are supposed to be positive (>0) or non-negative (>=0). Some tests do these validation and some missed adding the check. Add atoi_positive() and atoi_non_negative() to validate inputs in selftests before proceeding to use those values. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221103191719.1559407-7-vipinsh@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-11-16 10:03:24 -08:00
Vipin Sharma	018ea2d71a	KVM: selftests: Add atoi_paranoid() to catch errors missed by atoi() atoi() doesn't detect errors. There is no way to know that a 0 return is correct conversion or due to an error. Introduce atoi_paranoid() to detect errors and provide correct conversion. Replace all atoi() calls with atoi_paranoid(). Signed-off-by: Vipin Sharma <vipinsh@google.com> Suggested-by: David Matlack <dmatlack@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221103191719.1559407-4-vipinsh@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2022-11-16 10:03:24 -08:00
Emanuele Giuseppe Esposito	6336a810db	KVM: selftests: replace assertion with warning in access_tracking_perf_test Page_idle uses {ptep/pmdp}_clear_young_notify which in turn calls the mmu notifier callback ->clear_young(), which purposefully does not flush the TLB. When running the test in a nested guest, point 1. of the test doc header is violated, because KVM TLB is unbounded by size and since no flush is forced, KVM does not update the sptes accessed/idle bits resulting in guest assertion failure. More precisely, only the first ACCESS_WRITE in run_test() actually makes visible changes, because sptes are created and the accessed bit is set to 1 (or idle bit is 0). Then the first mark_memory_idle() passes since access bit is still one, and sets all pages as idle (or not accessed). When the next write is performed, the update is not flushed therefore idle is still 1 and next mark_memory_idle() fails. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220926082923.299554-1-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-30 06:38:00 -04:00
Sean Christopherson	7ed397d107	KVM: selftests: Add TEST_REQUIRE macros to reduce skipping copy+paste Add TEST_REQUIRE() and __TEST_REQUIRE() to replace the myriad open coded instances of selftests exiting with KSFT_SKIP after printing an informational message. In addition to reducing the amount of boilerplate code in selftests, the UPPERCASE macro names make it easier to visually identify a test's requirements. Convert usage that erroneously uses something other than print_skip() and/or "exits" with '0' or some other non-KSFT_SKIP value. Intentionally drop a kvm_vm_free() in aarch64/debug-exceptions.c as part of the conversion. All memory and file descriptors are freed on process exit, so the explicit free is superfluous. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-11 11:47:29 -04:00
Sean Christopherson	768e9a6185	KVM: selftests: Purge vm+vcpu_id == vcpu silliness Take a vCPU directly instead of a VM+vcpu pair in all vCPU-scoped helpers and ioctls. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-11 11:47:22 -04:00
Sean Christopherson	df84cef531	KVM: selftests: Stop conflating vCPU index and ID in perf tests Track vCPUs by their 'struct kvm_vcpu' object, and stop assuming that a vCPU's ID is the same as its index when referencing a vCPU's metadata. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-06-11 11:47:20 -04:00
David Matlack	81bcb26172	KVM: selftests: Move vCPU thread creation and joining to common helpers Move vCPU thread creation and joining to common helper functions. This is in preparation for the next commit which ensures that all vCPU threads are fully created before entering guest mode on any one vCPU. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Message-Id: <20211111001257.1446428-3-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-11-16 07:43:28 -05:00
David Matlack	36c5ad73d7	KVM: selftests: Start at iteration 0 instead of -1 Start at iteration 0 instead of -1 to avoid having to initialize vcpu_last_completed_iteration when setting up vCPU threads. This simplifies the next commit where we move vCPU thread initialization out to a common helper. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20211111001257.1446428-2-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-11-16 07:43:27 -05:00
Sean Christopherson	13bbc70329	KVM: selftests: Sync perf_test_args to guest during VM creation Copy perf_test_args to the guest during VM creation instead of relying on the caller to do so at their leisure. Ideally, tests wouldn't even be able to modify perf_test_args, i.e. they would have no motivation to do the sync, but enforcing that is arguably a net negative for readability. No functional change intended. [Set wr_fract=1 by default and add helper to override it since the new access_tracking_perf_test needs to set it dynamically.] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Message-Id: <20211111000310.1435032-13-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-11-16 07:43:27 -05:00
Sean Christopherson	cf1d59300a	KVM: selftests: Fill per-vCPU struct during "perf_test" VM creation Fill the per-vCPU args when creating the perf_test VM instead of having the caller do so. This helps ensure that any adjustments to the number of pages (and thus vcpu_memory_bytes) are reflected in the per-VM args. Automatically filling the per-vCPU args will also allow a future patch to do the sync to the guest during creation. Signed-off-by: Sean Christopherson <seanjc@google.com> [Updated access_tracking_perf_test as well.] Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Message-Id: <20211111000310.1435032-12-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-11-16 07:43:27 -05:00
David Matlack	9f2fc5554a	KVM: selftests: Refactor help message for -s backing_src All selftests that support the backing_src option were printing their own description of the flag and then calling backing_src_help() to dump the list of available backing sources. Consolidate the flag printing in backing_src_help() to align indentation, reduce duplicated strings, and improve consistency across tests. Note: Passing "-s" to backing_src_help is unnecessary since every test uses the same flag. However I decided to keep it for code readability at the call sites. While here this opportunistically fixes the incorrectly interleaved printing -x help message and list of backing source types in dirty_log_perf_test. Fixes: `609e6202ea` ("KVM: selftests: Support multiple slots in dirty_log_perf_test") Reviewed-by: Ben Gardon <bgardon@google.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210917173657.44011-3-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-09-22 10:33:14 -04:00
David Matlack	32bdc01988	KVM: selftests: Move vcpu_args_set into perf_test_util perf_test_util is used to set up KVM selftests where vCPUs touch a region of memory. The guest code is implemented in perf_test_util.c (not the calling selftests). The guest code requires a 1 parameter, the vcpuid, which has to be set by calling vcpu_args_set(vm, vcpu_id, 1, vcpu_id). Today all of the selftests that use perf_test_util are making this call. Instead, perf_test_util should just do it. This will save some code but more importantly prevents mistakes since totally non-obvious that this needs to be called and failing to do so results in vCPUs not accessing the right regions of memory. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210805172821.2622793-1-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-06 10:01:40 -04:00
David Matlack	609e6202ea	KVM: selftests: Support multiple slots in dirty_log_perf_test Introduce a new option to dirty_log_perf_test: -x number_of_slots. This causes the test to attempt to split the region of memory into the given number of slots. If the region cannot be evenly divided, the test will fail. This allows testing with more than one slot and therefore measure how performance scales with the number of memslots. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210804222844.1419481-8-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-06 07:53:01 -04:00
David Matlack	c33e05d9b0	KVM: selftests: Introduce access_tracking_perf_test This test measures the performance effects of KVM's access tracking. Access tracking is driven by the MMU notifiers test_young, clear_young, and clear_flush_young. These notifiers do not have a direct userspace API, however the clear_young notifier can be triggered by marking a pages as idle in /sys/kernel/mm/page_idle/bitmap. This test leverages that mechanism to enable access tracking on guest memory. To measure performance this test runs a VM with a configurable number of vCPUs that each touch every page in disjoint regions of memory. Performance is measured in the time it takes all vCPUs to finish touching their predefined region. Example invocation: $ ./access_tracking_perf_test -v 8 Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages guest physical test memory offset: 0xffdfffff000 Populating memory : 1.337752570s Writing to populated memory : 0.010177640s Reading from populated memory : 0.009548239s Mark memory idle : 23.973131748s Writing to idle memory : 0.063584496s Mark memory idle : 24.924652964s Reading from idle memory : 0.062042814s Breaking down the results: * "Populating memory": The time it takes for all vCPUs to perform the first write to every page in their region. * "Writing to populated memory" / "Reading from populated memory": The time it takes for all vCPUs to write and read to every page in their region after it has been populated. This serves as a control for the later results. * "Mark memory idle": The time it takes for every vCPU to mark every page in their region as idle through page_idle. * "Writing to idle memory" / "Reading from idle memory": The time it takes for all vCPUs to write and read to every page in their region after it has been marked idle. This test should be portable across architectures but it is only enabled for x86_64 since that's all I have tested. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210713220957.3493520-7-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-07-27 16:59:00 -04:00

22 commits