linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
Paolo Bonzini	b502e6ecdc	KVM: VMX: update PFEC_MASK/PFEC_MATCH together with PF intercept The PFEC_MASK and PFEC_MATCH fields in the VMCS reverse the meaning of the #PF intercept bit in the exception bitmap when they do not match. This means that, if PFEC_MASK and/or PFEC_MATCH are set, the hypervisor can get a vmexit for #PF exceptions even when the corresponding bit is clear in the exception bitmap. This is unexpected and is promptly detected by a WARN_ON_ONCE. To fix it, reset PFEC_MASK and PFEC_MATCH when the #PF intercept is disabled (as is common with enable_ept && !allow_smaller_maxphyaddr). Reported-by: Qian Cai <cai@redhat.com>> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-03 05:07:40 -04:00
Christoph Hellwig	c3973b401e	mm: remove compat_process_vm_{readv,writev} Now that import_iovec handles compat iovecs, the native syscalls can be used for the compat case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-10-03 00:02:15 -04:00
Christoph Hellwig	598b3cec83	fs: remove compat_sys_vmsplice Now that import_iovec handles compat iovecs, the native vmsplice syscall can be used for the compat case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-10-03 00:02:15 -04:00
Christoph Hellwig	5f764d624a	fs: remove the compat readv/writev syscalls Now that import_iovec handles compat iovecs, the native readv and writev syscalls can be used for the compat case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-10-03 00:02:14 -04:00
Jonathan Cameron	73bf7382de	x86: Support Generic Initiator only proximity domains In common with memoryless domains only register GI domains if the proximity node is not online. If a domain is already a memory containing domain, or a memoryless domain there is nothing to do just because it also contains a Generic Initiator. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-10-02 18:51:57 +02:00
Will Deacon	baab853229	Merge branch 'for-next/mte' into for-next/core Add userspace support for the Memory Tagging Extension introduced by Armv8.5. (Catalin Marinas and others) * for-next/mte: (30 commits) arm64: mte: Fix typo in memory tagging ABI documentation arm64: mte: Add Memory Tagging Extension documentation arm64: mte: Kconfig entry arm64: mte: Save tags when hibernating arm64: mte: Enable swap of tagged pages mm: Add arch hooks for saving/restoring tags fs: Handle intra-page faults in copy_mount_options() arm64: mte: ptrace: Add NT_ARM_TAGGED_ADDR_CTRL regset arm64: mte: ptrace: Add PTRACE_{PEEK,POKE}MTETAGS support arm64: mte: Allow {set,get}_tagged_addr_ctrl() on non-current tasks arm64: mte: Restore the GCR_EL1 register after a suspend arm64: mte: Allow user control of the generated random tags via prctl() arm64: mte: Allow user control of the tag check mode via prctl() mm: Allow arm64 mmap(PROT_MTE) on RAM-based files arm64: mte: Validate the PROT_MTE request via arch_validate_flags() mm: Introduce arch_validate_flags() arm64: mte: Add PROT_MTE support to mmap() and mprotect() mm: Introduce arch_calc_vm_flag_bits() arm64: mte: Tags-aware aware memcmp_pages() implementation arm64: Avoid unnecessary clear_user_page() indirection ...	2020-10-02 12:16:11 +01:00
Mark Mossberg	238c91115c	x86/dumpstack: Fix misleading instruction pointer error message Printing "Bad RIP value" if copy_code() fails can be misleading for userspace pointers, since copy_code() can fail if the instruction pointer is valid but the code is paged out. This is because copy_code() calls copy_from_user_nmi() for userspace pointers, which disables page fault handling. This is reproducible in OOM situations, where it's plausible that the code may be reclaimed in the time between entry into the kernel and when this message is printed. This leaves a misleading log in dmesg that suggests instruction pointer corruption has occurred, which may alarm users. Change the message to state the error condition more precisely. [ bp: Massage a bit. ] Signed-off-by: Mark Mossberg <mark.mossberg@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201002042915.403558-1-mark.mossberg@gmail.com	2020-10-02 11:33:55 +02:00
Herbert Xu	4a0c1de64b	crypto: x86/poly1305 - Remove assignments with no effect This patch removes a few ineffectual assignments from the function crypto_poly1305_setdctxkey. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2020-10-02 18:02:13 +10:00
Gustavo A. R. Silva	a0947081af	x86/uv/time: Use a flexible array in struct uv_rtc_timer_head There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. struct uv_rtc_timer_head contains a one-element array cpu[1]. Switch it to a flexible array and use the struct_size() helper to calculate the allocation size. Also, save some heap space in the process[3]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.9-rc1/process/deprecated.html#zero-length-and-one-element-arrays [3] https://lore.kernel.org/lkml/20200518190114.GA7757@embeddedor/ [ bp: Massage a bit. ] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Steve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/20201001145608.GA10204@embeddedor	2020-10-01 18:47:39 +02:00
Libing Zhou	f94c91f7ba	x86/nmi: Fix nmi_handle() duration miscalculation When nmi_check_duration() is checking the time an NMI handler took to execute, the whole_msecs value used should be read from the @duration argument, not from the ->max_duration, the latter being used to store the current maximal duration. [ bp: Rewrite commit message. ] Fixes: `248ed51048` ("x86/nmi: Remove irq_work from the long duration NMI handler") Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Libing Zhou <libing.zhou@nokia-sbell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Changbin Du <changbin.du@gmail.com> Link: https://lkml.kernel.org/r/20200820025641.44075-1-libing.zhou@nokia-sbell.com	2020-10-01 14:42:08 +02:00
Arvind Sankar	aa5cacdc29	x86/asm: Replace __force_order with a memory clobber The CRn accessor functions use __force_order as a dummy operand to prevent the compiler from reordering CRn reads/writes with respect to each other. The fact that the asm is volatile should be enough to prevent this: volatile asm statements should be executed in program order. However GCC 4.9.x and 5.x have a bug that might result in reordering. This was fixed in 8.1, 7.3 and 6.5. Versions prior to these, including 5.x and 4.9.x, may reorder volatile asm statements with respect to each other. There are some issues with __force_order as implemented: - It is used only as an input operand for the write functions, and hence doesn't do anything additional to prevent reordering writes. - It allows memory accesses to be cached/reordered across write functions, but CRn writes affect the semantics of memory accesses, so this could be dangerous. - __force_order is not actually defined in the kernel proper, but the LLVM toolchain can in some cases require a definition: LLVM (as well as GCC 4.9) requires it for PIE code, which is why the compressed kernel has a definition, but also the clang integrated assembler may consider the address of __force_order to be significant, resulting in a reference that requires a definition. Fix this by: - Using a memory clobber for the write functions to additionally prevent caching/reordering memory accesses across CRn writes. - Using a dummy input operand with an arbitrary constant address for the read functions, instead of a global variable. This will prevent reads from being reordered across writes, while allowing memory loads to be cached/reordered across CRn reads, which should be safe. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602 Link: https://lore.kernel.org/lkml/20200527135329.1172644-1-arnd@arndb.de/ Link: https://lkml.kernel.org/r/20200902232152.3709896-1-nivedita@alum.mit.edu	2020-10-01 10:31:48 +02:00
Thomas Gleixner	bc21a291fc	x86/mce: Use idtentry_nmi_enter/exit() The recent fix for NMI vs. IRQ state tracking missed to apply the cure to the MCE handler. Fixes: `ba1f2b2eaa` ("x86/entry: Fix NMI vs IRQ state tracking") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/87mu17ism2.fsf@nanos.tec.linutronix.de	2020-09-30 10:41:56 +02:00
Tony Luck	ed9705e4ad	x86/mce: Drop AMD-specific "DEFERRED" case from Intel severity rule list Way back in v3.19 Intel and AMD shared the same machine check severity grading code. So it made sense to add a case for AMD DEFERRED errors in commit `e3480271f5` ("x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error") But later in v4.2 AMD switched to a separate grading function in commit `bf80bbd7dc` ("x86/mce: Add an AMD severities-grading function") Belatedly drop the DEFERRED case from the Intel rule list. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200930021313.31810-3-tony.luck@intel.com	2020-09-30 07:49:58 +02:00
Borislav Petkov	fd258dc444	x86/mce: Add Skylake quirk for patrol scrub reported errors The patrol scrubber in Skylake and Cascade Lake systems can be configured to report uncorrected errors using a special signature in the machine check bank and to signal using CMCI instead of machine check. Update the severity calculation mechanism to allow specifying the model, minimum stepping and range of machine check bank numbers. Add a new rule to detect the special signature (on model 0x55, stepping >=4 in any of the memory controller banks). [ bp: Rewrite it. aegl: Productize it. ] Suggested-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Co-developed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200930021313.31810-2-tony.luck@intel.com	2020-09-30 07:43:56 +02:00
Maciej Fijalkowski	4d0b8c0b46	bpf: x64: Do not emit sub/add 0, %rsp when !stack_depth There is no particular reason for keeping the "sub 0, %rsp" insn within the BPF's x64 JIT prologue. When tail call code was skipping the whole prologue section these 7 bytes that represent the rsp subtraction could not be simply discarded as the jump target address would be broken. An option to address that would be to substitute it with nop7. Right now tail call is skipping only first 11 bytes of target program's prologue and "sub X, %rsp" is the first insn that is processed, so if stack depth is zero then this insn could be omitted without the need for nop7 swap. Therefore, do not emit the "sub 0, %rsp" in prologue when program is not making use of R10 register. Also, make the emission of "add X, %rsp" conditional in tail call code logic and take into account the presence of mentioned insn when calculating the jump offsets. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200929204653.4325-3-maciej.fijalkowski@intel.com	2020-09-29 16:47:39 -07:00
Maciej Fijalkowski	d207929d97	bpf, x64: Drop "pop %rcx" instruction on BPF JIT epilogue Back when all of the callee-saved registers where always pushed to stack in x64 JIT prologue, tail call counter was placed at the bottom of the BPF program's stack frame that had a following layout: +-------------+ \| ret addr \| +-------------+ \| rbp \| <- rbp +-------------+ \| \| \| free space \| \| from: \| \| sub $x,%rsp \| \| \| +-------------+ \| rbx \| +-------------+ \| r13 \| +-------------+ \| r14 \| +-------------+ \| r15 \| +-------------+ \| tail call \| <- rsp \| counter \| +-------------+ In order to restore the callee saved registers, epilogue needed to explicitly toss away the tail call counter via "pop %rbx" insn, so that %rsp would be back at the place where %r15 was stored. Currently, the tail call counter is placed on stack before the callee saved registers (brackets on rbx through r15 mean that they are now pushed to stack only if they are used): +-------------+ \| ret addr \| +-------------+ \| rbp \| <- rbp +-------------+ \| \| \| free space \| \| from: \| \| sub $x,%rsp \| \| \| +-------------+ \| tail call \| \| counter \| +-------------+ ( rbx ) +-------------+ ( r13 ) +-------------+ ( r14 ) +-------------+ ( r15 ) <- rsp +-------------+ For the record, the epilogue insns consist of (assuming all of the callee saved registers are used by program): pop %r15 pop %r14 pop %r13 pop %rbx pop %rcx leaveq retq "pop %rbx" for getting rid of tail call counter was not an option anymore as it would overwrite the restored value of %rbx register, so it was changed to use the %rcx register. Since epilogue can start popping the callee saved registers right away without any additional work, the "pop %rcx" could be dropped altogether as "leave" insn will simply move the %rbp to %rsp. IOW, tail call counter does not need the explicit handling. Having in mind the explanation above and the actual reason for that, let's piggy back on "leave" insn for discarding the tail call counter from stack and remove the "pop %rcx" from epilogue. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200929204653.4325-2-maciej.fijalkowski@intel.com	2020-09-29 16:47:39 -07:00
Krzysztof Wilczyński	3789af9a13	PCI/PM: Rename pci_dev.d3_delay to d3hot_delay PCI devices support two variants of the D3 power state: D3hot (main power present) D3cold (main power removed). Previously struct pci_dev contained: unsigned int d3_delay; /* D3->D0 transition time in ms / unsigned int d3cold_delay; / D3cold->D0 transition time in ms */ "d3_delay" refers specifically to the D3hot state. Rename it to "d3hot_delay" to avoid ambiguity and align with the ACPI "_DSM for Specifying Device Readiness Durations" in the PCI Firmware spec r3.2, sec 4.6.9. There is no change to the functionality. Link: https://lore.kernel.org/r/20200730210848.1578826-1-kw@linux.com Signed-off-by: Krzysztof Wilczyński <kw@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2020-09-29 14:21:50 -05:00
kernel test robot	6a2e0923b2	KVM: VMX: vmx_uret_msrs_list[] can be static Fixes: `14a61b642d` ("KVM: VMX: Rename "vmx_msr_index" to "vmx_uret_msrs_list"") Signed-off-by: kernel test robot <lkp@intel.com> Message-Id: <20200928153714.GA6285@a3a878002045> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-29 05:44:37 -04:00
Kan Liang	010cb00265	perf/x86/intel: Fix Ice Lake event constraint table An error occues when sampling non-PEBS INST_RETIRED.PREC_DIST(0x01c0) event. perf record -e cpu/event=0xc0,umask=0x01/ -- sleep 1 Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cpu/event=0xc0,umask=0x01/). /bin/dmesg \| grep -i perf may provide additional information. The idxmsk64 of the event is set to 0. The event never be successfully scheduled. The event should be limit to the fixed counter 0. Fixes: `6017608936` ("perf/x86/intel: Add Icelake support") Reported-by: Yi, Ammy <ammy.yi@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200928134726.13090-1-kan.liang@linux.intel.com	2020-09-29 09:57:02 +02:00
Kan Liang	8191016a02	perf/x86/intel/uncore: Fix the scale of the IMC free-running events The "MiB" result of the IMC free-running bandwidth events, uncore_imc_free_running/read/ and uncore_imc_free_running/write/ are 16 times too small. The "MiB" value equals the raw IMC free-running bandwidth counter value times a "scale" which is inaccurate. The IMC free-running bandwidth events should be incremented per 64B cache line, not DWs (4 bytes). The "scale" should be 6.103515625e-5. Fix the "scale" for both Snow Ridge and Ice Lake. Fixes: `2b3b76b5ec` ("perf/x86/intel/uncore: Add Ice Lake server uncore support") Fixes: `ee49532b38` ("perf/x86/intel/uncore: Add IMC uncore support for Snow Ridge") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200928133240.12977-1-kan.liang@linux.intel.com	2020-09-29 09:57:02 +02:00
Alexander Antonov	f797f05d91	perf/x86/intel/uncore: Fix for iio mapping on Skylake Server Introduced early attributes /sys/devices/uncore_iio_<pmu_idx>/die* are initialized by skx_iio_set_mapping(), however, for example, for multiple segment platforms skx_iio_get_topology() returns -EPERM before a list of attributes in skx_iio_mapping_group will have been initialized. As a result the list is being NULL. Thus the warning "sysfs: (bin_)attrs not set by subsystem for group: uncore_iio_*/" appears and uncore_iio pmus are not available in sysfs. Clear IIO attr_update to properly handle the cases when topology information cannot be retrieved. Fixes: `bb42b3d397` ("perf/x86/intel/uncore: Expose an Uncore unit to IIO PMON mapping") Reported-by: Kyle Meyer <kyle.meyer@hpe.com> Suggested-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Alexei Budankov <alexey.budankov@linux.intel.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20200928102133.61041-1-alexander.antonov@linux.intel.com	2020-09-29 09:57:02 +02:00
Kan Liang	c3bb8a9fa3	perf/x86/msr: Add Jasper Lake support The Jasper Lake processor is also a Tremont microarchitecture. From the perspective of perf MSR, there is nothing changed compared with Elkhart Lake. Share the code path with Elkhart Lake. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1601296242-32763-2-git-send-email-kan.liang@linux.intel.com	2020-09-29 09:57:02 +02:00
Kan Liang	dbfd638889	perf/x86/intel: Add Jasper Lake support The Jasper Lake processor is also a Tremont microarchitecture. From the perspective of Intel PMU, there is nothing changed compared with Elkhart Lake. Share the perf code with Elkhart Lake. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1601296242-32763-1-git-send-email-kan.liang@linux.intel.com	2020-09-29 09:57:01 +02:00
Kan Liang	ee13938543	perf/x86/intel/uncore: Reduce the number of CBOX counters An oops is triggered by the fuzzy test. [ 327.853081] unchecked MSR access error: RDMSR from 0x70c at rIP: 0xffffffffc082c820 (uncore_msr_read_counter+0x10/0x50 [intel_uncore]) [ 327.853083] Call Trace: [ 327.853085] <IRQ> [ 327.853089] uncore_pmu_event_start+0x85/0x170 [intel_uncore] [ 327.853093] uncore_pmu_event_add+0x1a4/0x410 [intel_uncore] [ 327.853097] ? event_sched_in.isra.118+0xca/0x240 There are 2 GP counters for each CBOX, but the current code claims 4 counters. Accessing the invalid registers triggers the oops. Fixes: `6e394376ee` ("perf/x86/intel/uncore: Add Intel Icelake uncore support") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200925134905.8839-3-kan.liang@linux.intel.com	2020-09-29 09:57:01 +02:00
Kan Liang	8f5d41f3a0	perf/x86/intel/uncore: Update Ice Lake uncore units There are some updates for the Icelake model specific uncore performance monitors. (The update can be found at 10th generation intel core processors families specification update Revision 004, ICL068) 1) Counter 0 of ARB uncore unit is not available for software use 2) The global 'enable bit' (bit 29) and 'freeze bit' (bit 31) of MSR_UNC_PERF_GLOBAL_CTRL cannot be used to control counter behavior. Needs to use local enable in event select MSR. Accessing the modified bit/registers will be ignored by HW. Users may observe inaccurate results with the current code. The changes of the MSR_UNC_PERF_GLOBAL_CTRL imply that groups cannot be read atomically anymore. Although the error of the result for a group becomes a bit bigger, it still far lower than not using a group. The group support is still kept. Only Remove the *_box() related implementation. Since the counter 0 of ARB uncore unit is not available, update the MSR address for the ARB uncore unit. There is no change for IMC uncore unit, which only include free-running counters. Fixes: `6e394376ee` ("perf/x86/intel/uncore: Add Intel Icelake uncore support") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200925134905.8839-2-kan.liang@linux.intel.com	2020-09-29 09:57:01 +02:00
Kan Liang	8abbcfefb5	perf/x86/intel/uncore: Split the Ice Lake and Tiger Lake MSR uncore support Previously, the MSR uncore for the Ice Lake and Tiger Lake are identical. The code path is shared. However, with recent update, the global MSR_UNC_PERF_GLOBAL_CTRL register and ARB uncore unit are changed for the Ice Lake. Split the Ice Lake and Tiger Lake MSR uncore support. The changes only impact the MSR ops() and the ARB uncore unit. Other codes can still be shared between the Ice Lake and the Tiger Lake. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200925134905.8839-1-kan.liang@linux.intel.com	2020-09-29 09:57:00 +02:00
Steven Rostedt (VMware)	fdb46faeab	x86: Use tracepoint_enabled() for msr tracepoints instead of open coding it `7f47d8cc03` ("x86, tracing, perf: Add trace point for MSR accesses") added tracing of msr read and write, but because of complexity in having tracepoints in headers, and even more so for a core header like msr.h, not to mention the bloat a tracepoint adds to inline functions, a helper function is needed to be called from the header. Use the new tracepoint_enabled() macro in tracepoint-defs.h to test if the tracepoint is active before calling the helper function, instead of open coding the same logic, which requires knowing the internals of a tracepoint. Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2020-09-28 10:36:02 -04:00
Paolo Bonzini	0c899c25d7	KVM: x86: do not attempt TSC synchronization on guest writes KVM special-cases writes to MSR_IA32_TSC so that all CPUs have the same base for the TSC. This logic is complicated, and we do not want it to have any effect once the VM is started. In particular, if any guest started to synchronize its TSCs with writes to MSR_IA32_TSC rather than MSR_IA32_TSC_ADJUST, the additional effect of kvm_write_tsc code would be uncharted territory. Therefore, this patch makes writes to MSR_IA32_TSC behave essentially the same as writes to MSR_IA32_TSC_ADJUST when they come from the guest. A new selftest (which passes both before and after the patch) checks the current semantics of writes to MSR_IA32_TSC and MSR_IA32_TSC_ADJUST originating from both the host and the guest. Upcoming work to remove the special side effects of host-initiated writes to MSR_IA32_TSC and MSR_IA32_TSC_ADJUST will be able to build onto this test, adjusting the host side to use the new APIs and achieve the same effect. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:59:52 -04:00
Paolo Bonzini	a7d5c7ce41	KVM: nSVM: delay MSR permission processing to first nested VM run Allow userspace to set up the memory map after KVM_SET_NESTED_STATE; to do so, move the call to nested_svm_vmrun_msrpm inside the KVM_REQ_GET_NESTED_STATE_PAGES handler (which is currently not used by nSVM). This is similar to what VMX does already. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:59:30 -04:00
Paolo Bonzini	729c15c20f	KVM: x86: rename KVM_REQ_GET_VMCS12_PAGES We are going to use it for SVM too, so use a more generic name. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:58:49 -04:00
Alexander Graf	1a155254ff	KVM: x86: Introduce MSR filtering It's not desireable to have all MSRs always handled by KVM kernel space. Some MSRs would be useful to handle in user space to either emulate behavior (like uCode updates) or differentiate whether they are valid based on the CPU model. To allow user space to specify which MSRs it wants to see handled by KVM, this patch introduces a new ioctl to push filter rules with bitmaps into KVM. Based on these bitmaps, KVM can then decide whether to reject MSR access. With the addition of KVM_CAP_X86_USER_SPACE_MSR it can also deflect the denied MSR events to user space to operate on. If no filter is populated, MSR handling stays identical to before. Signed-off-by: Alexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-8-graf@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:58:08 -04:00
Alexander Graf	3eb900173c	KVM: x86: VMX: Prevent MSR passthrough when MSR access is denied We will introduce the concept of MSRs that may not be handled in kernel space soon. Some MSRs are directly passed through to the guest, effectively making them handled by KVM from user space's point of view. This patch introduces all logic required to ensure that MSRs that user space wants trapped are not marked as direct access for guests. Signed-off-by: Alexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-7-graf@amazon.com> [Replace "_idx" with "_slot". - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:58:07 -04:00
Alexander Graf	fd6fa73d13	KVM: x86: SVM: Prevent MSR passthrough when MSR access is denied We will introduce the concept of MSRs that may not be handled in kernel space soon. Some MSRs are directly passed through to the guest, effectively making them handled by KVM from user space's point of view. This patch introduces all logic required to ensure that MSRs that user space wants trapped are not marked as direct access for guests. Signed-off-by: Alexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-6-graf@amazon.com> [Make terminology a bit more similar to VMX. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:58:06 -04:00
Aaron Lewis	476c9bd8e9	KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs Prepare vmx and svm for a subsequent change that ensures the MSR permission bitmap is set to allow an MSR that userspace is tracking to force a vmx_vmexit in the guest. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Oliver Upton <oupton@google.com> [agraf: rebase, adapt SVM scheme to nested changes that came in between] Signed-off-by: Alexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-5-graf@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:58:05 -04:00
Alexander Graf	51de8151bd	KVM: x86: Add infrastructure for MSR filtering In the following commits we will add pieces of MSR filtering. To ensure that code compiles even with the feature half-merged, let's add a few stubs and struct definitions before the real patches start. Signed-off-by: Alexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-4-graf@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:58:05 -04:00
Alexander Graf	1ae099540e	KVM: x86: Allow deflecting unknown MSR accesses to user space MSRs are weird. Some of them are normal control registers, such as EFER. Some however are registers that really are model specific, not very interesting to virtualization workloads, and not performance critical. Others again are really just windows into package configuration. Out of these MSRs, only the first category is necessary to implement in kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against certain CPU models and MSRs that contain information on the package level are much better suited for user space to process. However, over time we have accumulated a lot of MSRs that are not the first category, but still handled by in-kernel KVM code. This patch adds a generic interface to handle WRMSR and RDMSR from user space. With this, any future MSR that is part of the latter categories can be handled in user space. Furthermore, it allows us to replace the existing "ignore_msrs" logic with something that applies per-VM rather than on the full system. That way you can run productive VMs in parallel to experimental ones where you don't care about proper MSR handling. Signed-off-by: Alexander Graf <graf@amazon.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20200925143422.21718-3-graf@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:58:04 -04:00
Alexander Graf	90218e434c	KVM: x86: Return -ENOENT on unimplemented MSRs When we find an MSR that we can not handle, bubble up that error code as MSR error return code. Follow up patches will use that to expose the fact that an MSR is not handled by KVM to user space. Suggested-by: Aaron Lewis <aaronlewis@google.com> Signed-off-by: Alexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-2-graf@amazon.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:58:02 -04:00
Sean Christopherson	802145c56a	KVM: VMX: Rename vmx_uret_msr's "index" to "slot" Rename "index" to "slot" in struct vmx_uret_msr to align with the terminology used by common x86's kvm_user_return_msrs, and to avoid conflating "MSR's ECX index" with "MSR's index into an array". No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-16-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:58:02 -04:00
Sean Christopherson	14a61b642d	KVM: VMX: Rename "vmx_msr_index" to "vmx_uret_msrs_list" Rename "vmx_msr_index" to "vmx_uret_msrs_list" to associate it with the uret MSRs array, and to avoid conflating "MSR's ECX index" with "MSR's index into an array". Similarly, don't use "slot" in the name as that terminology is claimed by the common x86 "user_return_msrs" mechanism. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-15-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:58:01 -04:00
Sean Christopherson	7bf662bb5e	KVM: VMX: Rename "vmx_set_guest_msr" to "vmx_set_guest_uret_msr" Add "uret" to vmx_set_guest_msr() to explicitly associate it with the guest_uret_msrs array, and to differentiate it from vmx_set_msr() as well as VMX's load/store MSRs. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-14-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:58:01 -04:00
Sean Christopherson	d85a8034c0	KVM: VMX: Rename "find_msr_entry" to "vmx_find_uret_msr" Rename "find_msr_entry" to scope it to VMX and to associate it with guest_uret_msrs. Drop the "entry" so that the function name pairs with the existing __vmx_find_uret_msr(), which intentionally uses a double underscore prefix instead of appending "index" or "slot" as those names are already claimed by other pieces of the user return MSR stack. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-13-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:58:00 -04:00
Sean Christopherson	bd65ba82b3	KVM: VMX: Add vmx_setup_uret_msr() to handle lookup and swap Add vmx_setup_uret_msr() to wrap the lookup and manipulation of the uret MSRs array during setup_msrs(). In addition to consolidating code, this eliminates move_msr_up(), which while being a very literally description of the function, isn't exacly helpful in understanding the net effect of the code. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-12-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:58:00 -04:00
Sean Christopherson	86e3e494fe	KVM: VMX: Move uret MSR lookup into update_transition_efer() Move checking for the existence of MSR_EFER in the uret MSR array into update_transition_efer() so that the lookup and manipulation of the array in setup_msrs() occur back-to-back. This paves the way toward adding a helper to wrap the lookup and manipulation. To avoid unnecessary overhead, defer the lookup until the uret array would actually be modified in update_transition_efer(). EFER obviously exists on CPUs that support the dedicated VMCS fields for switching EFER, and EFER must exist for the guest and host EFER.NX value to diverge, i.e. there is no danger of attempting to read/write EFER when it doesn't exist. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-11-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:59 -04:00
Sean Christopherson	ef1d2ee12e	KVM: VMX: Check guest support for RDTSCP before processing MSR_TSC_AUX Check for RDTSCP support prior to checking if MSR_TSC_AUX is in the uret MSRs array so that the array lookup and manipulation are back-to-back. This paves the way toward adding a helper to wrap the lookup and manipulation. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-10-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:59 -04:00
Sean Christopherson	1e7a483037	KVM: VMX: Rename "__find_msr_index" to "__vmx_find_uret_msr" Rename "__find_msr_index" to scope it to VMX, associate it with guest_uret_msrs, and to avoid conflating "MSR's ECX index" with "MSR's array index". Similarly, don't use "slot" in the name so as to avoid colliding the common x86's half of "user_return_msrs" (the slot in kvm_user_return_msrs is not the same slot in guest_uret_msrs). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-9-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:58 -04:00
Sean Christopherson	658ece84f5	KVM: VMX: Rename vcpu_vmx's "guest_msrs_ready" to "guest_uret_msrs_loaded" Add "uret" to "guest_msrs_ready" to explicitly associate it with the "guest_uret_msrs" array, and replace "ready" with "loaded" to more precisely reflect what it tracks, e.g. "ready" could be interpreted as meaning ready for processing (setup_msrs() has run), which is wrong. "loaded" also aligns with the similar "guest_state_loaded" field. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-8-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:58 -04:00
Sean Christopherson	e9bb1ae92d	KVM: VMX: Rename vcpu_vmx's "save_nmsrs" to "nr_active_uret_msrs" Add "uret" into the name of "save_nmsrs" to explicitly associate it with the guest_uret_msrs array, and replace "save" with "active" (for lack of a better word) to better describe what is being tracked. While "save" is more or less accurate when viewed as a literal description of the field, e.g. it holds the number of MSRs that were saved into the array the last time setup_msrs() was invoked, it can easily be misinterpreted by the reader, e.g. as meaning the number of MSRs that were saved from hardware at some point in the past, or as the number of MSRs that need to be saved at some point in the future, both of which are wrong. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-7-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:57 -04:00
Sean Christopherson	fbc1800738	KVM: VMX: Rename vcpu_vmx's "nmsrs" to "nr_uret_msrs" Rename vcpu_vmx.nsmrs to vcpu_vmx.nr_uret_msrs to explicitly associate it with the guest_uret_msrs array. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-6-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:57 -04:00
Sean Christopherson	eb3db1b137	KVM: VMX: Rename the "shared_msr_entry" struct to "vmx_uret_msr" Rename struct "shared_msr_entry" to "vmx_uret_msr" to align with x86's rename of "shared_msrs" to "user_return_msrs", and to call out that the struct is specific to VMX, i.e. not part of the generic "shared_msrs" framework. Abbreviate "user_return" as "uret" to keep line lengths marginally sane and code more or less readable. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:56 -04:00
Sean Christopherson	a128a934f2	KVM: VMX: Rename "vmx_find_msr_index" to "vmx_find_loadstore_msr_slot" Add "loadstore" to vmx_find_msr_index() to differentiate it from the so called shared MSRs helpers (which will soon be renamed), and replace "index" with "slot" to better convey that the helper returns slot in the array, not the MSR index (the value that gets stuffed into ECX). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-09-28 07:57:55 -04:00

... 3 4 5 6 7 ...

36886 commits