1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

16741 commits

Author SHA1 Message Date
Thomas Gleixner
860aaabac8 x86/dumpstack: Do not try to access user space code of other tasks
sysrq-t ends up invoking show_opcodes() for each task which tries to access
the user space code of other processes, which is obviously bogus.

It either manages to dump where the foreign task's regs->ip points to in a
valid mapping of the current task or triggers a pagefault and prints "Code:
Bad RIP value.". Both is just wrong.

Add a safeguard in copy_code() and check whether the @regs pointer matches
currents pt_regs. If not, do not even try to access it.

While at it, add commentary why using copy_from_user_nmi() is safe in
copy_code() even if the function name suggests otherwise.

Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201117202753.667274723@linutronix.de
2020-11-18 12:56:29 +01:00
Hui Su
09a217c105 x86/dumpstack: Make show_trace_log_lvl() static
show_trace_log_lvl() is not used by other compilation units so make it
static and remove the declaration from the header file.

Signed-off-by: Hui Su <sh_def@163.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201113133943.GA136221@rlk
2020-11-17 19:05:32 +01:00
Jarkko Sakkinen
d2285493be x86/sgx: Add SGX page allocator functions
Add functions for runtime allocation and free.

This allocator and its algorithms are as simple as it gets.  They do a
linear search across all EPC sections and find the first free page.  They
are not NUMA-aware and only hand out individual pages.  The SGX hardware
does not support large pages, so something more complicated like a buddy
allocator is unwarranted.

The free function (sgx_free_epc_page()) implicitly calls ENCLS[EREMOVE],
which returns the page to the uninitialized state.  This ensures that the
page is ready for use at the next allocation.

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-10-jarkko@kernel.org
2020-11-17 14:36:13 +01:00
Jarkko Sakkinen
38853a3039 x86/cpu/intel: Add a nosgx kernel parameter
Add a kernel parameter to disable SGX kernel support and document it.

 [ bp: Massage. ]

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Tested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-9-jarkko@kernel.org
2020-11-17 14:36:13 +01:00
Sean Christopherson
224ab3527f x86/cpu/intel: Detect SGX support
Kernel support for SGX is ultimately decided by the state of the launch
control bits in the feature control MSR (MSR_IA32_FEAT_CTL).  If the
hardware supports SGX, but neglects to support flexible launch control, the
kernel will not enable SGX.

Enable SGX at feature control MSR initialization and update the associated
X86_FEATURE flags accordingly.  Disable X86_FEATURE_SGX (and all
derivatives) if the kernel is not able to establish itself as the authority
over SGX Launch Control.

All checks are performed for each logical CPU (not just boot CPU) in order
to verify that MSR_IA32_FEATURE_CONTROL is correctly configured on all
CPUs. All SGX code in this series expects the same configuration from all
CPUs.

This differs from VMX where X86_FEATURE_VMX is intentionally cleared only
for the current CPU so that KVM can provide additional information if KVM
fails to load like which CPU doesn't support VMX.  There’s not much the
kernel or an administrator can do to fix the situation, so SGX neglects to
convey additional details about these kinds of failures if they occur.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-8-jarkko@kernel.org
2020-11-17 14:36:13 +01:00
Sean Christopherson
e7e0545299 x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections
Although carved out of normal DRAM, enclave memory is marked in the
system memory map as reserved and is not managed by the core mm.  There
may be several regions spread across the system.  Each contiguous region
is called an Enclave Page Cache (EPC) section.  EPC sections are
enumerated via CPUID

Enclave pages can only be accessed when they are mapped as part of an
enclave, by a hardware thread running inside the enclave.

Parse CPUID data, create metadata for EPC pages and populate a simple
EPC page allocator.  Although much smaller, ‘struct sgx_epc_page’
metadata is the SGX analog of the core mm ‘struct page’.

Similar to how the core mm’s page->flags encode zone and NUMA
information, embed the EPC section index to the first eight bits of
sgx_epc_page->desc.  This allows a quick reverse lookup from EPC page to
EPC section.  Existing client hardware supports only a single section,
while upcoming server hardware will support at most eight sections.
Thus, eight bits should be enough for long term needs.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Serge Ayoun <serge.ayoun@intel.com>
Signed-off-by: Serge Ayoun <serge.ayoun@intel.com>
Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-6-jarkko@kernel.org
2020-11-17 14:36:13 +01:00
Jarkko Sakkinen
2c273671d0 x86/sgx: Add wrappers for ENCLS functions
ENCLS is the userspace instruction which wraps virtually all
unprivileged SGX functionality for managing enclaves.  It is essentially
the ioctl() of instructions with each function implementing different
SGX-related functionality.

Add macros to wrap the ENCLS functionality. There are two main groups,
one for functions which do not return error codes and a “ret_” set for
those that do.

ENCLS functions are documented in Intel SDM section 36.6.

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-3-jarkko@kernel.org
2020-11-17 14:36:12 +01:00
Jarkko Sakkinen
70d3b8ddcd x86/sgx: Add SGX architectural data structures
Define the SGX architectural data structures used by various SGX
functions. This is not an exhaustive representation of all SGX data
structures but only those needed by the kernel.

The goal is to sequester hardware structures in "sgx/arch.h" and keep
them separate from kernel-internal or uapi structures.

The data structures are described in Intel SDM section 37.6.

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-2-jarkko@kernel.org
2020-11-17 14:36:12 +01:00
Chen Yu
1a371e67dc x86/microcode/intel: Check patch signature before saving microcode for early loading
Currently, scan_microcode() leverages microcode_matches() to check
if the microcode matches the CPU by comparing the family and model.
However, the processor stepping and flags of the microcode signature
should also be considered when saving a microcode patch for early
update.

Use find_matching_signature() in scan_microcode() and get rid of the
now-unused microcode_matches() which is a good cleanup in itself.

Complete the verification of the patch being saved for early loading in
save_microcode_patch() directly. This needs to be done there too because
save_mc_for_early() will call save_microcode_patch() too.

The second reason why this needs to be done is because the loader still
tries to support, at least hypothetically, mixed-steppings systems and
thus adds all patches to the cache that belong to the same CPU model
albeit with different steppings.

For example:

  microcode: CPU: sig=0x906ec, pf=0x2, rev=0xd6
  microcode: mc_saved[0]: sig=0x906e9, pf=0x2a, rev=0xd6, total size=0x19400, date = 2020-04-23
  microcode: mc_saved[1]: sig=0x906ea, pf=0x22, rev=0xd6, total size=0x19000, date = 2020-04-27
  microcode: mc_saved[2]: sig=0x906eb, pf=0x2, rev=0xd6, total size=0x19400, date = 2020-04-23
  microcode: mc_saved[3]: sig=0x906ec, pf=0x22, rev=0xd6, total size=0x19000, date = 2020-04-27
  microcode: mc_saved[4]: sig=0x906ed, pf=0x22, rev=0xd6, total size=0x19400, date = 2020-04-23

The patch which is being saved for early loading, however, can only be
the one which fits the CPU this runs on so do the signature verification
before saving.

 [ bp: Do signature verification in save_microcode_patch()
       and rewrite commit message. ]

Fixes: ec400ddeff ("x86/microcode_intel_early.c: Early update ucode on Intel's CPU")
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208535
Link: https://lkml.kernel.org/r/20201113015923.13960-1-yu.c.chen@intel.com
2020-11-17 10:33:18 +01:00
Thomas Gleixner
4cffe21d4a Merge branch 'x86/entry' into core/entry
Prepare for the merging of the syscall_work series which conflicts with the
TIF bits overhaul in X86.
2020-11-16 20:51:59 +01:00
Alex Shi
789f52c071 x86/kvm: remove unused macro HV_CLOCK_SIZE
This macro is useless, and could cause gcc warning:
arch/x86/kernel/kvmclock.c:47:0: warning: macro "HV_CLOCK_SIZE" is not
used [-Wunused-macros]
Let's remove it.

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: <1604651963-10067-1-git-send-email-alex.shi@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-16 13:14:21 -05:00
Borislav Petkov
18741a5251 x86/msr: Do not allow writes to MSR_IA32_ENERGY_PERF_BIAS
Now that all in-kernel-tree users are converted to using the sysfs file,
remove the MSR from the "allowlist".

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20201029190259.3476-5-bp@alien8.de
2020-11-16 17:44:04 +01:00
Tony Luck
098416e698 x86/mce: Use "safe" MSR functions when enabling additional error logging
Booting as a guest under KVM results in error messages about
unchecked MSR access:

  unchecked MSR access error: RDMSR from 0x17f at rIP: 0xffffffff84483f16 (mce_intel_feature_init+0x156/0x270)

because KVM doesn't provide emulation for random model specific
registers.

Switch to using rdmsrl_safe()/wrmsrl_safe() to avoid the message.

Fixes: 68299a42f8 ("x86/mce: Enable additional error logging on certain Intel CPUs")
Reported-by: Qian Cai <cai@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201111003954.GA11878@agluck-desk2.amr.corp.intel.com
2020-11-16 17:34:08 +01:00
Linus Torvalds
326fd6db61 A small set of fixes for x86:
- Cure the fallout from the MSI irqdomain overhaul which missed that the
    Intel IOMMU does not register virtual function devices and therefore
    never reaches the point where the MSI interrupt domain is assigned. This
    makes the VF devices use the non-remapped MSI domain which is trapped by
    the IOMMU/remap unit.
 
  - Remove an extra space in the SGI_UV architecture type procfs output for
    UV5.
 
  - Remove a unused function which was missed when removing the UV BAU TLB
    shootdown handler.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+xJi0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoVWxD/9Tq4W6Kniln7mtoEWHRvHRceiiGcS3
 MocvqurhoJwirH4F2gkvCegTBy0r3FdUORy3OMmChVs6nb8XpPpso84SANCRePWp
 JZezpVwLSNC4O1/ZCg1Kjj4eUpzLB/UjUUQV9RsjL5wyQEhfCZgb1D40yLM/2dj5
 SkVm/EAqWuQNtYe/jqAOwTX/7mV+k2QEmKCNOigM13R9EWgu6a4J8ta1gtNSbwvN
 jWMW+M1KjZ76pfRK+y4OpbuFixteSzhSWYPITSGwQz4IpQ+Ty2Rv0zzjidmDnAR+
 Q73cup0dretdVnVDRpMwDc06dBCmt/rbN50w4yGU0YFRFDgjGc8sIbQzuIP81nEQ
 XY4l4rcBgyVufFsLrRpQxu1iYPFrcgU38W1kRkkJ3Kl/rY1a2ZU7sLE4kt4Oh55W
 A9KCmsfqP1PCYppjAQ0QT4NOp4YtecPvAU4UcBOb722DDBd8TfhLWWGw2yG57Q/d
 Wnu8xCJGy7BaLHLGGGseAft+D4aNnCjKC3jgMyvNtRDXaV2cK2Kdd6ehMlWVUapD
 xfLlKXE+igXMyoWJIWjTXQJs4dpKu6QpJCPiorwEZ8rmNaRfxsWEJVbeYwEkmUke
 bMoBBSCbZT86WVOYhI8WtrIemraY0mMYrrcE03M96HU3eYB8BV92KrIzZWThupcQ
 ZqkZbqCZm3vfHA==
 =X/P+
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A small set of fixes for x86:

   - Cure the fallout from the MSI irqdomain overhaul which missed that
     the Intel IOMMU does not register virtual function devices and
     therefore never reaches the point where the MSI interrupt domain is
     assigned. This made the VF devices use the non-remapped MSI domain
     which is trapped by the IOMMU/remap unit

   - Remove an extra space in the SGI_UV architecture type procfs output
     for UV5

   - Remove a unused function which was missed when removing the UV BAU
     TLB shootdown handler"

* tag 'x86-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  iommu/vt-d: Cure VF irqdomain hickup
  x86/platform/uv: Fix copied UV5 output archtype
  x86/platform/uv: Drop last traces of uv_flush_tlb_others
2020-11-15 09:49:56 -08:00
Linus Torvalds
64b609d6a6 A set of fixes for perf:
- A set of commits which reduce the stack usage of various perf event
    handling functions which allocated large data structs on stack causing
    stack overflows in the worst case.
 
  - Use the proper mechanism for detecting soft interrupts in the recursion
    protection.
 
  - Make the resursion protection simpler and more robust.
 
  - Simplify the scheduling of event groups to make the code more robust and
    prepare for fixing the issues vs. scheduling of exclusive event groups.
 
  - Prevent event multiplexing and rotation for exclusive event groups
 
  - Correct the perf event attribute exclusive semantics to take pinned
    events, e.g. the PMU watchdog, into account
 
  - Make the anythread filtering conditional for Intel's generic PMU
    counters as it is not longer guaranteed to be supported on newer
    CPUs. Check the corresponding CPUID leaf to make sure.
 
  - Fixup a duplicate initialization in an array which was probably cause by
    the usual copy & paste - forgot to edit mishap.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+xIi0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYofixD/4+4gc8DhOmAkMrN0Z9tiW8ebgMKmb9
 wZRkMr5Osi0GzLJOPZ6SdY6jd0A3rMN/sW6P1DT6pDtcty4bKFoW5VZBuUDIAhel
 BC4C93L3y1En/GEZu1GTy3LvsBwLBQTOoY4goDjbdAbk60S/0RTHOGyQsRsOQFe6
 fVs3iXozAFuaR6I6N3dlxuJAE51zvr8MyBWaUoByNDB//1+lLNW+JfClaAOG1oXx
 qZIg/niatBVGzSGgKNRUyh3g8G1HJtabsA/NZ4PH8ZHuYABfmj4lmmUPR77ICLfV
 wMITEBG7eaktB8EqM9hvaoOZLA5kpXHO2JbCFSs4c4x11mlC8g7QMV3poCw33YoN
 a5TmT1A3muri1riy1/Ee9lXACOq7/tf2+Xfn9o6dvDdBwd6s5pzlhLGR8gILp2lF
 2bcg3IwYvHT/Kiurb/WGNpbCqQIPJpcUcfs3tNBCCtKegahUQNnGjxN3NVo9RCit
 zfL6xIJ8eZiYnsxXx4NKm744AukWiql3aRNgRkOdBP5WC68xt6VLcxG1YZKUoDhy
 jRSOCD/DuPSMSvAAgN7S8OWlPsKWBxVxxWYV+K8FpwhgzbQ3WbS3UDiYkhgjeOxu
 OlM692oWpllKvQWlvYthr2Be6oPCRRi1vvADNNbTKzgHk5i61bwympsGl1EZx3Pz
 2ROp7NJFRESnqw==
 =FzCf
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
 "A set of fixes for perf:

    - A set of commits which reduce the stack usage of various perf
      event handling functions which allocated large data structs on
      stack causing stack overflows in the worst case

    - Use the proper mechanism for detecting soft interrupts in the
      recursion protection

    - Make the resursion protection simpler and more robust

    - Simplify the scheduling of event groups to make the code more
      robust and prepare for fixing the issues vs. scheduling of
      exclusive event groups

    - Prevent event multiplexing and rotation for exclusive event groups

    - Correct the perf event attribute exclusive semantics to take
      pinned events, e.g. the PMU watchdog, into account

    - Make the anythread filtering conditional for Intel's generic PMU
      counters as it is not longer guaranteed to be supported on newer
      CPUs. Check the corresponding CPUID leaf to make sure

    - Fixup a duplicate initialization in an array which was probably
      caused by the usual 'copy & paste - forgot to edit' mishap"

* tag 'perf-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Fix Add BW copypasta
  perf/x86/intel: Make anythread filter support conditional
  perf: Tweak perf_event_attr::exclusive semantics
  perf: Fix event multiplexing for exclusive groups
  perf: Simplify group_sched_in()
  perf: Simplify group_sched_out()
  perf/x86: Make dummy_iregs static
  perf/arch: Remove perf_sample_data::regs_user_copy
  perf: Optimize get_recursion_context()
  perf: Fix get_recursion_context()
  perf/x86: Reduce stack usage for x86_pmu::drain_pebs()
  perf: Reduce stack usage of perf_output_begin()
2020-11-15 09:46:36 -08:00
Steven Rostedt (VMware)
2860cd8a23 livepatch: Use the default ftrace_ops instead of REGS when ARGS is available
When CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS is available, the ftrace call
will be able to set the ip of the calling function. This will improve the
performance of live kernel patching where it does not need all the regs to
be stored just to change the instruction pointer.

If all archs that support live kernel patching also support
HAVE_DYNAMIC_FTRACE_WITH_ARGS, then the architecture specific function
klp_arch_set_pc() could be made generic.

It is possible that an arch can support HAVE_DYNAMIC_FTRACE_WITH_ARGS but
not HAVE_DYNAMIC_FTRACE_WITH_REGS and then have access to live patching.

Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: live-patching@vger.kernel.org
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-13 12:15:28 -05:00
Steven Rostedt (VMware)
02a474ca26 ftrace/x86: Allow for arguments to be passed in to ftrace_regs by default
Currently, the only way to get access to the registers of a function via a
ftrace callback is to set the "FL_SAVE_REGS" bit in the ftrace_ops. But as this
saves all regs as if a breakpoint were to trigger (for use with kprobes), it
is expensive.

The regs are already saved on the stack for the default ftrace callbacks, as
that is required otherwise a function being traced will get the wrong
arguments and possibly crash. And on x86, the arguments are already stored
where they would be on a pt_regs structure to use that code for both the
regs version of a callback, it makes sense to pass that information always
to all functions.

If an architecture does this (as x86_64 now does), it is to set
HAVE_DYNAMIC_FTRACE_WITH_ARGS, and this will let the generic code that it
could have access to arguments without having to set the flags.

This also includes having the stack pointer being saved, which could be used
for accessing arguments on the stack, as well as having the function graph
tracer not require its own trampoline!

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-13 12:14:59 -05:00
Steven Rostedt (VMware)
d19ad0775d ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs
In preparation to have arguments of a function passed to callbacks attached
to functions as default, change the default callback prototype to receive a
struct ftrace_regs as the forth parameter instead of a pt_regs.

For callbacks that set the FL_SAVE_REGS flag in their ftrace_ops flags, they
will now need to get the pt_regs via a ftrace_get_regs() helper call. If
this is called by a callback that their ftrace_ops did not have a
FL_SAVE_REGS flag set, it that helper function will return NULL.

This will allow the ftrace_regs to hold enough just to get the parameters
and stack pointer, but without the worry that callbacks may have a pt_regs
that is not completely filled.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-13 12:14:55 -05:00
Mike Travis
77c7e1bc06 x86/platform/uv: Fix copied UV5 output archtype
A test shows that the output contains a space:
    # cat /proc/sgi_uv/archtype
    NSGI4 U/UVX

Remove that embedded space by copying the "trimmed" buffer instead of the
untrimmed input character list.  Use sizeof to remove size dependency on
copy out length.  Increase output buffer size by one character just in case
BIOS sends an 8 character string for archtype.

Fixes: 1e61f5a95f ("Add and decode Arch Type in UVsystab")
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20201111010418.82133-1-mike.travis@hpe.com
2020-11-13 00:00:31 +01:00
Thomas Gleixner
aec8da04e4 x86/ioapic: Correct the PCI/ISA trigger type selection
PCI's default trigger type is level and ISA's is edge. The recent
refactoring made it the other way round, which went unnoticed as it seems
only to cause havoc on some AMD systems.

Make the comment and code do the right thing again.

Fixes: a27dca645d ("x86/io_apic: Cleanup trigger/polarity helpers")
Reported-by: Tom Lendacky <thomas.lendacky@amd.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Reported-by: Qian Cai <cai@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/87d00lgu13.fsf@nanos.tec.linutronix.de
2020-11-10 18:43:22 +01:00
Peter Zijlstra
76a4efa809 perf/arch: Remove perf_sample_data::regs_user_copy
struct perf_sample_data lives on-stack, we should be careful about it's
size. Furthermore, the pt_regs copy in there is only because x86_64 is a
trainwreck, solve it differently.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20201030151955.258178461@infradead.org
2020-11-09 18:12:34 +01:00
Linus Torvalds
40be821d62 A set of x86 fixes:
- Use SYM_FUNC_START_WEAK in the mem* ASM functions instead of a
    combination of .weak and SYM_FUNC_START_LOCAL which makes LLVMs
    integrated assembler upset.
 
  - Correct the mitigation selection logic which prevented the related prctl
    to work correctly.
 
  - Make the UV5 hubless system work correctly by fixing up the malformed
    table entries and adding the missing ones.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+oDNYTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaN0EACPWY15k1YuAEIjiQxRBhq22J8Y6wNX
 Ui/rF2AZcAnNEJDTIyvjP6COnT9mjX/tuuluMaI6i/XY/9Xp5LpKvivkL2PXNN3X
 onW01ouIc1iYxXwQEVZvhYHsOyhkR9Z8yNG/q9I7xYAXNSZcAHwXVar4VlPBT7Ay
 iP75i8pGmb/NCc4oHNXuBp/dV/0/dCoLTndb5p5pX8oS60AAt9ZuK3IRc3ucayhI
 M4rTTEya1oY+ZNbtP4A4Jp7Qc/NGYDo6q04za+jcxZ5Gqacs+fk/PNuWgL1fZZtW
 sn1D+SMWEb55Xcsdy976b29FFU/DcOcf7TRASzyKgyPW5jg1dP6BZ6U0wpVV3KZw
 S2h5/pt48JZI7olrDsLQ0tzjALlk2CcFNrnRtOMDduHdw9wyz+Sg58lZYuvH3sXK
 5ZblWRJ3JiBNsNO0sA3kd4sp7xWQB3ey6mkYD8Vqb7zRIt8aXT9jqBxhDrP+Vqs/
 /UKv+BJfD6WxC0nQ4x6MS3g4sDvI+1SLfHSZ/UjWJ6NfYJW5/w429pFCaF73xCTd
 cqxja1dZYixn7ioFZjolMUdvuDiC5B2+5+RzEV87kaDzO9QZQyvsl7G74MSfwx6G
 DAydvuyJoxP2qVASobOBcVOzLQO7DsLzFZzJTttZcnkK2iprcz4qrsFLMxF9SxTD
 Amb8qck60dLfqA==
 =JdPk
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A set of x86 fixes:

   - Use SYM_FUNC_START_WEAK in the mem* ASM functions instead of a
     combination of .weak and SYM_FUNC_START_LOCAL which makes LLVMs
     integrated assembler upset

   - Correct the mitigation selection logic which prevented the related
     prctl to work correctly

   - Make the UV5 hubless system work correctly by fixing up the
     malformed table entries and adding the missing ones"

* tag 'x86-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform/uv: Recognize UV5 hubless system identifier
  x86/platform/uv: Remove spaces from OEM IDs
  x86/platform/uv: Fix missing OEM_TABLE_ID
  x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP
  x86/lib: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S
2020-11-08 10:09:36 -08:00
Mike Travis
801284f973 x86/platform/uv: Recognize UV5 hubless system identifier
Testing shows a problem in that UV5 hubless systems were not being
recognized.  Add them to the list of OEM IDs checked.

Fixes: 6c7794423a ("Add UV5 direct references")
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201105222741.157029-4-mike.travis@hpe.com
2020-11-07 11:17:39 +01:00
Mike Travis
1aee505e01 x86/platform/uv: Remove spaces from OEM IDs
Testing shows that trailing spaces caused problems with the OEM_ID and
the OEM_TABLE_ID.  One being that the OEM_ID would not string compare
correctly.  Another the OEM_ID and OEM_TABLE_ID would be concatenated
in the printout.  Remove any trailing spaces.

Fixes: 1e61f5a95f ("Add and decode Arch Type in UVsystab")
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201105222741.157029-3-mike.travis@hpe.com
2020-11-07 11:17:39 +01:00
Mike Travis
1aec69ae56 x86/platform/uv: Fix missing OEM_TABLE_ID
Testing shows a problem in that the OEM_TABLE_ID was missing for
hubless systems.  This is used to determine the APIC type (legacy or
extended).  Add the OEM_TABLE_ID to the early hubless processing.

Fixes: 1e61f5a95f ("Add and decode Arch Type in UVsystab")
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201105222741.157029-2-mike.travis@hpe.com
2020-11-07 11:17:39 +01:00
Paul E. McKenney
3fcd6a230f x86/cpu: Avoid cpuinfo-induced IPIing of idle CPUs
Currently, accessing /proc/cpuinfo sends IPIs to idle CPUs in order to
learn their clock frequency.  Which is a bit strange, given that waking
them from idle likely significantly changes their clock frequency.
This commit therefore avoids sending /proc/cpuinfo-induced IPIs to
idle CPUs.

[ paulmck: Also check for idle in arch_freq_prepare_all(). ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>
2020-11-06 16:59:11 -08:00
Paul E. McKenney
f4deaf9021 x86/cpu: Avoid cpuinfo-induced IPI pileups
The aperfmperf_snapshot_cpu() function is invoked upon access to
/proc/cpuinfo, and it does do an early exit if the specified CPU has
recently done a snapshot.  Unfortunately, the indication that a snapshot
has been completed is set in an IPI handler, and the execution of this
handler can be delayed by any number of unfortunate events.  This means
that a system that starts a number of applications, each of which
parses /proc/cpuinfo, can suffer from an smp_call_function_single()
storm, especially given that each access to /proc/cpuinfo invokes
smp_call_function_single() for all CPUs.  Please note that this is not
theoretical speculation.  Note also that one CPU's pending IPI serves
all requests, so there is no point in ever having more than one IPI
pending to a given CPU.

This commit therefore suppresses duplicate IPIs to a given CPU via a
new ->scfpending field in the aperfmperf_sample structure.  This field
is set to the value one if an IPI is pending to the corresponding CPU
and to zero otherwise.

The aperfmperf_snapshot_cpu() function uses atomic_xchg() to set this
field to the value one and sample the old value.  If this function's
"wait" parameter is zero, smp_call_function_single() is called only if
the old value of the ->scfpending field was zero.  The IPI handler uses
atomic_set_release() to set this new field to zero just before returning,
so that the prior stores into the aperfmperf_sample structure are seen
by future requests that get to the atomic_xchg().  Future requests that
pass the elapsed-time check are ordered by the fact that on x86 loads act
as acquire loads, just as was the case prior to this change.  The return
value is based off of the age of the prior snapshot, just as before.

Reported-by: Dave Jones <davej@codemonkey.org.uk>
[ paulmck: Allow /proc/cpuinfo to take advantage of arch_freq_get_on_cpu(). ]
[ paulmck: Add comment on memory barrier. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>
2020-11-06 16:58:40 -08:00
Zhen Lei
15af36596a x86/mce: Correct the detection of invalid notifier priorities
Commit

  c9c6d216ed ("x86/mce: Rename "first" function as "early"")

changed the enumeration of MCE notifier priorities. Correct the check
for notifier priorities to cover the new range.

 [ bp: Rewrite commit message, remove superfluous brackets in
   conditional. ]

Fixes: c9c6d216ed ("x86/mce: Rename "first" function as "early"")
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201106141216.2062-2-thunder.leizhen@huawei.com
2020-11-06 19:02:48 +01:00
Steven Rostedt (VMware)
773c167050 ftrace: Add recording of functions that caused recursion
This adds CONFIG_FTRACE_RECORD_RECURSION that will record to a file
"recursed_functions" all the functions that caused recursion while a
callback to the function tracer was running.

Link: https://lkml.kernel.org/r/20201106023548.102375687@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Guo Ren <guoren@kernel.org>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-csky@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: live-patching@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-06 08:42:26 -05:00
Steven Rostedt (VMware)
c536aa1c5b kprobes/ftrace: Add recursion protection to the ftrace callback
If a ftrace callback does not supply its own recursion protection and
does not set the RECURSION_SAFE flag in its ftrace_ops, then ftrace will
make a helper trampoline to do so before calling the callback instead of
just calling the callback directly.

The default for ftrace_ops is going to change. It will expect that handlers
provide their own recursion protection, unless its ftrace_ops states
otherwise.

Link: https://lkml.kernel.org/r/20201028115613.140212174@goodmis.org
Link: https://lkml.kernel.org/r/20201106023546.944907560@goodmis.org

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh  Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-csky@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-06 08:35:44 -05:00
Kaixu Xia
77080929d5 x86/mce: Assign boolean values to a bool variable
Fix the following coccinelle warnings:

  ./arch/x86/kernel/cpu/mce/core.c:1765:3-20: WARNING: Assignment of 0/1 to bool variable
  ./arch/x86/kernel/cpu/mce/core.c:1584:2-9: WARNING: Assignment of 0/1 to bool variable

 [ bp: Massage commit message. ]

Reported-by: Tosk Robot <tencent_os_robot@tencent.com>
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/1604654363-1463-1-git-send-email-kaixuxia@tencent.com
2020-11-06 11:51:04 +01:00
Chester Lin
25519d6834 ima: generalize x86/EFI arch glue for other EFI architectures
Move the x86 IMA arch code into security/integrity/ima/ima_efi.c,
so that we will be able to wire it up for arm64 in a future patch.

Co-developed-by: Chester Lin <clin@suse.com>
Signed-off-by: Chester Lin <clin@suse.com>
Acked-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-11-06 07:40:42 +01:00
Anand K Mistry
1978b3a53a x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP
On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON,
STIBP is set to on and

  spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED

At the same time, IBPB can be set to conditional.

However, this leads to the case where it's impossible to turn on IBPB
for a process because in the PR_SPEC_DISABLE case in ib_prctl_set() the

  spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED

condition leads to a return before the task flag is set. Similarly,
ib_prctl_get() will return PR_SPEC_DISABLE even though IBPB is set to
conditional.

More generally, the following cases are possible:

1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb
2. STIBP = on && IBPB = conditional for AMD CPUs with
   X86_FEATURE_AMD_STIBP_ALWAYS_ON

The first case functions correctly today, but only because
spectre_v2_user_ibpb isn't updated to reflect the IBPB mode.

At a high level, this change does one thing. If either STIBP or IBPB
is set to conditional, allow the prctl to change the task flag.
Also, reflect that capability when querying the state. This isn't
perfect since it doesn't take into account if only STIBP or IBPB is
unconditionally on. But it allows the conditional feature to work as
expected, without affecting the unconditional one.

 [ bp: Massage commit message and comment; space out statements for
   better readability. ]

Fixes: 21998a3515 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.")
Signed-off-by: Anand K Mistry <amistry@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20201105163246.v2.1.Ifd7243cd3e2c2206a893ad0a5b9a4f19549e22c6@changeid
2020-11-05 21:43:34 +01:00
Thomas Gleixner
b6be002bcd x86/entry: Move nmi entry/exit into common code
Lockdep state handling on NMI enter and exit is nothing specific to X86. It's
not any different on other architectures. Also the extra state type is not
necessary, irqentry_state_t can carry the necessary information as well.

Move it to common code and extend irqentry_state_t to carry lockdep state.

[ Ira: Make exit_rcu and lockdep a union as they are mutually exclusive
  between the IRQ and NMI exceptions, and add kernel documentation for
  struct irqentry_state_t ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201102205320.1458656-7-ira.weiny@intel.com
2020-11-04 22:55:36 +01:00
Thomas Gleixner
01be83eea0 Merge branch 'core/urgent' into core/entry
Pick up the entry fix before further modifications.
2020-11-04 18:14:52 +01:00
David Woodhouse
f36a74b934 x86/ioapic: Use I/O-APIC ID for finding irqdomain, not index
In commit b643128b91 ("x86/ioapic: Use irq_find_matching_fwspec() to
find remapping irqdomain") the I/O-APIC code was changed to find its
parent irqdomain using irq_find_matching_fwspec(), but the key used
for the lookup was wrong. It shouldn't use 'ioapic' which is the index
into its own ioapics[] array. It should use the actual arbitration
ID of the I/O-APIC in question, which is mpc_ioapic_id(ioapic).

Fixes: b643128b91 ("x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain")
Reported-by: lkp <oliver.sang@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/57adf2c305cd0c5e9d860b2f3007a7e676fd0f9f.camel@infradead.org
2020-11-04 11:11:35 +01:00
Dexuan Cui
d981059e13 x86/hyperv: Enable 15-bit APIC ID if the hypervisor supports it
When a Linux VM runs on Hyper-V, if the VM has CPUs with >255 APIC IDs,
the CPUs can't be the destination of IOAPIC interrupts, because the
IOAPIC RTE's Dest Field has only 8 bits. Currently the hackery driver
drivers/iommu/hyperv-iommu.c is used to ensure IOAPIC interrupts are
only routed to CPUs that don't have >255 APIC IDs. However, there is
an issue with kdump, because the kdump kernel can run on any CPU, and
hence IOAPIC interrupts can't work if the kdump kernel run on a CPU
with a >255 APIC ID.

The kdump issue can be fixed by the Extended Dest ID, which is introduced
recently by David Woodhouse (for IOAPIC, see the field virt_destid_8_14 in
struct IO_APIC_route_entry). Of course, the Extended Dest ID needs the
support of the underlying hypervisor. The latest Hyper-V has added the
support recently: with this commit, on such a Hyper-V host, Linux VM
does not use hyperv-iommu.c because hyperv_prepare_irq_remapping()
returns -ENODEV; instead, Linux kernel's generic support of Extended Dest
ID from David is used, meaning that Linux VM is able to support up to
32K CPUs, and IOAPIC interrupts can be routed to all the CPUs.

On an old Hyper-V host that doesn't support the Extended Dest ID, nothing
changes with this commit: Linux VM is still able to bring up the CPUs with
> 255 APIC IDs with the help of hyperv-iommu.c, but IOAPIC interrupts still
can not go to such CPUs, and the kdump kernel still can not work properly
on such CPUs.

[ tglx: Updated comment as suggested by David ]

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20201103011136.59108-1-decui@microsoft.com
2020-11-04 11:10:52 +01:00
Linus Torvalds
43c834186c A couple of changes to the SEV-ES code to perform more stringent
hypervisor checks before enabling encryption. (Joerg Roedel)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+hKN8ACgkQEsHwGGHe
 VUrkZQ/+LWjbDrbkLCQpWuzLagAocZMKKvr4+2ujU+krj0QU5FFJbfuzhkktQD+H
 cbfOW7+E8lqTDoj/dwoJPj2Xs8HvW4Ua6sbxF5lCPhlEr3NIetRfQ7SPj3qFvQG+
 FKP/55RSnjKIx7aZXKN9YAw2FF3EC1BisjszCBKid5S8HbGqjLMb2Ue0i/nssksY
 CvLwaxtDOGuSzJ8FwL+vmI70NkeLZ0ulTxbuxXAqfMTvJX3e1QA9dgeZMgfU1hng
 eA1Pjlm0X7FOsnwihYP2EZ6NzRrTkYeGl1Iagz1apqlDlQ+bcaxvs2btIyb7MKt5
 6PPDGg0P0WVMNfOEUYTZob31QcLnakA/p8kG8sYE6h2PlqO9Tf5cpmOJ6pv+DYFz
 hfcjAZfamStUbWdWpr33RVCXN5pwZRu+UytD3JYykzgwmKXQxLHqrbjHXLO3zJ7k
 +L0JE+N2vmi/7M5Ghsv3yKwy5fR5rMT5V6qEHSd1qrr9VpKBceNMJgPA8wh4882F
 SD5sD2b6L/Cf9L4FAFqICHb/p4rxPRf5VnUoybo70U7EiwfbZQik5g3X5cO4KO2N
 0z8nMk7dIZncQF0LYJNElIvKonrU8sIa+TbHjYyBWdQlOPgK4IlCvZeyjVUvUG24
 kYx2WbANhCxGFd86rsl5P7xNXvBiSALf1afbQPvU0VTbZ43vSnQ=
 =Pvgr
 -----END PGP SIGNATURE-----

Merge tag 'x86_seves_for_v5.10_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SEV-ES fixes from Borislav Petkov:
 "A couple of changes to the SEV-ES code to perform more stringent
  hypervisor checks before enabling encryption (Joerg Roedel)"

* tag 'x86_seves_for_v5.10_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev-es: Do not support MMIO to/from encrypted memory
  x86/head/64: Check SEV encryption before switching to kernel page-table
  x86/boot/compressed/64: Check SEV encryption in 64-bit boot-path
  x86/boot/compressed/64: Sanity-check CPUID results in the early #VC handler
  x86/boot/compressed/64: Introduce sev_status
2020-11-03 09:55:09 -08:00
Mauro Carvalho Chehab
4a2d2ed9ba x86/mtrr: Fix a kernel-doc markup
Kernel-doc markup should use this format:
	identifier - description

Fix it.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/2217cd4ae9e561da2825485eb97de77c65741489.1603469755.git.mchehab+huawei@kernel.org
2020-11-02 19:58:53 +01:00
Tony Luck
68299a42f8 x86/mce: Enable additional error logging on certain Intel CPUs
The Xeon versions of Sandy Bridge, Ivy Bridge and Haswell support an
optional additional error logging mode which is enabled by an MSR.

Previously, this mode was enabled from the mcelog(8) tool via /dev/cpu,
but userspace should not be poking at MSRs. So move the enabling into
the kernel.

 [ bp: Correct the explanation why this is done. ]

Suggested-by: Boris Petkov <bp@alien8.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201030190807.GA13884@agluck-desk2.amr.corp.intel.com
2020-11-02 11:15:59 +01:00
Arvind Sankar
ea3186b957 x86/build: Fix vmlinux size check on 64-bit
Commit

  b4e0409a36 ("x86: check vmlinux limits, 64-bit")

added a check that the size of the 64-bit kernel is less than
KERNEL_IMAGE_SIZE.

The check uses (_end - _text), but this is not enough. The initial
PMD used in startup_64() (level2_kernel_pgt) can only map upto
KERNEL_IMAGE_SIZE from __START_KERNEL_map, not from _text, and the
modules area (MODULES_VADDR) starts at KERNEL_IMAGE_SIZE.

The correct check is what is currently done for 32-bit, since
LOAD_OFFSET is defined appropriately for the two architectures. Just
check (_end - LOAD_OFFSET) against KERNEL_IMAGE_SIZE unconditionally.

Note that on 32-bit, the limit is not strict: KERNEL_IMAGE_SIZE is not
really used by the main kernel. The higher the kernel is located, the
less the space available for the vmalloc area. However, it is used by
KASLR in the compressed stub to limit the maximum address of the kernel
to a safe value.

Clean up various comments to clarify that despite the name,
KERNEL_IMAGE_SIZE is not a limit on the size of the kernel image, but a
limit on the maximum virtual address that the image can occupy.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201029161903.2553528-1-nivedita@alum.mit.edu
2020-10-29 21:54:35 +01:00
Joerg Roedel
2411cd8211 x86/sev-es: Do not support MMIO to/from encrypted memory
MMIO memory is usually not mapped encrypted, so there is no reason to
support emulated MMIO when it is mapped encrypted.

Prevent a possible hypervisor attack where a RAM page is mapped as
an MMIO page in the nested page-table, so that any guest access to it
will trigger a #VC exception and leak the data on that page to the
hypervisor via the GHCB (like with valid MMIO). On the read side this
attack would allow the HV to inject data into the guest.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20201028164659.27002-6-joro@8bytes.org
2020-10-29 19:27:42 +01:00
Joerg Roedel
c9f09539e1 x86/head/64: Check SEV encryption before switching to kernel page-table
When SEV is enabled, the kernel requests the C-bit position again from
the hypervisor to build its own page-table. Since the hypervisor is an
untrusted source, the C-bit position needs to be verified before the
kernel page-table is used.

Call sev_verify_cbit() before writing the CR3.

 [ bp: Massage. ]

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20201028164659.27002-5-joro@8bytes.org
2020-10-29 18:09:59 +01:00
Joerg Roedel
86ce43f7dd x86/boot/compressed/64: Check SEV encryption in 64-bit boot-path
Check whether the hypervisor reported the correct C-bit when running as
an SEV guest. Using a wrong C-bit position could be used to leak
sensitive data from the guest to the hypervisor.

The check function is in a separate file:

  arch/x86/kernel/sev_verify_cbit.S

so that it can be re-used in the running kernel image.

 [ bp: Massage. ]

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20201028164659.27002-4-joro@8bytes.org
2020-10-29 18:06:52 +01:00
Joerg Roedel
ed7b895f3e x86/boot/compressed/64: Sanity-check CPUID results in the early #VC handler
The early #VC handler which doesn't have a GHCB can only handle CPUID
exit codes. It is needed by the early boot code to handle #VC exceptions
raised in verify_cpu() and to get the position of the C-bit.

But the CPUID information comes from the hypervisor which is untrusted
and might return results which trick the guest into the no-SEV boot path
with no C-bit set in the page-tables. All data written to memory would
then be unencrypted and could leak sensitive data to the hypervisor.

Add sanity checks to the early #VC handler to make sure the hypervisor
can not pretend that SEV is disabled.

 [ bp: Massage a bit. ]

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20201028164659.27002-3-joro@8bytes.org
2020-10-29 13:48:49 +01:00
Jens Axboe
12db8b6900 entry: Add support for TIF_NOTIFY_SIGNAL
Add TIF_NOTIFY_SIGNAL handling in the generic entry code, which if set,
will return true if signal_pending() is used in a wait loop. That causes an
exit of the loop so that notify_signal tracehooks can be run. If the wait
loop is currently inside a system call, the system call is restarted once
task_work has been processed.

In preparation for only having arch_do_signal() handle syscall restarts if
_TIF_SIGPENDING isn't set, rename it to arch_do_signal_or_restart().  Pass
in a boolean that tells the architecture specific signal handler if it
should attempt to get a signal, or just process a potential syscall
restart.

For !CONFIG_GENERIC_ENTRY archs, add the TIF_NOTIFY_SIGNAL handling to
get_signal(). This is done to minimize the needed architecture changes to
support this feature.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20201026203230.386348-3-axboe@kernel.dk
2020-10-29 09:37:36 +01:00
David Woodhouse
2e008ffe42 x86/kvm: Enable 15-bit extension when KVM_FEATURE_MSI_EXT_DEST_ID detected
This allows the host to indicate that MSI emulation supports 15-bit
destination IDs, allowing up to 32768 CPUs without interrupt remapping.

cf. https://patchwork.kernel.org/patch/11816693/ for qemu

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20201024213535.443185-36-dwmw2@infradead.org
2020-10-28 20:26:33 +01:00
David Woodhouse
ab0f59c6f1 x86/apic: Support 15 bits of APIC ID in MSI where available
Some hypervisors can allow the guest to use the Extended Destination ID
field in the MSI address to address up to 32768 CPUs.

This applies to all downstream devices which generate MSI cycles,
including HPET, I/O-APIC and PCI MSI.

HPET and PCI MSI use the same __irq_msi_compose_msg() function, while
I/O-APIC generates its own and had support for the extended bits added in
a previous commit.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-33-dwmw2@infradead.org
2020-10-28 20:26:29 +01:00
David Woodhouse
51130d2188 x86/ioapic: Handle Extended Destination ID field in RTE
Bits 63-48 of the I/OAPIC Redirection Table Entry map directly to bits 19-4
of the address used in the resulting MSI cycle.

Historically, the x86 MSI format only used the top 8 of those 16 bits as
the destination APIC ID, and the "Extended Destination ID" in the lower 8
bits was unused.

With interrupt remapping, the lowest bit of the Extended Destination ID
(bit 48 of RTE, bit 4 of MSI address) is now used to indicate a remappable
format MSI.

A hypervisor can use the other 7 bits of the Extended Destination ID to
permit guests to address up to 15 bits of APIC IDs, thus allowing 32768
vCPUs before having to expose a vIOMMU and interrupt remapping to the
guest.

No behavioural change in this patch, since nothing yet permits APIC IDs
above 255 to be used with the non-IR I/OAPIC domain.

[ tglx: Converted it to the cleaned up entry/msi_msg format and added
  	commentry ]

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-32-dwmw2@infradead.org
2020-10-28 20:26:28 +01:00
David Woodhouse
b643128b91 x86/ioapic: Use irq_find_matching_fwspec() to find remapping irqdomain
All possible parent domains have a select method now. Make use of it.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201024213535.443185-29-dwmw2@infradead.org
2020-10-28 20:26:28 +01:00