1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

13511 commits

Author SHA1 Message Date
Linus Torvalds
7a69f9c60b Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Continued work to add support for 5-level paging provided by future
     Intel CPUs. In particular we switch the x86 GUP code to the generic
     implementation. (Kirill A. Shutemov)

   - Continued work to add PCID CPU support to native kernels as well.
     In this round most of the focus is on reworking/refreshing the TLB
     flush infrastructure for the upcoming PCID changes. (Andy
     Lutomirski)"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
  x86/mm: Delete a big outdated comment about TLB flushing
  x86/mm: Don't reenter flush_tlb_func_common()
  x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level paging
  x86/ftrace: Exclude functions in head64.c from function-tracing
  x86/mmap, ASLR: Do not treat unlimited-stack tasks as legacy mmap
  x86/mm: Remove reset_lazy_tlbstate()
  x86/ldt: Simplify the LDT switching logic
  x86/boot/64: Put __startup_64() into .head.text
  x86/mm: Add support for 5-level paging for KASLR
  x86/mm: Make kernel_physical_mapping_init() support 5-level paging
  x86/mm: Add sync_global_pgds() for configuration with 5-level paging
  x86/boot/64: Add support of additional page table level during early boot
  x86/boot/64: Rename init_level4_pgt and early_level4_pgt
  x86/boot/64: Rewrite startup_64() in C
  x86/boot/compressed: Enable 5-level paging during decompression stage
  x86/boot/efi: Define __KERNEL32_CS GDT on 64-bit configurations
  x86/boot/efi: Fix __KERNEL_CS definition of GDT entry on 64-bit configurations
  x86/boot/efi: Cleanup initialization of GDT entries
  x86/asm: Fix comment in return_from_SYSCALL_64()
  x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation
  ...
2017-07-03 14:45:09 -07:00
Linus Torvalds
9bc088ab66 Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode updates from Ingo Molnar:
 "The main changes are a fix early microcode application for
  resume-from-RAM, plus a 32-bit initrd placement fix - by Borislav
  Petkov"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode: Make a couple of symbols static
  x86/microcode/intel: Save pointer to ucode patch for early AP loading
  x86/microcode: Look for the initrd at the correct address on 32-bit
2017-07-03 14:35:18 -07:00
Linus Torvalds
e1449007e8 Merge branch 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 hyperv updates from Ingo Molnar:
 "Avoid boot time TSC calibration on Hyper-V hosts, to improve
  calibration robustness. (Vitaly Kuznetsov)"

* 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/hyperv: Read TSC frequency from a synthetic MSR
  x86/hyperv: Check frequency MSRs presence according to the specification
2017-07-03 14:09:32 -07:00
Linus Torvalds
e6529f6f58 Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 debug update from Ingo Molnar:
 "A single fix for an off-by one bug in test_nmi_ipi() that probably
  doesn't matter in practice"

* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/nmi: Fix timeout test in test_nmi_ipi()
2017-07-03 14:07:45 -07:00
Linus Torvalds
25e09ca524 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
 "The main changes in this cycle were KASLR improvements for rare
  environments with special boot options, by Baoquan He. Also misc
  smaller changes/cleanups"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/debug: Extend the lower bound of crash kernel low reservations
  x86/boot: Remove unused copy_*_gs() functions
  x86/KASLR: Use the right memcpy() implementation
  Documentation/kernel-parameters.txt: Update 'memmap=' boot option description
  x86/KASLR: Handle the memory limit specified by the 'memmap=' and 'mem=' boot options
  x86/KASLR: Parse all 'memmap=' boot option entries
2017-07-03 13:40:38 -07:00
Linus Torvalds
2a275382a4 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic updates from Ingo Molnar:
 "Janitorial changes: removal of an unused function plus __init
  annotations"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Make arch_init_msi/htirq_domain __init
  x86/apic: Make init_legacy_irqs() __init
  x86/ioapic: Remove unused IO_APIC_irq_trigger() function
2017-07-03 13:36:23 -07:00
Linus Torvalds
9bd42183b9 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Add the SYSTEM_SCHEDULING bootup state to move various scheduler
     debug checks earlier into the bootup. This turns silent and
     sporadically deadly bugs into nice, deterministic splats. Fix some
     of the splats that triggered. (Thomas Gleixner)

   - A round of restructuring and refactoring of the load-balancing and
     topology code (Peter Zijlstra)

   - Another round of consolidating ~20 of incremental scheduler code
     history: this time in terms of wait-queue nomenclature. (I didn't
     get much feedback on these renaming patches, and we can still
     easily change any names I might have misplaced, so if anyone hates
     a new name, please holler and I'll fix it.) (Ingo Molnar)

   - sched/numa improvements, fixes and updates (Rik van Riel)

   - Another round of x86/tsc scheduler clock code improvements, in hope
     of making it more robust (Peter Zijlstra)

   - Improve NOHZ behavior (Frederic Weisbecker)

   - Deadline scheduler improvements and fixes (Luca Abeni, Daniel
     Bristot de Oliveira)

   - Simplify and optimize the topology setup code (Lauro Ramos
     Venancio)

   - Debloat and decouple scheduler code some more (Nicolas Pitre)

   - Simplify code by making better use of llist primitives (Byungchul
     Park)

   - ... plus other fixes and improvements"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits)
  sched/cputime: Refactor the cputime_adjust() code
  sched/debug: Expose the number of RT/DL tasks that can migrate
  sched/numa: Hide numa_wake_affine() from UP build
  sched/fair: Remove effective_load()
  sched/numa: Implement NUMA node level wake_affine()
  sched/fair: Simplify wake_affine() for the single socket case
  sched/numa: Override part of migrate_degrades_locality() when idle balancing
  sched/rt: Move RT related code from sched/core.c to sched/rt.c
  sched/deadline: Move DL related code from sched/core.c to sched/deadline.c
  sched/cpuset: Only offer CONFIG_CPUSETS if SMP is enabled
  sched/fair: Spare idle load balancing on nohz_full CPUs
  nohz: Move idle balancer registration to the idle path
  sched/loadavg: Generalize "_idle" naming to "_nohz"
  sched/core: Drop the unused try_get_task_struct() helper function
  sched/fair: WARN() and refuse to set buddy when !se->on_rq
  sched/debug: Fix SCHED_WARN_ON() to return a value on !CONFIG_SCHED_DEBUG as well
  sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming
  sched/wait: Move bit_wait_table[] and related functionality from sched/core.c to sched/wait_bit.c
  sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h>
  sched/wait: Re-adjust macro line continuation backslashes in <linux/wait.h>
  ...
2017-07-03 13:08:04 -07:00
Linus Torvalds
e94693f797 Merge branch 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
 "This is an extensive rewrite of the objdump tool to track all stack
  pointer modifications through the machine instructions of disassembled
  functions found in kernel .o files.

  This re-design removes the prior dependency on CONFIG_FRAME_POINTERS,
  with the goal to prepare the tool to generate kernel debuginfo data in
  the future. There's also an increase in checking/tracking robustness
  as a side effect as well.

  No (intended) changes to existing functionality"

* 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Silence warnings for functions which use IRET
  objtool: Implement stack validation 2.0
  objtool, x86: Add several functions and files to the objtool whitelist
  objtool: Move checking code to check.c
2017-07-03 11:12:04 -07:00
Rafael J. Wysocki
f1c7842e5f Merge branches 'pm-cpufreq', 'intel_pstate' and 'pm-cpuidle'
* pm-cpufreq:
  cpufreq / CPPC: Initialize policy->min to lowest nonlinear performance
  cpufreq: sfi: make freq_table static
  cpufreq: exynos5440: Fix inconsistent indenting
  cpufreq: imx6q: imx6ull should use the same flow as imx6ul
  cpufreq: dt: Add support for hi3660

* intel_pstate:
  cpufreq: Update scaling_cur_freq documentation
  cpufreq: intel_pstate: Clean up after performance governor changes
  intel_pstate: skip scheduler hook when in "performance" mode
  intel_pstate: delete scheduler hook in HWP mode
  x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF
  cpufreq: intel_pstate: Remove max/min fractions to limit performance
  x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"

* pm-cpuidle:
  cpuidle: menu: allow state 0 to be disabled
  intel_idle: Use more common logging style
  x86/ACPI/cstate: Allow ACPI C1 FFH MWAIT use on AMD systems
  ARM: cpuidle: Support asymmetric idle definition
2017-07-03 14:21:18 +02:00
Linus Torvalds
e18aca0236 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "Fixlets for x86:

   - Prevent kexec crash when KASLR is enabled, which was caused by an
     address calculation bug

   - Restore the freeing of PUDs on memory hot remove

   - Correct a negated pointer check in the intel uncore performance
     monitoring driver

   - Plug a memory leak in an error exit path in the RDT code"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel_rdt: Fix memory leak on mount failure
  x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug
  x86/boot/KASLR: Add checking for the offset of kernel virtual address randomization
  perf/x86/intel/uncore: Fix wrong box pointer check
  x86/mm/hotplug: Fix BUG_ON() after hot-remove by not freeing PUD
2017-07-01 09:10:17 -07:00
Vikas Shivappa
79298acc4b x86/intel_rdt: Fix memory leak on mount failure
If mount fails, the kn_info directory is not freed causing memory leak.

Add the missing error handling path.

Fixes: 4e978d06de ("x86/intel_rdt: Add "info" files to resctrl file system")
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: peterz@infradead.org
Cc: vikas.shivappa@intel.com
Cc: andi.kleen@intel.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1498503368-20173-3-git-send-email-vikas.shivappa@linux.intel.com
2017-06-30 21:20:00 +02:00
Linus Torvalds
27ab862a3a IOMMU Fixes for Linux 4.12-rc7
Two fixes:
 
 		* A fix for AMD IOMMU interrupt remapping code when
 		  IRQs are forwarded directly to KVM guests
 
 		* Fixed check in the recently merged code to allow
 		  tboot with Intel VT-d disabled
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJZVoIAAAoJECvwRC2XARrjpYoP/0JGpnrtGsw2GnXbe4jCXwWg
 hWpJze0QyHi7oJrNHRK1W72rwQxp53NUXS942bosXj18U+KU3x3UnLQfU5u1RfEi
 Z44riGpurZ2+Ied+X01eS5ALeno9/jj7DOnSDsrZS2vI56aJTGISNO/8xvE5M5Mj
 xwvOQW+K1Wgqac2Sdi+ckt8tUhik1MKbkL8k04/27Yzq6FxvkEyiHXDQcXA+eGId
 ewW034Z2ueriZ6fzuanQW3j57albeIN6XsTaOHbwD2edQwiyZ6yoMakKgMuHnpgx
 F4BtPtcbgSHSQ48ZZb9bdQpvEO3lY0jmSlMI4S2Fu5DmSIKC/KOy/4yYUhlQtrHT
 UUbIvq/pC+SMPxZiZAJhyIFcV6YkTelArPFx+QxsWvMMiXeGnezgeFsAOzwZuptF
 FFm9ItgfPkGxkiECFwJwSAHsTiiFocRfYHHv/ace/6X4ZB+nZrl3mSfX7+EtT2LG
 Kje2XtUzoGR/8LSBTMlQQeurhBZwbnoaFEtiVMrLODhvFT3IU7B00wgQHWNpjyRj
 Rqe/ScHRdRF1NQtW6QDTpNU4rZGB4lt1WxpMlONVVpO4LI/fXZq8Nq4lT+FzUHV1
 xNQEh9xV2DK7bjT3HDd/OKSusaLzImNFYmi6+t/gROX8z/PqISb5IOmjaiO9pYYM
 coSJHmYL1hc0yFsNp6eB
 =bw0V
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-v4.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:
 "Two fixes:

   - A fix for AMD IOMMU interrupt remapping code when IRQs are
     forwarded directly to KVM guests

   - Fixed check in the recently merged code to allow tboot with
     Intel VT-d disabled"

* tag 'iommu-fixes-v4.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: Fix interrupt remapping when disable guest_mode
  iommu/vt-d: Correctly disable Intel IOMMU force on
2017-06-30 10:37:48 -07:00
Josh Poimboeuf
c207aee480 objtool, x86: Add several functions and files to the objtool whitelist
In preparation for an objtool rewrite which will have broader checks,
whitelist functions and files which cause problems because they do
unusual things with the stack.

These whitelists serve as a TODO list for which functions and files
don't yet have undwarf unwinder coverage.  Eventually most of the
whitelists can be removed in favor of manual CFI hint annotations or
objtool improvements.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/7f934a5d707a574bda33ea282e9478e627fb1829.1498659915.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30 10:19:19 +02:00
Kirill A. Shutemov
bb43dbc5e0 x86/ftrace: Exclude functions in head64.c from function-tracing
A recent commit moved most logic of early boot up from startup_64() written
in assembly to __startup_64() written in C.

Fengguang reported breakage due to the change. It was tracked down to
CONFIG_FUNCTION_TRACER being enabled.

Tracing this function is not possible because it's invoked from the
earliest boot stage before the relocation fixups have been done. It is the
function doing the relocation.

Exclude it from being built with tracer stubs.

Fixes: c88d71508e ("x86/boot/64: Rewrite startup_64() in C")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: lkp@01.org
Link: http://lkml.kernel.org/r/20170627115948.17938-1-kirill.shutemov@linux.intel.com
2017-06-29 22:33:27 +02:00
Tobias Klauser
6474924e2b arch: remove unused macro/function thread_saved_pc()
The only user of thread_saved_pc() in non-arch-specific code was removed
in commit 8243d55977 ("sched/core: Remove pointless printout in
sched_show_task()").  Remove the implementations as well.

Some architectures use thread_saved_pc() in their arch-specific code.
Leave their thread_saved_pc() intact.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-28 16:13:57 -07:00
Christoph Hellwig
5860acc1a9 x86: remove arch specific dma_supported implementation
And instead wire it up as method for all the dma_map_ops instances.

Note that this also means the arch specific check will be fully instead
of partially applied in the AMD iommu driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-28 06:54:46 -07:00
Christoph Hellwig
8bd17c6670 x86/calgary: implement ->mapping_error
DMA_ERROR_CODE is going to go away, so don't rely on it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-28 06:54:35 -07:00
Christoph Hellwig
14a9aad7f0 x86/pci-nommu: implement ->mapping_error
DMA_ERROR_CODE is going to go away, so don't rely on it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-28 06:54:34 -07:00
Yazen Ghannam
5209654a46 x86/ACPI/cstate: Allow ACPI C1 FFH MWAIT use on AMD systems
AMD systems support the Monitor/Mwait instructions and these can be used
for ACPI C1 in the same way as on Intel systems.

Three things are needed:
 1) This patch.
 2) BIOS that declares a C1 state in _CST to use FFH, with correct values.
 3) CPUID_Fn00000005_EDX is non-zero on the system.

The BIOS on AMD systems have historically not defined a C1 state in _CST,
so the acpi_idle driver uses HALT for ACPI C1.

Currently released systems have CPUID_Fn00000005_EDX as reserved/RAZ. If a
BIOS is released for these systems that requests a C1 state with FFH, the
FFH implementation in Linux will fail since CPUID_Fn00000005_EDX is 0. The
acpi_idle driver will then fallback to using HALT for ACPI C1.

Future systems are expected to have non-zero CPUID_Fn00000005_EDX and BIOS
support for using FFH for ACPI C1.

Allow ffh_cstate_init() to succeed on AMD systems.

Tested on Fam15h and Fam17h systems.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-06-27 02:00:52 +02:00
Len Brown
f8475cef90 x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF
The goal of this change is to give users a uniform and meaningful
result when they read /sys/...cpufreq/scaling_cur_freq
on modern x86 hardware, as compared to what they get today.

Modern x86 processors include the hardware needed
to accurately calculate frequency over an interval --
APERF, MPERF, and the TSC.

Here we provide an x86 routine to make this calculation
on supported hardware, and use it in preference to any
driver driver-specific cpufreq_driver.get() routine.

MHz is computed like so:

MHz = base_MHz * delta_APERF / delta_MPERF

MHz is the average frequency of the busy processor
over a measurement interval.  The interval is
defined to be the time between successive invocations
of aperfmperf_khz_on_cpu(), which are expected to to
happen on-demand when users read sysfs attribute
cpufreq/scaling_cur_freq.

As with previous methods of calculating MHz,
idle time is excluded.

base_MHz above is from TSC calibration global "cpu_khz".

This x86 native method to calculate MHz returns a meaningful result
no matter if P-states are controlled by hardware or firmware
and/or if the Linux cpufreq sub-system is or is-not installed.

When this routine is invoked more frequently, the measurement
interval becomes shorter.  However, the code limits re-computation
to 10ms intervals so that average frequency remains meaningful.

Discerning users are encouraged to take advantage of
the turbostat(8) utility, which can gracefully handle
concurrent measurement intervals of arbitrary length.

Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-06-27 01:47:32 +02:00
Yazen Ghannam
e2de64ec52 x86/mce: Always save severity in machine_check_poll()
The MCE severity gives a hint as to how to handle the error. The
notifier blocks can then use the severity to decide on an action.
It's not necessary for machine_check_poll() to filter errors for
the notifier chain, since each block will check its own set of
conditions before handling an error.

Also, there isn't any urgency for machine_check_poll() to make decisions
based on severity like in do_machine_check().

If we can assume that a severity is set then we can use it in more
notifier blocks. For example, the CEC block could check for a "KEEP"
severity rather than checking bits in the status. This isn't possible
now since the severity is not set except for "DEFFRRED/UCNA" errors with
a valid address.

Save the severity since we have it, and let the notifier blocks decide
if they want to do anything.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1498074402-98633-1-git-send-email-Yazen.Ghannam@amd.com
2017-06-26 15:58:56 +02:00
Colin Ian King
d7f7dc7b88 x86/microcode: Make a couple of symbols static
The helper function __load_ucode_amd() and pointer intel_ucode_patch do
not need to be in global scope, so make them static.

Fixes those sparse warnings:
"symbol '__load_ucode_amd' was not declared. Should it be static?"
"symbol 'intel_ucode_patch' was not declared. Should it be static?"

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170622095736.11937-1-colin.king@canonical.com
2017-06-26 15:57:37 +02:00
Len Brown
51204e0639 x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"
cpufreq_quick_get() allows cpufreq drivers to over-ride cpu_khz
that is otherwise reported in x86 /proc/cpuinfo "cpu MHz".

There are four problems with this scheme,
any of them is sufficient justification to delete it.

 1. Depending on which cpufreq driver is loaded, the behavior
    of this field is different.

 2. Distros complain that they have to explain to users
    why and how this field changes.  Distros have requested a constant.

 3. The two major providers of this information, acpi_cpufreq
    and intel_pstate, both "get it wrong" in different ways.

    acpi_cpufreq lies to the user by telling them that
    they are running at whatever frequency was last
    requested by software.

    intel_pstate lies to the user by telling them that
    they are running at the average frequency computed
    over an undefined measurement.  But an average computed
    over an undefined interval, is itself, undefined...

 4. On modern processors, user space utilities, such as
    turbostat(1), are more accurate and more precise, while
    supporing concurrent measurement over arbitrary intervals.

Users who have been consulting /proc/cpuinfo to
track changing CPU frequency will be dissapointed that
it no longer wiggles -- perhaps being unaware of the
limitations of the information they have been consuming.

Yes, they can change their scripts to look in sysfs
cpufreq/scaling_cur_frequency.  Here they will find the same
data of dubious quality here removed from /proc/cpuinfo.
The value in sysfs will be addressed in a subsequent patch
to address issues 1-3, above.

Issue 4 will remain -- users that really care about
accurate frequency information should not be using either
proc or sysfs kernel interfaces.
They should be using using turbostat(8), or a similar
purpose-built analysis tool.

Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-06-24 01:45:47 +02:00
Thomas Gleixner
3ca57222c3 x86/apic: Mark single target interrupts
If the interrupt destination mode of the APIC is physical then the
effective affinity is restricted to a single CPU.

Mark the interrupt accordingly in the domain allocation code, so the core
code can avoid pointless affinity setting attempts.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235447.508846202@linutronix.de
2017-06-22 18:21:26 +02:00
Thomas Gleixner
c7d6c9dd87 x86/apic: Implement effective irq mask update
Add the effective irq mask update to the apic implementations and enable
effective irq masks for x86.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235446.878370703@linutronix.de
2017-06-22 18:21:23 +02:00
Thomas Gleixner
0e24f7c9f6 x86/apic: Add irq_data argument to apic->cpu_mask_to_apicid()
The decision to which CPUs an interrupt is effectively routed happens in
the various apic->cpu_mask_to_apicid() implementations

To support effective affinity masks this information needs to be updated in
irq_data. Add a pointer to irq_data to the callbacks and feed it through
the call chain.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235446.720739075@linutronix.de
2017-06-22 18:21:22 +02:00
Thomas Gleixner
91cd9cb7ee x86/apic: Move cpumask and to core code
All implementations of apic->cpu_mask_to_apicid_and() and the two incoming
cpumasks to search for the target.

Move that operation to the call site and rename it to cpu_mask_to_apicid()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235446.641575516@linutronix.de
2017-06-22 18:21:22 +02:00
Thomas Gleixner
52b166af40 x86/apic: Move online masking to core code
All implementations of apic->cpu_mask_to_apicid_and() mask out the offline
cpus. The callsite already has a mask available, which has the offline CPUs
removed. Use that and remove the extra bits.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235446.560868224@linutronix.de
2017-06-22 18:21:21 +02:00
Thomas Gleixner
bbcf9574bc x86/uv: Use default_cpu_mask_to_apicid_and()
Same functionality except the extra bits ored on the apicid.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235446.482841015@linutronix.de
2017-06-22 18:21:21 +02:00
Thomas Gleixner
ad95212ee6 x86/apic: Move flat_cpu_mask_to_apicid_and() into C source
No point in having inlines assigned to function pointers at multiple
places. Just bloats the text.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235446.405975721@linutronix.de
2017-06-22 18:21:21 +02:00
Thomas Gleixner
ad7a929fa4 x86/irq: Use irq_migrate_all_off_this_cpu()
The generic migration code supports all the required features
already. Remove the x86 specific implementation and use the generic one.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235445.851311033@linutronix.de
2017-06-22 18:21:18 +02:00
Thomas Gleixner
654abd0a7b x86/irq: Restructure fixup_irqs()
Reorder fixup_irqs() so it matches the flow in the generic migration code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235445.774272454@linutronix.de
2017-06-22 18:21:18 +02:00
Thomas Gleixner
8e7b632237 x86/irq: Cleanup pending irq move in fixup_irqs()
If an CPU goes offline, the interrupts are migrated away, but a eventually
pending interrupt move, which has not yet been made effective is kept
pending even if the outgoing CPU is the sole target of the pending affinity
mask. What's worse is, that the pending affinity mask is discarded even if
it would contain a valid subset of the online CPUs.

Use the newly introduced helper to:

 - Discard a pending move when the outgoing CPU is the only target in the
   pending mask.

 - Use the pending mask instead of the affinity mask to find a valid target
   for the CPU if the pending mask intersects with the online CPUs.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235444.774068557@linutronix.de
2017-06-22 18:21:13 +02:00
Thomas Gleixner
f8f37ca789 x86/msi: Create named irq domains
Use the fwnode to create named irq domains so diagnosis works.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235444.299024560@linutronix.de
2017-06-22 18:21:11 +02:00
Thomas Gleixner
0323b96904 x86/msi: Remove unused remap irq domain interface
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235444.221049665@linutronix.de
2017-06-22 18:21:11 +02:00
Thomas Gleixner
667724c5a3 x86/msi: Provide new iommu irqdomain interface
Provide a new interface for creating the iommu remapping domains, so that
the caller can supply a name and a id in order to create named irqdomains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: iommu@lists.linux-foundation.org
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235443.986661206@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-22 18:21:10 +02:00
Thomas Gleixner
5f432711ba x86/htirq: Create named domain
Use the fwnode to create a named domain so diagnosis works.

Mark the init function __init while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235443.829047007@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-22 18:21:09 +02:00
Thomas Gleixner
1b604745c8 x86/ioapic: Create named irq domain
Use the fwnode to create a named domain so diagnosis works, but only when
the the ioapic is not device tree based.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235443.752782603@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-22 18:21:09 +02:00
Thomas Gleixner
9d35f85959 x86/vector: Create named irq domain
Use the fwnode to create a named domain so diagnosis works.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235443.673635238@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-22 18:21:08 +02:00
Thomas Gleixner
8947dfb257 x86/apic: Add name to irq chip
Add the missing name, so debugging will work proper.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235443.266561988@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-22 18:21:06 +02:00
Zhenzhong Duan
a1272dd553 x86/tsc: Call check_system_tsc_reliable() before unsynchronized_tsc()
tsc_clocksource_reliable is initialized in check_system_tsc_reliable(), but
it is checked in unsynchronized_tsc() which is called before the
initialization.

In practice that's not an issue because systems which mark the TSC
reliable have X86_FEATURE_CONSTANT_TSC set as well, which is evaluated
in unsynchronized_tsc() before tsc_clocksource_reliable.

Reorder the calls so initialization happens before usage.

[ tglx: Massaged changelog ]

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/b1532ef7-cd9f-45f7-9f49-48dd2a5c2495@default
2017-06-22 16:00:03 +02:00
Vitaly Kuznetsov
71c2a2d0a8 x86/hyperv: Read TSC frequency from a synthetic MSR
It was found that SMI_TRESHOLD of 50000 is not enough for Hyper-V
guests in nested environment and falling back to counting jiffies
is not an option for Gen2 guests as they don't have PIT. As Hyper-V
provides TSC frequency in a synthetic MSR we can just use this information
instead of doing a error prone calibration.

Reported-and-tested-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jork Loeser <jloeser@microsoft.com>
Cc: devel@linuxdriverproject.org
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Link: http://lkml.kernel.org/r/20170622100730.18112-3-vkuznets@redhat.com
2017-06-22 15:35:12 +02:00
Vitaly Kuznetsov
2cf0284223 x86/hyperv: Check frequency MSRs presence according to the specification
Hyper-V TLFS specifies two bits which should be checked before accessing
frequency MSRs:

- AccessFrequencyMsrs (BIT(11) in EAX) which indicates if we have access to
  frequency MSRs.
- FrequencyMsrsAvailable (BIT(8) in EDX) which indicates is these MSRs are
  present.
  
Rename and specify these bits accordingly.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Ladi Prosek <lprosek@redhat.com>
Cc: Jork Loeser <jloeser@microsoft.com>
Cc: devel@linuxdriverproject.org
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Link: http://lkml.kernel.org/r/20170622100730.18112-2-vkuznets@redhat.com
2017-06-22 15:35:11 +02:00
Jiri Bohac
fe2d48b805 x86/debug: Extend the lower bound of crash kernel low reservations
The following change in 2013:

  0212f91596 ("x86: Add Crash kernel low reservation")

... introduced reserve_crashkernel_low(). This function is used to
reserve crash kernel memory either if crashkernel=size,low is given
on the command line or if the region reserved by reserve_crashkernel
is entirely above 4G.

reserve_crashkernel_low() tries to find a block of 'low_size' bytes.
But there seems to be no good reason to restrict the lower bound
of the range to 'low_size'.

Make memblock_find_in_range() search from the start of memory.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20170616161602.2r7birrf2y3ylv6v@dwarf.suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22 11:10:23 +02:00
Andy Lutomirski
d54368127a x86/mm: Remove reset_lazy_tlbstate()
The only call site also calls idle_task_exit(), and idle_task_exit()
puts us into a clean state by explicitly switching to init_mm.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/3acc7ad02a2ec060d2321a1e0f6de1cb90069517.1498022414.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22 10:57:50 +02:00
Ingo Molnar
a4eb8b9935 Merge branch 'linus' into x86/mm, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22 10:57:28 +02:00
Dou Liyang
538ac46c64 x86/apic: Make arch_init_msi/htirq_domain __init
These two functions are only called by arch_early_irq_init(), which
is an __init function, so mark them __init as well.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1498101341-10182-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22 10:34:42 +02:00
Dou Liyang
a884d25f38 x86/apic: Make init_legacy_irqs() __init
This function is only called by arch_early_irq_init(), which is an
__init function, so mark the child function __init as well.

In addition mark it inline for the !CONFIG_X86_IO_APIC case.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1498040061-5332-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22 10:34:41 +02:00
Juergen Gross
b867059018 x86/MCE, xen/mcelog: Make /dev/mcelog registration messages more precise
When running under Xen as dom0, /dev/mcelog is being provided by Xen
instead of the normal mcelog character device of the MCE core. Convert
an error message being issued by the MCE core in this case to an
informative message that Xen has registered the device.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170614084059.19294-1-jgross@suse.com
2017-06-20 23:25:19 +02:00
Kirill A. Shutemov
26179670a6 x86/boot/64: Put __startup_64() into .head.text
Put __startup_64() and fixup_pointer() into .head.text section to make
sure it's always near startup_64() and always callable.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel test robot <fengguang.wu@intel.com>
Cc: wfg@linux.intel.com
Link: http://lkml.kernel.org/r/20170616113024.ajmif63cmcszry5a@black.fi.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-20 12:56:27 +02:00