Pull x86 mm updates from Ingo Molnar:
"The main changes in this cycle were:
- Continued work to add support for 5-level paging provided by future
Intel CPUs. In particular we switch the x86 GUP code to the generic
implementation. (Kirill A. Shutemov)
- Continued work to add PCID CPU support to native kernels as well.
In this round most of the focus is on reworking/refreshing the TLB
flush infrastructure for the upcoming PCID changes. (Andy
Lutomirski)"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
x86/mm: Delete a big outdated comment about TLB flushing
x86/mm: Don't reenter flush_tlb_func_common()
x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level paging
x86/ftrace: Exclude functions in head64.c from function-tracing
x86/mmap, ASLR: Do not treat unlimited-stack tasks as legacy mmap
x86/mm: Remove reset_lazy_tlbstate()
x86/ldt: Simplify the LDT switching logic
x86/boot/64: Put __startup_64() into .head.text
x86/mm: Add support for 5-level paging for KASLR
x86/mm: Make kernel_physical_mapping_init() support 5-level paging
x86/mm: Add sync_global_pgds() for configuration with 5-level paging
x86/boot/64: Add support of additional page table level during early boot
x86/boot/64: Rename init_level4_pgt and early_level4_pgt
x86/boot/64: Rewrite startup_64() in C
x86/boot/compressed: Enable 5-level paging during decompression stage
x86/boot/efi: Define __KERNEL32_CS GDT on 64-bit configurations
x86/boot/efi: Fix __KERNEL_CS definition of GDT entry on 64-bit configurations
x86/boot/efi: Cleanup initialization of GDT entries
x86/asm: Fix comment in return_from_SYSCALL_64()
x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation
...
Pull x86 microcode updates from Ingo Molnar:
"The main changes are a fix early microcode application for
resume-from-RAM, plus a 32-bit initrd placement fix - by Borislav
Petkov"
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode: Make a couple of symbols static
x86/microcode/intel: Save pointer to ucode patch for early AP loading
x86/microcode: Look for the initrd at the correct address on 32-bit
Pull x86 hyperv updates from Ingo Molnar:
"Avoid boot time TSC calibration on Hyper-V hosts, to improve
calibration robustness. (Vitaly Kuznetsov)"
* 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hyperv: Read TSC frequency from a synthetic MSR
x86/hyperv: Check frequency MSRs presence according to the specification
Pull x86 debug update from Ingo Molnar:
"A single fix for an off-by one bug in test_nmi_ipi() that probably
doesn't matter in practice"
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/nmi: Fix timeout test in test_nmi_ipi()
Pull x86 boot updates from Ingo Molnar:
"The main changes in this cycle were KASLR improvements for rare
environments with special boot options, by Baoquan He. Also misc
smaller changes/cleanups"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/debug: Extend the lower bound of crash kernel low reservations
x86/boot: Remove unused copy_*_gs() functions
x86/KASLR: Use the right memcpy() implementation
Documentation/kernel-parameters.txt: Update 'memmap=' boot option description
x86/KASLR: Handle the memory limit specified by the 'memmap=' and 'mem=' boot options
x86/KASLR: Parse all 'memmap=' boot option entries
Pull x86 apic updates from Ingo Molnar:
"Janitorial changes: removal of an unused function plus __init
annotations"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Make arch_init_msi/htirq_domain __init
x86/apic: Make init_legacy_irqs() __init
x86/ioapic: Remove unused IO_APIC_irq_trigger() function
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle were:
- Add the SYSTEM_SCHEDULING bootup state to move various scheduler
debug checks earlier into the bootup. This turns silent and
sporadically deadly bugs into nice, deterministic splats. Fix some
of the splats that triggered. (Thomas Gleixner)
- A round of restructuring and refactoring of the load-balancing and
topology code (Peter Zijlstra)
- Another round of consolidating ~20 of incremental scheduler code
history: this time in terms of wait-queue nomenclature. (I didn't
get much feedback on these renaming patches, and we can still
easily change any names I might have misplaced, so if anyone hates
a new name, please holler and I'll fix it.) (Ingo Molnar)
- sched/numa improvements, fixes and updates (Rik van Riel)
- Another round of x86/tsc scheduler clock code improvements, in hope
of making it more robust (Peter Zijlstra)
- Improve NOHZ behavior (Frederic Weisbecker)
- Deadline scheduler improvements and fixes (Luca Abeni, Daniel
Bristot de Oliveira)
- Simplify and optimize the topology setup code (Lauro Ramos
Venancio)
- Debloat and decouple scheduler code some more (Nicolas Pitre)
- Simplify code by making better use of llist primitives (Byungchul
Park)
- ... plus other fixes and improvements"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits)
sched/cputime: Refactor the cputime_adjust() code
sched/debug: Expose the number of RT/DL tasks that can migrate
sched/numa: Hide numa_wake_affine() from UP build
sched/fair: Remove effective_load()
sched/numa: Implement NUMA node level wake_affine()
sched/fair: Simplify wake_affine() for the single socket case
sched/numa: Override part of migrate_degrades_locality() when idle balancing
sched/rt: Move RT related code from sched/core.c to sched/rt.c
sched/deadline: Move DL related code from sched/core.c to sched/deadline.c
sched/cpuset: Only offer CONFIG_CPUSETS if SMP is enabled
sched/fair: Spare idle load balancing on nohz_full CPUs
nohz: Move idle balancer registration to the idle path
sched/loadavg: Generalize "_idle" naming to "_nohz"
sched/core: Drop the unused try_get_task_struct() helper function
sched/fair: WARN() and refuse to set buddy when !se->on_rq
sched/debug: Fix SCHED_WARN_ON() to return a value on !CONFIG_SCHED_DEBUG as well
sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming
sched/wait: Move bit_wait_table[] and related functionality from sched/core.c to sched/wait_bit.c
sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h>
sched/wait: Re-adjust macro line continuation backslashes in <linux/wait.h>
...
Pull objtool updates from Ingo Molnar:
"This is an extensive rewrite of the objdump tool to track all stack
pointer modifications through the machine instructions of disassembled
functions found in kernel .o files.
This re-design removes the prior dependency on CONFIG_FRAME_POINTERS,
with the goal to prepare the tool to generate kernel debuginfo data in
the future. There's also an increase in checking/tracking robustness
as a side effect as well.
No (intended) changes to existing functionality"
* 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Silence warnings for functions which use IRET
objtool: Implement stack validation 2.0
objtool, x86: Add several functions and files to the objtool whitelist
objtool: Move checking code to check.c
* pm-cpufreq:
cpufreq / CPPC: Initialize policy->min to lowest nonlinear performance
cpufreq: sfi: make freq_table static
cpufreq: exynos5440: Fix inconsistent indenting
cpufreq: imx6q: imx6ull should use the same flow as imx6ul
cpufreq: dt: Add support for hi3660
* intel_pstate:
cpufreq: Update scaling_cur_freq documentation
cpufreq: intel_pstate: Clean up after performance governor changes
intel_pstate: skip scheduler hook when in "performance" mode
intel_pstate: delete scheduler hook in HWP mode
x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF
cpufreq: intel_pstate: Remove max/min fractions to limit performance
x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"
* pm-cpuidle:
cpuidle: menu: allow state 0 to be disabled
intel_idle: Use more common logging style
x86/ACPI/cstate: Allow ACPI C1 FFH MWAIT use on AMD systems
ARM: cpuidle: Support asymmetric idle definition
Pull x86 fixes from Thomas Gleixner:
"Fixlets for x86:
- Prevent kexec crash when KASLR is enabled, which was caused by an
address calculation bug
- Restore the freeing of PUDs on memory hot remove
- Correct a negated pointer check in the intel uncore performance
monitoring driver
- Plug a memory leak in an error exit path in the RDT code"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/intel_rdt: Fix memory leak on mount failure
x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug
x86/boot/KASLR: Add checking for the offset of kernel virtual address randomization
perf/x86/intel/uncore: Fix wrong box pointer check
x86/mm/hotplug: Fix BUG_ON() after hot-remove by not freeing PUD
Two fixes:
* A fix for AMD IOMMU interrupt remapping code when
IRQs are forwarded directly to KVM guests
* Fixed check in the recently merged code to allow
tboot with Intel VT-d disabled
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJZVoIAAAoJECvwRC2XARrjpYoP/0JGpnrtGsw2GnXbe4jCXwWg
hWpJze0QyHi7oJrNHRK1W72rwQxp53NUXS942bosXj18U+KU3x3UnLQfU5u1RfEi
Z44riGpurZ2+Ied+X01eS5ALeno9/jj7DOnSDsrZS2vI56aJTGISNO/8xvE5M5Mj
xwvOQW+K1Wgqac2Sdi+ckt8tUhik1MKbkL8k04/27Yzq6FxvkEyiHXDQcXA+eGId
ewW034Z2ueriZ6fzuanQW3j57albeIN6XsTaOHbwD2edQwiyZ6yoMakKgMuHnpgx
F4BtPtcbgSHSQ48ZZb9bdQpvEO3lY0jmSlMI4S2Fu5DmSIKC/KOy/4yYUhlQtrHT
UUbIvq/pC+SMPxZiZAJhyIFcV6YkTelArPFx+QxsWvMMiXeGnezgeFsAOzwZuptF
FFm9ItgfPkGxkiECFwJwSAHsTiiFocRfYHHv/ace/6X4ZB+nZrl3mSfX7+EtT2LG
Kje2XtUzoGR/8LSBTMlQQeurhBZwbnoaFEtiVMrLODhvFT3IU7B00wgQHWNpjyRj
Rqe/ScHRdRF1NQtW6QDTpNU4rZGB4lt1WxpMlONVVpO4LI/fXZq8Nq4lT+FzUHV1
xNQEh9xV2DK7bjT3HDd/OKSusaLzImNFYmi6+t/gROX8z/PqISb5IOmjaiO9pYYM
coSJHmYL1hc0yFsNp6eB
=bw0V
-----END PGP SIGNATURE-----
Merge tag 'iommu-fixes-v4.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"Two fixes:
- A fix for AMD IOMMU interrupt remapping code when IRQs are
forwarded directly to KVM guests
- Fixed check in the recently merged code to allow tboot with
Intel VT-d disabled"
* tag 'iommu-fixes-v4.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Fix interrupt remapping when disable guest_mode
iommu/vt-d: Correctly disable Intel IOMMU force on
In preparation for an objtool rewrite which will have broader checks,
whitelist functions and files which cause problems because they do
unusual things with the stack.
These whitelists serve as a TODO list for which functions and files
don't yet have undwarf unwinder coverage. Eventually most of the
whitelists can be removed in favor of manual CFI hint annotations or
objtool improvements.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/7f934a5d707a574bda33ea282e9478e627fb1829.1498659915.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A recent commit moved most logic of early boot up from startup_64() written
in assembly to __startup_64() written in C.
Fengguang reported breakage due to the change. It was tracked down to
CONFIG_FUNCTION_TRACER being enabled.
Tracing this function is not possible because it's invoked from the
earliest boot stage before the relocation fixups have been done. It is the
function doing the relocation.
Exclude it from being built with tracer stubs.
Fixes: c88d71508e ("x86/boot/64: Rewrite startup_64() in C")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: lkp@01.org
Link: http://lkml.kernel.org/r/20170627115948.17938-1-kirill.shutemov@linux.intel.com
The only user of thread_saved_pc() in non-arch-specific code was removed
in commit 8243d55977 ("sched/core: Remove pointless printout in
sched_show_task()"). Remove the implementations as well.
Some architectures use thread_saved_pc() in their arch-specific code.
Leave their thread_saved_pc() intact.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
And instead wire it up as method for all the dma_map_ops instances.
Note that this also means the arch specific check will be fully instead
of partially applied in the AMD iommu driver.
Signed-off-by: Christoph Hellwig <hch@lst.de>
AMD systems support the Monitor/Mwait instructions and these can be used
for ACPI C1 in the same way as on Intel systems.
Three things are needed:
1) This patch.
2) BIOS that declares a C1 state in _CST to use FFH, with correct values.
3) CPUID_Fn00000005_EDX is non-zero on the system.
The BIOS on AMD systems have historically not defined a C1 state in _CST,
so the acpi_idle driver uses HALT for ACPI C1.
Currently released systems have CPUID_Fn00000005_EDX as reserved/RAZ. If a
BIOS is released for these systems that requests a C1 state with FFH, the
FFH implementation in Linux will fail since CPUID_Fn00000005_EDX is 0. The
acpi_idle driver will then fallback to using HALT for ACPI C1.
Future systems are expected to have non-zero CPUID_Fn00000005_EDX and BIOS
support for using FFH for ACPI C1.
Allow ffh_cstate_init() to succeed on AMD systems.
Tested on Fam15h and Fam17h systems.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The goal of this change is to give users a uniform and meaningful
result when they read /sys/...cpufreq/scaling_cur_freq
on modern x86 hardware, as compared to what they get today.
Modern x86 processors include the hardware needed
to accurately calculate frequency over an interval --
APERF, MPERF, and the TSC.
Here we provide an x86 routine to make this calculation
on supported hardware, and use it in preference to any
driver driver-specific cpufreq_driver.get() routine.
MHz is computed like so:
MHz = base_MHz * delta_APERF / delta_MPERF
MHz is the average frequency of the busy processor
over a measurement interval. The interval is
defined to be the time between successive invocations
of aperfmperf_khz_on_cpu(), which are expected to to
happen on-demand when users read sysfs attribute
cpufreq/scaling_cur_freq.
As with previous methods of calculating MHz,
idle time is excluded.
base_MHz above is from TSC calibration global "cpu_khz".
This x86 native method to calculate MHz returns a meaningful result
no matter if P-states are controlled by hardware or firmware
and/or if the Linux cpufreq sub-system is or is-not installed.
When this routine is invoked more frequently, the measurement
interval becomes shorter. However, the code limits re-computation
to 10ms intervals so that average frequency remains meaningful.
Discerning users are encouraged to take advantage of
the turbostat(8) utility, which can gracefully handle
concurrent measurement intervals of arbitrary length.
Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The MCE severity gives a hint as to how to handle the error. The
notifier blocks can then use the severity to decide on an action.
It's not necessary for machine_check_poll() to filter errors for
the notifier chain, since each block will check its own set of
conditions before handling an error.
Also, there isn't any urgency for machine_check_poll() to make decisions
based on severity like in do_machine_check().
If we can assume that a severity is set then we can use it in more
notifier blocks. For example, the CEC block could check for a "KEEP"
severity rather than checking bits in the status. This isn't possible
now since the severity is not set except for "DEFFRRED/UCNA" errors with
a valid address.
Save the severity since we have it, and let the notifier blocks decide
if they want to do anything.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1498074402-98633-1-git-send-email-Yazen.Ghannam@amd.com
The helper function __load_ucode_amd() and pointer intel_ucode_patch do
not need to be in global scope, so make them static.
Fixes those sparse warnings:
"symbol '__load_ucode_amd' was not declared. Should it be static?"
"symbol 'intel_ucode_patch' was not declared. Should it be static?"
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170622095736.11937-1-colin.king@canonical.com
cpufreq_quick_get() allows cpufreq drivers to over-ride cpu_khz
that is otherwise reported in x86 /proc/cpuinfo "cpu MHz".
There are four problems with this scheme,
any of them is sufficient justification to delete it.
1. Depending on which cpufreq driver is loaded, the behavior
of this field is different.
2. Distros complain that they have to explain to users
why and how this field changes. Distros have requested a constant.
3. The two major providers of this information, acpi_cpufreq
and intel_pstate, both "get it wrong" in different ways.
acpi_cpufreq lies to the user by telling them that
they are running at whatever frequency was last
requested by software.
intel_pstate lies to the user by telling them that
they are running at the average frequency computed
over an undefined measurement. But an average computed
over an undefined interval, is itself, undefined...
4. On modern processors, user space utilities, such as
turbostat(1), are more accurate and more precise, while
supporing concurrent measurement over arbitrary intervals.
Users who have been consulting /proc/cpuinfo to
track changing CPU frequency will be dissapointed that
it no longer wiggles -- perhaps being unaware of the
limitations of the information they have been consuming.
Yes, they can change their scripts to look in sysfs
cpufreq/scaling_cur_frequency. Here they will find the same
data of dubious quality here removed from /proc/cpuinfo.
The value in sysfs will be addressed in a subsequent patch
to address issues 1-3, above.
Issue 4 will remain -- users that really care about
accurate frequency information should not be using either
proc or sysfs kernel interfaces.
They should be using using turbostat(8), or a similar
purpose-built analysis tool.
Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If the interrupt destination mode of the APIC is physical then the
effective affinity is restricted to a single CPU.
Mark the interrupt accordingly in the domain allocation code, so the core
code can avoid pointless affinity setting attempts.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235447.508846202@linutronix.de
Add the effective irq mask update to the apic implementations and enable
effective irq masks for x86.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235446.878370703@linutronix.de
The decision to which CPUs an interrupt is effectively routed happens in
the various apic->cpu_mask_to_apicid() implementations
To support effective affinity masks this information needs to be updated in
irq_data. Add a pointer to irq_data to the callbacks and feed it through
the call chain.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235446.720739075@linutronix.de
All implementations of apic->cpu_mask_to_apicid_and() and the two incoming
cpumasks to search for the target.
Move that operation to the call site and rename it to cpu_mask_to_apicid()
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235446.641575516@linutronix.de
All implementations of apic->cpu_mask_to_apicid_and() mask out the offline
cpus. The callsite already has a mask available, which has the offline CPUs
removed. Use that and remove the extra bits.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235446.560868224@linutronix.de
Same functionality except the extra bits ored on the apicid.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235446.482841015@linutronix.de
No point in having inlines assigned to function pointers at multiple
places. Just bloats the text.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235446.405975721@linutronix.de
The generic migration code supports all the required features
already. Remove the x86 specific implementation and use the generic one.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235445.851311033@linutronix.de
Reorder fixup_irqs() so it matches the flow in the generic migration code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235445.774272454@linutronix.de
If an CPU goes offline, the interrupts are migrated away, but a eventually
pending interrupt move, which has not yet been made effective is kept
pending even if the outgoing CPU is the sole target of the pending affinity
mask. What's worse is, that the pending affinity mask is discarded even if
it would contain a valid subset of the online CPUs.
Use the newly introduced helper to:
- Discard a pending move when the outgoing CPU is the only target in the
pending mask.
- Use the pending mask instead of the affinity mask to find a valid target
for the CPU if the pending mask intersects with the online CPUs.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235444.774068557@linutronix.de
Use the fwnode to create named irq domains so diagnosis works.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235444.299024560@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235444.221049665@linutronix.de
Provide a new interface for creating the iommu remapping domains, so that
the caller can supply a name and a id in order to create named irqdomains.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: iommu@lists.linux-foundation.org
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235443.986661206@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use the fwnode to create a named domain so diagnosis works.
Mark the init function __init while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235443.829047007@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use the fwnode to create a named domain so diagnosis works, but only when
the the ioapic is not device tree based.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235443.752782603@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use the fwnode to create a named domain so diagnosis works.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235443.673635238@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add the missing name, so debugging will work proper.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235443.266561988@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
tsc_clocksource_reliable is initialized in check_system_tsc_reliable(), but
it is checked in unsynchronized_tsc() which is called before the
initialization.
In practice that's not an issue because systems which mark the TSC
reliable have X86_FEATURE_CONSTANT_TSC set as well, which is evaluated
in unsynchronized_tsc() before tsc_clocksource_reliable.
Reorder the calls so initialization happens before usage.
[ tglx: Massaged changelog ]
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/b1532ef7-cd9f-45f7-9f49-48dd2a5c2495@default
It was found that SMI_TRESHOLD of 50000 is not enough for Hyper-V
guests in nested environment and falling back to counting jiffies
is not an option for Gen2 guests as they don't have PIT. As Hyper-V
provides TSC frequency in a synthetic MSR we can just use this information
instead of doing a error prone calibration.
Reported-and-tested-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jork Loeser <jloeser@microsoft.com>
Cc: devel@linuxdriverproject.org
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Link: http://lkml.kernel.org/r/20170622100730.18112-3-vkuznets@redhat.com
Hyper-V TLFS specifies two bits which should be checked before accessing
frequency MSRs:
- AccessFrequencyMsrs (BIT(11) in EAX) which indicates if we have access to
frequency MSRs.
- FrequencyMsrsAvailable (BIT(8) in EDX) which indicates is these MSRs are
present.
Rename and specify these bits accordingly.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Ladi Prosek <lprosek@redhat.com>
Cc: Jork Loeser <jloeser@microsoft.com>
Cc: devel@linuxdriverproject.org
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Link: http://lkml.kernel.org/r/20170622100730.18112-2-vkuznets@redhat.com
The following change in 2013:
0212f91596 ("x86: Add Crash kernel low reservation")
... introduced reserve_crashkernel_low(). This function is used to
reserve crash kernel memory either if crashkernel=size,low is given
on the command line or if the region reserved by reserve_crashkernel
is entirely above 4G.
reserve_crashkernel_low() tries to find a block of 'low_size' bytes.
But there seems to be no good reason to restrict the lower bound
of the range to 'low_size'.
Make memblock_find_in_range() search from the start of memory.
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20170616161602.2r7birrf2y3ylv6v@dwarf.suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The only call site also calls idle_task_exit(), and idle_task_exit()
puts us into a clean state by explicitly switching to init_mm.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/3acc7ad02a2ec060d2321a1e0f6de1cb90069517.1498022414.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
These two functions are only called by arch_early_irq_init(), which
is an __init function, so mark them __init as well.
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1498101341-10182-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This function is only called by arch_early_irq_init(), which is an
__init function, so mark the child function __init as well.
In addition mark it inline for the !CONFIG_X86_IO_APIC case.
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1498040061-5332-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When running under Xen as dom0, /dev/mcelog is being provided by Xen
instead of the normal mcelog character device of the MCE core. Convert
an error message being issued by the MCE core in this case to an
informative message that Xen has registered the device.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170614084059.19294-1-jgross@suse.com
Put __startup_64() and fixup_pointer() into .head.text section to make
sure it's always near startup_64() and always callable.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel test robot <fengguang.wu@intel.com>
Cc: wfg@linux.intel.com
Link: http://lkml.kernel.org/r/20170616113024.ajmif63cmcszry5a@black.fi.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>