Use static_cpu_has() in the timing-sensitive paths in fpstate_init() and
fpu__copy().
While at it, simplify the use in init_cyrix() and get rid of the ternary
operator.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459801503-15600-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
which uses the same fast path as __down_write() except it falls back to
call_rwsem_down_write_failed_killable() slow path and return -EINTR if
killed. To prevent from code duplication extract the skeleton of
__down_write() into a helper macro which just takes the semaphore
and the slow path function to be called.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-11-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This is no longer used anywhere and all callers (__down_write()) use
0 as a subclass. Ditch __down_write_nested() to make the code easier
to follow.
This shouldn't introduce any functional change.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-2-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Allowing user code to map the HPET is problematic. HPET
implementations are notoriously buggy, and there are probably many
machines on which even MMIO reads from bogus HPET addresses are
problematic.
We have a report that the Dell Precision M2800 with:
ACPI: HPET 0x00000000C8FE6238 000038 (v01 DELL CBX3 01072009 AMI. 00000005)
is either so slow when accessing the HPET or actually hangs in some
regard, causing soft lockups to be reported if users do unexpected
things to the HPET.
The vclock HPET code has also always been a questionable speedup.
Accessing an HPET is exceedingly slow (on the order of several
microseconds), so the added overhead in requiring a syscall to read
the HPET is a small fraction of the total code of accessing it.
To avoid future problems, let's just delete the code entirely.
In the long run, this could actually be a speedup. Waiman Long as a
patch to optimize the case where multiple CPUs contend for the HPET,
but that won't help unless all the accesses are mediated by the
kernel.
Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman Long <Waiman.Long@hpe.com>
Cc: Waiman Long <waiman.long@hpe.com>
Link: http://lkml.kernel.org/r/d2f90bba98db9905041cff294646d290d378f67a.1460074438.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Erratum 88 affects old AMD K8s, where a SWAPGS fails to cause an input
dependency on GS. Therefore, we need to MFENCE before it.
But that MFENCE is expensive and unnecessary on the remaining x86 CPUs
out there so patch it out on the CPUs which don't require it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rudolf Marek <r.marek@assembler.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/aec6b2df1bfc56101d4e9e2e5d5d570bf41663c6.1460075211.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
AMD and Intel do different things when writing zero to a segment
selector. Since neither vendor documents the behavior well and it's
easy to test the behavior, try nulling fs to see what happens.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rudolf Marek <r.marek@assembler.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/61588ba0e0df35beafd363dc8b68a4c5878ef095.1460075211.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- intel_pstate fixes for two issues exposed by the recent switch
over from using timers and for one issue introduced during the
4.4 cycle plus new comments describing data structures used by
the driver (Rafael Wysocki, Srinivas Pandruvada).
- intel_idle fixes related to CPU offline/online (Richard Cochran).
- intel_idle support (new CPU IDs and state definitions mostly) for
Skylake-X and Kabylake processors (Len Brown).
- PCC mailbox driver fix for an out-of-bounds memory access that
may cause the kernel to panic() (Shanker Donthineni).
- New (missing) CPU ID for one apparently overlooked Haswell model
in the Intel RAPL power capping driver (Srinivas Pandruvada).
- Fix for the PM core's wakeup IRQs framework to make it work after
wakeup settings reconfiguration from sysfs (Grygorii Strashko).
- Runtime PM documentation update to make it describe what needs
to be done during device removal more precisely (Krzysztof
Kozlowski).
- Stale comment removal cleanup in the cpufreq-dt driver (Viresh
Kumar).
- turbostat utility fixes and support for Broxton, Skylake-X
and Kabylake processors (Len Brown).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJXCBaNAAoJEILEb/54YlRxDrkP/jdCfB3wOcRGp6LN6GHksC3u
k2yUQl+XJZhggLda+yUK2ovV5tCJwGmP1N1B/MacRr/5MeicGXBvt6ZrHGqOfTKo
9xAGKOt2xsoAbHkleB733GIRD9TYZpqCTx9n0dka7kRArQA/uhY+TlXYY2Kn+E0B
1c4lEN3d9j4G2qW55YxnfRUOhwUc03syokOsS2z9ChEAVPSrn7Zv8V+8rPwxIz5w
uUWphlnEQgaRwzwxACHwE0bht8sl1cJ1UUSVbzf30mzRlGHiteYyI/vuE2hF9SiV
DEv9mhSB+4U6WuBHr4Cwu+nf9YTlaY1ZaS3r5EDzJaNJb8tMqPZcGitfn+fTGtUz
9GlCQZIBcP1vWBDybuuPOzAH++QUFrikozuNfUu1d+WFEypRyadMIvUhtmZPG9mh
9+Vem+ta32eXos07dx4Dvth+yNvmG5bxPnteDp3GPsnCCDlXutcKaaJaaj+1NzHi
oqHyEynMhJuN/D2h+oIVIUppKBVO55M+lJMJrX891zICs/98K2LwOtFDuNRWQml0
3yR7Ieoj5gxVfbT6zNeH5z7CxP/MymNAMjbxfiwqtGHhkNoAv92ymMxnptPiLCHn
Zyycelsve3sneV+iyk1fAOZFvDXbzTUyigxGP7Kc87iia/Yd2TjLlvwJkMNK6x/K
2OSkjJFqqrqAqcXNIz3G
=f4MM
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"Fixes for some issues discovered after recent changes and for some
that have just been found lately regardless of those changes
(intel_pstate, intel_idle, PM core, mailbox/pcc, turbostat) plus
support for some new CPU models (intel_idle, Intel RAPL driver,
turbostat) and documentation updates (intel_pstate, PM core).
Specifics:
- intel_pstate fixes for two issues exposed by the recent switch over
from using timers and for one issue introduced during the 4.4 cycle
plus new comments describing data structures used by the driver
(Rafael Wysocki, Srinivas Pandruvada).
- intel_idle fixes related to CPU offline/online (Richard Cochran).
- intel_idle support (new CPU IDs and state definitions mostly) for
Skylake-X and Kabylake processors (Len Brown).
- PCC mailbox driver fix for an out-of-bounds memory access that may
cause the kernel to panic() (Shanker Donthineni).
- New (missing) CPU ID for one apparently overlooked Haswell model in
the Intel RAPL power capping driver (Srinivas Pandruvada).
- Fix for the PM core's wakeup IRQs framework to make it work after
wakeup settings reconfiguration from sysfs (Grygorii Strashko).
- Runtime PM documentation update to make it describe what needs to
be done during device removal more precisely (Krzysztof Kozlowski).
- Stale comment removal cleanup in the cpufreq-dt driver (Viresh
Kumar).
- turbostat utility fixes and support for Broxton, Skylake-X and
Kabylake processors (Len Brown)"
* tag 'pm+acpi-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (28 commits)
PM / wakeirq: fix wakeirq setting after wakup re-configuration from sysfs
tools/power turbostat: work around RC6 counter wrap
tools/power turbostat: initial KBL support
tools/power turbostat: initial SKX support
tools/power turbostat: decode BXT TSC frequency via CPUID
tools/power turbostat: initial BXT support
tools/power turbostat: print IRTL MSRs
tools/power turbostat: SGX state should print only if --debug
intel_idle: Add KBL support
intel_idle: Add SKX support
intel_idle: Clean up all registered devices on exit.
intel_idle: Propagate hot plug errors.
intel_idle: Don't overreact to a cpuidle registration failure.
intel_idle: Setup the timer broadcast only on successful driver load.
intel_idle: Avoid a double free of the per-CPU data.
intel_idle: Fix dangling registration on error path.
intel_idle: Fix deallocation order on the driver exit path.
intel_idle: Remove redundant initialization calls.
intel_idle: Fix a helper function's return value.
intel_idle: remove useless return from void function.
...
* pm-core:
PM / wakeirq: fix wakeirq setting after wakup re-configuration from sysfs
PM / runtime: Document steps for device removal
* powercap:
powercap: intel_rapl: Add missing Haswell model
* pm-tools:
tools/power turbostat: work around RC6 counter wrap
tools/power turbostat: initial KBL support
tools/power turbostat: initial SKX support
tools/power turbostat: decode BXT TSC frequency via CPUID
tools/power turbostat: initial BXT support
tools/power turbostat: print IRTL MSRs
tools/power turbostat: SGX state should print only if --debug
Some processors use the Interrupt Response Time Limit (IRTL) MSR value
to describe the maximum IRQ response time latency for deep
package C-states. (Though others have the register, but do not use it)
Lets print it out to give insight into the cases where it is used.
IRTL begain in SNB, with PC3/PC6/PC7, and HSW added PC8/PC9/PC10.
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
others are usual stable material.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJXA8x6AAoJEL/70l94x66D0x8H/RcBnc75994RQ++WmHSvD9GF
yruGB8soLDdjX+Oceol0aEPHokrBu3JtcdoTBe0GwbCKV/F5NkQZ4EfLxDtR3tte
7ILkPULLy5GElFpJNQuT4pmXzTEspFvXpqHhFik7WVBga3W9wMFQcjbrgmGBUzLE
p2aJVhZyErpKxGFkUYWhDnlqWsguTTIzv/pqNhLY4VVc0UrXN9AA0fq9RkvgU3KS
Hxk4/A6SV/b7dyzvttzITww0f1iu8FmlLj2TXapIEoOz7AnInD6KIN0RYpxbDjxN
bEzEfpahUtuDeM87/t2kHEj0Gn09iHK7/BbCC1Hrwo1CQhbAQ/D0GIvqYAQixf4=
=NugZ
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Miscellaneous bugfixes.
The ARM and s390 fixes are for new regressions from the merge window,
others are usual stable material"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
compiler-gcc: disable -ftracer for __noclone functions
kvm: x86: make lapic hrtimer pinned
s390/mm/kvm: fix mis-merge in gmap handling
kvm: set page dirty only if page has been writable
KVM: x86: reduce default value of halt_poll_ns parameter
KVM: Hyper-V: do not do hypercall userspace exits if SynIC is disabled
KVM: x86: Inject pending interrupt even if pending nmi exist
arm64: KVM: Register CPU notifiers when the kernel runs at HYP
arm64: kvm: 4.6-rc1: Fix VTCR_EL2 VS setting
Pull x86 fixes from Thomas Gleixner:
"This lot contains:
- Some fixups for the fallout of the topology consolidation which
unearthed AMD/Intel inconsistencies
- Documentation for the x86 topology management
- Support for AMD advanced power management bits
- Two simple cleanups removing duplicated code"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add advanced power management bits
x86/thread_info: Merge two !__ASSEMBLY__ sections
x86/cpufreq: Remove duplicated TDP MSR macro definitions
x86/Documentation: Start documenting x86 topology
x86/cpu: Get rid of compute_unit_id
perf/x86/amd: Cleanup Fam10h NB event constraints
x86/topology: Fix AMD core count
The recently introduced batched invalidations mechanism uses its own
mechanism for shootdown. However, it does wrong accounting of
interrupts (e.g., inc_irq_stat is called for local invalidations),
trace-points (e.g., TLB_REMOTE_SHOOTDOWN for local invalidations) and
may break some platforms as it bypasses the invalidation mechanisms of
Xen and SGI UV.
This patch reuses the existing TLB flushing mechnaisms instead. We use
NULL as mm to indicate a global invalidation is required.
Fixes 72b252aed5 ("mm: send one IPI per CPU to TLB flush all entries after unmapping pages")
Signed-off-by: Nadav Amit <namit@vmware.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reuse module loader code to write relocations, thereby eliminating the need
for architecture specific relocation code in livepatch. Specifically, reuse
the apply_relocate_add() function in the module loader to write relocations
instead of duplicating functionality in livepatch's arch-dependent
klp_write_module_reloc() function.
In order to accomplish this, livepatch modules manage their own relocation
sections (marked with the SHF_RELA_LIVEPATCH section flag) and
livepatch-specific symbols (marked with SHN_LIVEPATCH symbol section
index). To apply livepatch relocation sections, livepatch symbols
referenced by relocs are resolved and then apply_relocate_add() is called
to apply those relocations.
In addition, remove x86 livepatch relocation code and the s390
klp_write_module_reloc() function stub. They are no longer needed since
relocation work has been offloaded to module loader.
Lastly, mark the module as a livepatch module so that the module loader
canappropriately identify and initialize it.
Signed-off-by: Jessica Yu <jeyu@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # for s390 changes
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Windows lets applications choose the frequency of the timer tick,
and in Windows 10 the maximum rate was changed from 1024 Hz to
2048 Hz. Unfortunately, because of the way the Windows API
works, most applications who need a higher rate than the default
64 Hz will just do
timeGetDevCaps(&tc, sizeof(tc));
timeBeginPeriod(tc.wPeriodMin);
and pick the maximum rate. This causes very high CPU usage when
playing media or games on Windows 10, even if the guest does not
actually use the CPU very much, because the frequent timer tick
causes halt_poll_ns to kick in.
There is no really good solution, especially because Microsoft
could sooner or later bump the limit to 4096 Hz, but for now
the best we can do is lower a bit the upper limit for
halt_poll_ns. :-(
Reported-by: Jon Panozzo <jonp@lime-technology.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For several years, the common practice has been to boot UVs with the
"nobau" parameter on the command line, to disable the BAU. We've
decided that it makes more sense to just disable the BAU by default in
the kernel, and provide the option to turn it on, if desired.
For now, having the on/off switch doesn't buy us any more than just
reversing the logic would, but we're working towards having the BAU
enabled by default on UV4. When those changes are in place, having the
on/off switch will make more sense than an enable flag, since the
default behavior will be different depending on the system version.
I've also added a bit of documentation for the new parameter to
Documentation/kernel-parameters.txt.
Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459451909-121845-1-git-send-email-athorlton@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use static_cpu_has() in __flush_tlb_all() due to the time-sensitivity of
this one.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459266123-21878-10-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
AMD Carrizo (Family 15h, Model 60h) introduces a time-stamp counter
which is indicated by CPUID.8000_0001H:ECX[27]. It increments at a 100
MHz rate in all P-states, and C states, S0, or S1. The frequency is
about 100MHz. This counter will be used to calculate processor power
and other parts. So add an interface into the MSR PMU to get the PTSC
counter value.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1454056197-5893-2-git-send-email-ray.huang@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Xen supports PAT without MTRRs for its guests. In order to
enable WC attribute, it was necessary for xen_start_kernel()
to call pat_init_cache_modes() to update PAT table before
starting guest kernel.
Now that the kernel initializes PAT table to the BIOS handoff
state when MTRR is disabled, this Xen-specific PAT init code
is no longer necessary. Delete it from xen_start_kernel().
Also change __init_cache_modes() to a static function since
PAT table should not be tweaked by other modules.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-7-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A Xorg failure on qemu32 was reported as a regression [1] caused by
commit 9cd25aac1f ("x86/mm/pat: Emulate PAT when it is disabled").
This patch fixes the Xorg crash.
Negative effects of this regression were the following two failures [2]
in Xorg on QEMU with QEMU CPU model "qemu32" (-cpu qemu32), which were
triggered by the fact that its virtual CPU does not support MTRRs.
#1. copy_process() failed in the check in reserve_pfn_range()
copy_process
copy_mm
dup_mm
dup_mmap
copy_page_range
track_pfn_copy
reserve_pfn_range
A WC map request was tracked as WC in memtype, which set a PTE as
UC (pgprot) per __cachemode2pte_tbl[]. This led to this error in
reserve_pfn_range() called from track_pfn_copy(), which obtained
a pgprot from a PTE. It converts pgprot to page_cache_mode, which
does not necessarily result in the original page_cache_mode since
__cachemode2pte_tbl[] redirects multiple types to UC.
#2. error path in copy_process() then hit WARN_ON_ONCE in
untrack_pfn().
x86/PAT: Xorg:509 map pfn expected mapping type uncached-
minus for [mem 0xfd000000-0xfdffffff], got write-combining
Call Trace:
dump_stack
warn_slowpath_common
? untrack_pfn
? untrack_pfn
warn_slowpath_null
untrack_pfn
? __kunmap_atomic
unmap_single_vma
? pagevec_move_tail_fn
unmap_vmas
exit_mmap
mmput
copy_process.part.47
_do_fork
SyS_clone
do_syscall_32_irqs_on
entry_INT80_32
These negative effects are caused by two separate bugs, but they
can be addressed in separate patches. Fixing the pat_init() issue
described below addresses the root cause, and avoids Xorg to hit
these cases.
When the CPU does not support MTRRs, MTRR does not call pat_init(),
which leaves PAT enabled without initializing PAT. This pat_init()
issue is a long-standing issue, but manifested as issue #1 (and then
hit issue #2) with the above-mentioned commit because the memtype
now tracks cache attribute with 'page_cache_mode'.
This pat_init() issue existed before the commit, but we used pgprot
in memtype. Hence, we did not have issue #1 before. But WC request
resulted in WT in effect because WC pgrot is actually WT when PAT
is not initialized. This is not how it was designed to work. When
PAT is set to disable properly, WC is converted to UC. The use of
WT can result in a system crash if the target range does not support
WT. Fortunately, nobody ran into such issue before.
To fix this pat_init() issue, PAT code has been enhanced to provide
pat_disable() interface. Call this interface when MTRRs are disabled.
By setting PAT to disable properly, PAT bypasses the memtype check,
and avoids issue #1.
[1]: https://lkml.org/lkml/2016/3/3/828
[2]: https://lkml.org/lkml/2016/3/4/775
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-5-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation for fixing a regression caused by:
9cd25aac1f ("x86/mm/pat: Emulate PAT when it is disabled")
... PAT needs to provide an interface that prevents the OS from
initializing the PAT MSR.
PAT MSR initialization must be done on all CPUs using the specific
sequence of operations defined in the Intel SDM. This requires MTRRs
to be enabled since pat_init() is called as part of MTRR init
from mtrr_rendezvous_handler().
Make pat_disable() as the interface that prevents the OS from
initializing the PAT MSR. MTRR will call this interface when it
cannot provide the SDM-defined sequence to initialize PAT.
This also assures that pat_disable() called from pat_bsp_init()
will set the PAT table properly when CPU does not support PAT.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Elliott <elliott@hpe.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-3-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation for fixing a regression caused by:
9cd25aac1f ("x86/mm/pat: Emulate PAT when it is disabled")'
... PAT needs to support a case that PAT MSR is initialized with a
non-default value.
When pat_init() is called and PAT is disabled, it initializes the
PAT table with the BIOS default value. Xen, however, sets PAT MSR
with a non-default value to enable WC. This causes inconsistency
between the PAT table and PAT MSR when PAT is set to disable on Xen.
Change pat_init() to handle the PAT disable cases properly. Add
init_cache_modes() to handle two cases when PAT is set to disable.
1. CPU supports PAT: Set PAT table to be consistent with PAT MSR.
2. CPU does not support PAT: Set PAT table to be consistent with
PWT and PCD bits in a PTE.
Note, __init_cache_modes(), renamed from pat_init_cache_modes(),
will be changed to a static function in a later patch.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-2-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The list of CPU model specific registers contains two copies of TDP
registers, remove the one, which is out of numerical order in the
list.
Fixes: 6a35fc2d6c ("cpufreq: intel_pstate: get P1 from TAR when available")
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Kristen Carlson
Accardi <kristen@linux.intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: http://lkml.kernel.org/r/1459018020-24577-1-git-send-email-vladimir_zapolskiy@mentor.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It turns out AMD gets x86_max_cores wrong when there are compute
units.
The issue is that Linux assumes:
nr_logical_cpus = nr_cores * nr_siblings
But AMD reports its CU unit as 2 cores, but then sets num_smp_siblings
to 2 as well.
Boris: fixup ras/mce_amd_inj.c too, to compute the Node Base Core
properly, according to the new nomenclature.
Fixes: 1f12e32f4c ("x86/topology: Create logical package id")
Reported-by: Xiong Zhou <jencce.kernel@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andreas Herrmann <aherrmann@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/20160317095220.GO6344@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Update the definition of memcpy_from_pmem() to return 0 or a negative
error code. Implement x86/arch_memcpy_from_pmem() with memcpy_mcsafe().
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull perf fixes from Ingo Molnar:
"This tree contains various perf fixes on the kernel side, plus three
hw/event-enablement late additions:
- Intel Memory Bandwidth Monitoring events and handling
- the AMD Accumulated Power Mechanism reporting facility
- more IOMMU events
... and a final round of perf tooling updates/fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
perf llvm: Use strerror_r instead of the thread unsafe strerror one
perf llvm: Use realpath to canonicalize paths
perf tools: Unexport some methods unused outside strbuf.c
perf probe: No need to use formatting strbuf method
perf help: Use asprintf instead of adhoc equivalents
perf tools: Remove unused perf_pathdup, xstrdup functions
perf tools: Do not include stringify.h from the kernel sources
tools include: Copy linux/stringify.h from the kernel
tools lib traceevent: Remove redundant CPU output
perf tools: Remove needless 'extern' from function prototypes
perf tools: Simplify die() mechanism
perf tools: Remove unused DIE_IF macro
perf script: Remove lots of unused arguments
perf thread: Rename perf_event__preprocess_sample_addr to thread__resolve
perf machine: Rename perf_event__preprocess_sample to machine__resolve
perf tools: Add cpumode to struct perf_sample
perf tests: Forward the perf_sample in the dwarf unwind test
perf tools: Remove misplaced __maybe_unused
perf list: Fix documentation of :ppp
perf bench numa: Fix assertion for nodes bitfield
...
After e76b027 ("x86,vdso: Use LSL unconditionally for vgetcpu")
native_read_tscp() is unused in the kernel. The function can be removed like
native_read_tsc() was.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1458687968-9106-1-git-send-email-prarit@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Merge third patch-bomb from Andrew Morton:
- more ocfs2 changes
- a few hotfixes
- Andy's compat cleanups
- misc fixes to fatfs, ptrace, coredump, cpumask, creds, eventfd,
panic, ipmi, kgdb, profile, kfifo, ubsan, etc.
- many rapidio updates: fixes, new drivers.
- kcov: kernel code coverage feature. Like gcov, but not
"prohibitively expensive".
- extable code consolidation for various archs
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (81 commits)
ia64/extable: use generic search and sort routines
x86/extable: use generic search and sort routines
s390/extable: use generic search and sort routines
alpha/extable: use generic search and sort routines
kernel/...: convert pr_warning to pr_warn
drivers: dma-coherent: use memset_io for DMA_MEMORY_IO mappings
drivers: dma-coherent: use MEMREMAP_WC for DMA_MEMORY_MAP
memremap: add MEMREMAP_WC flag
memremap: don't modify flags
kernel/signal.c: add compile-time check for __ARCH_SI_PREAMBLE_SIZE
mm/mprotect.c: don't imply PROT_EXEC on non-exec fs
ipc/sem: make semctl setting sempid consistent
ubsan: fix tree-wide -Wmaybe-uninitialized false positives
kfifo: fix sparse complaints
scripts/gdb: account for changes in module data structure
scripts/gdb: add cmdline reader command
scripts/gdb: add version command
kernel: add kcov code coverage
profile: hide unused functions when !CONFIG_PROC_FS
hpwdt: use nmi_panic() when kernel panics in NMI handler
...
Replace the arch specific versions of search_extable() and
sort_extable() with calls to the generic ones, which now support
relative exception tables as well.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
x86's is_compat_task always checked the current syscall type, not the
task type. It has no non-arch users any more, so just remove it to
avoid confusion.
On x86, nothing should really be checking the task ABI. There are
legitimate users for the syscall ABI and for the mm ABI.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>