Make a copy of the IDT (as seen via the "sidt" instruction) read-only.
This primarily removes the IDT from being a target for arbitrary memory
write attacks, and has the added benefit of also not leaking the kernel
base offset, if it has been relocated.
We already did this on vendor == Intel and family == 5 because of the
F0 0F bug -- regardless of if a particular CPU had the F0 0F bug or
not. Since the workaround was so cheap, there simply was no reason to
be very specific. This patch extends the readonly alias to all CPUs,
but does not activate the #PF to #UD conversion code needed to deliver
the proper exception in the F0 0F case except on Intel family 5
processors.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130410192422.GA17344@www.outflux.net
Cc: Eric Northup <digitaleric@google.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Invoking arch_flush_lazy_mmu_mode() results in calls to
preempt_enable()/disable() which may have performance impact.
Since lazy MMU is not used on bare metal we can patch away
arch_flush_lazy_mmu_mode() so that it is never called in such
environment.
[ hpa: the previous patch "Fix vmalloc_fault oops during lazy MMU
updates" may cause a minor performance regression on
bare metal. This patch resolves that performance regression. It is
somewhat unclear to me if this is a good -stable candidate. ]
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1364045796-10720-2-git-send-email-konrad.wilk@oracle.com
Tested-by: Josh Boyer <jwboyer@redhat.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> SEE NOTE ABOVE
Add CYCLE_ACTIVITY.CYCLES_NO_DISPATCH/CYCLES_L1D_PENDING constraints.
These recently documented events have restrictions to counter
0-3 and counter 2 respectively. The perf scheduler needs to know
that to schedule them correctly.
IvyBridge already has the necessary constraints.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: a.p.zijlstra@chello.nl
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1362784968-12542-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Future AMD processors, starting with Family 16h, can provide software
with feedback on how the workload may respond to frequency change --
memory-bound workloads will not benefit from higher frequency, where
as compute-bound workloads will. This patch enables this "frequency
sensitivity feedback" to aid the ondemand governor to make better
frequency change decisions by hooking into the powersave bias.
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Acked-by: Thomas Renninger <trenn@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRXI4YAAoJEKurIx+X31iBY1UP/Rq7DkmmffFh2faMVPMdIC/T
Z3xwbqmqNpqJOjBojocOkwpG44Da51h4dZNIdm6xtEFojqf9UdaTPobe00jOEotX
MPcc4VsuTS2WLd3dCkRHCfHu2nW6z/F4ysEtIirrHVj5JVqqH9atZPkGBB3Wcsic
00QTV/JeOP1QjvGEMxnaW2rNIXSE19NNbiHE0tx9jNYZuecOhll/GH2AOW17+Kvn
ocn4dkisbHRZ/YixuwkqXzDABj2+qiwjHZbP4f4jZbdDuuDR2KWeW0Vkoau1vRHg
TU3FX2BCBy4foA8St/AQ7WRbmSA/+5obUWvSWevXXZu7A6CxAmkv4s46D3rRuIaB
MLDqov0NobqM4vjJn4Y4bcbvYMIv8zJijok6sg2MNFIHUy9iBFTlJHOai9guSdiB
kHKTfLsCUpdkiwHMq3rcFCAMZi/DSxjjPbae8IzP7+IJAG1ff1SP6QwM6UOZq9h1
wVYpvavIOwBhKIsjghkqF0Ov/8a+4lyXMksQ3nKleKYD5908hpDYMD/2i6MTCkpy
50k3OVvnpWbGdQmYwbTiGaOW/jQ2tC72v24o/Y9tKYc2Fea0BlWn+yzIGahCn6ok
j7oQHAW1OJ2oc3h/RWtDWSejXUpYsyclCZhlFB+TQj1J/2d7VytFxo+g156xD1Rj
VJ0Dseml7vA4Orr6jAMh
=wipk
-----END PGP SIGNATURE-----
Merge tag 'please-pull-cmci_rediscover' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras
Pull clean up of the cmci_rediscover code to fix problems found by Dave Jones,
from Tony Luck.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Return an error from __copy_instruction() and use printk() to
give us a more productive message, since this is just an error
case which we can handle and also the BUG_ON() never tells us
why and what happened.
This is related to the following bug-report:
https://bugzilla.redhat.com/show_bug.cgi?id=910649
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20130404104230.22862.85242.stgit@mhiramat-M0-7522
Signed-off-by: Ingo Molnar <mingo@kernel.org>
So gcc nags about those since forever in randconfig builds.
arch/x86/kernel/quirks.c: In function ‘ati_ixp4x0_rev’:
arch/x86/kernel/quirks.c:361:4: warning: ‘b’ is used uninitialized in this function [-Wuninitialized]
arch/x86/kernel/quirks.c: In function ‘ati_force_enable_hpet’:
arch/x86/kernel/quirks.c:367:4: warning: ‘d’ may be used uninitialized in this function [-Wuninitialized]
arch/x86/kernel/quirks.c:357:6: note: ‘d’ was declared here
arch/x86/kernel/quirks.c:407:21: warning: ‘val’ may be used uninitialized in this function [-Wuninitialized]
This function quirk is called on a SB400 chipset only anyway so the
distant possibility of a PCI access failing becomes almost impossible
there. Even if it did fail, then something else more serious is the
problem.
So zero-out the variables so that gcc shuts up but do a coarse check
on the PCI accesses at the end and signal whether any of them had an
error. They shouldn't but in case they do, we'll at least know and we
can address it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1362428180-8865-6-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
We have KERNEL_IMAGE_START and __START_KERNEL_map which both contain the
start of the kernel text mapping's virtual address. Remove the prior one
which has been replicated a lot less times around the tree.
No functionality change.
Signed-off-by: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1362428180-8865-3-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Dave Jones reports that offlining a CPU leads to this trace:
numa_remove_cpu cpu 1 node 0: mask now 0,2-3
smpboot: CPU 1 is now offline
BUG: using smp_processor_id() in preemptible [00000000] code:
cpu-offline.sh/10591
caller is cmci_rediscover+0x6a/0xe0
Pid: 10591, comm: cpu-offline.sh Not tainted 3.9.0-rc3+ #2
Call Trace:
[<ffffffff81333bbd>] debug_smp_processor_id+0xdd/0x100
[<ffffffff8101edba>] cmci_rediscover+0x6a/0xe0
[<ffffffff815f5b9f>] mce_cpu_callback+0x19d/0x1ae
[<ffffffff8160ea66>] notifier_call_chain+0x66/0x150
[<ffffffff8107ad7e>] __raw_notifier_call_chain+0xe/0x10
[<ffffffff8104c2e3>] cpu_notify+0x23/0x50
[<ffffffff8104c31e>] cpu_notify_nofail+0xe/0x20
[<ffffffff815ef082>] _cpu_down+0x302/0x350
[<ffffffff815ef106>] cpu_down+0x36/0x50
[<ffffffff815f1c9d>] store_online+0x8d/0xd0
[<ffffffff813edc48>] dev_attr_store+0x18/0x30
[<ffffffff81226eeb>] sysfs_write_file+0xdb/0x150
[<ffffffff811adfb2>] vfs_write+0xa2/0x170
[<ffffffff811ae16c>] sys_write+0x4c/0xa0
[<ffffffff81613019>] system_call_fastpath+0x16/0x1b
However, a look at cmci_rediscover shows that it can be simplified quite
a bit, apart from solving the above issue. It invokes functions that
take spin locks with interrupts disabled, and hence it can run in atomic
context. Also, it is run in the CPU_POST_DEAD phase, so the dying CPU
is already dead and out of the cpu_online_mask. So take these points into
account and simplify the code, and thereby also fix the above issue.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Convert AMD erratum 400 to the bug infrastructure. Then, retract all
exports for modules since they're not needed now and make the AMD
erratum checking machinery local to amd.c. Use forward declarations to
avoid shuffling too much code around needlessly.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1363788448-31325-7-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Convert the AMD erratum 383 testing code to the bug infrastructure. This
allows keeping the AMD-specific erratum testing machinery private to
amd.c and not export symbols to modules needlessly.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1363788448-31325-6-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
We add another 32-bit vector at the end of the ->x86_capability
bitvector which collects bugs present in CPUs. After all, a CPU bug is a
kind of a capability, albeit a strange one.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1363788448-31325-2-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This patch adds support for memory profiling using the
PEBS Load Latency facility.
Load accesses are sampled by HW and the instruction
address, data address, load latency, data source, tlb,
locked information can be saved in the sampling buffer
if using the PERF_SAMPLE_COST (for latency),
PERF_SAMPLE_ADDR, PERF_SAMPLE_DATA_SRC types.
To enable PEBS Load Latency, users have to use the
model specific event:
- on NHM/WSM: MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD
- on SNB/IVB: MEM_TRANS_RETIRED:LATENCY_ABOVE_THRESHOLD
To make things easier, this patch also exports a generic
alias via sysfs: mem-loads. It export the right event
encoding based on the host CPU and can be used directly
by the perf tool.
Loosely based on Intel's Lin Ming patch posted on LKML
in July 2011.
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: acme@redhat.com
Cc: jolsa@redhat.com
Cc: namhyung.kim@lge.com
Link: http://lkml.kernel.org/r/1359040242-8269-9-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch adds a flags field to each event constraint.
It can be used to store event specific features which can
then later be used by scheduling code or low-level x86 code.
The flags are propagated into event->hw.flags during the
get_event_constraint() call. They are cleared during the
put_event_constraint() call.
This mechanism is going to be used by the PEBS-LL patches.
It avoids defining yet another table to hold event specific
information.
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: jolsa@redhat.com
Cc: namhyung.kim@lge.com
Link: http://lkml.kernel.org/r/1359040242-8269-4-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull x86 fixes from Peter Anvin:
"A collection of minor fixes, more EFI variables paranoia
(anti-bricking) plus the ability to disable the pstore either as a
runtime default or completely, due to bricking concerns."
* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efivars: Fix check for CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE
x86, microcode_intel_early: Mark apply_microcode_early() as cpuinit
efivars: Handle duplicate names from get_next_variable()
efivars: explicitly calculate length of VariableName
efivars: Add module parameter to disable use as a pstore backend
efivars: Allow disabling use as a pstore backend
x86-32, microcode_intel_early: Fix crash with CONFIG_DEBUG_VIRTUAL
x86-64: Fix the failure case in copy_user_handle_tail()
Currently number of error reporting register banks is hardcoded to
6 on AMD processors. This may break in virtualized scenarios when
a hypervisor prefers to report fewer banks than what the physical
HW provides.
Since number of supported banks is reported in MSR_IA32_MCG_CAP[7:0]
that's what we should use.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1363295441-1859-3-git-send-email-boris.ostrovsky@oracle.com
[ reverse NULL ptr test logic ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Pull perf fixes from Ingo Molnar:
"A fair chunk of the linecount comes from a fix for a tracing bug that
corrupts latency tracing buffers when the overwrite mode is changed on
the fly - the rest is mostly assorted fewliner fixlets."
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Add SNB/SNB-EP scheduling constraints for cycle_activity event
kprobes/x86: Check Interrupt Flag modifier when registering probe
kprobes: Make hash_64() as always inlined
perf: Generate EXIT event only once per task context
perf: Reset hwc->last_period on sw clock events
tracing: Prevent buffer overwrite disabled for latency tracers
tracing: Keep overwrite in sync between regular and snapshot buffers
tracing: Protect tracer flags with trace_types_lock
perf tools: Fix LIBNUMA build with glibc 2.12 and older.
tracing: Fix free of probe entry by calling call_rcu_sched()
perf/POWER7: Create a sysfs format entry for Power7 events
perf probe: Fix segfault
libtraceevent: Remove hard coded include to /usr/local/include in Makefile
perf record: Fix -C option
perf tools: check if -DFORTIFY_SOURCE=2 is allowed
perf report: Fix build with NO_NEWT=1
perf annotate: Fix build with NO_NEWT=1
tracing: Fix race in snapshot swapping
Merge reason:
From: Alexander Graf <agraf@suse.de>
"Just recently this really important patch got pulled into Linus' tree for 3.9:
commit 1674400aae
Author: Anton Blanchard <anton <at> samba.org>
Date: Tue Mar 12 01:51:51 2013 +0000
Without that commit, I can not boot my G5, thus I can't run automated tests on it against my queue.
Could you please merge kvm/next against linus/master, so that I can base my trees against that?"
* upstream/master: (653 commits)
PCI: Use ROM images from firmware only if no other ROM source available
sparc: remove unused "config BITS"
sparc: delete "if !ULTRA_HAS_POPULATION_COUNT"
KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798)
KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797)
KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796)
arm64: Kconfig.debug: Remove unused CONFIG_DEBUG_ERRORS
arm64: Do not select GENERIC_HARDIRQS_NO_DEPRECATED
inet: limit length of fragment queue hash table bucket lists
qeth: Fix scatter-gather regression
qeth: Fix invalid router settings handling
qeth: delay feature trace
sgy-cts1000: Remove __dev* attributes
KVM: x86: fix deadlock in clock-in-progress request handling
KVM: allow host header to be included even for !CONFIG_KVM
hwmon: (lm75) Fix tcn75 prefix
hwmon: (lm75.h) Update header inclusion
MAINTAINERS: Remove Mark M. Hoffman
xfs: ensure we capture IO errors correctly
xfs: fix xfs_iomap_eof_prealloc_initial_size type
...
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch fixes an uninitialized pt_regs struct in drain BTS
function. The pt_regs struct is propagated all the way to the
code_get_segment() function from perf_instruction_pointer()
and may get garbage.
We cannot simply inherit the actual pt_regs from the interrupt
because BTS must be flushed on context-switch or when the
associated event is disabled. And there we do not have a pt_regs
handy.
Setting pt_regs to all zeroes may not be the best option but it
is not clear what else to do given where the drain_bts_buffer()
is called from.
In V2, we move the memset() later in the code to avoid doing it
when we end up returning early without doing the actual BTS
processing. Also dropped the reg.val initialization because it
is redundant with the memset() as suggested by PeterZ.
Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: peterz@infradead.org
Cc: sqazi@google.com
Cc: ak@linux.intel.com
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/20130319151038.GA25439@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In 32-bit, __pa_symbol() in CONFIG_DEBUG_VIRTUAL accesses kernel data
(e.g. max_low_pfn) that not only hasn't been setup yet in such early
boot phase, but since we are in linear mode, cannot even be detected
as uninitialized.
Thus, use __pa_nodebug() rather than __pa_symbol() to get a global
symbol's physical address.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1363705484-27645-1-git-send-email-fenghua.yu@intel.com
Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Currently kprobes check whether the copied instruction modifies
IF (interrupt flag) on each probe hit. This results not only in
introducing overhead but also involving
inat_get_opcode_attribute into the kprobes hot path, and it can
cause an infinite recursive call (and kernel panic in the end).
Actually, since the copied instruction itself can never be modified
on the buffer, it is needless to analyze the instruction on every
probe hit.
To fix this issue, we check it only once when registering probe
and store the result on ainsn->if_modifier.
Reported-by: Timo Juhani Lindfors <timo.lindfors@iki.fi>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20130314115242.19690.33573.stgit@mhiramat-M0-7522
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 1d9d8639c0 ("perf,x86: fix kernel crash with PEBS/BTS after
suspend/resume") fixed a crash when doing PEBS performance profiling
after resuming, but in using init_debug_store_on_cpu() to restore the
DS_AREA mtrr it also resulted in a new WARN_ON() triggering.
init_debug_store_on_cpu() uses "wrmsr_on_cpu()", which in turn uses CPU
cross-calls to do the MSR update. Which is not really valid at the
early resume stage, and the warning is quite reasonable. Now, it all
happens to _work_, for the simple reason that smp_call_function_single()
ends up just doing the call directly on the CPU when the CPU number
matches, but we really should just do the wrmsr() directly instead.
This duplicates the wrmsr() logic, but hopefully we can just remove the
wrmsr_on_cpu() version eventually.
Reported-and-tested-by: Parag Warudkar <parag.lkml@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On some new Intel Atom processors (Penwell and Cloverview), there is
a feature that the TSC won't stop in S3 state, say the TSC value
won't be reset to 0 after resume. This feature makes TSC a more reliable
clocksource and could benefit the timekeeping code during system
suspend/resume cycle, so add a flag for it.
Signed-off-by: Feng Tang <feng.tang@intel.com>
[jstultz: Fix checkpatch warning]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Every 11 minutes ntp attempts to update the x86 rtc with the current
system time. Currently, the x86 code only updates the rtc if the system
time is within +/-15 minutes of the current value of the rtc. This
was done originally to avoid setting the RTC if the RTC was in localtime
mode (common with Windows dualbooting). Other architectures do a full
synchronization and now that we have better infrastructure to detect
when the RTC is in localtime, there is no reason that x86 should be
software limited to a 30 minute window.
This patch changes the behavior of the kernel to do a full synchronization
(year, month, day, hour, minute, and second) of the rtc when ntp requests
a synchronization between the system time and the rtc.
I've used the RTC library functions in this patchset as they do all the
required bounds checking.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: x86@kernel.org
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: linux-efi@vger.kernel.org
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
[jstultz: Tweak commit message, fold in build fix found by fengguang
Also add select RTC_LIB to X86, per new dependency, as found by prarit]
Signed-off-by: John Stultz <john.stultz@linaro.org>
This patch fixes a kernel crash when using precise sampling (PEBS)
after a suspend/resume. Turns out the CPU notifier code is not invoked
on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
by the kernel and keeps it power-on/resume value of 0 causing any PEBS
measurement to crash when running on CPU0.
The workaround is to add a hook in the actual resume code to restore
the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
the DS_AREA will be restored twice but this is harmless.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit e44b7b7 ("x86: move suspend wakeup code to C") didn't
care to also eliminate the side effects that the earlier 4c49156
("x86: make arch/x86/kernel/acpi/wakeup_32.S use a separate")
had, thus leaving a now pointless, almost page size gap at the
beginning of .text.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pavel Machek <pavel@ucw.cz>
Link: http://lkml.kernel.org/r/513DBAA402000078000C4896@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On exception exit, we restore the previous context tracking state based on
the regs of the interrupted frame. Iff that frame is in user mode as
stated by user_mode() helper, we restore the context tracking user mode.
However there is a tiny chunck of low level arch code after we pass through
user_enter() and until the CPU eventually resumes userspace.
If an exception happens in this tiny area, exception_enter() correctly
exits the context tracking user mode but exception_exit() won't restore
it because of the value returned by user_mode(regs).
As a result we may return to userspace with the wrong context tracking
state.
To fix this, change exception_enter() to return the context tracking state
prior to its call and pass this saved state to exception_exit(). This restores
the real context tracking state of the interrupted frame.
(May be this patch was suggested to me, I don't recall exactly. If so,
sorry for the missing credit).
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Mats Liljegren <mats.liljegren@enea.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Exceptions handling on context tracking should share common
treatment: on entry we exit user mode if the exception triggered
in that context. Then on exception exit we return to that previous
context.
Generalize this to avoid duplication across archs.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Mats Liljegren <mats.liljegren@enea.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The commit 27be457000
('x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok
flag') removed the hlt_works_ok flag from struct cpuinfo_x86, but
boot_cpu_data and new_cpu_data initializers were not changed
causing setting f00f_bug flag, instead of fdiv_bug.
If CONFIG_X86_F00F_BUG is not set the f00f_bug flag is never
cleared.
To avoid such problems in future C99-style initialization is now
used.
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: len.brown@intel.com
Link: http://lkml.kernel.org/r/1362266082-2227-1-git-send-email-krzysiek@podlesie.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The cpuinfo_x86 ptr is unused now. Drop it. Got obsolete by 69fb3676df
("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
removing its only user.
[ hpa: fixes gcc warning ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1362428180-8865-2-git-send-email-bp@alien8.de
Cc: Len Brown <len.brown@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* master: (15791 commits)
Linux 3.9-rc1
btrfs/raid56: Add missing #include <linux/vmalloc.h>
fix compat_sys_rt_sigprocmask()
SUNRPC: One line comment fix
ext4: enable quotas before orphan cleanup
ext4: don't allow quota mount options when quota feature enabled
ext4: fix a warning from sparse check for ext4_dir_llseek
ext4: convert number of blocks to clusters properly
ext4: fix possible memory leak in ext4_remount()
jbd2: fix ERR_PTR dereference in jbd2__journal_start
metag: Provide dma_get_sgtable()
metag: prom.h: remove declaration of metag_dt_memblock_reserve()
metag: copy devicetree to non-init memory
metag: cleanup metag_ksyms.c includes
metag: move mm/init.c exports out of metag_ksyms.c
metag: move usercopy.c exports out of metag_ksyms.c
metag: move setup.c exports out of metag_ksyms.c
metag: move kick.c exports out of metag_ksyms.c
metag: move traps.c exports out of metag_ksyms.c
metag: move irq enable out of irqflags.h on SMP
...
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Conflicts:
arch/x86/kernel/kvmclock.c
Put all config options needed to run Linux as a guest behind a
CONFIG_HYPERVISOR_GUEST menu so that they don't get built-in by default
but be selectable by the user. Also, make all units which depend on
x86_hyper, depend on this new symbol so that compilation doesn't fail
when CONFIG_HYPERVISOR_GUEST is disabled but those units assume its
presence.
Sort options in the new HYPERVISOR_GUEST menu, adapt config text and
drop redundant select.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1362428421-9244-3-git-send-email-bp@alien8.de
Cc: Dmitry Torokhov <dtor@vmware.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
... and switch i386 to HAVE_SYSCALL_WRAPPERS, killing open-coded
uses of asmlinkage_protect() in a bunch of syscalls.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull more VFS bits from Al Viro:
"Unfortunately, it looks like xattr series will have to wait until the
next cycle ;-/
This pile contains 9p cleanups and fixes (races in v9fs_fid_add()
etc), fixup for nommu breakage in shmem.c, several cleanups and a bit
more file_inode() work"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
constify path_get/path_put and fs_struct.c stuff
fix nommu breakage in shmem.c
cache the value of file_inode() in struct file
9p: if v9fs_fid_lookup() gets to asking server, it'd better have hashed dentry
9p: make sure ->lookup() adds fid to the right dentry
9p: untangle ->lookup() a bit
9p: double iput() in ->lookup() if d_materialise_unique() fails
9p: v9fs_fid_add() can't fail now
v9fs: get rid of v9fs_dentry
9p: turn fid->dlist into hlist
9p: don't bother with private lock in ->d_fsdata; dentry->d_lock will do just fine
more file_inode() open-coded instances
selinux: opened file can't have NULL or negative ->f_path.dentry
(In the meantime, the hlist traversal macros have changed, so this
required a semantic conflict fixup for the newly hlistified fid->dlist)