When calling into eeh_gather_pci_data() on pSeries platform, we
possiblly don't have pci_dev instance yet, but eeh_dev is always
ready. So we use cached capability from eeh_dev instead of pci_dev
for log dump there. In order to keep things unified, we also cache
PCI capability positions to eeh_dev for PowerNV as well.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch replaces printk(KERN_WARNING ...) with pr_warn() in the
function eeh_gather_pci_data().
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We have suffered recrusive frozen PE a lot, which was caused
by IO accesses during the PE reset. Ben came up with the good
idea to keep frozen PE until recovery (BAR restore) gets done.
With that, IO accesses during PE reset are dropped by hardware
and wouldn't incur the recrusive frozen PE any more.
The patch implements the idea. We don't clear the frozen state
until PE reset is done completely. During the period, the EEH
core expects unfrozen state from backend to keep going. So we
have to reuse EEH_PE_RESET flag, which has been set during PE
reset, to return normal state from backend. The side effect is
we have to clear frozen state for towice (PE reset and clear it
explicitly), but that's harmless.
We have some limitations on pHyp. pHyp doesn't allow to enable
IO or DMA for unfrozen PE. So we don't enable them on unfrozen PE
in eeh_pci_enable(). We have to enable IO before grabbing logs on
pHyp. Otherwise, 0xFF's is always returned from PCI config space.
Also, we had wrong return value from eeh_pci_enable() for
EEH_OPT_THAW_DMA case. The patch fixes it too.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We've observed multiple PE reset failures because of PCI-CFG
access during that period. Potentially, some device drivers
can't support EEH very well and they can't put the device to
motionless state before PE reset. So those device drivers might
produce PCI-CFG accesses during PE reset. Also, we could have
PCI-CFG access from user space (e.g. "lspci"). Since access to
frozen PE should return 0xFF's, we can block PCI-CFG access
during the period of PE reset so that we won't get recrusive EEH
errors.
The patch adds flag EEH_PE_RESET, which is kept during PE reset.
The PowerNV/pSeries PCI-CFG accessors reuse the flag to block
PCI-CFG accordingly.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When doing PE reset, EEH_PE_ISOLATED is cleared unconditionally.
However, We should remove that if the PE reset has cleared the
frozen state successfully. Otherwise, the flag should be kept.
The patch fixes the issue.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The PE state (for eeh_pe instance) EEH_PE_PHB_DEAD is duplicate to
EEH_PE_ISOLATED. Originally, those PHBs (PHB PE) with EEH_PE_PHB_DEAD
would be removed from the system. However, it's safe to replace
that with EEH_PE_ISOLATED.
The patch also clear EEH_PE_RECOVERING after fenced PHB has been handled,
either failure or success. It makes the PHB PE state consistent with:
PHB functions normally NONE
PHB has been removed EEH_PE_ISOLATED
PHB fenced, recovery in progress EEH_PE_ISOLATED | RECOVERING
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
module_init should return 0 or a negative errno.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit aac416fc38 (lkdtm: flush icache and report actions) calls
flush_icache_range from a module. It's exported on most architectures
that implement it, but not on powerpc. This patch exports it to fix
the module link failure.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
__ftrace_make_call assumed ABIv1 TOC stack offsets, so it
broke on ABIv2.
While we are here, we can simplify the instruction modification
code. Since we always update one instruction there is no need to
probe_kernel_write and flush_icache_range, just use patch_branch.
Signed-off-by: Anton Blanchard <anton@samba.org>
Now we have is_module_trampoline() and module_trampoline_target()
we can remove a bunch of intimate kernel module trampoline
knowledge from ftrace.
Signed-off-by: Anton Blanchard <anton@samba.org>
ftrace has way too much knowledge of our kernel module trampoline
layout hidden inside it. Create module_trampoline_target() that gives
the target address of a kernel module trampoline.
Signed-off-by: Anton Blanchard <anton@samba.org>
ftrace has way too much knowledge of our kernel module trampoline
layout hidden inside it. Create is_module_trampoline() that can
abstract this away inside the module loader code.
Signed-off-by: Anton Blanchard <anton@samba.org>
When testing the ftrace function tracer, I realised that ftrace_caller
and mcount are called from modules and they both call into C, therefore
they need the ABIv2 global entry point to establish r2.
Signed-off-by: Anton Blanchard <anton@samba.org>
ELFv2 doesn't use function descriptors, because it doesn't need to
load a new r2 when calling into a function. On the other hand, you're
supposed to use a local entry point for R_PPC_REL24 branches.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In ELFv2, r12 is supposed to equal to PC on entry to a function.
Our stubs use r11, so change swap that with r12.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
ELFv2 doesn't use function descriptors, so we don't expect symbols to
start with ".". But because depmod and modpost strip ".", and we have
the special symbol ".TOC.", we still need to do it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The new ELF ABI tends to use R_PPC64_REL16_LO and R_PPC64_REL16_HA
relocations (PC-relative), so implement them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The kernel resolved the '.TOC.' to a fake symbol, so we need to fix it up
to point to our .toc section plus 0x8000.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For the ELFv2 ABI, powerpc introduces a magic symbol ".TOC.". If we
don't create a CRC for it (minus the leading ".", since we strip that)
we get a modpost warning about missing CRC and the CRC array seems to
be displaced by 1 so other CRCs mismatch too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
For the ELFv2 ABI, powerpc introduces a magic symbol ".TOC.". depmod
then complains that this doesn't resolve (so does modpost, but we could
easily fix that). To export this, we need to use asm.
modpost and depmod both strip "." from symbols for the old PPC64 ELFv1
ABI, so we actually export a "TOC.".
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There is no need to put a function descriptor in
__secondary_hold_spinloop. Use ppc_function_entry to get the
instruction address and put it in __secondary_hold_spinloop instead.
Also fix an issue where we assumed cur_cpu_spec held a function
descriptor.
Signed-off-by: Anton Blanchard <anton@samba.org>
Change how we setup registers for ret_from_kernel_thread. In
ABIv1, instead of passing a function descriptor in, dereference
it and pass the target in directly.
Use ppc_global_function_entry to get it right on both ABIv1 and ABIv2.
Signed-off-by: Anton Blanchard <anton@samba.org>
To establish addressability quickly, ABIv2 requires the target
address of the function being called to be in r12. Fix a number of
places in assembly code that we do indirect function calls.
We need to avoid function descriptors on ABIv2 too.
Signed-off-by: Anton Blanchard <anton@samba.org>
There are a few places we have to use dot symbols with the
current ABI - the syscall table and the kvm hcall table.
Wrap both of these with a new macro called DOTSYM so it will
be easy to transition away from dot symbols in a future ABI.
Signed-off-by: Anton Blanchard <anton@samba.org>
STD_EXCEPTION_COMMON, STD_EXCEPTION_COMMON_ASYNC and
MASKABLE_EXCEPTION branch to the handler, so we can remove
the explicit dot symbol and binutils will do the right thing.
Signed-off-by: Anton Blanchard <anton@samba.org>
There is no need to create a function descriptor for the system call
table. By using one we force the system call table into the text
section and it really belongs in the rodata section.
This also removes another use of dot symbols.
Signed-off-by: Anton Blanchard <anton@samba.org>
We have a number of places where we load the text address of a local
function and indirectly branch to it in assembly. Since it is an
indirect branch binutils will not know to use the function text
address, so that trick wont work.
There is no need for these functions to have a function descriptor
so we can replace it with a label and remove the dot symbol.
Signed-off-by: Anton Blanchard <anton@samba.org>
binutils is smart enough to know that a branch to a function
descriptor is actually a branch to the functions text address.
Alan tells me that binutils has been doing this for 9 years.
Signed-off-by: Anton Blanchard <anton@samba.org>
Powerpc allows reordering over its ll/sc implementation. Implement the
two new barriers as appropriate.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-gg2ffgq32sjgy9b8lj6m3hsc@git.kernel.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-kernel@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
3bc955987f ("powerpc/PCI: Use list_for_each_entry() for bus traversal")
caused a NULL pointer dereference because the loop body set the iterator to
NULL:
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc000000000041d78
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c000000000041d78] .sys_pciconfig_iobase+0x68/0x1f0
LR [c000000000041e0c] .sys_pciconfig_iobase+0xfc/0x1f0
Call Trace:
[c0000003b4787db0] [c000000000041e0c] .sys_pciconfig_iobase+0xfc/0x1f0 (unreliable)
[c0000003b4787e30] [c000000000009ed8] syscall_exit+0x0/0x98
Fix it by using a temporary variable for the iterator.
[bhelgaas: changelog, drop tmp_bus initialization]
Fixes: 3bc955987f powerpc/PCI: Use list_for_each_entry() for bus traversal
Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
3bc955987f ("powerpc/PCI: Use list_for_each_entry() for bus traversal")
caused a NULL pointer dereference because the loop body set the iterator to
NULL:
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc000000000041d78
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c000000000041d78] .sys_pciconfig_iobase+0x68/0x1f0
LR [c000000000041e0c] .sys_pciconfig_iobase+0xfc/0x1f0
Call Trace:
[c0000003b4787db0] [c000000000041e0c] .sys_pciconfig_iobase+0xfc/0x1f0 (unreliable)
[c0000003b4787e30] [c000000000009ed8] syscall_exit+0x0/0x98
Fix it by using a temporary variable for the iterator.
[bhelgaas: changelog, drop tmp_bus initialization]
Fixes: 3bc955987f powerpc/PCI: Use list_for_each_entry() for bus traversal
Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Commit 8f619b5429 ("powerpc/ppc64: Do not turn AIL (reloc-on
interrupts) too early") added code to set the AIL bit in the LPCR
without checking whether the kernel is running in hypervisor mode. The
result is that when the kernel is running as a guest (i.e., under
PowerKVM or PowerVM), the processor takes a privileged instruction
interrupt at that point, causing a panic. The visible result is that
the kernel hangs after printing "returning from prom_init".
This fixes it by checking for hypervisor mode being available before
setting LPCR. If we are not in hypervisor mode, we enable relocation-on
interrupts later in pSeries_setup_arch using the H_SET_MODE hcall.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull more powerpc updates from Ben Herrenschmidt:
"Here are a few more powerpc things for you.
So you'll find here the conversion of the two new firmware sysfs
interfaces to the new API for self-removing files that Greg and Tejun
introduced, so they can finally remove the old one.
I'm also reverting the hwmon driver for powernv. I shouldn't have
merged it, I got a bit carried away here. I hadn't realized it was
never CCed to the relevant maintainer(s) and list(s), and happens to
have some issues so I'm taking it out and it will come back via the
proper channels.
The rest is a bunch of LE fixes (argh, some of the new stuff was
broken on LE, I really need to start testing LE myself !) and various
random fixes here and there.
Finally one bit that's not strictly a fix, which is the HVC OPAL
change to "kick" the HVC thread when the firmware tells us there is
new incoming data. I don't feel like waiting for this one, it's
simple enough, and it makes a big difference in console responsiveness
which is good for my nerves"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (26 commits)
powerpc/powernv Adapt opal-elog and opal-dump to new sysfs_remove_file_self
Revert "powerpc/powernv: hwmon driver for power values, fan rpm and temperature"
power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update
powerpc/le: Avoid creatng R_PPC64_TOCSAVE relocations for modules.
arch/powerpc: Use RCU_INIT_POINTER(x, NULL) in platforms/cell/spu_syscalls.c
powerpc/opal: Add missing include
powerpc: Convert last uses of __FUNCTION__ to __func__
powerpc: Add lq/stq emulation
powerpc/powernv: Add invalid OPAL call
powerpc/powernv: Add OPAL message log interface
powerpc/book3s: Fix mc_recoverable_range buffer overrun issue.
powerpc: Remove dead code in sycall entry
powerpc: Use of_node_init() for the fakenode in msi_bitmap.c
powerpc/mm: NUMA pte should be handled via slow path in get_user_pages_fast()
powerpc/powernv: Fix endian issues with sensor code
powerpc/powernv: Fix endian issues with OPAL async code
tty/hvc_opal: Kick the HVC thread on OPAL console events
powerpc/powernv: Add opal_notifier_unregister() and export to modules
powerpc/ppc64: Do not turn AIL (reloc-on interrupts) too early
powerpc/ppc64: Gracefully handle early interrupts
...
Recent CPUs support quad word load and store instructions. Add
support to the alignment handler for them.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In:
commit 742415d6b6
Author: Michael Neuling <mikey@neuling.org>
powerpc: Turn syscall handler into macros
We converted the syscall entry code onto macros, but in doing this we
introduced some cruft that's never run and should never have been added.
This removes that code.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The purpose of this single series of commits from Srivatsa S Bhat (with
a small piece from Gautham R Shenoy) touching multiple subsystems that use
CPU hotplug notifiers is to provide a way to register them that will not
lead to deadlocks with CPU online/offline operations as described in the
changelog of commit 93ae4f978c (CPU hotplug: Provide lockless versions
of callback registration functions).
The first three commits in the series introduce the API and document it
and the rest simply goes through the users of CPU hotplug notifiers and
converts them to using the new method.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJTQow2AAoJEILEb/54YlRxW4QQAJlYRDUzwFJzJzYhltQYuVR+
4D74XMtvXgoJfg3cwdSWvMKKpJZnA9BVN0f7Hcx9wYmgdexYUuHeZJmMNyc3S2+g
KjKBIsugvgmZhHbbLd6TJ6GBbhGT5JLt9VmSfL9zIkveInU1YHFUUqL/mxdHm4J0
BSGKjk2rN3waRJgmY+xfliFLtQjDKFwJpMuvrgtoUyfas3f4sIV43UNbqdvA/weJ
rzedxXOlKH/id4b56lj/4iIzcoL3mwvJJ7r6n0CEMsKv87z09kqR0O+69Tsq/cgs
j17CsvoJOmZGk3QTeKVMQWBsvk6aPoDu3zK83gLbQMt+qjOpSTbJLz/3HZw4/TrW
ss4nuZne1DLMGS+6hoxYbTP+6Ni//Kn+l/LrHc5jb7m1X3lMO4W2aV3IROtIE1rv
lEP1IG01NU4u9YwkVj1dyhrkSp8tLPul4SrUK8W+oNweOC5crjJV7vJbIPJgmYiM
IZN55wln0yVRtR4TX+rmvN0PixsInE8MeaVCmReApyF9pdzul/StxlBze5BKLSJD
cqo1kNPpsmdxoDucqUpQ/gSvy+IOl2qnlisB5PpV93sk7De6TFDYrGHxjYIW7jMf
StXwdCDDQhzd2Q8Kfpp895A1dbIl8rKtwA6bTU2eX+BfMVFzuMdT44cvosx1+UdQ
sWl//rg76nb13dFjvF+q
=SW7Q
-----END PGP SIGNATURE-----
Merge tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull CPU hotplug notifiers registration fixes from Rafael Wysocki:
"The purpose of this single series of commits from Srivatsa S Bhat
(with a small piece from Gautham R Shenoy) touching multiple
subsystems that use CPU hotplug notifiers is to provide a way to
register them that will not lead to deadlocks with CPU online/offline
operations as described in the changelog of commit 93ae4f978c ("CPU
hotplug: Provide lockless versions of callback registration
functions").
The first three commits in the series introduce the API and document
it and the rest simply goes through the users of CPU hotplug notifiers
and converts them to using the new method"
* tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (52 commits)
net/iucv/iucv.c: Fix CPU hotplug callback registration
net/core/flow.c: Fix CPU hotplug callback registration
mm, zswap: Fix CPU hotplug callback registration
mm, vmstat: Fix CPU hotplug callback registration
profile: Fix CPU hotplug callback registration
trace, ring-buffer: Fix CPU hotplug callback registration
xen, balloon: Fix CPU hotplug callback registration
hwmon, via-cputemp: Fix CPU hotplug callback registration
hwmon, coretemp: Fix CPU hotplug callback registration
thermal, x86-pkg-temp: Fix CPU hotplug callback registration
octeon, watchdog: Fix CPU hotplug callback registration
oprofile, nmi-timer: Fix CPU hotplug callback registration
intel-idle: Fix CPU hotplug callback registration
clocksource, dummy-timer: Fix CPU hotplug callback registration
drivers/base/topology.c: Fix CPU hotplug callback registration
acpi-cpufreq: Fix CPU hotplug callback registration
zsmalloc: Fix CPU hotplug callback registration
scsi, fcoe: Fix CPU hotplug callback registration
scsi, bnx2fc: Fix CPU hotplug callback registration
scsi, bnx2i: Fix CPU hotplug callback registration
...
Turn them on at the same time as we allow MSR_IR/DR in the paca
kernel MSR, ie, after the MMU has been setup enough to be able
to handle relocated access to the linear mapping.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If we take an interrupt such as a trap caused by a BUG_ON before the
MMU has been setup, the interrupt handlers try to enable virutal mode
and cause a recursive crash, making the original problem very hard
to debug.
This fixes it by adjusting the "kernel_msr" value in the PACA so that
it only has MSR_IR and MSR_DR (translation for instruction and data)
set after the MMU has been initialized for the processor.
We may still not have a console yet but at least we don't get into
a recursive fault (and early debug console or memory dump via JTAG
of the kernel buffer *will* give us the proper error).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
All our cpu feature updates were done for every CPU in the device-tree,
thus overwriting the cputable bits over and over again. Instead do them
only for the boot CPU.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Move the definition to setup-common.c and set the init value
to -1 on both 32 and 64-bit (it was 0 on 64-bit).
Additionally add a check to prom.c to garantee that the init
value has been udpated after the DT scan.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
For historical reasons that code was under #ifdef CONFIG_PPC_PSERIES
but it applies equally to all 64-bit platforms.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We can't take an IRQ when we're about to do a trechkpt as our GPR state is set
to user GPR values.
We've hit this when running some IBM Java stress tests in the lab resulting in
the following dump:
cpu 0x3f: Vector: 700 (Program Check) at [c000000007eb3d40]
pc: c000000000050074: restore_gprs+0xc0/0x148
lr: 00000000b52a8184
sp: ac57d360
msr: 8000000100201030
current = 0xc00000002c500000
paca = 0xc000000007dbfc00 softe: 0 irq_happened: 0x00
pid = 34535, comm = Pooled Thread #
R00 = 00000000b52a8184 R16 = 00000000b3e48fda
R01 = 00000000ac57d360 R17 = 00000000ade79bd8
R02 = 00000000ac586930 R18 = 000000000fac9bcc
R03 = 00000000ade60000 R19 = 00000000ac57f930
R04 = 00000000f6624918 R20 = 00000000ade79be8
R05 = 00000000f663f238 R21 = 00000000ac218a54
R06 = 0000000000000002 R22 = 000000000f956280
R07 = 0000000000000008 R23 = 000000000000007e
R08 = 000000000000000a R24 = 000000000000000c
R09 = 00000000b6e69160 R25 = 00000000b424cf00
R10 = 0000000000000181 R26 = 00000000f66256d4
R11 = 000000000f365ec0 R27 = 00000000b6fdcdd0
R12 = 00000000f66400f0 R28 = 0000000000000001
R13 = 00000000ada71900 R29 = 00000000ade5a300
R14 = 00000000ac2185a8 R30 = 00000000f663f238
R15 = 0000000000000004 R31 = 00000000f6624918
pc = c000000000050074 restore_gprs+0xc0/0x148
cfar= c00000000004fe28 dont_restore_vec+0x1c/0x1a4
lr = 00000000b52a8184
msr = 8000000100201030 cr = 24804888
ctr = 0000000000000000 xer = 0000000000000000 trap = 700
This moves tm_recheckpoint to a C function and moves the tm_restore_sprs into
that function. It then adds IRQ disabling over the trechkpt critical section.
It also sets the TEXASR FS in the signals code to ensure this is never set now
that we explictly write the TM sprs in tm_recheckpoint.
Signed-off-by: Michael Neuling <mikey@neuling.org>
cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>