Originally, EEH core probes on device_node or pci_dev to populate
EEH devices and PEs, which conflicts with the fact: SRIOV VFs are
usually enabled and created by PF's driver and they don't have the
corresponding device_nodes. Instead, SRIOV VFs have dynamically
created pci_dn, which can be used for EEH probe.
The patch reworks EEH probe for PowerNV and pSeries platforms to
do probing based on pci_dn, instead of pci_dev or device_node any
more.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch adds function traverse_pci_dn(), which is similar to
traverse_pci_devices() except it takes pci_dn, not device_node
as parameter. The pci_dev.c has been reworked to create eeh_dev
from pci_dn, instead of device_node.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Originally, EEH probes on device_node or pci_dev and populates the
corresponding eeh_dev. In the subsequent patches, EEH will probes
on pci_dn and populates the corresponding eeh_dev. So we have to
cache some information in pci_dn, either from device_node or SRIOV
PF's enablement platform hook, to populate the eeh_dev properly.
The motivation to probe pci_dn, instead of device node or pci_dev,
to populate eeh_dev is SRIOV VFs are dynamically created and we
don't have the corresponding device nodes for them.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, the PCI config accessors are implemented based on device node.
Unfortunately, SRIOV VFs won't have the corresponding device nodes. pci_dn
will be used in replacement with device node for SRIOV VFs. So we have to
use pci_dn in PCI config accessors.
The patch refactors pci_dn in following aspects to make it ready to be used
in PCI config accessors as we do in subsequent patch:
* pci_dn is organized as a hierarchy tree. PCI device's pci_dn is
put to the child list of pci_dn of its upstream bridge or PHB. VF's
pci_dn will be put to the child list of pci_dn of PF's bridge.
* For one particular PCI device (VF or not), its pci_dn can be
found from pdev->dev.archdata.pci_data, PCI_DN(devnode), or
parent's list. The fast path (fetching pci_dn through PCI device
instance) is populated during early fixup time.
[bhelgaas: changelog]
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
commit id 2ba9f0d has changed CONFIG_KVM_BOOK3S_64_HV to tristate to allow
HV/PR bits to be built as modules. But the MCE code still depends on
CONFIG_KVM_BOOK3S_64_HV which is wrong. When user selects
CONFIG_KVM_BOOK3S_64_HV=m to build HV/PR bits as a separate module the
relevant MCE code gets excluded.
This patch fixes the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLER. This
makes sure that the relevant MCE code is included when HV/PR bits
are built as a separate modules.
Fixes: 2ba9f0d887 ("kvm: powerpc: book3s: Support building HV and PR KVM as module")
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The power7_nap(), power7_sleep() and power7_winkle() functions are
called from pnv_smp_cpu_kill_self(), which expects them to return the
SRR1 value set by the hardware on wakeup, or 0 if no nap/sleep/winkle
occurred. However, in the case where an interrupt needs to be
replayed, the logic in power7_powersave_common (the common code for
power7_nap et al.) doesn't set r3 to 0 in this case. Instead what we
get as the return value is the selector for the type of power-saving
mode requested (1, 2 or 3). In fact this should not affect the
operation of pnv_smp_cpu_kill_self(), but it is better to get this
correct, so this adds an instruction to set r3 to 0 in this case.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
While we are here, let us make timestamp related code y2038-safe.
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch extends pstore, a generic interface to platform dependent
persistent storage, support for powernv platform to capture certain
useful information, during dying moments. Such support is already in
place for pseries platform. This patch re-uses most of that code.
It is a common practice to compile kernels with both CONFIG_PPC_PSERIES=y
and CONFIG_PPC_POWERNV=y. The code in nvram_init_oops_partition() routine
still works as intended, as the caller is platform specific code which
passes the appropriate value for "rtas_partition_exists" parameter.
In all other places, where CONFIG_PPC_PSERIES or CONFIG_PPC_POWERNV
flag is used in this patchset, it is to reduce the kernel size in cases
where this flag is not set and doesn't have any impact logic wise.
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
With minor checks, we can move most of the code for nvram
under pseries to a common place to be re-used by other
powerpc platforms like powernv. This patch moves such
common code to arch/powerpc/kernel/nvram_64.c file.
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
[mpe: Move select of ZLIB_DEFLATE to PPC64 to fix the build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There's a new variant of POWER8 coming called "POWER8 with NVLink". The
core is identical to POWER8 but unfortunately they strapped it with a
different PVR, so we need to add an explicit entry for it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Since we can now use hypervisor doorbells for host IPIs, this makes
sure we clear the host IPI flag when taking a doorbell interrupt, and
clears any pending doorbell IPI in pnv_smp_cpu_kill_self() (as we
already do for IPIs sent via the XICS interrupt controller). Otherwise
if there did happen to be a leftover pending doorbell interrupt for
an offline CPU thread for any reason, it would prevent that thread from
going into a power-saving mode; it would instead keep waking up because
of the interrupt.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The 'arg' argument to copy_thread() is only ever used when forking a new
kernel thread. Hence, rename it to 'kthread_arg' for clarity.
Signed-off-by: Alex Dowad <alexinbeijing@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We have set CONFIG_PPC_OF to always 'y' in commit 0a498d96a3
("powerpc: set CONFIG_PPC_OF=y always for ARCH=powerpc") nine years
ago. And the arch/ppc also has gone away for many years. The OF
functionality was also moved to a common place and be used by many
archs. So it does make no sense to keep such a option in the current
kernel. Just kill it.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Function pcibios_set_pcie_reset_state() is possibly called by
pci_reset_function(), on which VFIO infrastructure depends to
issue reset. pcibios_set_pcie_reset_state() is issuing reset
on the parent PE of the indicated PCI device. The reset causes
state lost on all PCI devices except the indicated one as the
argument to pcibios_set_pcie_reset_state(). Also, sideband
MMIO access from guest when issuing reset would cause unexpected
EEH error.
For above two issues, the patch applies following enhancements
to pcibios_set_pcie_reset_state():
* For all PCI devices except the indicated one, save their
state prior to reset and restore state after that.
* Explicitly freeze PE prior to reset and unfreeze it after
that, in order to avoid unexpected EEH error.
Tested-by: Priya M. A <priyama2@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The flush_tlb hook in cpu_spec was introduced as a generic function hook
to invalidate TLBs. But the current implementation of flush_tlb hook
takes IS (invalidation selector) as an argument which is architecture
dependent. Hence, It is not right to have a generic routine where caller
has to pass non-generic argument.
This patch fixes this and makes flush_tlb hook as high level API.
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
As our various loops (copy, string, crypto etc) get more complicated,
we want to share implementations between userspace (eg glibc) and
the kernel. We also want to write userspace test harnesses to put
in tools/testing/selftest.
One gratuitous difference between userspace and the kernel is the
VMX register definitions - the kernel uses vrX whereas both gcc and
glibc use vX.
Change the kernel to match userspace.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
After d905c5df9a ("PPC: POWERNV: move iommu_add_device earlier"), the
refcnt on the kobject backing the IOMMU group for a PCI device is
elevated by each call to pci_dma_dev_setup_pSeriesLP() (via
set_iommu_table_base_and_group). When we go to dlpar a multi-function
PCI device out:
iommu_reconfig_notifier ->
iommu_free_table ->
iommu_group_put
BUG_ON(tbl->it_group)
We trip this BUG_ON, because there are still references on the table, so
it is not freed. Fix this by moving the powernv bus notifier to common
code and calling it for both powernv and pseries.
Fixes: d905c5df9a ("PPC: POWERNV: move iommu_add_device earlier")
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Tested-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton has a busy ppc64le KVM box where guests sometimes hit the infamous
"kernel BUG at kernel/smpboot.c:134!" issue during boot:
BUG_ON(td->cpu != smp_processor_id());
Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
output confirms it:
CPU: 0
Comm: watchdog/130
The problem is that we aren't ensuring the CPU active bit is set for the
secondary before allowing the master to continue on. The master unparks
the secondary CPU's kthreads and the scheduler looks for a CPU to run
on. It calls select_task_rq() and realises the suggested CPU is not in
the cpus_allowed mask. It then ends up in select_fallback_rq(), and
since the active bit isnt't set we choose some other CPU to run on.
This seems to have been introduced by 6acbfb9697 "sched: Fix hotplug
vs. set_cpus_allowed_ptr()", which changed from setting active before
online to setting active after online. However that was in turn fixing a
bug where other code assumed an active CPU was also online, so we can't
just revert that fix.
The simplest fix is just to spin waiting for both active & online to be
set. We already have a barrier prior to set_cpu_online() (which also
sets active), to ensure all other setup is completed before online &
active are set.
Fixes: 6acbfb9697 ("sched: Fix hotplug vs. set_cpus_allowed_ptr()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
enhancements and fixes mostly for ARM32, ARM64, MIPS and Power-based
devices. Additionaly the framework core underwent a bit of surgery with
two major changes. The boundary between the clock core and clock
providers (e.g clock drivers) is now more well defined with dedicated
provider helper functions. struct clk no longer maps 1:1 with the
hardware clock but is a true per-user cookie which helps us tracker
users of hardware clocks and debug bad behavior. The second major change
is the addition of rate constraints for clocks. Rate ranges are now
supported which are analogous to the voltage ranges in the regulator
framework. Unfortunately these changes to the core created some
breakeage. We think we fixed it all up but for this reason there are
lots of last minute commits trying to undo the damage.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU54D5AAoJEDqPOy9afJhJs6AQAK5YuUwjDchdpNZx9p7OnT1q
+poehuUwE/gYjmdACqYFyaPrI/9f43iNCfFAgKGLQqmB5ZK4sm4ktzfBEhjWINR2
iiCx9QYMQVGiKwC8KU0ddeBciglE2b/DwxB45m9TsJEjowucUeBzwLEIj5DsGxf7
teXRoOWgXdz1MkQJ4pnA09Q3qEPQgmu8prhMfka/v75/yn7nb9VWiJ6seR2GqTKY
sIKL9WbKjN4AzctggdqHnMSIqZoq6vew850bv2C1fPn7GiYFQfWW+jvMlVY40dp8
nNa2ixSQSIXVw4fCtZhTIZcIvZ8puc7WVLcl8fz3mUe3VJn1VaGs0E+Yd3GexpIV
7bwkTOIdS8gSRlsUaIPiMnUob5TUMmMqjF4KIh/AhP4dYrmVbU7Ie8ccvSxe31Ku
lK7ww6BFv3KweTnW/58856ZXDlXLC6x3KT+Fw58L23VhPToFgYOdTxn8AVtE/LKP
YR3UnY9BqFx6WHXVoNvg3Piyej7RH8fYmE9om8tyWc/Ab8Eo501SHs9l3b2J8snf
w/5STd2CYxyKf1/9JLGnBvGo754O9NvdzBttRlygB14gCCtS/SDk/ELG2Ae+/a9P
YgRk2+257h8PMD3qlp94dLidEZN4kYxP/J6oj0t1/TIkERWfZjzkg5tKn3/hEcU9
qM97ZBTplTm6FM+Dt/Vk
=zCVK
-----END PGP SIGNATURE-----
Merge tag 'clk-for-linus-3.20' of git://git.linaro.org/people/mike.turquette/linux
Pull clock framework updates from Mike Turquette:
"The clock framework changes contain the usual driver additions,
enhancements and fixes mostly for ARM32, ARM64, MIPS and Power-based
devices.
Additionally the framework core underwent a bit of surgery with two
major changes:
- The boundary between the clock core and clock providers (e.g clock
drivers) is now more well defined with dedicated provider helper
functions. struct clk no longer maps 1:1 with the hardware clock
but is a true per-user cookie which helps us tracker users of
hardware clocks and debug bad behavior.
- The addition of rate constraints for clocks. Rate ranges are now
supported which are analogous to the voltage ranges in the
regulator framework.
Unfortunately these changes to the core created some breakeage. We
think we fixed it all up but for this reason there are lots of last
minute commits trying to undo the damage"
* tag 'clk-for-linus-3.20' of git://git.linaro.org/people/mike.turquette/linux: (113 commits)
clk: Only recalculate the rate if needed
Revert "clk: mxs: Fix invalid 32-bit access to frac registers"
clk: qoriq: Add support for the platform PLL
powerpc/corenet: Enable CLK_QORIQ
clk: Replace explicit clk assignment with __clk_hw_set_clk
clk: Add __clk_hw_set_clk helper function
clk: Don't dereference parent clock if is NULL
MIPS: Alchemy: Remove bogus args from alchemy_clk_fgcs_detr
clkdev: Always allocate a struct clk and call __clk_get() w/ CCF
clk: shmobile: div6: Avoid division by zero in .round_rate()
clk: mxs: Fix invalid 32-bit access to frac registers
clk: omap: compile legacy omap3 clocks conditionally
clkdev: Export clk_register_clkdev
clk: Add rate constraints to clocks
clk: remove clk-private.h
pci: xgene: do not use clk-private.h
arm: omap2+ remove dead clock code
clk: Make clk API return per-user struct clk instances
clk: tegra: Define PLLD_DSI and remove dsia(b)_mux
clk: tegra: Add support for the Tegra132 CAR IP block
...
Add a new kexec preprocessor macro IND_FLAGS, which is the bitwise OR of
all the possible kexec IND_ kimage_entry indirection flags. Having this
macro allows for simplified code in the prosessing of the kexec
kimage_entry items. Also, remove the local powerpc definition and use the
generic one.
Signed-off-by: Geoff Levand <geoff@infradead.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Maximilian Attems <max@stro.at>
Cc: Michal Marek <mmarek@suse.cz>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
printk and friends can now format bitmaps using '%*pb[l]'. cpumask
and nodemask also provide cpumask_pr_args() and nodemask_pr_args()
respectively which can be used to generate the two printf arguments
necessary to format the specified cpu/nodemask.
* Spurious if (len > 1) test dropped from shared_cpu_map_show().
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On POWER8 virtualised kernels the VTB register can be read to have a view
of time that only increases while the guest is running. This will prevent
guests from seeing time jump if a guest is paused for significant amounts
of time.
On POWER7 and below virtualised kernels stolen time is subtracted from
local_clock as a best effort approximation. This will not eliminate
spurious warnings in the case of a suspended guest but may reduce the
occurance in the case of softlockups due to host over commit.
Bare metal kernels should avoid reading the VTB as KVM does not restore
sane values when not executing, the approxmation is fine as host kernels
won't observe any stolen time.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrew Jones <drjones@redhat.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: chai wen <chaiw.fnst@cn.fujitsu.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Ben Zhang <benzh@chromium.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If an attacker can cause a controlled kernel stack overflow, overwriting
the restart block is a very juicy exploit target. This is because the
restart_block is held in the same memory allocation as the kernel stack.
Moving the restart block to struct task_struct prevents this exploit by
making the restart_block harder to locate.
Note that there are other fields in thread_info that are also easy
targets, at least on some architectures.
It's also a decent simplification, since the restart code is more or less
identical on all architectures.
[james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Richard Weinberger <richard@nod.at>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Including:
- Update of all defconfigs
- Addition of a bunch of config options to modernise our defconfigs
- Some PS3 updates from Geoff
- Optimised memcmp for 64 bit from Anton
- Fix for kprobes that allows 'perf probe' to work from Naveen
- Several cxl updates from Ian & Ryan
- Expanded support for the '24x7' PMU from Cody & Sukadev
- Freescale updates from Scott:
"Highlights include 8xx optimizations, some more work on datapath device
tree content, e300 machine check support, t1040 corenet error reporting,
and various cleanups and fixes."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU2/LSAAoJEFHr6jzI4aWATDAQAKPU6v2Mq0sLnGst69waHU/Q
vvpIq9hqVeSr6znHhrnazc3iQTLk0acqIdxUl/dT+5ADhi9+FxGD5Ckk+BH1DDve
g6mQelSMlVZF9hKonHsbr4iUuTUyZyx2vj2qjdgOaRiv9Xubq6vUFNeolq3AeHxv
J33vqRTmowj3VJ52u+V1dmzXQGfUye7DG2jHpjXoBieZsroTvyuYm5GoIPblWFO6
zbYRh6IitALnQRtXfwIManPyWMkJti9JX8PwDkmvacr+V+MXbrksHpIOITMhNlo1
WsVnFMpxuk80XuUfhaKZgISgBSfCqBckvKDn2QwztF2/kBnV6Su5xiOKVgouzM6B
myy+maiMZlNJlNjqdMK5v2bqMXICP048zgfMbDN2e1K25jSSlRawt0RngoCQO2EP
7aWmEDAlL3shgzkl68pj1fevQokxC/40C1yExIgAa9C31+bjtMz4Xb1SfN1SSveW
7uWEY/eG9eLsrSE1CeBDvh6B8BRdyuIHgPhux4Tgc/bUtBGFQ29NuXwKh3QCeEy9
9wWrRGx3U69eP06Ey7P5js3jPTQs80bjJewyGaiPQF5XHB89To8Dg8VfXjEV49Dx
Pa3OLL5QsQloKfEBiEhQeGfKYImC00pVYAxc0qpmnr9T+25Ri1TLdF1EBAwriSYE
5p9kSW+ZIht0lvzsdPNm
=xDU3
-----END PGP SIGNATURE-----
Merge tag 'powerpc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux
Pull powerpc updates from Michael Ellerman:
- Update of all defconfigs
- Addition of a bunch of config options to modernise our defconfigs
- Some PS3 updates from Geoff
- Optimised memcmp for 64 bit from Anton
- Fix for kprobes that allows 'perf probe' to work from Naveen
- Several cxl updates from Ian & Ryan
- Expanded support for the '24x7' PMU from Cody & Sukadev
- Freescale updates from Scott:
"Highlights include 8xx optimizations, some more work on datapath
device tree content, e300 machine check support, t1040 corenet
error reporting, and various cleanups and fixes"
* tag 'powerpc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (102 commits)
cxl: Add missing return statement after handling AFU errror
cxl: Fail AFU initialisation if an invalid configuration record is found
cxl: Export optional AFU configuration record in sysfs
powerpc/mm: Warn on flushing tlb page in kernel context
powerpc/powernv: Add OPAL soft-poweroff routine
powerpc/perf/hv-24x7: Document sysfs event description entries
powerpc/perf/hv-gpci: add the remaining gpci requests
powerpc/perf/{hv-gpci, hv-common}: generate requests with counters annotated
powerpc/perf/hv-24x7: parse catalog and populate sysfs with events
perf: define EVENT_DEFINE_RANGE_FORMAT_LITE helper
perf: add PMU_EVENT_ATTR_STRING() helper
perf: provide sysfs_show for struct perf_pmu_events_attr
powerpc/kernel: Avoid initializing device-tree pointer twice
powerpc: Remove old compile time disabled syscall tracing code
powerpc/kernel: Make syscall_exit a local label
cxl: Fix device_node reference counting
powerpc/mm: bail out early when flushing TLB page
powerpc: defconfigs: add MTD_SPI_NOR (new dependency for M25P80)
perf/powerpc: reset event hw state when adding it to the PMU
powerpc/qe: Use strlcpy()
...
Freescale updates from Scott:
"Highlights include 8xx optimizations, some more work on datapath device
tree content, e300 machine check support, t1040 corenet error reporting,
and various cleanups and fixes."
As commit 50ba08f3 ("of/fdt: Don't clear initial_boot_params
if fdt_check_header() fails") does, the device-tree pointer
"initial_boot_params" is initialized by early_init_dt_verify(),
which is called by early_init_devtree(). So we needn't explicitly
initialize that again in early_init_devtree().
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We have code to do syscall tracing which is disabled at compile time by
default. It's not been touched since the dawn of time (ie. v2.6.12).
There are now better ways to do syscall tracing, ie. using the
raw_syscall, or syscall tracepoints.
For the specific case of tracing syscalls at boot on a system that
doesn't get to userspace, you can boot with:
trace_event=syscalls tp_printk=on
Which will trace syscalls from boot, and echo all output to the console.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently when we back trace something that is in a syscall we see
something like this:
[c000000000000000] [c000000000000000] SyS_read+0x6c/0x110
[c000000000000000] [c000000000000000] syscall_exit+0x0/0x98
Although it's entirely correct, seeing syscall_exit at the bottom can be
confusing - we were exiting from a syscall and then called SyS_read() ?
If we instead change syscall_exit to be a local label we get something
more intuitive:
[c0000001fa46fde0] [c00000000026719c] SyS_read+0x6c/0x110
[c0000001fa46fe30] [c000000000009264] system_call+0x38/0xd0
ie. we were handling a system call, and it was SyS_read().
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
All accessed to PGD entries are done via 0(r11).
By using lower part of swapper_pg_dir as load index to r11, we can remove the
ori instruction.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
L1 base address is now aligned so we can insert L1 index into r11 directly and
then preserve r10
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Kernel MMU handling code handles validity of entries via _PMD_PRESENT which
corresponds to V bit in MD_TWC and MI_TWC. When the V bit is not set, MPC8xx
triggers TLBError exception. So we don't have to check that and branch ourself
to TLBError. We can set TLB entries with non present entries, remove all those
tests and let the 8xx handle it. This reduce the number of cycle when the
entries are valid which is the case most of the time, and doesn't significantly
increase the time for handling invalid entries.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Since commit 33fb845a6f ("powerpc/8xx: Don't use MD_TWC for walk"), MD_EPN and
MD_TWC are not writen anymore in FixupDAR so saving r3 has become useless.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
On powerpc 8xx, in TLB entries, 0x400 bit is set to 1 for read-only pages
and is set to 0 for RW pages. So we should use _PAGE_RO instead of _PAGE_RW
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Remove slice_set_psize() which is not used.
It was added in 3a8247cc2c "powerpc: Only demote individual slices
rather than whole process" but was never used.
Remove vsx_assist_exception() which is not used.
It was added in ce48b21007 "powerpc: Add VSX context save/restore,
ptrace and signal support" but was never used.
Remove generic_mach_cpu_die() which is not used.
Its last caller was removed in 375f561a41 "powerpc/powernv: Always go
into nap mode when CPU is offline".
Remove mpc7448_hpc2_power_off() and mpc7448_hpc2_halt() which are
unused.
These were introduced in c5d56332fd "[POWERPC] Add general support for
mpc7448hpc2 (Taiga) platform" but were never used.
This was partially found by using a static code analysis program called
cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
[mpe: Update changelog with details on when/why they are unused]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
RTAS events require arguments be passed in big endian while hypercalls
have their arguments passed in registers and the values should therefore
be in CPU endian.
The "ibm,suspend_me" 'RTAS' call makes a sequence of hypercalls to setup
one true RTAS call. This means that "ibm,suspend_me" is handled
specially in the ppc_rtas() syscall.
The ppc_rtas() syscall has its arguments in big endian and can therefore
pass these arguments directly to the RTAS call. "ibm,suspend_me" is
handled specially from within ppc_rtas() (by calling rtas_ibm_suspend_me())
which has left an endian bug on little endian systems due to the
requirement of hypercalls. The return value from rtas_ibm_suspend_me()
gets returned in cpu endian, and is left unconverted, also a bug on
little endian systems.
rtas_ibm_suspend_me() does not actually make use of the rtas_args that
it is passed. This patch removes the convoluted use of the rtas_args
struct to pass params to rtas_ibm_suspend_me() in favour of passing what
it needs as actual arguments. This patch also ensures the two callers of
rtas_ibm_suspend_me() pass function parameters in cpu endian and in the
case of ppc_rtas(), converts the return value.
migrate_store() (the other caller of rtas_ibm_suspend_me()) is from a
sysfs file which deals with everything in cpu endian so this function
only underwent cleanup.
This patch has been tested with KVM both LE and BE and on PowerVM both
LE and BE. Under QEMU/KVM the migration happens without touching these
code pathes.
For PowerVM there is no obvious regression on BE and the LE code path
now provides the correct parameters to the hypervisor.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Resource management
- Clip bridge windows to fit in upstream windows (Yinghai Lu)
Virtualization
- Mark Atheros AR93xx to avoid using bus reset (Alex Williamson)
Miscellaneous
- Update Richard Zhu's email address (Lucas Stach)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUwqpDAAoJEFmIoMA60/r8ykIQAINgkP/iPaFMTPkTSfzTJCMY
oQVNGha4FDt6Ic1UWGyS/sYUpywSnALxlYWxVZTm5r+sGQ2yJBo6veuxvCI09YFw
lWqf6lfvkFSthWCo7pHLoNaIjKJUNCy4a2han31aAIScMCNX4YF60YorMSjQBST8
smLMG75U3U9VWaXYsV1e5gTvLa5IQh4lgaTgMAOXqd+6WcAR4WwOgD2sR06o2X43
63JF2U+ieuA789Xu2IS92TmMMESD5haEZATqdGPtxpnqyHxmBNu0Y4JkkBWD2S92
HvveOoLBT2TBfICkftvCJscBLHh7PZMIx9nLx58SnijVzX+hzVr4Zfc96MZU50MK
DuNbbZn3sO902ukOEpfih7Mg0tDxCxNytleEdAnXmZuqf+odbd/Y4AA0Hg6w7GEY
OsVGbQAT/knlTfsSZsivtmUl7l1SXzrozv+q4f4szY95v34S9pm0sWzz0IBn7oKj
h7N9Vslr3lyEudOUo1OrFq+0arDw53kwOOkIavMUH0nvTqKs4cmXBcGMfo1EfMa+
3YhjwbgpvtZ3AXi2NSBk4gIGZEmQslvgRStLhgXVDl+9DieK+sw1Vx4cKe8gu9mD
c7zPStEsJBJgd3v+8s8avwo8R0oPZb6MsCKFjjaYojTvpfFmfX0YyWE/TzYoUm6Z
+BTyA8t0+3jTArTs/Zid
=HRy7
-----END PGP SIGNATURE-----
Merge tag 'pci-v3.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"These are fixes for:
- a resource management problem that causes a Radeon "Fatal error
during GPU init" on machines where the BIOS programmed an invalid
Root Port window. This was a regression in v3.16.
- an Atheros AR93xx device that doesn't handle PCI bus resets
correctly. This was a regression in v3.14.
- an out-of-date email address"
* tag 'pci-v3.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
MAINTAINERS: Update Richard Zhu's email address
sparc/PCI: Clip bridge windows to fit in upstream windows
powerpc/PCI: Clip bridge windows to fit in upstream windows
parisc/PCI: Clip bridge windows to fit in upstream windows
mn10300/PCI: Clip bridge windows to fit in upstream windows
microblaze/PCI: Clip bridge windows to fit in upstream windows
ia64/PCI: Clip bridge windows to fit in upstream windows
frv/PCI: Clip bridge windows to fit in upstream windows
alpha/PCI: Clip bridge windows to fit in upstream windows
x86/PCI: Clip bridge windows to fit in upstream windows
PCI: Add pci_claim_bridge_resource() to clip window if necessary
PCI: Add pci_bus_clip_resource() to clip to fit upstream window
PCI: Pass bridge device, not bus, when updating bridge windows
PCI: Mark Atheros AR93xx to avoid bus reset
PCI: Add flag for devices where we can't use bus reset
When PE's frozen count hits maximal allowed frozen times, which is
5 currently, it will be forced to be offline permanently. Once the
PE is removed permanently, rebooting machine is required to bring
the PE back. It's not convienent when testing EEH functionality.
The patch exports the maximal allowed frozen times through debugfs
entry (/sys/kernel/debug/powerpc/eeh_max_freezes).
Requested-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The conditions that one specific PE's frozen count exceeds the maximal
allowed times (EEH_MAX_ALLOWED_FREEZES) and it's in isolated or recovery
state indicate the PE was removed permanently implicitly. The patch
introduces flag EEH_PE_REMOVED to indicate that explicitly so that we
don't depend on the fixed maximal allowed times, which can be varied as
we do in subsequent patch.
Flag EEH_PE_REMOVED is expected to be marked for the PE whose frozen
count exceeds the maximal allowed times, or just failed from recovery.
Requested-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
PE#0 should be regarded as valid for P7IOC, while it's invalid for
PHB3. The patch adds flag EEH_VALID_PE_ZERO to differentiate those
two cases. Without the patch, we possibly see frozen PE#0 state is
cleared without EEH recovery taken on P7IOC as following kernel logs
indicate:
[root@ltcfbl8eb ~]# dmesg
:
pci 0000:00 : [PE# 000] Secondary bus 0 associated with PE#0
pci 0000:01 : [PE# 001] Secondary bus 1 associated with PE#1
pci 0001:00 : [PE# 000] Secondary bus 0 associated with PE#0
pci 0001:01 : [PE# 001] Secondary bus 1 associated with PE#1
pci 0002:00 : [PE# 000] Secondary bus 0 associated with PE#0
pci 0002:01 : [PE# 001] Secondary bus 1 associated with PE#1
pci 0003:00 : [PE# 000] Secondary bus 0 associated with PE#0
pci 0003:01 : [PE# 001] Secondary bus 1 associated with PE#1
pci 0003:20 : [PE# 002] Secondary bus 32..63 associated with PE#2
:
EEH: Clear non-existing PHB#3-PE#0
EEH: PHB location: U78AE.001.WZS00M9-P1-002
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When calling to early_setup(), we pick "boot_paca" up for the master CPU
and initialize that with initialise_paca(). At that point, the SLB
shadow buffer isn't populated yet. Updating the SLB shadow buffer should
corrupt what we had in physical address 0 where the trap instruction is
usually stored.
This hasn't been observed to cause any trouble in practice, but is
obviously fishy.
Fixes: 6f4441ef70 ("powerpc: Dynamically allocate slb_shadow from memblock")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Once upon a time, at least 9 years ago (< 2.6.12), _TIF_SYSCALL_T_OR_A
meant "TRACE or AUDIT". But these days it means TRACE or AUDIT or
SECCOMP or TRACEPOINT or NOHZ.
All of those are implemented via syscall_dotrace() so rename the flag to
that to try and clarify things.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
pci_dn->phb is set to phb in update_dn_pci_info(), if succeed.
This patch removes the duplication of pci_dn->phb initialization.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
So the boards which has COMMON_CLK enabled don't have to
invoke this in its board specific file.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Acked-by: Scott Wood <scottwood@freescale.com>
Acked-by: Michael Turquette <mturquette@linaro.org>
Signed-off-by: Michael Turquette <mturquette@linaro.org>
Every PCI-PCI bridge window should fit inside an upstream bridge window
because orphaned address space is unreachable from the primary side of the
upstream bridge. If we inherit invalid bridge windows that overlap an
upstream window from firmware, clip them to fit and update the bridge
accordingly.
[bhelgaas: changelog]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=85491
Reported-by: Marek Kordik <kordikmarek@gmail.com>
Fixes: 5b28541552 ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Gavin Shan <gwshan@linux.vnet.ibm.com>
CC: Anton Blanchard <anton@samba.org>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
CC: Wei Yang <weiyang@linux.vnet.ibm.com>
CC: Andrew Murray <amurray@embedded-bits.co.uk>
CC: linuxppc-dev@lists.ozlabs.org
This reverts commit 7c5c92ed56.
Although this did fix the bug it was aimed at, it also broke secondary
startup on platforms that use give/take_timebase(). Unfortunately we
didn't detect that while it was in next.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We have two arrays in kvm_host_state that contain register values for
the PMU. Currently we only create an asm-offsets symbol for the base of
the arrays, and do the array offset in the assembly code.
Creating an asm-offsets symbol for each field individually makes the
code much nicer to read, particularly for the MMCRx/SIxR/SDAR fields, and
might have helped us notice the recent double restore bug we had in this
code.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Alexander Graf <agraf@suse.de>