For a while we've been having issues with seemingly random interrupts
coming from nvidia cards when resuming them. Originally the fix for this
was thought to be just re-arming the MSI interrupt registers right after
re-allocating our IRQs, however it seems a lot of what we do is both
wrong and not even nessecary.
This was made apparent by what appeared to be a regression in the
mainline kernel that started introducing suspend/resume issues for
nouveau:
a0c9259dc4 (irq/matrix: Spread interrupts on allocation)
After this commit was introduced, we started getting interrupts from the
GPU before we actually re-allocated our own IRQ (see references below)
and assigned the IRQ handler. Investigating this turned out that the
problem was not with the commit, but the fact that nouveau even
free/allocates it's irqs before and after suspend/resume.
For starters: drivers in the linux kernel haven't had to handle
freeing/re-allocating their IRQs during suspend/resume cycles for quite
a while now. Nouveau seems to be one of the few drivers left that still
does this, despite the fact there's no reason we actually need to since
disabling interrupts from the device side should be enough, as the
kernel is already smart enough to know to disable host-side interrupts
for us before going into suspend. Since we were tearing down our IRQs by
hand however, that means there was a short period during resume where
interrupts could be received before we re-allocated our IRQ which would
lead to us getting an unhandled IRQ. Since we never handle said IRQ and
re-arm the interrupt registers, this would cause us to miss all of the
interrupts from the GPU and cause our init process to start timing out
on anything requiring interrupts.
So, since this whole setup/teardown every suspend/resume cycle is
useless anyway, move irq setup/teardown into the pci subdev's ctor/dtor
functions instead so they're only called at driver load and driver
unload. This should fix most of the issues with pending interrupts on
resume, along with getting suspend/resume for nouveau to work again.
As well, this probably means we can also just remove the msi rearm call
inside nvkm_pci_init(). But since our main focus here is to fix
suspend/resume before 4.15, we'll save that for a later patch.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
On my GP107 when I load nouveau after unloading it, for some reason the
GPU stopped sending or the CPU stopped receiving interrupts if MSI was
enabled.
Doing a rearm once before getting any interrupts fixes this.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
It appears that MSI does not work on either G5 PPC nor on a E5500-based
platform, where other hardware is reported to work fine with MSI.
Both tests were conducted with NV4x hardware, so perhaps other (or even
this) hardware can be made to work. It's still possible to force-enable
with config=NvMSI=1 on load.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
v2: rename and group functions
v4: change copyright information
move printing of pcie speeds into oneinit,
rename all pcie functions to nvkm_pcie_*
don't try to raise the pcie version when no higher one is supported
v5: revert Copyright changes and rename nvkm_pcie_raise_version to nvkm_pcie_set_version
v6: remove some useless pci_is_pcie checks and rework messages
Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
If the hardware supports extended tag field (8-bit ones), then enable it.
This is usually done by the VBIOS, but not on some MBPs (see fdo#86537).
In case extended tag field is not supported, 5-bit tag field is used which
limits the possible number of requests to 32. Apparently bits 7:0 of
0x08841c stores some number of outstanding requests, so cap it to 32 if
extended tag is unsupported.
Fixes: fdo#86537
v2: Restrict changes to chipsets >= 0x84
v3:
* Add nvkm_pci_mask to pci.h
* Mask bit 8 before setting it
v4:
* Rename `add` argument of nvkm_pci_mask to `value`
* Move code from nvkm_pci_init to g84_pci_init and remove PCIe and chipset
checks
v5:
* Rebase code on latest PCI structure
* Restore PCIe check
* Fix namings in nvkm_pci_mask
* Rephrase part of the commit message
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>