When the PMU/SEC2 LS FWs have booted, they'll send a message to the host
with various information, including the configuration of message/command
queues that are available.
Move the handling for this to the relevant subdevs.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This moves the code to generate commands for the ACR unit of the PMU/SEC2 LS
firmwares to those subdevs.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Takes the command queue pointer directly instead of requiring a function to
lookup based on an queue type, as well as an explicit timeout value.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Code to interface with LS firmwares is being moved to the subdevs where it
belongs, rather than living in the common falcon code.
Arbitrary private data passed to callbacks is to allow for something other
than struct nvkm_msgqueue to be passed into the callback (like the pointer
to the subdev itself, for example), and the return code will be used where
we'd like to detect failure from synchronous messages.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Code to interface with LS firmwares is being moved to the subdevs where it
belongs, rather than living in the common falcon code.
This is an incremental step towards that goal.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Code to interface with LS firmwares is being moved to the subdevs where it
belongs, rather than living in the common falcon code.
This is an incremental step towards that goal.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Code to interface with LS firmwares is being moved to the subdevs where it
belongs, rather than living in the common falcon code.
This is an incremental step towards that goal.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
NVDEC is available from GM107, and we currently only have a stub
implementation anyway, let's make it explicit.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This will allow us to register the falcon with ACR, and further customise
its behaviour by providing the nvkm_falcon_func structure directly.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This will allow us to register the falcon with ACR, and further customise
its behaviour by providing the nvkm_falcon_func structure directly.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
ACR LS FW loading is moving out of SECBOOT and into their specific subdevs,
and the available GP108/GV100 FWs differ from the other GP10x boards.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
ACR LS FW loading is moving out of SECBOOT and into their specific subdevs,
and the available GP107/GP108 FWs have interface differences.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This will allow us to register the falcon with ACR, and further customise
its behaviour by providing the nvkm_falcon_func structure directly.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
ACR LS FW loading is moving out of SECBOOT and into their specific subdevs,
and the available GM20B/GP10B FWs have interface differences.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This will allow us to register the falcon with ACR, and further customise
its behaviour by providing the nvkm_falcon_func structure directly.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
PMU, SEC2 and GR will be modified to register their falcons with ACR before
the main commit switching everything over.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Will be used in upcoming commits to allow subdevs to better customise
themselves based on which (if any) firmware is available.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We want to be able to register falcons with ACR during the constructor for
the subdev it belongs to, however, we may not have access to the falcon's
registers prior to DEVINIT.
Delay touching registers until the first time the falcon is acquired.
This may temporarily break secboot on non-production boards due to not
being able to determine whether the falcon is in debug or production mode,
the new ACR subdev will not have this issue, and it's not a use-case that's
terribly important for bisectability.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Turing introduced a new simplified page kind
scheme, reducing the number of possible page
kinds from 256 to 16. It also is the first
NVIDIA GPU in which the highest possible page
kind value is not reserved as an "invalid" page
kind.
To address this, the invalid page kind is made
an explicit property of the MMU HAL, and a new
table of page kinds is added to the tu102 MMU
HAL.
One hardware change not addressed here is that
0x00 is technically no longer a supported page
kind, and pitch surfaces are instead intended to
share the block-linear generic page kind 0x06.
However, because that will be a rather invasive
change to nouveau and 0x00 still works fine in
practice on Turing hardware, addressing this new
behavior is deferred.
Signed-off-by: James Jones <jajones@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
There are extra registers that need to be programmed to make the level 2
cache work on GP10B, such as the stream ID register that is used when an
SMMU is used to translate memory addresses.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
There is no BAR2 on GP10B and there is no need to map through BAR2
because all memory is shared between the GPU and the CPU. Add a custom
implementation of the fault sub-device that uses nvkm_memory_addr()
instead of nvkm_memory_bar2() to return the address of a pinned fault
buffer.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Added GPIO is "Power Alert". It's uncertain if this
GPIO is set on GPU initialization or only if a change is detected by the
GPU at runtime.
This GPIO can be found on Tesla and sometimes on Fermi GPUs.
Untested, wrote according to documentation.
Signed-off-by: Mark Menzynski <mmenzyns@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Added GPIO is "Thermal and External Power Detect". It's uncertain if this
GPIO is set on GPU initialization or only if a change is detected by the
GPU at runtime.
This GPIO can be found in Rankine and Curie and rarely on Tesla GPUs
VBIOS.
Untested, wrote according to documentation.
Signed-off-by: Mark Menzynski <mmenzyns@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Currently, nouveau doesn't check if GPU is missing power. This
patch makes nouveau fail when this happens on latest GPUs.
It checks GPIO function 121 (External Power Emergency), which
should detect power problems on GPU initialization.
This can be disabled with nouveau.config=NvPowerChecks=1
Tested on TU104, GP106 and GF100.
v3:
* Add config override for disabling power checks
Signed-off-by: Mark Menzynski <mmenzyns@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
One gpio was in wrong place, moved it for better readability.
Signed-off-by: Mark Menzynski <mmenzyns@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
There's already a condition in place which attempts to detect this, but
since we've begun to require a PMU subdev even on boards where we don't
load a custom FW, it's become inaccurate.
This will prevent unnecessarily running a periodic fan update thread on
GP100 and newer, where we don't yet override the default PMU FW.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The bulk SPDX addition made all these files into GPL-2.0 licensed files.
However the remainder of the project is MIT-licensed, these files
(primarily header files) were simply missing the boiler plate and got
caught up in the global update.
Fixes: b24413180f (License cleanup: add SPDX GPL-2.0 license identifier to files with no license)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We have a need for this now with updated SEC2 LS FW images that have an
incompatible interface from the previous version.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
For a while, we've had the problem of i2c bus access not grabbing
a runtime PM ref when it's being used in userspace by i2c-dev, resulting
in nouveau spamming the kernel log with errors if anything attempts to
access the i2c bus while the GPU is in runtime suspend. An example:
[ 130.078386] nouveau 0000:01:00.0: i2c: aux 000d: begin idle timeout ffffffff
Since the GPU is in runtime suspend, the MMIO region that the i2c bus is
on isn't accessible. On x86, the standard behavior for accessing an
unavailable MMIO region is to just return ~0.
Except, that turned out to be a lie. While computers with a clean
concious will return ~0 in this scenario, some machines will actually
completely hang a CPU on certian bad MMIO accesses. This was witnessed
with someone's Lenovo ThinkPad P50, where sensors-detect attempting to
access the i2c bus while the GPU was suspended would result in a CPU
hang:
CPU: 5 PID: 12438 Comm: sensors-detect Not tainted 5.0.0-0.rc4.git3.1.fc30.x86_64 #1
Hardware name: LENOVO 20EQS64N17/20EQS64N17, BIOS N1EET74W (1.47 ) 11/21/2017
RIP: 0010:ioread32+0x2b/0x30
Code: 81 ff ff ff 03 00 77 20 48 81 ff 00 00 01 00 76 05 0f b7 d7 ed c3
48 c7 c6 e1 0c 36 96 e8 2d ff ff ff b8 ff ff ff ff c3 8b 07 <c3> 0f 1f
40 00 49 89 f0 48 81 fe ff ff 03 00 76 04 40 88 3e c3 48
RSP: 0018:ffffaac3c5007b48 EFLAGS: 00000292 ORIG_RAX: ffffffffffffff13
RAX: 0000000001111000 RBX: 0000000001111000 RCX: 0000043017a97186
RDX: 0000000000000aaa RSI: 0000000000000005 RDI: ffffaac3c400e4e4
RBP: ffff9e6443902c00 R08: ffffaac3c400e4e4 R09: ffffaac3c5007be7
R10: 0000000000000004 R11: 0000000000000001 R12: ffff9e6445dd0000
R13: 000000000000e4e4 R14: 00000000000003c4 R15: 0000000000000000
FS: 00007f253155a740(0000) GS:ffff9e644f600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005630d1500358 CR3: 0000000417c44006 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
g94_i2c_aux_xfer+0x326/0x850 [nouveau]
nvkm_i2c_aux_i2c_xfer+0x9e/0x140 [nouveau]
__i2c_transfer+0x14b/0x620
i2c_smbus_xfer_emulated+0x159/0x680
? _raw_spin_unlock_irqrestore+0x1/0x60
? rt_mutex_slowlock.constprop.0+0x13d/0x1e0
? __lock_is_held+0x59/0xa0
__i2c_smbus_xfer+0x138/0x5a0
i2c_smbus_xfer+0x4f/0x80
i2cdev_ioctl_smbus+0x162/0x2d0 [i2c_dev]
i2cdev_ioctl+0x1db/0x2c0 [i2c_dev]
do_vfs_ioctl+0x408/0x750
ksys_ioctl+0x5e/0x90
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x60/0x1e0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f25317f546b
Code: 0f 1e fa 48 8b 05 1d da 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff
ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d ed d9 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc88caab68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00005630d0fe7260 RCX: 00007f25317f546b
RDX: 00005630d1598e80 RSI: 0000000000000720 RDI: 0000000000000003
RBP: 00005630d155b968 R08: 0000000000000001 R09: 00005630d15a1da0
R10: 0000000000000070 R11: 0000000000000246 R12: 00005630d1598e80
R13: 00005630d12f3d28 R14: 0000000000000720 R15: 00005630d12f3ce0
watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [sensors-detect:12438]
Yikes! While I wanted to try to make it so that accessing an i2c bus on
nouveau would wake up the GPU as needed, airlied pointed out that pretty
much any usecase for userspace accessing an i2c bus on a GPU (mainly for
the DDC brightness control that some displays have) is going to only be
useful while there's at least one display enabled on the GPU anyway, and
the GPU never sleeps while there's displays running.
Since teaching the i2c bus to wake up the GPU on userspace accesses is a
good deal more difficult than it might seem, mostly due to the fact that
we have to use the i2c bus during runtime resume of the GPU, we instead
opt for the easiest solution: don't let userspace access i2c busses on
the GPU at all while it's in runtime suspend.
Changes since v1:
* Also disable i2c busses that run over DP AUX
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>