The function nvkm_vmm_ctor() is not called outside of the file defining
it, so make it static.
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Some GPU units are capable of supporting "replayable" page faults, where
the execution unit will wait for SW to fixup GPU page tables rather than
triggering a channel-fatal fault.
This feature isn't useful (it's harmful, even) unless something like HMM
is being used to manage events appearing in the replayable fault buffer,
so, it's disabled by default.
This commit allows a client to request it be enabled.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Host methods exist to do at least some of what we need, but we are not
currently pushing replay/cancels through a channel like UVM does as it's
not clear whether it's necessary in our case (UVM also updates PTEs with
the GPU).
UVM also pushes a software method for fault cancels on Pascal, seemingly
because the host methods don't appear to be sufficient. If/when we want
to push the replay/cancel on the GPU, we can re-purpose the cancellation
code here to implement that swmthd.
Keep it simple for now, until we figure out exactly what we need here.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This provides a somewhat more direct method of manipulating the GPU page
tables, which will be required to support SVM.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
NVKM is currently responsible for managing the allocation of a client's
GPU address-space, but there's various use-cases (ie. HMM address-space
mirroring) where giving a client more direct control is desirable.
This commit allows for a VMM to be created where the area allocated for
NVKM is limited to a client-specified window, the remainder of address-
space is controlled directly by the client.
Leaving a window is necessary to support various internal requirements,
but also to support existing allocation interfaces as not all of the HW
is capable of working with a HMM allocation.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Aside from being a nice cleanup, these will to allow the upcoming direct
page mapping interfaces to play nicely with normal mappings.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
VEID support hacked in here, as it's the most convenient place for now.
Will be refined once it's better understood.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
- Fixes addition of stolen memory base address to PTEs.
- Removes support for compression.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
These are the new priviledged interfaces to the VMM backends, and expose
some functionality that wasn't previously available.
It's now possible to allocate a chunk of address-space (even all of it),
without causing page tables to be allocated up-front, and then map into
it at arbitrary locations. This is the basic primitive used to support
features such as sparse mapping, or to allow userspace control over its
own address-space, or HMM (where the GPU driver isn't in control of the
address-space layout).
Rather than being tied to a subtle combination of memory object and VMA
properties, arguments that control map flags (ro, kind, etc) are passed
explicitly at map time.
The compatibility hacks to implement the old frontend on top of the new
driver backends have been replaced with something similar to implement
the old frontend's interfaces on top of the new frontend.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Adds support for:
- 64KiB/2MiB big page sizes (128KiB not supported by HW with new PT layout).
- System-memory PTs.
- LPTE "invalid" state.
- (Tegra) Use of video memory aperture.
- Sparse PDEs/PTEs.
- Additional blocklinear kinds.
- 49-bit address-space.
GP100 supports an entirely new 5-level page table layout that provides
an expanded 49-bit address-space. It also supports the layout present
on previous generations, which we've been making do with until now.
This commit implements support for the new layout, and enables it by
default.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Adds support for:
- 64KiB big page size.
- System-memory PTs.
- LPTE "invalid" state.
- (Tegra) Use of video memory aperture.
Adds support for marking LPTEs invalid, resulting in the corresponding
SPTEs being ignored, which is supposed to speed up TLB invalidates.
On The Tegra side, this will switch to using the video memory aperture
for all mappings. The HW will still target non-coherent system memory,
but this aperture needs to be selected in order to support compression.
Tegra's instmem backend somewhat cheated to get this effect previously.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is the common code to support a rework of the VMM backends.
It adds support for more than 2 levels of page table nesting, which
is required to be able to support GP100's MMU layout.
Sparse mappings (that don't cause MMU faults when accessed) are now
supported, where the backend provides it.
Dual-PT handling had to become more sophisticated to support sparse,
but this also allows us to support an optimisation the MMU provides
on GK104 and newer.
Certain operations can now be combined into a single page tree walk
to avoid some overhead, but also enables optimsations like skipping
PTE unmap writes when the PT will be destroyed anyway.
The old backend has been hacked up to forward requests onto the new
backend, if present, so that it's possible to bisect between issues
in the backend changes vs the upcoming frontend changes.
Until the new frontend has been merged, new backends will leak BAR2
page tables on module unload. This is expected, and it's not worth
the effort of hacking around this as it doesn't effect runtime.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Adds support for:
- Selection of old/new-style page table layout (GP100MmuLayout=0/1).
- System-memory PDs.
New layout disabled by default for the moment, as we don't have a
backend that can handle it yet.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is the first chunk of the new VMM code that provides the structures
needed to describe a GPU virtual address-space layout, as well as common
interfaces to handle VMM creation, and connecting instances to a VMM.
The constructor now allocates the PD itself, rather than having the user
handle that manually. This won't/can't be used until after all backends
have been ported to these interfaces, so a little bit of memory will be
wasted on Fermi and newer for a couple of commits in the series.
Compatibility has been hacked into the old code to allow each GPU backend
to be ported individually.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>