linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
Chris Wilson	f28d42663e	drm/i915/gt: Move scratch page into system memory on all platforms The scratch page should never be accessed, and is only assigned as a filler page to redirection invalid userspace access. It is not of a performance concern and so we prefer to have a single consistent configuration across all platforms, reducing the pressure on device memory and avoiding the direct device access that would be required to initialise the scratch page. Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220926155018.109678-1-matthew.auld@intel.com	2022-09-27 11:32:10 +01:00
Chris Wilson	c286558f58	drm/i915/gt: Use i915_vm_put on ppgtt_create error paths Now that the scratch page and page directories have a reference back to the i915_address_space, we cannot do an immediate free of the ppgtt upon error as those buffer objects will perform a later i915_vm_put in their deferred frees. The downside is that by replacing the onion unwind along the error paths, the ppgtt cleanup must handle a partially constructed vm. This includes ensuring that the vm->cleanup is set prior to the error path. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6900 Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Fixes: `4d8151ae53` ("drm/i915: Don't free shared locks while shared") Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v5.14+ Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220926153333.102195-1-matthew.auld@intel.com	2022-09-27 09:31:51 +01:00
Michael Cheng	61c5ed946d	drm/i915/gt: replace cache_clflush_range Replace all occurrence of cache_clflush_range with drm_clflush_virt_range. This will prevent compile errors on non-x86 platforms. Signed-off-by: Michael Cheng <michael.cheng@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220321223819.72833-6-michael.cheng@intel.com	2022-03-22 10:10:53 -07:00
Matthew Auld	6f84aa1cd4	drm/i915/gtt: add xehpsdv_ppgtt_insert_entry If this is LMEM then we get a 32 entry PT, with each PTE pointing to some 64K block of memory, otherwise it's just the usual 512 entry PT. This very much assumes the caller knows what they are doing. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Ramalingam C <ramalingam.c@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220218184752.7524-11-ramalingam.c@intel.com	2022-02-19 22:26:46 -08:00
Matthew Auld	5189e3126e	drm/i915: support 64K GTT pages for discrete cards discrete cards optimise 64K GTT pages for local-memory, since everything should be allocated at 64K granularity. We say goodbye to sparse entries, and instead get a compact 256B page-table for 64K pages, which should be more cache friendly. 4K pages for local-memory are no longer supported by the HW. v4: don't return uninitialized err in igt_ppgtt_compact Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Signed-off-by: Robert Beckett <bob.beckett@collabora.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220218184752.7524-8-ramalingam.c@intel.com	2022-02-19 20:33:47 -08:00
Thomas Hellström	39a2bd34c9	drm/i915: Use the vma resource as argument for gtt binding / unbinding When introducing asynchronous unbinding, the vma itself may no longer be alive when the actual binding or unbinding takes place. Update the gtt i915_vma_ops accordingly to take a struct i915_vma_resource instead of a struct i915_vma for the bind_vma() and unbind_vma() ops. Similarly change the insert_entries() op for struct i915_address_space. Replace a couple of i915_vma_snapshot members with their newly introduced i915_vma_resource counterparts, since they have the same lifetime. Also make sure to avoid changing the struct i915_vma_flags (in particular the bind flags) async. That should now only be done sync under the vm mutex. v2: - Update the vma_res::bound_flags when binding to the aliased ggtt v6: - Remove I915_VMA_ALLOC_BIT (Matthew Auld) - Change some members of struct i915_vma_resource from unsigned long to u64 (Matthew Auld) v7: - Fix vma resource size parameters to be u64 rather than unsigned long (Matthew Auld) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220110172219.107131-3-thomas.hellstrom@linux.intel.com	2022-01-11 10:53:14 +01:00
Matthew Auld	fef53be028	drm/i915/gtt/xehpsdv: move scratch page to system memory On some platforms the hw has dropped support for 4K GTT pages when dealing with LMEM, and due to the design of 64K GTT pages in the hw, we can only mark the entire page-table as operating in 64K GTT mode, since the enable bit is still on the pde, and not the pte. And since we we still need to allow 4K GTT pages for SMEM objects, we can't have a "normal" 4K page-table with scratch pointing to LMEM, since that's undefined from the hw pov. The simplest solution is to just move the 64K scratch page to SMEM on such platforms and call it a day, since that should work for all configurations. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211208141613.7251-4-ramalingam.c@intel.com	2021-12-09 22:09:29 +05:30
Michael Cheng	5f97816762	drm/i915: Introduce new macros for i915 PTE Certain functions within i915 uses macros that are defined for specific architectures by the mmu, such as _PAGE_RW and _PAGE_PRESENT (Some architectures don't even have these macros defined, like ARM64). Instead of re-using bits defined for the CPU, we should use bits defined for i915. This patch introduces two new 64 bit macros, GEN8_PAGE_PRESENT and GEN8_PAGE_RW, to check for bits 0 and 1 and, to replace all occurrences of _PAGE_RW and _PAGE_PRESENT within i915. v2(Michael Cheng): Use GEN8_ instead of I915_ Signed-off-by: Michael Cheng <michael.cheng@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> [ Move defines together with other GEN8 defines ] Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211206215245.513677-2-michael.cheng@intel.com	2021-12-06 22:21:03 -08:00
Matthew Auld	b0cc4dca4f	drm/i915/gtt: stop caching the scratch page Normal users shouldn't be hitting this, likely this would indicate a userspace bug. So don't bother caching, which should be safe now that we manually flush the page. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211028092638.3142258-2-matthew.auld@intel.com	2021-10-29 09:03:42 +01:00
Matthew Auld	5926ff80c9	drm/i915/gtt: drop unneeded make_unshrinkable We already do this when mapping the pages. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-3-matthew.auld@intel.com	2021-10-22 13:19:22 +01:00
Thomas Hellström	a259cc14ec	drm/i915: Reduce the number of objects subject to memcpy recover We really only need memcpy restore for objects that affect the operability of the migrate context. That is, primarily the page-table objects of the migrate VM. Add an object flag, I915_BO_ALLOC_PM_EARLY for objects that need early restores using memcpy and a way to assign LMEM page-table object flags to be used by the vms. Restore objects without this flag with the gpu blitter and only objects carrying the flag using TTM memcpy. Initially mark the migrate, gt, gtt and vgpu vms to use this flag, and defer for a later audit which vms actually need it. Most importantly, user- allocated vms with pinned page-table objects can be restored using the blitter. Performance-wise memcpy restore is probably as fast as gpu restore if not faster, but using gpu restore will help tackling future restrictions in mappable LMEM size. v4: - Don't mark the aliasing ppgtt page table flags for early resume, but rather the ggtt page table flags as intended. (Matthew Auld) - The check for user buffer objects during early resume is pointless, since they are never marked I915_BO_ALLOC_PM_EARLY. (Matthew Auld) v5: - Mark GuC LMEM objects with I915_BO_ALLOC_PM_EARLY to have them restored before we fire up the migrate context. Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210922062527.865433-8-thomas.hellstrom@linux.intel.com	2021-09-24 08:19:16 +02:00
Matthew Auld	502d0609fc	drm/i915/gtt: add some flushing for the 64K GTT path If we need to mark the PDE as operating in 64K GTT mode, we should be paranoid and flush the extra writes, like we already do for the PTEs. On some platforms the clflush can apparently add the just the right amount of magical delay to force the GPU to see the updated entry. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210903155317.1854012-1-matthew.auld@intel.com	2021-09-08 09:35:37 +01:00
Matthew Auld	8f88ca76b3	drm/i915/gtt: drop the page table optimisation We skip filling out the pt with scratch entries if the va range covers the entire pt, since we later have to fill it with the PTEs for the object pages anyway. However this might leave open a small window where the PTEs don't point to anything valid for the HW to consume. When for example using 2M GTT pages this fill_px() showed up as being quite significant in perf measurements, and ends up being completely wasted since we ignore the pt and just use the pde directly. Anyway, currently we have our PTE construction split between alloc and insert, which is probably slightly iffy nowadays, since the alloc doesn't actually allocate anything anymore, instead it just sets up the page directories and points the PTEs at the scratch page. Later when we do the insert step we re-program the PTEs again. Better might be to squash the alloc and insert into a single step, then bringing back this optimisation(along with some others) should be possible. Fixes: `1482667324` ("drm/i915: Only initialize partially filled pagetables") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Chris Wilson <chris.p.wilson@intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: <stable@vger.kernel.org> # v4.15+ Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210713130431.2392740-1-matthew.auld@intel.com	2021-07-14 09:15:53 +01:00
Chris Wilson	3607e1e9ba	drm/i915/gt: Add a routine to iterate over the pagetables of a GTT In the next patch, we will want to look at the dma addresses of individual page tables, so add a routine to iterate over them. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210617063018.92802-6-thomas.hellstrom@linux.intel.com	2021-06-17 14:23:03 +01:00
Chris Wilson	0dcd6fdf3b	drm/i915/gt: Add an insert_entry for gen8_ppgtt In the next patch, we will want to write a PTE for an explicit dma address, outside of the usual vma. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210617063018.92802-5-thomas.hellstrom@linux.intel.com	2021-06-17 14:23:02 +01:00
Lucas De Marchi	c816723b6b	drm/i915/gt: replace IS_GEN and friends with GRAPHICS_VER This was done by the following semantic patch: @@ expression i915; @@ - INTEL_GEN(i915) + GRAPHICS_VER(i915) @@ expression i915; expression E; @@ - INTEL_GEN(i915) >= E + GRAPHICS_VER(i915) >= E @@ expression dev_priv; expression E; @@ - !IS_GEN(dev_priv, E) + GRAPHICS_VER(dev_priv) != E @@ expression dev_priv; expression E; @@ - IS_GEN(dev_priv, E) + GRAPHICS_VER(dev_priv) == E @@ expression dev_priv; expression from, until; @@ - IS_GEN_RANGE(dev_priv, from, until) + IS_GRAPHICS_VER(dev_priv, from, until) @def@ expression E; identifier id =~ "^gen$"; @@ - id = GRAPHICS_VER(E) + ver = GRAPHICS_VER(E) @@ identifier def.id; @@ - id + ver It also takes care of renaming the variable we assign to GRAPHICS_VER() so to use "ver" rather than "gen". Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210605155356.4183026-2-lucas.demarchi@intel.com	2021-06-05 15:09:06 -07:00
Matthew Auld	6aed5673f0	drm/i915/gtt/dgfx: place the PD in LMEM It's a requirement that for dgfx we place all the paging structures in device local-memory. v2: use i915_coherent_map_type() v3: improve the shared dma-resv object comment Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210427085417.120246-4-matthew.auld@intel.com	2021-04-27 16:21:47 +01:00
Matthew Auld	529b9ec809	drm/i915/gtt: map the PD up front We need to generalise our accessor for the page directories and tables from using the simple kmap_atomic to support local memory, and this setup must be done on acquisition of the backing storage prior to entering fence execution contexts. Here we replace the kmap with the object mapping code that for simple single page shmemfs object will return a plain kmap, that is then kept for the lifetime of the page directory. Note that keeping the mapping around is a potential concern here, since while the vma is pinned the mapping remains there for the PDs underneath, or at least until the used_count reaches zero, at which point we can safely destroy the mapping. For 32b this will be even worse since the address space is more limited, but since this change mostly impacts full ppGTT platforms, the justification is that for modern platforms we shouldn't care too much about 32b. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210427085417.120246-3-matthew.auld@intel.com	2021-04-27 16:21:47 +01:00
Lv Yunlong	ac69496fe6	drm/i915/gt: Fix a double free in gen8_preallocate_top_level_pdp Our code analyzer reported a double free bug. In gen8_preallocate_top_level_pdp, pde and pde->pt.base are allocated via alloc_pd(vm) with one reference. If pin_pt_dma() failed, pde->pt.base is freed by i915_gem_object_put() with a reference dropped. Then free_pd calls free_px() defined in intel_ppgtt.c, which calls i915_gem_object_put() to put pde->pt.base again. As pde->pt.base is protected by refcount, so the second put will not free pde->pt.base actually. But, maybe it is better to remove the first put? Fixes: `82adf90113` ("drm/i915/gt: Shrink i915_page_directory's slab bucket") Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210426124340.4238-1-lyl2019@mail.ustc.edu.cn	2021-04-27 10:20:46 +01:00
Matthew Auld	11724eea0d	drm/i915/gtt/dg1: add PTE_LM plumbing for ppGTT For the PTEs we get an LM bit, to signal whether the page resides in SMEM or LMEM. BSpec: 45040 v2: just use gen8_pte_encode for dg1 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20210203171231.551338-2-matthew.auld@intel.com Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2021-03-24 19:31:58 +01:00
Chris Wilson	2f8aa3b80e	drm/i915/gt: Add some missing blank lines after declaration Trivial checkpatch cleanup. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210122192913.4518-2-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2021-03-24 19:30:35 +01:00
Chris Wilson	9834dfef55	drm/i915/gt: Prune inlines Remove all the manual inlines from non-critical sections in gt/ add/remove: 2/0 grow/shrink: 0/3 up/down: 762/-1473 (-711) Function old new delta mi_set_context.isra - 602 +602 write_dma_entry - 160 +160 __set_pd_entry 214 69 -145 clear_pd_entry 190 42 -148 ring_request_alloc 2021 841 -1180 Total: Before=1605086, After=1604375, chg -0.04% Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210113152224.29794-1-chris@chris-wilson.co.uk	2021-01-14 15:40:57 +00:00
Chris Wilson	fa812ce96a	drm/i915/gt: Onion unwind for scratch page allocation failure In switching to using objects for our ppGTT scratch pages, care was not taken to avoid trying to unref NULL objects on failure. And for gen6 ppGTT, it appears we forgot entirely to unwind after a partial allocation failure. Fixes: `89351925a4` ("drm/i915/gt: Switch to object allocations for page directories") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201019083444.1286-1-chris@chris-wilson.co.uk	2020-10-19 12:36:38 +01:00
Tvrtko Ursulin	8a473dbadc	drm/i915: Fix DMA mapped scatterlist walks When walking DMA mapped scatterlists sg_dma_len has to be used since it can be different (coalesced) from the backing store entry. This also means we have to end the walk when encountering a zero length DMA entry and cannot rely on the normal sg list end marker. Both issues were there in theory for some time but were hidden by the fact Intel IOMMU driver was never coalescing entries. As there are ongoing efforts to change this we need to start handling it. v2: * Use unsigned int for local storing sg_dma_len. (Logan) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> References: `85d1225ec0` ("drm/i915: Introduce & use new lightweight SGL iterators") References: `b31144c0da` ("drm/i915: Micro-optimise gen6_ppgtt_insert_entries()") Reported-by: Tom Murphy <murphyt7@tcd.ie> Suggested-by: Tom Murphy <murphyt7@tcd.ie> # __sgt_iter Suggested-by: Logan Gunthorpe <logang@deltatee.com> # __sgt_iter Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201006092508.1064287-1-tvrtko.ursulin@linux.intel.com	2020-10-06 12:49:07 +01:00
Chris Wilson	82adf90113	drm/i915/gt: Shrink i915_page_directory's slab bucket kmalloc uses power-of-two slab buckets for small allocations (up to a few pages). Since i915_page_directory is a page of pointers, plus a couple more, this is rounded up to 8K, and we waste nearly 50% of that allocation. Long terms this leads to poor memory utilisation, bloating the kernel footprint, but the problem is exacerbated by our conservative preallocation scheme for binding VMA. As we are required to allocate all levels for each vma just in case we need to insert them upon binding, this leads to a large multiplication factor for a single page vma. By halving the allocation we need for the page directory structure, we halve the impact of that factor, bringing workloads that once fitted into memory, hopefully back to fitting into memory. We maintain the split between i915_page_directory and i915_page_table as we only need half the allocation for the lowest, most populous, level. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-3-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2020-09-07 14:24:23 +03:00
Chris Wilson	89351925a4	drm/i915/gt: Switch to object allocations for page directories The GEM object is grossly overweight for the practicality of tracking large numbers of individual pages, yet it is currently our only abstraction for tracking DMA allocations. Since those allocations need to be reserved upfront before an operation, and that we need to break away from simple system memory, we need to ditch using plain struct page wrappers. In the process, we drop the WC mapping as we ended up clflushing everything anyway due to various issues across a wider range of platforms. Though in a future step, we need to drop the kmap_atomic approach which suggests we need to pre-map all the pages and keep them mapped. v2: Verify our large scratch page is suitably DMA aligned; and manually clear the scratch since we are allocating plain struct pages full of prior content. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-2-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2020-09-07 14:24:08 +03:00
Chris Wilson	cd0452aa2a	drm/i915: Preallocate stashes for vma page-directories We need to make the DMA allocations used for page directories to be performed up front so that we can include those allocations in our memory reservation pass. The downside is that we have to assume the worst case, even before we know the final layout, and always allocate enough page directories for this object, even when there will be overlap. This unfortunately can be quite expensive, especially as we have to clear/reset the page directories and DMA pages, but it should only be required during early phases of a workload when new objects are being discovered, or after memory/eviction pressure when we need to rebind. Once we reach steady state, the objects should not be moved and we no longer need to preallocating the pages tables. It should be noted that the lifetime for the page directories DMA is more or less decoupled from individual fences as they will be shared across objects across timelines. v2: Only allocate enough PD space for the PTE we may use, we do not need to allocate PD that will be left as scratch. v3: Store the shift unto the first PD level to encapsulate the different PTE counts for gen6/gen8. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200729164219.5737-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2020-09-07 14:24:05 +03:00
Nathan Chancellor	b2379ba2b9	drm/i915: Remove duplicate inline specifier on write_pte When building with clang: drivers/gpu/drm/i915/gt/gen8_ppgtt.c:392:24: warning: duplicate 'inline' declaration specifier [-Wduplicate-decl-specifier] declaration specifier [-Wduplicate-decl-specifier] static __always_inline inline void ^ include/linux/compiler_types.h:138:16: note: expanded from macro 'inline' #define inline inline __gnu_inline __inline_maybe_unused notrace ^ 1 warning generated. __always_inline is defined as 'inline __attribute__((__always_inline))' so we do not need to specify it twice. Fixes: `84eac0c659` ("drm/i915/gt: Force pte cacheline to main memory") Link: https://github.com/ClangBuiltLinux/linux/issues/1024 Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200513182340.3968668-1-natechancellor@gmail.com	2020-05-13 20:01:28 +01:00
Mika Kuoppala	84eac0c659	drm/i915/gt: Force pte cacheline to main memory We have problems of tgl not seeing a valid pte entry when iommu is enabled. Add heavy handed flushing of entry modification by flushing the cpu, cacheline and then wcb. This forces the pte out to main memory past this point regarless of promises of coherency. This is an evolution of an experimental patch from Chris Wilson of adding wmb for coherent partners, by adding a clflush to force the cache->memory step. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1840 Testcase: igt/gem_exec_fence/parallel Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200511160803.15407-1-mika.kuoppala@linux.intel.com	2020-05-11 17:25:07 +01:00
Jani Nikula	9e859eb9d0	drm/i915/vgpu: improve vgpu abstractions Add intel_vgpu_register() abstraction, rename i915_detect_vgpu() to intel_vgpu_detect() to match other function naming, un-inline intel_vgpu_active(), intel_vgpu_has_full_ppgtt() and intel_vgpu_has_huge_gtt() to reduce header interdependencies. The i915_vgpu.[ch] filename and intel_vgpu_ prefix discrepancy remains. Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227144408.24345-1-jani.nikula@intel.com	2020-03-03 17:46:54 +02:00
Daniele Ceraolo Spurio	69edc390a5	drm/i915/ggtt: do not set bits 1-11 in gen12 ptes On TGL, bits 2-4 in the GGTT PTE are not ignored anymore and are instead used for some extra VT-d capabilities. We don't (yet?) have support for those capabilities, but, given that we shared the pte_encode function betweed GGTT and PPGTT, we still set those bits to the PPGTT PPAT values. The DMA engine gets very confused when those bits are set while the iommu is enabled, leading to errors. E.g. when loading the GuC we get: [ 9.796218] DMAR: DRHD: handling fault status reg 2 [ 9.796235] DMAR: [DMA Write] Request device [00:02.0] PASID ffffffff fault addr 0 [fault reason 02] Present bit in context entry is clear [ 9.899215] [drm:intel_guc_fw_upload [i915]] ERROR GuC firmware signature verification failed To fix this, just have dedicated gen8_pte_encode function per type of gtt. Also, explicitly set vm->pte_encode for gen8_ppgtt, even if we don't use it, to make sure we don't accidentally assign it to the GGTT one, like we do for gen6_ppgtt, in case we need it in the future. Reported-by: "Sodhi, Vunny" <vunny.sodhi@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200226185657.26445-1-daniele.ceraolospurio@intel.com	2020-02-27 22:38:11 +00:00
Matthew Auld	8e78871bc1	drm/i915/userptr: fix size calculation If we create a rather large userptr object(e.g 1ULL << 32) we might shift past the type-width of num_pages: (int)num_pages << PAGE_SHIFT, resulting in a totally bogus sg_table, which fortunately will eventually manifest as: gen8_ppgtt_insert_huge:463 GEM_BUG_ON(iter->sg->length < page_size) kernel BUG at drivers/gpu/drm/i915/gt/gen8_ppgtt.c:463! v2: more unsigned long prefer I915_GTT_PAGE_SIZE Fixes: `5cc9ed4b9a` ("drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200117132413.1170563-2-matthew.auld@intel.com	2020-01-17 19:14:03 +00:00
Matthew Auld	2c86e55d2a	drm/i915/gtt: split up i915_gem_gtt Attempt to split i915_gem_gtt.[ch] into more manageable chunks. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200107134009.3255354-1-chris@chris-wilson.co.uk	2020-01-07 19:27:36 +00:00

33 commits