linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
Chris Wilson	c5ba5b2465	drm/i915: Apply the GTT write flush for all !llc machines We also see the delayed GTT write issue on i915g/i915gm, so let's presume that it is a universal problem for all !llc machines, and that we just haven't yet noticed on g33, gen4 and gen5 machines. v2: Use a register that exists on all platforms Testcase: igt/gem_mmap_gtt/coherency # i915gm References: https://bugs.freedesktop.org/show_bug.cgi?id=102577 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170907184520.5032-1-chris@chris-wilson.co.uk Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>	2017-09-07 21:46:42 +01:00
Ville Syrjälä	750fae2324	i915: Fix obj size vs. alignment for drm_pci_alloc() drm_pci_alloc() refuses to cooperate if the passed alignment exceeds the object size. So round up the obj size to the next power of two as well to make this actually work. Obviously things work just fine as long as the size was a power of two to begin with. However kms_cursor_crc doesn't always use power of two sizes so we hit a failure when we try to allocate the phys memory. Testcase: igt/kms_cursor_crc Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170907143203.13055-1-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-09-07 20:12:36 +03:00
Tvrtko Ursulin	5602452e4c	drm/i915: Use __sg_alloc_table_from_pages for userptr allocations With the addition of __sg_alloc_table_from_pages we can control the maximum coalescing size and eliminate a separate path for allocating backing store here. Similar to `871dfbd67d` ("drm/i915: Allow compaction upto SWIOTLB max segment size") this enables more compact sg lists to be created and so has a beneficial effect on workloads with many and/or large objects of this class. v2: * Rename helper to i915_sg_segment_size and fix swiotlb override. * Commit message update. v3: * Actually include the swiotlb override fix. v4: * Regroup parameters a bit. (Chris Wilson) v5: * Rebase for swiotlb_max_segment. * Add DMA map failure handling as in `abb0deacb5` ("drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping"). v6: Handle swiotlb_max_segment() returning 1. (Joonas Lahtinen) v7: Rebase. v8: Commit spelling fix. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: linux-kernel@vger.kernel.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170803091417.23677-1-tvrtko.ursulin@linux.intel.com	2017-09-07 10:48:29 +01:00
Chris Wilson	912d572d63	drm/i915: wire up shrinkctl->nr_scanned shrink_slab() allows us to report back the number of objects we successfully scanned (out of the target shrinkctl->nr_to_scan). As report the number of pages owned by each GEM object as a separate item to the shrinker, we cannot precisely control the number of shrinker objects we scan on each pass; and indeed may free more than requested. If we fail to tell the shrinker about the number of objects we process, it will continue to hold a grudge against us as any objects left unscanned are added to the next reclaim -- and so we will keep on "unfairly" shrinking our own slab in comparison to other slabs. Link: http://lkml.kernel.org/r/20170822135325.9191-2-chris@chris-wilson.co.uk Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Shaohua Li <shli@fb.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-09-06 17:27:25 -07:00
Chris Wilson	88c880bbde	drm/i915: Lift has-pinned-pages assert to caller of ____i915_gem_object_get_pages i915_gem_object_attach_phys() is trying to swap out its shmemfs pages for a new set of physically contiguous pages, but unfortunately triggers an assert inside get-pages. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102561 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170906135220.13508-1-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>	2017-09-06 22:12:10 +01:00
Ville Syrjälä	6910d85250	drm/i915: Add __rcu to radix tree slot pointer radix_tree_for_each_slot() wants an __rcu annotated pointer for the slot. So let's add the annotation. Fixes the following sparse warnings: i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces) i915_gem.c:2217:9: expected void slot i915_gem.c:2217:9: got void [noderef] <asn:4> i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces) i915_gem.c:2217:9: expected void slot i915_gem.c:2217:9: got void [noderef] <asn:4> i915_gem.c:2217:9: warning: incorrect type in argument 1 (different address spaces) i915_gem.c:2217:9: expected void [noderef] <asn:4>slot i915_gem.c:2217:9: got void slot i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces) i915_gem.c:2217:9: expected void slot i915_gem.c:2217:9: got void [noderef] <asn:4> Cc: Chris Wilson <chris@chris-wilson.co.uk> Fixes: `96d7763452` ("drm/i915: Use a radixtree for random access to the object's backing storage") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170901171252.31025-1-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (cherry picked from commit `c23aa71bcf`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2017-09-05 10:00:14 -07:00
Ville Syrjälä	afe722bee4	drm/i915: io unmap functions want __iomem Don't cast away the __iomem from the io_mapping functions so that sparse won't be so unhappy when we pass the pointer to the unmap functions. Instead let's move the cast to where we actually use the pointer. Fixes the following sparse warnings: i915_gem.c:1022:33: warning: incorrect type in argument 1 (different address spaces) i915_gem.c:1022:33: expected void [noderef] <asn:2>vaddr i915_gem.c:1022:33: got void [assigned] vaddr i915_gem.c:1027:34: warning: incorrect type in argument 1 (different address spaces) i915_gem.c:1027:34: expected void [noderef] <asn:2>vaddr i915_gem.c:1027:34: got void [assigned] vaddr i915_gem.c:1199:33: warning: incorrect type in argument 1 (different address spaces) i915_gem.c:1199:33: expected void [noderef] <asn:2>vaddr i915_gem.c:1199:33: got void [assigned] vaddr i915_gem.c:1204:34: warning: incorrect type in argument 1 (different address spaces) i915_gem.c:1204:34: expected void [noderef] <asn:2>vaddr i915_gem.c:1204:34: got void [assigned] vaddr Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170901171252.31025-2-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-09-04 19:31:55 +03:00
Ville Syrjälä	c23aa71bcf	drm/i915: Add __rcu to radix tree slot pointer radix_tree_for_each_slot() wants an __rcu annotated pointer for the slot. So let's add the annotation. Fixes the following sparse warnings: i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces) i915_gem.c:2217:9: expected void slot i915_gem.c:2217:9: got void [noderef] <asn:4> i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces) i915_gem.c:2217:9: expected void slot i915_gem.c:2217:9: got void [noderef] <asn:4> i915_gem.c:2217:9: warning: incorrect type in argument 1 (different address spaces) i915_gem.c:2217:9: expected void [noderef] <asn:4>slot i915_gem.c:2217:9: got void slot i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces) i915_gem.c:2217:9: expected void slot i915_gem.c:2217:9: got void [noderef] <asn:4> Cc: Chris Wilson <chris@chris-wilson.co.uk> Fixes: `96d7763452` ("drm/i915: Use a radixtree for random access to the object's backing storage") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170901171252.31025-1-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2017-09-04 19:31:40 +03:00
Chris Wilson	fa3722f649	drm/i915: Ignore duplicate VMA stored within the per-object handle LUT By using drm_gem_flink/drm_gem_open on an object using the same fd, it is possible for a client to create multiple handles pointing to the same object (tied to the same contexts and VMA), as exemplified by igt::gem_handle_to_libdrm_bo(). Since this duplication has been possible since forever, we cannot assume that the handle:(fpriv, object) is unique and so must handle the multiple users of a single VMA. v2: Added commentary noise. Testcase: igt/gem_close Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102355 Fixes: `d1b48c1e71` ("drm/i915: Replace execbuf vma ht with an idr") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170822110517.22277-3-chris@chris-wilson.co.uk Tested-by: Marta Lofstedt <marta.lofstedt@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> (cherry-picked from commit `3ffff01749`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2017-08-30 12:17:22 -07:00
Chris Wilson	0168bdfc9c	drm/i915: Always wake the device to flush the GTT Since we hold the device wakeref when writing through the GTT (otherwise the writes would fail), we presumed that before the device sleeps those writes would naturally be flushed and that we wouldn't need our mmio read trick. However, that presumption seems false and a sleepy bxt seems to require us to always manually flush the GTT writes prior to direct access. Fixes: `e2a2aa36a5` ("drm/i915: Check we have an wake device before flushing GTT writes") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170829192546.1087-1-chris@chris-wilson.co.uk (cherry picked from commit `b69a784f5e`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2017-08-30 12:08:14 -07:00
Chris Wilson	3b24e7e810	drm/i915: Recreate vmapping even when the object is pinned Sometimes we know we are the only user of the bo, but since we take a protective pin_pages early on, an attempt to change the vmap on the object is denied because it is busy. i915_gem_object_pin_map() cannot tell from our single pin_count if the operation is safe. Instead we must pass that information down from the caller in the manner of I915_MAP_OVERRIDE. This issue has existed from the introduction of the mapping, but was never noticed as the only place where this conflict might happen is for cached kernel buffers (such as allocated by i915_gem_batch_pool_get()). Until recently there was only a single user (the cmdparser) so no conflicts ever occurred. However, we now use it to allocate batches for different operations (using MAP_WC on !llc for writes) in addition to the existing shadow batch (using MAP_WB for reads). We could either keep both mappings cached, or use a different write mechanism if we detect a MAP_WB already exists (i.e. clflush afterwards), but as we haven't seen this issue in the wild (it requires hitting the GPU reloc path in addition to the cmdparser) for simplicity just allow the mappings to be recreated. v2: Include the i915_MAP_OVERRIDE bit in the enum so the compiler knows about all the valid values. Fixes: `7dd4f6729f` ("drm/i915: Async GPU relocation processing") Testcase: igt/gem_lut_handle # byt, completely by accident Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170828104631.8606-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (cherry picked from commit `a575c67617`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2017-08-30 12:08:11 -07:00
Chris Wilson	b69a784f5e	drm/i915: Always wake the device to flush the GTT Since we hold the device wakeref when writing through the GTT (otherwise the writes would fail), we presumed that before the device sleeps those writes would naturally be flushed and that we wouldn't need our mmio read trick. However, that presumption seems false and a sleepy bxt seems to require us to always manually flush the GTT writes prior to direct access. Fixes: `e2a2aa36a5` ("drm/i915: Check we have an wake device before flushing GTT writes") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170829192546.1087-1-chris@chris-wilson.co.uk	2017-08-30 09:10:35 +01:00
Chris Wilson	fc692bd31b	drm/i915: Discard the request queue if we fail to sleep before suspend If we fail to clear the outstanding request queue before suspending, mark those requests as lost. References: https://bugs.freedesktop.org/show_bug.cgi?id=102037 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170826110935.10237-3-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-08-29 16:56:39 +01:00
Chris Wilson	f36325f378	drm/i915: Clear wedged status upon resume When we wake up from suspend, the device has been powered down and should come back afresh. We should be able to safely remove the wedged status from the previous session and start afresh. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170826110935.10237-2-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>	2017-08-29 16:56:38 +01:00
Chris Wilson	cad9946c2a	drm/i915: Always sanity check engine state upon idling When we do a locked idle we know that afterwards all requests have been completed and the engines have been cleared of tasks. For whatever reason, this doesn't always happen and we may go into a suspend with ELSP still full, and this causes an issue upon resume as we get very, very confused. If the engines refuse to idle, mark the device as wedged. In the process we get rid of the maybe unused open-coded version of wait_for_engines reported by Nick Desaulniers and Matthias Kaehlcke. v2: Suppress the -EIO before suspend, but keep it for seqno wrap. References: https://bugs.freedesktop.org/show_bug.cgi?id=101891 References: https://bugs.freedesktop.org/show_bug.cgi?id=102456 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Matthias Kaehlcke <mka@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20170826110935.10237-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>	2017-08-29 16:56:04 +01:00
Chris Wilson	a575c67617	drm/i915: Recreate vmapping even when the object is pinned Sometimes we know we are the only user of the bo, but since we take a protective pin_pages early on, an attempt to change the vmap on the object is denied because it is busy. i915_gem_object_pin_map() cannot tell from our single pin_count if the operation is safe. Instead we must pass that information down from the caller in the manner of I915_MAP_OVERRIDE. This issue has existed from the introduction of the mapping, but was never noticed as the only place where this conflict might happen is for cached kernel buffers (such as allocated by i915_gem_batch_pool_get()). Until recently there was only a single user (the cmdparser) so no conflicts ever occurred. However, we now use it to allocate batches for different operations (using MAP_WC on !llc for writes) in addition to the existing shadow batch (using MAP_WB for reads). We could either keep both mappings cached, or use a different write mechanism if we detect a MAP_WB already exists (i.e. clflush afterwards), but as we haven't seen this issue in the wild (it requires hitting the GPU reloc path in addition to the cmdparser) for simplicity just allow the mappings to be recreated. v2: Include the i915_MAP_OVERRIDE bit in the enum so the compiler knows about all the valid values. Fixes: `7dd4f6729f` ("drm/i915: Async GPU relocation processing") Testcase: igt/gem_lut_handle # byt, completely by accident Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170828104631.8606-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-08-29 10:39:08 +01:00
Chris Wilson	3ffff01749	drm/i915: Ignore duplicate VMA stored within the per-object handle LUT By using drm_gem_flink/drm_gem_open on an object using the same fd, it is possible for a client to create multiple handles pointing to the same object (tied to the same contexts and VMA), as exemplified by igt::gem_handle_to_libdrm_bo(). Since this duplication has been possible since forever, we cannot assume that the handle:(fpriv, object) is unique and so must handle the multiple users of a single VMA. v2: Added commentary noise. Testcase: igt/gem_close Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102355 Fixes: `d1b48c1e71` ("drm/i915: Replace execbuf vma ht with an idr") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170822110517.22277-3-chris@chris-wilson.co.uk Tested-by: Marta Lofstedt <marta.lofstedt@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>	2017-08-24 15:28:05 +01:00
Chris Wilson	67b4804025	drm/i915: Assert that the handle->vma lut is empty on object close Make sure that we are not leaking an entry in the ctx->handles_lut by asserting that the object was removed prior to being freed. This should be enforced by all such handles being removed by i915_gem_close_object. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170822110517.22277-2-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>	2017-08-24 15:22:46 +01:00
Chris Wilson	432295d7b9	drm/i915: Assert the context is not closed on object-close During the context-close, we should be decoupling all the vma from the object so that upon object-closing we shouldn't see any vma from the already closed contexts. So include a check upon closing the object that the context is still open. v2: Eek, the fpriv check is required for shared objects. Double eek, BAT passed? Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170822110517.22277-1-chris@chris-wilson.co.uk Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>	2017-08-24 15:22:21 +01:00
Chris Wilson	d1b48c1e71	drm/i915: Replace execbuf vma ht with an idr This was the competing idea long ago, but it was only with the rewrite of the idr as an radixtree and using the radixtree directly ourselves, along with the realisation that we can store the vma directly in the radixtree and only need a list for the reverse mapping, that made the patch performant enough to displace using a hashtable. Though the vma ht is fast and doesn't require any extra allocation (as we can embed the node inside the vma), it does require a thread for resizing and serialization and will have the occasional slow lookup. That is hairy enough to investigate alternatives and favour them if equivalent in peak performance. One advantage of allocating an indirection entry is that we can support a single shared bo between many clients, something that was done on a first-come first-serve basis for shared GGTT vma previously. To offset the extra allocations, we create yet another kmem_cache for them. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-5-chris@chris-wilson.co.uk	2017-08-18 11:59:02 +01:00
Chris Wilson	b805014897	drm/i915: Handle full s64 precision for wait-ioctl The wait-ioctl is optionally supplied a timeout with nanosecond precision in a s64 field. We use nsecs_to_jiffies64() to convert that into the jiffies consumed by the scheduler, but internally nsecs_to_jiffies64() does not guard against overflow (as it's purpose is for use by the scheduler and not drivers!). So we must guard against the overflow ourselves, and in the process note that we may then return much earlier than the timeout selected by the user, so don't report ETIME unless we do hit the timeout. (Woe betold us though if the user waits for a year (32bit) and the request is still not complete!) v2: Refine overflow detection (to not include an overffow itself) Reported-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20170811105731.9482-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-08-15 16:25:25 +01:00
Chris Wilson	b8f55be644	drm/i915: Split obj->cache_coherent to track r/w Another month, another story in the cache coherency saga. This time, we come to the realisation that i915_gem_object_is_coherent() has been reporting whether we can read from the target without requiring a cache invalidate; but we were using it in places for testing whether we could write into the object without requiring a cache flush. So split the tracking into two, one to decide before reads, one after writes. See commit `e27ab73d17` ("drm/i915: Mark CPU cache as dirty on every transition for CPU writes") for the previous entry in this saga. v2: Be verbose v3: Remove unused function (i915_gem_object_is_coherent) v4: Fix inverted coherency check prior to execbuf (from v2) v5: Add comment for nasty code where we are optimising on gcc's behalf. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101109 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101555 Testcase: igt/kms_mmap_write_crc Testcase: igt/kms_pwrite_crc Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Tested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170811111116.10373-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-08-15 15:46:57 +01:00
Chris Wilson	8fb6a5df46	drm/i915: Call the unlocked version of i915_gem_object_get_pages() When we hold for the lock for swapping out the shmem pages for the physically contiguous pages, we have to call the unlocked version of get_pages! Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101934 Fixes: 35d23516946e ("drm/i915: Make i915_gem_object_phys_attach() use obj->mm.lock more appropriately") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170726181602.23527-2-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2017-07-27 09:56:11 +02:00
Chris Wilson	8eeb79060b	drm/i915: Move i915_gem_object_phys_attach() Prevent a forward declaration in the next patch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170726181602.23527-1-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2017-07-27 09:56:11 +02:00
Chris Wilson	6ea1d55d31	drm/i915: Make i915_gem_object_phys_attach() use obj->mm.lock more appropriately Actually transferring from shmemfs to the physically contiguous set of pages should be wholly guarded by its obj->mm.lock! v2: Remember to free the old pages. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170726160038.29487-2-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2017-07-27 09:56:10 +02:00
Chris Wilson	d1d1ebf412	drm/i915: Don't touch fence->error when resetting an innocent request If the request has been completed before the reset took effect, we don't need to mark it up as being a victim. Touching fence->error after the fence has been signaled is detected by dma_fence_set_error() and triggers a BUG: [ 231.743133] kernel BUG at ./include/linux/dma-fence.h:434! [ 231.743156] invalid opcode: 0000 [#1] SMP KASAN [ 231.743172] Modules linked in: i915 drm_kms_helper drm iptable_nat nf_nat_ipv4 nf_nat x86_pkg_temp_thermal iosf_mbi i2c_algo_bit cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea fb font fbdev [last unloaded: drm] [ 231.743221] CPU: 2 PID: 20 Comm: kworker/2:0 Tainted: G U 4.13.0-rc1+ #52 [ 231.743236] Hardware name: Hewlett-Packard HP EliteBook 8460p/161C, BIOS 68SCF Ver. F.01 03/11/2011 [ 231.743363] Workqueue: events_long i915_hangcheck_elapsed [i915] [ 231.743382] task: ffff8801f42e9780 task.stack: ffff8801f42f8000 [ 231.743489] RIP: 0010:i915_gem_reset_engine+0x45a/0x460 [i915] [ 231.743505] RSP: 0018:ffff8801f42ff770 EFLAGS: 00010202 [ 231.743521] RAX: 0000000000000007 RBX: ffff8801bf6b1880 RCX: ffffffffa02881a6 [ 231.743537] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff8801bf6b18c8 [ 231.743551] RBP: ffff8801f42ff7c8 R08: 0000000000000001 R09: 0000000000000000 [ 231.743566] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801edb02d00 [ 231.743581] R13: ffff8801e19d4200 R14: 000000000000001d R15: ffff8801ce2a4000 [ 231.743599] FS: 0000000000000000(0000) GS:ffff8801f5a80000(0000) knlGS:0000000000000000 [ 231.743614] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 231.743629] CR2: 00007f0ebd1add10 CR3: 0000000002621000 CR4: 00000000000406e0 [ 231.743643] Call Trace: [ 231.743752] i915_gem_reset+0x6c/0x150 [i915] [ 231.743853] i915_reset+0x175/0x210 [i915] [ 231.743958] i915_reset_device+0x33b/0x350 [i915] [ 231.744061] ? valleyview_pipestat_irq_handler+0xe0/0xe0 [i915] [ 231.744081] ? trace_hardirqs_off_caller+0x70/0x110 [ 231.744102] ? _raw_spin_unlock_irqrestore+0x46/0x50 [ 231.744120] ? find_held_lock+0x119/0x150 [ 231.744138] ? mark_lock+0x6d/0x850 [ 231.744241] ? gen8_gt_irq_ack+0x1f0/0x1f0 [i915] [ 231.744262] ? work_on_cpu_safe+0x60/0x60 [ 231.744284] ? rcu_read_lock_sched_held+0x57/0xa0 [ 231.744400] ? gen6_read32+0x2ba/0x320 [i915] [ 231.744506] i915_handle_error+0x382/0x5f0 [i915] [ 231.744611] ? gen6_rps_reset_ei+0x20/0x20 [i915] [ 231.744630] ? vsnprintf+0x128/0x8e0 [ 231.744649] ? pointer+0x6b0/0x6b0 [ 231.744667] ? debug_check_no_locks_freed+0x1a0/0x1a0 [ 231.744688] ? scnprintf+0x92/0xe0 [ 231.744706] ? snprintf+0xb0/0xb0 [ 231.744820] hangcheck_declare_hang+0x15a/0x1a0 [i915] [ 231.744932] ? engine_stuck+0x440/0x440 [i915] [ 231.744951] ? rcu_read_lock_sched_held+0x57/0xa0 [ 231.745062] ? gen6_read32+0x2ba/0x320 [i915] [ 231.745173] ? gen6_read16+0x320/0x320 [i915] [ 231.745284] ? intel_engine_get_active_head+0x91/0x170 [i915] [ 231.745401] i915_hangcheck_elapsed+0x3d8/0x400 [i915] [ 231.745424] process_one_work+0x3e8/0xac0 [ 231.745444] ? pwq_dec_nr_in_flight+0x110/0x110 [ 231.745464] ? do_raw_spin_lock+0x8e/0x120 [ 231.745484] worker_thread+0x8d/0x720 [ 231.745506] kthread+0x19e/0x1f0 [ 231.745524] ? process_one_work+0xac0/0xac0 [ 231.745541] ? kthread_create_on_node+0xa0/0xa0 [ 231.745560] ret_from_fork+0x27/0x40 [ 231.745581] Code: 8b 7d c8 e8 49 0d 02 e1 49 8b 7f 38 48 8b 75 b8 48 83 c7 10 e8 b8 89 be e1 e9 95 fc ff ff 4c 89 e7 e8 4b b9 ff ff e9 30 ff ff ff <0f> 0b 0f 1f 40 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fe [ 231.745767] RIP: i915_gem_reset_engine+0x45a/0x460 [i915] RSP: ffff8801f42ff770 At first glance this looks to be related to commit `c64992e035` ("drm/i915: Look for active requests earlier in the reset path"), but it could easily happen before as well. On the other hand, we no longer logged victims due to the active_request being dropped earlier. v2: Be trickier to unwind the incomplete request as we cannot rely on request retirement for the lockless per-engine reset. v3: Reprobe the active request at the time of the reset. Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch> Fixes: `c64992e035` ("drm/i915: Look for active requests earlier in the reset path") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-15-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v1 Reviewed-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2017-07-27 09:38:57 +02:00
Chris Wilson	77b25a972b	drm/i915: Make i915_gem_context_mark_guilty() safe for unlocked updates Since we make call i915_gem_context_mark_guilty() concurrently when resetting different engines in parallel, we need to make sure that our updates are safe for the unlocked access. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-12-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2017-07-27 09:38:47 +02:00
Chris Wilson	ed454f2cd6	drm/i915: Clear engine irq posted following a reset When the GPU is reset, we want to discard all pending notifications as either we have manually completed them, or they are no longer applicable. Make sure we do reset the engine->irq_posted prior to re-enabling the engine (e.g. the interrupt tasklets) in i915_gem_reset_finish_engine(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-11-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2017-07-27 09:38:47 +02:00
Chris Wilson	bf2eac3bee	drm/i915: Assert that machine is wedged for nop_submit_request We should only ever do nop_submit_request when the machine is wedged, so assert it is so. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-10-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2017-07-27 09:38:46 +02:00
Chris Wilson	3d7adbbf49	drm/i915: Wake up waiters after setting the WEDGED bit After setting the WEDGED bit, make sure that we do wake up waiters as they may not be waiting for a request completion yet, just for its execution. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-9-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2017-07-27 09:38:46 +02:00
Chris Wilson	5e32d7482e	drm/i915: Clear execlist port[] before updating seqno on wedging When we wedge the device, we clear out the in-flight requests and advance the breadcrumb to indicate they are complete. However, the breadcrumb advance includes an assert that the engine is idle, so that advancement needs to be the last step to ensure we pass our own sanity checks. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-7-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2017-07-27 09:38:45 +02:00
Daniel Vetter	64282ea2d2	Merge airlied/drm-next into drm-intel-next-queued Resync with upstream to avoid git getting too badly confused. Also, we have a conflict with the drm_vblank_cleanup removal, which cannot be resolved by simply taking our side. Bake that in properly. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2017-07-27 09:33:49 +02:00
Daniel Vetter	8b5d27b911	drm/i915: Remove intel_flip_work infrastructure This gets rid of all the interactions between the legacy flip code and the modeset code. Yay! This highlights an ommission in the atomic paths, where we fail to apply a boost to the pending rendering when we miss the target vblank. But the existing code is still dead and can be removed. v2: Note that the boosting doesn't work in atomic (Chris). Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170720175754.30751-7-daniel.vetter@ffwll.ch	2017-07-20 22:45:36 +02:00
Dave Airlie	2d62c799f8	Merge tag 'drm-intel-next-2017-07-17' of git://anongit.freedesktop.org/git/drm-intel into drm-next 2nd round of 4.14 features: - prep for deferred fbdev setup - refactor fixed 16.16 computations and skl+ wm code (Mahesh Kumar) - more cnl paches (Rodrigo, Imre et al) - tighten context cleanup and handling (Chris Wilson) - fix interlaced handling on skl+ (Mahesh Kumar) - small bits as usual * tag 'drm-intel-next-2017-07-17' of git://anongit.freedesktop.org/git/drm-intel: (84 commits) drm/i915: Update DRIVER_DATE to 20170717 drm/i915: Protect against deferred fbdev setup drm/i915/fbdev: Always forward hotplug events drm/i915/skl+: unify cpp value in WM calculation drm/i915/skl+: WM calculation don't require height drm/i915: Addition wrapper for fixed16.16 operation drm/i915: cleanup fixed-point wrappers naming drm/i915: Always perform internal fixed16 division in 64 bits drm/i915: take-out common clamping code of fixed16 wrappers drm/i915/cnl: Add missing type case. drm/i915/cnl: Add max allowed Cannonlake DC. drm/i915: Make DP-MST connector info work drm/i915/cnl: Get DDI clock based on PLLs. drm/i915/cnl: Inherit RPS stuff from previous platforms. drm/i915/cnl: Gen10 render context size. drm/i915/cnl: Don't trust VBT's alternate pin for port D for now. drm/i915: Fix the kernel panic when using aliasing ppgtt drm/i915/cnl: Cannonlake color init. drm/i915/cnl: Add force wake for gen10+. x86/gpu: CNL uses the same GMS values as SKL ...	2017-07-20 11:31:43 +10:00
Michal Hocko	dbb329561a	drm/i915: use __GFP_RETRY_MAYFAIL Commit `24f8e00a8a` ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations") has tried to remove disruptive OOM killer because the userspace should be able to cope with allocation failures. At the time only __GFP_NORETRY could achieve that and it turned out that this would fail the allocations just too easily. So "drm/i915: Remove __GFP_NORETRY from our buffer allocator" removed it and hoped for a better solution. __GFP_RETRY_MAYFAIL is that solution. It will keep retrying the allocation until there is no more progress and we would go OOM. Instead we fail the allocation and let the caller to deal with it. Link: http://lkml.kernel.org/r/20170623085345.11304-6-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Alex Belits <alex.belits@cavium.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: David Daney <david.daney@cavium.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: NeilBrown <neilb@suse.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-07-12 16:26:04 -07:00
Chris Wilson	7b92c1bd05	drm/i915: Avoid keeping waitboost active for signaling threads Once a client has requested a waitboost, we keep that waitboost active until all clients are no longer waiting. This is because we don't distinguish which waiter deserves the boost. However, with the advent of fence signaling, the signaler threads appear as waiters to the RPS interrupt handler. So instead of using a single boolean to track when to keep the waitboost active, use a counter of all outstanding waitboosted requests. At this point, I have removed all vestiges of the rate limiting on clients. Whilst this means that compositors should remain more fluid, it also means that boosts are more prevalent. See commit `b29c19b645` ("drm/i915: Boost RPS frequency for CPU stalls") for a longer discussion on the pros and cons of both approaches. A drawback of this implementation is that it requires constant request submission to keep the waitboost trimmed (as it is now cancelled when the request is completed). This will be fine for a busy system, but near idle the boosts may be kept for longer than desired (effectively tens of vblanks worstcase) and there is a reliance on rc6 instead. v2: Remove defunct rps.client_lock Reported-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170628123548.9236-1-chris@chris-wilson.co.uk	2017-06-28 15:23:12 +01:00
Chris Wilson	13f8458f9a	drm/i915: Drop flushing of the object free list/worker from i915_gem_suspend i915_gem_suspend() is called from all of our finalization paths (suspend, hibernate, unload). i915_gem_drain_freed_objects() adds an arbitrary delay as it uses an rcu_barrier() to ensure that there are no more freed objects in flight, and this delay causes a large amount of variability in suspend timings. For S3 suspend, we do not need to free pages as doing so does not impact at all upon the system in its suspended state, unlike S4 hibernation where we do want the hibernation image to be as small as possible. Therefore we can forgo waiting inside i915_gem_suspend(), so long as we ensure that we do cleanup before unload (see i915_gem_load_cleanup()) and prefer to reap our objects prior to hibernation (see i915_gem_freeze()). Removing the rcu_barrier() from i915_gem_suspend() improves S3 latency by about 30ms on Skylake (ymmv). Reported-by: David Weinehall <david.weinehall@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: David Weinehall <david.weinehall@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170627173731.11566-1-chris@chris-wilson.co.uk Tested-by: David Weinehall <david.weinehall@linux.intel.com> Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>	2017-06-28 12:15:20 +01:00
Chris Wilson	36703e79a9	drm/i915: Break modeset deadlocks on reset Trying to do a modeset from within a reset is fraught with danger. We can fall into a cyclic deadlock where the modeset is waiting on a previous modeset that is waiting on a request, and since the GPU hung that request completion is waiting on the reset. As modesetting doesn't allow its locks to be broken and restarted, or for its own reset mechanism to take over the display, we have to do something very evil instead. If we detect that we are stuck waiting to prepare the display reset (by using a very simple timeout), resort to cancelling all in-flight requests and throwing the user data into /dev/null, which is marginally better than the driver locking up and keeping that data to itself. This is not a fix; this is just a workaround that unbreaks machines until we can resolve the deadlock in a way that doesn't lose data! v2: Move the retirement from set-wegded to the i915_reset() error path, after which we no longer any delayed worker cleanup for i915_handle_error() v3: C abuse for syntactic sugar v4: Cover all waits with the timeout to catch more driver breakage References: https://bugs.freedesktop.org/show_bug.cgi?id=99093 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170622105625.16952-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-06-23 13:41:55 +01:00
Chris Wilson	4ee056f418	drm/i915: Cancel pending execlist tasklet upon wedging Highly unlikely, but if the stop_machine() did suspend the tasklet, we want to make sure that when it wakes it finds there is nothing to do. Otherwise, it will loudly complain that the ELSP port tracking no longer matches the hardware, and we will be mightly confused. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170621124804.4529-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-06-21 15:15:33 +01:00
Michel Thierry	a1ef70e144	drm/i915: Add support for per engine reset recovery This change implements support for per-engine reset as an initial, less intrusive hang recovery option to be attempted before falling back to the legacy full GPU reset recovery mode if necessary. This is only supported from Gen8 onwards. Hangchecker determines which engines are hung and invokes error handler to recover from it. Error handler schedules recovery for each of those engines that are hung. The recovery procedure is as follows, - identifies the request that caused the hang and it is dropped - force engine to idle: this is done by issuing a reset request - reset the engine - re-init the engine to resume submissions. If engine reset fails then we fall back to heavy weight full gpu reset which resets all engines and reinitiazes complete state of HW and SW. v2: Rebase. v3: s/engine_reset/reset_engine/; freeze engine and irqs before calling i915_gem_reset_engine (Chris). v4: Rebase, modify i915_gem_reset_prepare to use a ring mask and reuse the function for reset_engine. v5: intel_reset_engine_start/cancel instead of request/unrequest_reset. v6: Clean up reset_engine function to not require mutex, i.e. no need to call revoke/restore_fences and _retire_requests (Chris). v7: Remove leftovers from v5, i.e. no need to disable irq, hold forcewake or wakeup the handoff bit (Chris). v8: engine_retire_requests should be (and it was) static; explain that we have to re-init the engine after reset, which is why the init_hw call is needed; check reset-in-progress flag (Chris). v9: Rebase, include code to pass the active request to gem_reset_engine (as it is already done in full reset). Remove unnecessary intel_reset_engine_start/cancel, these are executed as part of the reset. v10: Rebase, use the right I915_RESET_ENGINE flag. v11: Fixup to call reset_finish_engine even on error. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-6-michel.thierry@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-6-chris@chris-wilson.co.uk	2017-06-20 21:00:16 +01:00
Michel Thierry	c64992e035	drm/i915: Look for active requests earlier in the reset path And store the active request so that we only search for it once. v2: Check for request completion inside _prepare_engine, don't use ECANCELED, remove unnecessary null checks (Chris). v3: Capture active requests during reset_prepare and store it the engine hangcheck obj. v4: Rename commit, change i915_gem_reset_request to just confirm the active_request is still incomplete, instead of engine_stalled (Chris). v5: With style; pass the active request to gem_reset_engine, keep single return in reset_prepare_engine (Chris). v6: Moved before reset-engine code appears (Chris) Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v5) Signed-off-by: Michel Thierry <michel.thierry@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-2-michel.thierry@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-3-chris@chris-wilson.co.uk	2017-06-20 21:00:03 +01:00
Chris Wilson	829a0af29f	drm/i915: Group all the global context information together Create a substruct to hold all the global context state under drm_i915_private. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-1-chris@chris-wilson.co.uk	2017-06-20 17:13:40 +01:00
Chris Wilson	7dd4f6729f	drm/i915: Async GPU relocation processing If the user requires patching of their batch or auxiliary buffers, we currently make the alterations on the cpu. If they are active on the GPU at the time, we wait under the struct_mutex for them to finish executing before we rewrite the contents. This happens if shared relocation trees are used between different contexts with separate address space (and the buffers then have different addresses in each), the 3D state will need to be adjusted between execution on each context. However, we don't need to use the CPU to do the relocation patching, as we could queue commands to the GPU to perform it and use fences to serialise the operation with the current activity and future - so the operation on the GPU appears just as atomic as performing it immediately. Performing the relocation rewrites on the GPU is not free, in terms of pure throughput, the number of relocations/s is about halved - but more importantly so is the time under the struct_mutex. v2: Break out the request/batch allocation for clearer error flow. v3: A few asserts to ensure rq ordering is maintained Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-06-16 16:54:05 +01:00
Chris Wilson	8a2421bd0d	drm/i915: Wait upon userptr get-user-pages within execbuffer This simply hides the EAGAIN caused by userptr when userspace causes resource contention. However, it is quite beneficial with highly contended userptr users as we avoid repeating the setup costs and kernel-user context switches. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>	2017-06-16 16:54:05 +01:00
Chris Wilson	4ff4b44cbb	drm/i915: Store a direct lookup from object handle to vma The advent of full-ppgtt lead to an extra indirection between the object and its binding. That extra indirection has a noticeable impact on how fast we can convert from the user handles to our internal vma for execbuffer. In order to bypass the extra indirection, we use a resizable hashtable to jump from the object to the per-ctx vma. rhashtable was considered but we don't need the online resizing feature and the extra complexity proved to undermine its usefulness. Instead, we simply reallocate the hastable on demand in a background task and serialize it before iterating. In non-full-ppgtt modes, multiple files and multiple contexts can share the same vma. This leads to having multiple possible handle->vma links, so we only use the first to establish the fast path. The majority of buffers are not shared and so we should still be able to realise speedups with multiple clients. v2: Prettier names, more magic. v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-06-16 16:54:04 +01:00
Chris Wilson	7fc92e96c3	drm/i915: Store i915_gem_object_is_coherent() as a bit next to cache-dirty For ease of use (i.e. avoiding a few checks and function calls), store the object's cache coherency next to the cache is dirty bit. Specifically this patch aims to reduce the frequency of no-op calls to i915_gem_object_clflush() to counter-act the increase of such calls for GPU only objects in the previous patch. v2: Replace cache_dirty & ~cache_coherent with cache_dirty && !cache_coherent as gcc generates much better code for the latter (Tvrtko) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Tested-by: Dongwon Kim <dongwon.kim@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170616105455.16977-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-06-16 14:52:27 +01:00
Chris Wilson	e27ab73d17	drm/i915: Mark CPU cache as dirty on every transition for CPU writes Currently, we only mark the CPU cache as dirty if we skip a clflush. This leads to some confusion where we have to ask if the object is in the write domain or missed a clflush. If we always mark the cache as dirty, this becomes a much simply question to answer. The goal remains to do as few clflushes as required and to do them as late as possible, in the hope of deferring the work to a kthread and not block the caller (e.g. execbuf, flips). v2: Always call clflush before GPU execution when the cache_dirty flag is set. This may cause some extra work on llc systems that migrate dirty buffers back and forth - but we do try to limit that by only setting cache_dirty at the end of the gpu sequence. v3: Always mark the cache as dirty upon a level change, as we need to invalidate any stale cachelines due to external writes. Reported-by: Dongwon Kim <dongwon.kim@intel.com> Fixes: `a6a7cc4b7d` ("drm/i915: Always flush the dirty CPU cache when pinning the scanout") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Tested-by: Dongwon Kim <dongwon.kim@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170615123850.26843-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2017-06-16 14:50:52 +01:00
Chris Wilson	0f6ab55d7a	drm/i915: Only restrict noreclaim in the early shrink passes In our first pass, we do not want to use reclaim at all as we want to solely reap the i915 buffer caches (its purgeable pages). But we don't mind it initiates IO or pulls via the FS (but it shouldn't anyway as we say no to reclaim!). Just drop the GFP_IO constraint for simplicity. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-3-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-06-14 10:53:37 +01:00
Chris Wilson	eaf4180155	drm/i915: Remove __GFP_NORETRY from our buffer allocator I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It struggles with handling reclaim of our dirty buffers and relies on reclaim via kswapd. As a result, a single pass of direct reclaim is unreliable when i915 occupies the majority of available memory, and the only means of effectively waiting on kswapd to amke progress is by not setting the __GFP_NORETRY flag and lopping. That leaves us with the dilemma of invoking the oomkiller instead of propagating the allocation failure back to userspace where it can be handled more gracefully (one hopes). In the future we may have __GFP_MAYFAIL to allow repeats up until we genuinely run out of memory and the oomkiller would have been invoked. Until then, let the oomkiller wreck havoc. v2: Stop playing with side-effects of gfp flags and await __GFP_MAYFAIL v3: Update comments that direct reclaim only appears to be ignoring our dirty buffers! Fixes: `24f8e00a8a` ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations") Testcase: igt/gem_tiled_swapping Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Michal Hocko <mhocko@suse.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-2-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2017-06-14 10:53:09 +01:00
Chris Wilson	4846bf0ca8	drm/i915: Encourage our shrinker more when our shmemfs allocations fails Commit `24f8e00a8a` ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations") made the bold decision to try and avoid the oomkiller by reporting -ENOMEM to userspace if our allocation failed after attempting to free enough buffer objects. In short, it appears we were giving up too easily (even before we start wondering if one pass of reclaim is as strong as we would like). Part of the problem is that if we only shrink just enough pages for our expected allocation, the likelihood of those pages becoming available to us is less than 100% To counter-act that we ask for twice the number of pages to be made available. Furthermore, we allow the shrinker to pull pages from the active list in later passes. v2: Be a little more cautious in paging out gfx buffers, and leave that to a more balanced approach from shrink_slab(). Important when combined with "drm/i915: Start writeback from the shrinker" as anything shrunk is immediately swapped out and so should be more conservative. Fixes: `24f8e00a8a` ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-1-chris@chris-wilson.co.uk	2017-06-14 10:51:50 +01:00

... 9 10 11 12 13 ...

2327 commits