In order to better encapsulate the FBC implementation
introduce a small helper to do the plane<->FBC instance
association.
We'll also try to structure the plane init code such
that introducing multiple FBC instances will be easier
down the line.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211124113652.22090-13-ville.syrjala@linux.intel.com
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
The underrun code doesn't need to know any details about FBC, so
just pass in the whole device rather than a specific FBC instance.
We could make this a bit more fine grained by also passing in the
pipe to intel_fbc_handle_fifo_underrun_irq() and letting the FBC
code figure which FBC instance (if any) is active on said pipe.
But that seems a bit overkill for this so don't bother.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211124113652.22090-11-ville.syrjala@linux.intel.com
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
In the future we may have multiple planes on the same pipe
capable of using FBC. Prepare for that by tracking FBC usage
per-plane rather than per-crtc.
v2: s/intel_get_crtc_for_pipe/intel_crtc_for_pipe/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211124113652.22090-9-ville.syrjala@linux.intel.com
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
Pass the FBC instance instead of the crtc to a bunch of places.
We also adjust intel_fbc_post_update() to do the
intel_fbc_get_reg_params() things instead of doing it from the lower
level function (which also gets called for front buffer tracking).
Nothing in there will change during front buffer updates.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211124113652.22090-8-ville.syrjala@linux.intel.com
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
There's no need to store all this stuff in intel_fbc_state_cache.
Just check it all against the plane/crtc states and store only
what we need. Probably more should get nuked still, but this
is a start.
So what we'll do is:
- each plane will check its own state and update its local
no_fbc_reason
- the per-plane no_fbc_reason (if any) then gets propagated
to the cache->no_fbc_reason while doing the actual update
- fbc->no_fbc_reason gets updated in the end with either
the value from the cache or directly from frontbuffer
tracking
It's still a bit messy, but should hopefuly get cleaned up
more in the future. At least now we can observe each plane's
reasons for rejecting FBC now more consistently, and we don't
have so mcuh redundant state store all over the place.
v2: store no_fbc_reason per-plane instead of per-pipe
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211124113652.22090-4-ville.syrjala@linux.intel.com
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
With both integrated and discrete Intel GPUs in a system, the current
global check of intel_iommu_gfx_mapped, as done from intel_vtd_active()
may not be completely accurate.
In this patch we add i915 parameter to intel_vtd_active() in order to
prepare it for multiple GPUs and we also change the check away from Intel
specific intel_iommu_gfx_mapped (global exported by the Intel IOMMU
driver) to probing the presence of IOMMU on a specific device using
device_iommu_mapped().
This will return true both for IOMMU pass-through and address translation
modes which matches the current behaviour. If in the future we wanted to
distinguish between these two modes we could either use
iommu_get_domain_for_dev() and check for __IOMMU_DOMAIN_PAGING bit
indicating address translation, or ask for a new API to be exported from
the IOMMU core code.
v2:
* Check for dmar translation specifically, not just iommu domain. (Baolu)
v3:
* Go back to plain "any domain" check for now, rewrite commit message.
v4:
* Use device_iommu_mapped. (Robin, Baolu)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211126141424.493753-1-tvrtko.ursulin@linux.intel.com
TileF(Tile4 in bspec) format is 4K tile organized into
64B subtiles with same basic shape as for legacy TileY
which will be supported by Display13.
v2: - Fixed wrong case condition(Jani Nikula)
- Increased I915_FORMAT_MOD_F_TILED up to 12(Imre Deak)
v3: - s/I915_TILING_F/TILING_4/g
- s/I915_FORMAT_MOD_F_TILED/I915_FORMAT_MOD_4_TILED/g
- Removed unneeded fencing code
v4: - Rebased, fixed merge conflict with new table-oriented
format modifier checking(Stan)
- Replaced the rest of "Tile F" mentions to "Tile 4"(Stan)
v5: - Still had to remove some Tile F mentionings
- Moved has_4tile from adlp to DG2(Ramalingam C)
- Check specifically for DG2, but not the Display13(Imre)
v6: - Moved Tile4 associating struct for modifier/display to
the beginning(Imre Deak)
- Removed unneeded case I915_FORMAT_MOD_4_TILED modifier
checks(Imre Deak)
- Fixed I915_FORMAT_MOD_4_TILED to be 9 instead of 12
(Imre Deak)
v7: - Fixed display_ver to { 13, 13 }(Imre Deak)
- Removed redundant newline(Imre Deak)
Reviewed-by: Imre Deak <imre.deak@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Juha-Pekka Heikkilä <juha-pekka.heikkila@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211122211420.31584-1-stanislav.lisovskiy@intel.com
Fix the recently introduced 'make htmldocs' warnings:
$ make htmldocs 2>&1 > /dev/null | grep i915
./drivers/gpu/drm/i915/display/intel_fbc.c:635: warning: Excess function parameter 'i915' description in 'intel_fbc_is_active'
./drivers/gpu/drm/i915/display/intel_fbc.c:1638: warning: Excess function parameter 'i915' description in 'intel_fbc_handle_fifo_underrun_irq'
./drivers/gpu/drm/i915/display/intel_fbc.c:635: warning: Function parameter or member 'fbc' not described in 'intel_fbc_is_active'
./drivers/gpu/drm/i915/display/intel_fbc.c:635: warning: Excess function parameter 'i915' description in 'intel_fbc_is_active'
./drivers/gpu/drm/i915/display/intel_fbc.c:1638: warning: Function parameter or member 'fbc' not described in 'intel_fbc_handle_fifo_underrun_irq'
./drivers/gpu/drm/i915/display/intel_fbc.c:1638: warning: Excess function parameter 'i915' description in 'intel_fbc_handle_fifo_underrun_irq'
Fixes: e49a656b92 ("drm/i915/fbc: Start passing around intel_fbc")
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211115140549.27629-1-jani.nikula@intel.com
With multiple fbc instances we need to find the right one for each
plane. Rather than going looking for the right instance every time
let's just replace the has_fbc boolean with a pointer that gets us
there straight away.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211104144520.22605-18-ville.syrjala@linux.intel.com
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
In preparation for multiple FBC instances start passing around
intel_fbc pointers rather than i915 pointers. And once there are
multiple of these we can't rely on container_of() to get back to
the i915, so we toss in a fbc->i915 pointer already.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211104144520.22605-17-ville.syrjala@linux.intel.com
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
The FBC register defines are a mess:
- namespace changes between DPFC_, FBC_, and some platform
specific prefix at a whim
- ilk+ reuses most g4x bits but still has some separate bit
defines elsewhere
- it's not clear from the defines that the bit defines are
shared
So let's clean it up:
- both g4x and ilk register share the same defines now
- only defines which conflict have a _PLATFORM suffix, everyone
else just gets comments to indicate which platforms do what
- namespace is consistent DPFC_ now
- SNB system agent fence registers also get a consistent namespace
- REG_BIT() & co. for everything
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211104144520.22605-13-ville.syrjala@linux.intel.com
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
Just use a same mask for ivb/hsw as for bdw+. The extra bit
in the bdw mask is mbz on ivb/hsw anyway so this is just
pointless complexity.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211104144520.22605-12-ville.syrjala@linux.intel.com
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
Pull the direct FBC register frobbing out from the debugfs code
into the fbc code. Also add a vfunc for this so we don't need
extra platforms checks.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211104144520.22605-11-ville.syrjala@linux.intel.com
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
Eliminate yet another if-ladder by adding .nuke() vfunc.
We also rename all *_recompress() stuff to *_nuke() since
that's the terminology the spec uses. Also "recompress"
is a bit confusing by perhaps implying that this triggers
an immediate recompression. Depending on the hardware that
may definitely not be the case, and in general we don't
specifically know when the hardware decides to compress.
So all we do is "nuke" the current compressed framebuffer
and leave it up to the hardware to recompress later if it
so chooses.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211104144520.22605-8-ville.syrjala@linux.intel.com
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
Declutter the *_fbc_activate() functions by pulling all the
control register value computations into helpers.
I left the enable bit in *_fbc_activate() in the hopes of maybe
using the helpers in the *_fbc_deactivate() paths as well instead
of the current rmw approach. That won't be possible at least
quite yet since we clobber the fbc->params before deactivating
FBC so we could end up changing some of the values live, which
given FBC's lack of/poor double buffering would likely not go
so well.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211104144520.22605-6-ville.syrjala@linux.intel.com
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
No need to tiptoe around programming DPFC_FENCE_YOFF with
params->fence_y_offset vs. 0. If the fence is not enabled
it doesn't even matter what we program here.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211104144520.22605-4-ville.syrjala@linux.intel.com
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
We have two identical copies of the snb+ system agent
CPU fence programming code. Extract into a helper.
Also there's no real point in insisting that we
program 0 into DPFC_CPU_FENCE_OFFSET when the fence is
disabled. So just always stick the computed Y offset there
whether or not the fence is actually used or not.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211104144520.22605-2-ville.syrjala@linux.intel.com
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
The next patch needs to distinguish between a view's mapping and scanout
stride. Rename the current stride parameter to mapping_stride with the
script below. mapping_stride will keep the same meaning as stride had
on all platforms so far, while the meaning of it will change on ADLP.
No functional changes.
@@
identifier intel_fb_view;
identifier i915_color_plane_view;
identifier color_plane;
expression e;
type T;
@@
struct intel_fb_view {
...
struct i915_color_plane_view {
...
- T stride;
+ T mapping_stride;
...
} color_plane[e];
...
};
@@
struct i915_color_plane_view pv;
@@
pv.
- stride
+ mapping_stride
@@
struct i915_color_plane_view *pvp;
@@
pvp->
- stride
+ mapping_stride
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211026225105.2783797-6-imre.deak@intel.com
Legacy cursor APIs are handled by intel_legacy_cursor_update(), that
calls drm_atomic_helper_update_plane() when going through the
slow/atomic path to update cursor, what was the case for PSR2
selective fetch.
drm_atomic_helper_update_plane() sets
drm_atomic_state->legacy_cursor_update to true when updating the
cursor plane, to allow several cursor updates to happen within the
same frame, as userspace does that.
If drivers waited for a vblank increment at the end of every cursor
movement that would cause a visible lag in the cursor.
But this optimization do not properly work with PSR2 selective fetch
dirt area calculation, for example if within a single frame the cursor
had 3 moves the final dirt area programmed to PSR2_MAN_TRK_CTL would
be based in the second movement as old state and third movement as new
state, not updating the area where cursor was in the first state.
So here switching back to the fast path approach in
intel_legacy_cursor_update() and handling cursor movements as
frontbuffer rendering(psr_force_hw_tracking_exit()), that is not the
most optimal for power-savings but is the solution that we have until
mailbox style updates is implemented.
Also removing the cursor workaround as not it is properly undestand
the issue and is know that it will never cover all the cases.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210930001409.254817-5-jose.souza@intel.com
FBC+Yf tiling seems to work just fine, and unlike with linear
the hardware does appear to correctly calculate the CFB stride
with using the override stride on both cfl and glk. So no need
for any additional tweaks.
Cc: Uma Shankar <uma.shankar@intel.com> #v2
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210924141330.1515-1-ville.syrjala@linux.intel.com
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
On FBC1 we can specify an arbitrary cfb stride. The hw will
simply throw away any compressed line that would exceed the
specified limit and keep using the uncompressed data instead.
Thus we can allow arbitrary compression limits.
The one thing we have to keep in mind though is that the cfb
stride is specified in units of 32B (gen2) or 64B (gen3+).
Fortunately X-tile is already 128B (gen2) or 512B (gen3+) wide
so as long as we limit outselves to the same 4x compression
limit that FBC2 has we are guaranteed to have a sufficiently
aligned cfb stride.
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210921152517.803-5-ville.syrjala@linux.intel.com
There's some kind of weird corner cases in FBC which requires
FBC segments to be separated by at least one extra cacheline.
Make sure that is present.
v2: Respin to fit in with skl_fbc_min_cfb_stride()
v3: Make it build
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> #v1
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210921181245.15091-1-ville.syrjala@linux.intel.com
Apply the same 512 byte FBC segment alignment to glk+ as we use
on skl+. The only real difference is that we now have a dedicated
register for the FBC override stride. Not 100% sure which
platforms really need the 512B alignment, but it's easiest
to just do it on everything.
Also the hardware no longer seems to misclaculate the CFB stride
for linear, so we can omit the use of the override stride for
linear unless the stride is misaligned.
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210921152517.803-3-ville.syrjala@linux.intel.com
The code to calculate the cfb stride/size is a bit of mess.
The cfb size is getting calculated based purely on the plane
stride and plane height. That doesn't account for extra
alignment we want for the cfb stride. The gen9 override
stride OTOH is just calculated based on the plane width, and
it does try to make things more aligned but any extra alignment
added there is not considered in the cfb size calculations.
So not at all convinced this is working as intended. Additionally
the compression limit handling is split between the cfb allocation
code and g4x_dpfc_ctl_limit() (for the 16bpp case), which is just
confusing.
Let's streamline the whole thing:
- Start with the plane stride, convert that into cfb stride (cfb is
always 4 bytes per pixel). All the calculations will assume 1:1
compression limit since that will give us the max values, and we
don't yet know how much stolen memory we will be able to allocate
- Align the cfb stride to 512 bytes on modern platforms. This guarantees
the 4 line segment will be 512 byte aligned regardles of the final
compression limit we choose later. The 512 byte alignment for the
segment is required by at least some of the platforms, and just doing
it always seems like the easiest option
- Figure out if we need to use the override stride or not. For X-tiled
it's never needed since the plane stride is already 512 byte aligned,
for Y-tiled it will be needed if the plane stride is not a multiple
of 512 bytes, and for linear it's apparently always needed because the
hardware miscalculates the cfb stride as PLANE_STRIDE*512 instead of
the PLANE_STRIDE*64 that it use with linear.
- The cfb size will be calculated based on the aligned cfb stride to
guarantee we actually reserved enough stolen memory and the FBC hw
won't end up scribbling over whatever else is allocated in stolen
- The compression limit handling we just do fully in the cfb allocation
code to make things less confusing
v2: Write the min cfb segment stride calculation in a more
explicit way to make it clear what is going on
v3: Remeber to update fbc->limit when changing to 16bpp
Reviewed-by: Uma Shankar <uma.shankar@intel.com> #v2
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210923042151.19052-1-ville.syrjala@linux.intel.com
Polish the FBC stride override stuff:
- just call it override_cfb_stride since it'll be used on
more gens later
- Use REG_BIT() & co. for the registers and give everything
CHICKEN_ prefix since glk+ will have a different register
for this
- Use intel_de_rmw() for the RMW
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210702204603.596-5-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
On ILK+ we current do a nuke right after activating FBC. If my
memory isn't playing tricks on me this is actially required if
FBC didn't stay disabled for a full frame. In that case the
deactivate+reactivate may not invalidate the cfb. I'd have to
double chekc to be sure though.
So let's keep the nuke, and just extend it backwards to cover
all the platforms by doing it a bit higher up.
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210702204603.596-4-ville.syrjala@linux.intel.com
If we hit the error path here we unconditionally call
i915_gem_stolen_remove_node, even though we only allocate the
compressed_llb on older platforms. Therefore we should first check that
we actually allocated the node before trying to remove it.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3709
Fixes: 46b2c40e0a ("drm/i915/fbc: Allocate llb before cfb")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210701090326.1056452-1-matthew.auld@intel.com
Since the llb allocation has a fixed size, let's grab it before
the potentially variable sized cfb. That should avoid some allocation
failure cases once we allow different compression ratios for FBC1.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210610183237.3920-10-ville.syrjala@linux.intel.com
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>