Patch series "New page table range API", v6.
This patchset changes the API used by the MM to set up page table entries.
The four APIs are:
set_ptes(mm, addr, ptep, pte, nr)
update_mmu_cache_range(vma, addr, ptep, nr)
flush_dcache_folio(folio)
flush_icache_pages(vma, page, nr)
flush_dcache_folio() isn't technically new, but no architecture
implemented it, so I've done that for them. The old APIs remain around
but are mostly implemented by calling the new interfaces.
The new APIs are based around setting up N page table entries at once.
The N entries belong to the same PMD, the same folio and the same VMA, so
ptep++ is a legitimate operation, and locking is taken care of for you.
Some architectures can do a better job of it than just a loop, but I have
hesitated to make too deep a change to architectures I don't understand
well.
One thing I have changed in every architecture is that PG_arch_1 is now a
per-folio bit instead of a per-page bit when used for dcache clean/dirty
tracking. This was something that would have to happen eventually, and it
makes sense to do it now rather than iterate over every page involved in a
cache flush and figure out if it needs to happen.
The point of all this is better performance, and Fengwei Yin has measured
improvement on x86. I suspect you'll see improvement on your architecture
too. Try the new will-it-scale test mentioned here:
https://lore.kernel.org/linux-mm/20230206140639.538867-5-fengwei.yin@intel.com/
You'll need to run it on an XFS filesystem and have
CONFIG_TRANSPARENT_HUGEPAGE set.
This patchset is the basis for much of the anonymous large folio work
being done by Ryan, so it's received quite a lot of testing over the last
few months.
This patch (of 38):
Determine if a value lies within a range more efficiently (subtraction +
comparison vs two comparisons and an AND). It also has useful (under some
circumstances) behaviour if the range exceeds the maximum value of the
type. Convert all the conflicting definitions of in_range() within the
kernel; some can use the generic definition while others need their own
definition.
Link: https://lkml.kernel.org/r/20230802151406.3735276-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20230802151406.3735276-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Another issue found by KASAN. The bit finding is buried inside the
dp_for_each_set_bit() macro (that passes on to for_each_set_bit() that
calls the bit stuff. These bit functions want an unsigned long pointer
as input and just dumbly casting leads to out-of-bounds accesses.
This fixes that.
Signed-off-by: Carsten Haitzler <carsten.haitzler@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: James Qian Wang <james.qian.wang@arm.com>
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210204131102.68658-1-carsten.haitzler@foss.arm.com
D32 is simple version of D71, the difference is:
- Only has one pipeline
- Drop the periph block and merge it to GCU
v2: Rebase.
v3: Isolate the block counting fix to a new patch
Signed-off-by: James Qian Wang (Arm Technology China) <james.qian.wang@arm.com>
Reviewed-by: Mihail Atanassov <mihail.atanassov@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191210084828.19664-3-james.qian.wang@arm.com
- Y0L2 and P010 are block (tiled) format, Update the kemeda logic to
compatible with such block format.
- Since DRM introduced a general block information to drm_format_info,
the format_caps->tiled_size no long needed, delete it.
- Build some fb utils functions for code sharing.
Signed-off-by: James Qian Wang (Arm Technology China) <james.qian.wang@arm.com>
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
dp_wait_cond() currently returns the number of retries left over which
is hardly an useful information. Convert to returning -ETIMEDOUT when
the wait times out, or 0 (zero) when condition is met before deadline.
Also convert the users of the function to return the error value.
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: James Qian Wang (Arm Technology China) <james.qian.wang@arm.com>
Add two sysfs node: core_id, config_id, user can read them to fetch the
HW product information.
Also, use memset to initialize config_id, rather than quirky C syntax.
Courtesy of Nathan Chancellor <natechancellor@gmail.com>.
Signed-off-by: James Qian Wang (Arm Technology China) <james.qian.wang@arm.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
[Merged Nathan's patch that uses memset to initialize config_id into
original patch as the fixes tag changed due to rebase, reworded the
commit to reference the merged patch]
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
1. Add detailed layer/layer_state definitions
2. Add d71_layer_init to report layer features and capabilities according
to D71 layer block.
3. Add d71_layer_updat/disable
v2: Rebase.
Signed-off-by: James Qian Wang (Arm Technology China) <james.qian.wang@arm.com>
[removed d71_layer_dump() from this commit]
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
D71 consists of a number of Register Blocks, every Block controls a
specific HW function, every block has a common block_header to represent
its type and pipeline information.
GCU (Global Control Unit) is the first Block which describe the global
information of D71 HW, Like number of block contained and the number of
pipeline supported.
So the d71_enum_resources parsed GCU and create pipeline according
the GCU configuration, and then iterate and detect the blocks that
indicated by the GCU and block_header.
And this change also added two struct d71_dev/d71_pipeline to extend
komeda_dev/komeda_pipeline to add some d71 only members.
v2:
- Return the specific errno not -1.
- Use DRM_DEBUG as default debug msg printer.
Signed-off-by: James Qian Wang (Arm Technology China) <james.qian.wang@arm.com>
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
Implement a simple wrapper for platform module to build komeda to module,
Also add a very simple D71 layer code to show how to discover a product.
Komeda driver direct bind the product ENTRY function xxx_identity to DT
compatible name like:
d71_product = {
.product_id = MALIDP_D71_PRODUCT_ID,
.identify = d71_identify,
},
const struct of_device_id komeda_of_match[] = {
{ .compatible = "arm,mali-d71", .data = &d71_product, },
{},
};
Then when linux found a matched DT node and call driver to probe, we can
easily get the of data, and call into the product to do the identify:
komeda_bind()
{
...
product = of_device_get_match_data(dev);
product->identify();
...
}
Changes in v4:
- Replaced kzalloc with devm_kzalloc
Changes in v3:
- Fixed style problem found by checkpatch.pl --strict.
Signed-off-by: James Qian Wang (Arm Technology China) <james.qian.wang@arm.com>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
1. Added a brief definition of komeda_dev/pipeline/component, this change
didn't add the detailed component features and capabilities, which will
be added in the following changes.
2. Corresponding resources discovery and initialzation functions.
Changes in v4:
- Deleted unnecessary headers
Changes in v3:
- Fixed style problem found by checkpatch.pl --strict.
Changes in v2:
- Unified abbreviation of "pipeline" to "pipe".
Signed-off-by: James Qian Wang (Arm Technology China) <james.qian.wang@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>