Merge branch 'docs-net-page_pool-sync-dev-and-kdoc'
Jakub Kicinski says: ==================== docs: net: page_pool: sync dev and kdoc Document PP_FLAG_DMA_SYNC_DEV based on recent conversation. Use kdoc to document structs and functions, to avoid duplication. Olek, this will conflict with your work, but I think that trying to make progress in parallel is the best course of action... Retargetting at net-next to make it a little less bad. ==================== Link: https://lore.kernel.org/r/20230802161821.3621985-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit is contained in:
commit
36e68eadd3
3 changed files with 181 additions and 94 deletions
|
@ -64,84 +64,68 @@ This lockless guarantee naturally comes from running under a NAPI softirq.
|
||||||
The protection doesn't strictly have to be NAPI, any guarantee that allocating
|
The protection doesn't strictly have to be NAPI, any guarantee that allocating
|
||||||
a page will cause no race conditions is enough.
|
a page will cause no race conditions is enough.
|
||||||
|
|
||||||
* page_pool_create(): Create a pool.
|
.. kernel-doc:: net/core/page_pool.c
|
||||||
* flags: PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV
|
:identifiers: page_pool_create
|
||||||
* order: 2^order pages on allocation
|
|
||||||
* pool_size: size of the ptr_ring
|
|
||||||
* nid: preferred NUMA node for allocation
|
|
||||||
* dev: struct device. Used on DMA operations
|
|
||||||
* dma_dir: DMA direction
|
|
||||||
* max_len: max DMA sync memory size
|
|
||||||
* offset: DMA address offset
|
|
||||||
|
|
||||||
* page_pool_put_page(): The outcome of this depends on the page refcnt. If the
|
.. kernel-doc:: include/net/page_pool.h
|
||||||
driver bumps the refcnt > 1 this will unmap the page. If the page refcnt is 1
|
:identifiers: struct page_pool_params
|
||||||
the allocator owns the page and will try to recycle it in one of the pool
|
|
||||||
caches. If PP_FLAG_DMA_SYNC_DEV is set, the page will be synced for_device
|
|
||||||
using dma_sync_single_range_for_device().
|
|
||||||
|
|
||||||
* page_pool_put_full_page(): Similar to page_pool_put_page(), but will DMA sync
|
.. kernel-doc:: include/net/page_pool.h
|
||||||
for the entire memory area configured in area pool->max_len.
|
:identifiers: page_pool_put_page page_pool_put_full_page
|
||||||
|
page_pool_recycle_direct page_pool_dev_alloc_pages
|
||||||
|
page_pool_get_dma_addr page_pool_get_dma_dir
|
||||||
|
|
||||||
* page_pool_recycle_direct(): Similar to page_pool_put_full_page() but caller
|
.. kernel-doc:: net/core/page_pool.c
|
||||||
must guarantee safe context (e.g NAPI), since it will recycle the page
|
:identifiers: page_pool_put_page_bulk page_pool_get_stats
|
||||||
directly into the pool fast cache.
|
|
||||||
|
|
||||||
* page_pool_dev_alloc_pages(): Get a page from the page allocator or page_pool
|
DMA sync
|
||||||
caches.
|
--------
|
||||||
|
Driver is always responsible for syncing the pages for the CPU.
|
||||||
|
Drivers may choose to take care of syncing for the device as well
|
||||||
|
or set the ``PP_FLAG_DMA_SYNC_DEV`` flag to request that pages
|
||||||
|
allocated from the page pool are already synced for the device.
|
||||||
|
|
||||||
* page_pool_get_dma_addr(): Retrieve the stored DMA address.
|
If ``PP_FLAG_DMA_SYNC_DEV`` is set, the driver must inform the core what portion
|
||||||
|
of the buffer has to be synced. This allows the core to avoid syncing the entire
|
||||||
|
page when the drivers knows that the device only accessed a portion of the page.
|
||||||
|
|
||||||
* page_pool_get_dma_dir(): Retrieve the stored DMA direction.
|
Most drivers will reserve headroom in front of the frame. This part
|
||||||
|
of the buffer is not touched by the device, so to avoid syncing
|
||||||
|
it drivers can set the ``offset`` field in struct page_pool_params
|
||||||
|
appropriately.
|
||||||
|
|
||||||
* page_pool_put_page_bulk(): Tries to refill a number of pages into the
|
For pages recycled on the XDP xmit and skb paths the page pool will
|
||||||
ptr_ring cache holding ptr_ring producer lock. If the ptr_ring is full,
|
use the ``max_len`` member of struct page_pool_params to decide how
|
||||||
page_pool_put_page_bulk() will release leftover pages to the page allocator.
|
much of the page needs to be synced (starting at ``offset``).
|
||||||
page_pool_put_page_bulk() is suitable to be run inside the driver NAPI tx
|
When directly freeing pages in the driver (page_pool_put_page())
|
||||||
completion loop for the XDP_REDIRECT use case.
|
the ``dma_sync_size`` argument specifies how much of the buffer needs
|
||||||
Please note the caller must not use data area after running
|
to be synced.
|
||||||
page_pool_put_page_bulk(), as this function overwrites it.
|
|
||||||
|
|
||||||
* page_pool_get_stats(): Retrieve statistics about the page_pool. This API
|
If in doubt set ``offset`` to 0, ``max_len`` to ``PAGE_SIZE`` and
|
||||||
is only available if the kernel has been configured with
|
pass -1 as ``dma_sync_size``. That combination of arguments is always
|
||||||
``CONFIG_PAGE_POOL_STATS=y``. A pointer to a caller allocated ``struct
|
correct.
|
||||||
page_pool_stats`` structure is passed to this API which is filled in. The
|
|
||||||
caller can then report those stats to the user (perhaps via ethtool,
|
Note that the syncing parameters are for the entire page.
|
||||||
debugfs, etc.). See below for an example usage of this API.
|
This is important to remember when using fragments (``PP_FLAG_PAGE_FRAG``),
|
||||||
|
where allocated buffers may be smaller than a full page.
|
||||||
|
Unless the driver author really understands page pool internals
|
||||||
|
it's recommended to always use ``offset = 0``, ``max_len = PAGE_SIZE``
|
||||||
|
with fragmented page pools.
|
||||||
|
|
||||||
Stats API and structures
|
Stats API and structures
|
||||||
------------------------
|
------------------------
|
||||||
If the kernel is configured with ``CONFIG_PAGE_POOL_STATS=y``, the API
|
If the kernel is configured with ``CONFIG_PAGE_POOL_STATS=y``, the API
|
||||||
``page_pool_get_stats()`` and structures described below are available. It
|
page_pool_get_stats() and structures described below are available.
|
||||||
takes a pointer to a ``struct page_pool`` and a pointer to a ``struct
|
It takes a pointer to a ``struct page_pool`` and a pointer to a struct
|
||||||
page_pool_stats`` allocated by the caller.
|
page_pool_stats allocated by the caller.
|
||||||
|
|
||||||
The API will fill in the provided ``struct page_pool_stats`` with
|
The API will fill in the provided struct page_pool_stats with
|
||||||
statistics about the page_pool.
|
statistics about the page_pool.
|
||||||
|
|
||||||
The stats structure has the following fields::
|
.. kernel-doc:: include/net/page_pool.h
|
||||||
|
:identifiers: struct page_pool_recycle_stats
|
||||||
struct page_pool_stats {
|
struct page_pool_alloc_stats
|
||||||
struct page_pool_alloc_stats alloc_stats;
|
struct page_pool_stats
|
||||||
struct page_pool_recycle_stats recycle_stats;
|
|
||||||
};
|
|
||||||
|
|
||||||
|
|
||||||
The ``struct page_pool_alloc_stats`` has the following fields:
|
|
||||||
* ``fast``: successful fast path allocations
|
|
||||||
* ``slow``: slow path order-0 allocations
|
|
||||||
* ``slow_high_order``: slow path high order allocations
|
|
||||||
* ``empty``: ptr ring is empty, so a slow path allocation was forced.
|
|
||||||
* ``refill``: an allocation which triggered a refill of the cache
|
|
||||||
* ``waive``: pages obtained from the ptr ring that cannot be added to
|
|
||||||
the cache due to a NUMA mismatch.
|
|
||||||
|
|
||||||
The ``struct page_pool_recycle_stats`` has the following fields:
|
|
||||||
* ``cached``: recycling placed page in the page pool cache
|
|
||||||
* ``cache_full``: page pool cache was full
|
|
||||||
* ``ring``: page placed into the ptr ring
|
|
||||||
* ``ring_full``: page released from page pool because the ptr ring was full
|
|
||||||
* ``released_refcnt``: page released (and not recycled) because refcnt > 1
|
|
||||||
|
|
||||||
Coding examples
|
Coding examples
|
||||||
===============
|
===============
|
||||||
|
|
|
@ -70,47 +70,76 @@ struct pp_alloc_cache {
|
||||||
struct page *cache[PP_ALLOC_CACHE_SIZE];
|
struct page *cache[PP_ALLOC_CACHE_SIZE];
|
||||||
};
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* struct page_pool_params - page pool parameters
|
||||||
|
* @flags: PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV, PP_FLAG_PAGE_FRAG
|
||||||
|
* @order: 2^order pages on allocation
|
||||||
|
* @pool_size: size of the ptr_ring
|
||||||
|
* @nid: NUMA node id to allocate from pages from
|
||||||
|
* @dev: device, for DMA pre-mapping purposes
|
||||||
|
* @napi: NAPI which is the sole consumer of pages, otherwise NULL
|
||||||
|
* @dma_dir: DMA mapping direction
|
||||||
|
* @max_len: max DMA sync memory size for PP_FLAG_DMA_SYNC_DEV
|
||||||
|
* @offset: DMA sync address offset for PP_FLAG_DMA_SYNC_DEV
|
||||||
|
*/
|
||||||
struct page_pool_params {
|
struct page_pool_params {
|
||||||
unsigned int flags;
|
unsigned int flags;
|
||||||
unsigned int order;
|
unsigned int order;
|
||||||
unsigned int pool_size;
|
unsigned int pool_size;
|
||||||
int nid; /* Numa node id to allocate from pages from */
|
int nid;
|
||||||
struct device *dev; /* device, for DMA pre-mapping purposes */
|
struct device *dev;
|
||||||
struct napi_struct *napi; /* Sole consumer of pages, otherwise NULL */
|
struct napi_struct *napi;
|
||||||
enum dma_data_direction dma_dir; /* DMA mapping direction */
|
enum dma_data_direction dma_dir;
|
||||||
unsigned int max_len; /* max DMA sync memory size */
|
unsigned int max_len;
|
||||||
unsigned int offset; /* DMA addr offset */
|
unsigned int offset;
|
||||||
|
/* private: used by test code only */
|
||||||
void (*init_callback)(struct page *page, void *arg);
|
void (*init_callback)(struct page *page, void *arg);
|
||||||
void *init_arg;
|
void *init_arg;
|
||||||
};
|
};
|
||||||
|
|
||||||
#ifdef CONFIG_PAGE_POOL_STATS
|
#ifdef CONFIG_PAGE_POOL_STATS
|
||||||
|
/**
|
||||||
|
* struct page_pool_alloc_stats - allocation statistics
|
||||||
|
* @fast: successful fast path allocations
|
||||||
|
* @slow: slow path order-0 allocations
|
||||||
|
* @slow_high_order: slow path high order allocations
|
||||||
|
* @empty: ptr ring is empty, so a slow path allocation was forced
|
||||||
|
* @refill: an allocation which triggered a refill of the cache
|
||||||
|
* @waive: pages obtained from the ptr ring that cannot be added to
|
||||||
|
* the cache due to a NUMA mismatch
|
||||||
|
*/
|
||||||
struct page_pool_alloc_stats {
|
struct page_pool_alloc_stats {
|
||||||
u64 fast; /* fast path allocations */
|
u64 fast;
|
||||||
u64 slow; /* slow-path order 0 allocations */
|
u64 slow;
|
||||||
u64 slow_high_order; /* slow-path high order allocations */
|
u64 slow_high_order;
|
||||||
u64 empty; /* failed refills due to empty ptr ring, forcing
|
u64 empty;
|
||||||
* slow path allocation
|
u64 refill;
|
||||||
*/
|
u64 waive;
|
||||||
u64 refill; /* allocations via successful refill */
|
|
||||||
u64 waive; /* failed refills due to numa zone mismatch */
|
|
||||||
};
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* struct page_pool_recycle_stats - recycling (freeing) statistics
|
||||||
|
* @cached: recycling placed page in the page pool cache
|
||||||
|
* @cache_full: page pool cache was full
|
||||||
|
* @ring: page placed into the ptr ring
|
||||||
|
* @ring_full: page released from page pool because the ptr ring was full
|
||||||
|
* @released_refcnt: page released (and not recycled) because refcnt > 1
|
||||||
|
*/
|
||||||
struct page_pool_recycle_stats {
|
struct page_pool_recycle_stats {
|
||||||
u64 cached; /* recycling placed page in the cache. */
|
u64 cached;
|
||||||
u64 cache_full; /* cache was full */
|
u64 cache_full;
|
||||||
u64 ring; /* recycling placed page back into ptr ring */
|
u64 ring;
|
||||||
u64 ring_full; /* page was released from page-pool because
|
u64 ring_full;
|
||||||
* PTR ring was full.
|
u64 released_refcnt;
|
||||||
*/
|
|
||||||
u64 released_refcnt; /* page released because of elevated
|
|
||||||
* refcnt
|
|
||||||
*/
|
|
||||||
};
|
};
|
||||||
|
|
||||||
/* This struct wraps the above stats structs so users of the
|
/**
|
||||||
* page_pool_get_stats API can pass a single argument when requesting the
|
* struct page_pool_stats - combined page pool use statistics
|
||||||
* stats for the page pool.
|
* @alloc_stats: see struct page_pool_alloc_stats
|
||||||
|
* @recycle_stats: see struct page_pool_recycle_stats
|
||||||
|
*
|
||||||
|
* Wrapper struct for combining page pool stats with different storage
|
||||||
|
* requirements.
|
||||||
*/
|
*/
|
||||||
struct page_pool_stats {
|
struct page_pool_stats {
|
||||||
struct page_pool_alloc_stats alloc_stats;
|
struct page_pool_alloc_stats alloc_stats;
|
||||||
|
@ -211,6 +240,12 @@ struct page_pool {
|
||||||
|
|
||||||
struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp);
|
struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* page_pool_dev_alloc_pages() - allocate a page.
|
||||||
|
* @pool: pool from which to allocate
|
||||||
|
*
|
||||||
|
* Get a page from the page allocator or page_pool caches.
|
||||||
|
*/
|
||||||
static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool)
|
static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool)
|
||||||
{
|
{
|
||||||
gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN);
|
gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN);
|
||||||
|
@ -230,8 +265,12 @@ static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool,
|
||||||
return page_pool_alloc_frag(pool, offset, size, gfp);
|
return page_pool_alloc_frag(pool, offset, size, gfp);
|
||||||
}
|
}
|
||||||
|
|
||||||
/* get the stored dma direction. A driver might decide to treat this locally and
|
/**
|
||||||
* avoid the extra cache line from page_pool to determine the direction
|
* page_pool_get_dma_dir() - Retrieve the stored DMA direction.
|
||||||
|
* @pool: pool from which page was allocated
|
||||||
|
*
|
||||||
|
* Get the stored dma direction. A driver might decide to store this locally
|
||||||
|
* and avoid the extra cache line from page_pool to determine the direction.
|
||||||
*/
|
*/
|
||||||
static
|
static
|
||||||
inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool)
|
inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool)
|
||||||
|
@ -321,6 +360,19 @@ static inline bool page_pool_is_last_frag(struct page_pool *pool,
|
||||||
(page_pool_defrag_page(page, 1) == 0);
|
(page_pool_defrag_page(page, 1) == 0);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* page_pool_put_page() - release a reference to a page pool page
|
||||||
|
* @pool: pool from which page was allocated
|
||||||
|
* @page: page to release a reference on
|
||||||
|
* @dma_sync_size: how much of the page may have been touched by the device
|
||||||
|
* @allow_direct: released by the consumer, allow lockless caching
|
||||||
|
*
|
||||||
|
* The outcome of this depends on the page refcnt. If the driver bumps
|
||||||
|
* the refcnt > 1 this will unmap the page. If the page refcnt is 1
|
||||||
|
* the allocator owns the page and will try to recycle it in one of the pool
|
||||||
|
* caches. If PP_FLAG_DMA_SYNC_DEV is set, the page will be synced for_device
|
||||||
|
* using dma_sync_single_range_for_device().
|
||||||
|
*/
|
||||||
static inline void page_pool_put_page(struct page_pool *pool,
|
static inline void page_pool_put_page(struct page_pool *pool,
|
||||||
struct page *page,
|
struct page *page,
|
||||||
unsigned int dma_sync_size,
|
unsigned int dma_sync_size,
|
||||||
|
@ -337,14 +389,29 @@ static inline void page_pool_put_page(struct page_pool *pool,
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Same as above but will try to sync the entire area pool->max_len */
|
/**
|
||||||
|
* page_pool_put_full_page() - release a reference on a page pool page
|
||||||
|
* @pool: pool from which page was allocated
|
||||||
|
* @page: page to release a reference on
|
||||||
|
* @allow_direct: released by the consumer, allow lockless caching
|
||||||
|
*
|
||||||
|
* Similar to page_pool_put_page(), but will DMA sync the entire memory area
|
||||||
|
* as configured in &page_pool_params.max_len.
|
||||||
|
*/
|
||||||
static inline void page_pool_put_full_page(struct page_pool *pool,
|
static inline void page_pool_put_full_page(struct page_pool *pool,
|
||||||
struct page *page, bool allow_direct)
|
struct page *page, bool allow_direct)
|
||||||
{
|
{
|
||||||
page_pool_put_page(pool, page, -1, allow_direct);
|
page_pool_put_page(pool, page, -1, allow_direct);
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Same as above but the caller must guarantee safe context. e.g NAPI */
|
/**
|
||||||
|
* page_pool_recycle_direct() - release a reference on a page pool page
|
||||||
|
* @pool: pool from which page was allocated
|
||||||
|
* @page: page to release a reference on
|
||||||
|
*
|
||||||
|
* Similar to page_pool_put_full_page() but caller must guarantee safe context
|
||||||
|
* (e.g NAPI), since it will recycle the page directly into the pool fast cache.
|
||||||
|
*/
|
||||||
static inline void page_pool_recycle_direct(struct page_pool *pool,
|
static inline void page_pool_recycle_direct(struct page_pool *pool,
|
||||||
struct page *page)
|
struct page *page)
|
||||||
{
|
{
|
||||||
|
@ -354,6 +421,13 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
|
||||||
#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT \
|
#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT \
|
||||||
(sizeof(dma_addr_t) > sizeof(unsigned long))
|
(sizeof(dma_addr_t) > sizeof(unsigned long))
|
||||||
|
|
||||||
|
/**
|
||||||
|
* page_pool_get_dma_addr() - Retrieve the stored DMA address.
|
||||||
|
* @page: page allocated from a page pool
|
||||||
|
*
|
||||||
|
* Fetch the DMA address of the page. The page pool to which the page belongs
|
||||||
|
* must had been created with PP_FLAG_DMA_MAP.
|
||||||
|
*/
|
||||||
static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
|
static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
|
||||||
{
|
{
|
||||||
dma_addr_t ret = page->dma_addr;
|
dma_addr_t ret = page->dma_addr;
|
||||||
|
|
|
@ -58,6 +58,17 @@ static const char pp_stats[][ETH_GSTRING_LEN] = {
|
||||||
"rx_pp_recycle_released_ref",
|
"rx_pp_recycle_released_ref",
|
||||||
};
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* page_pool_get_stats() - fetch page pool stats
|
||||||
|
* @pool: pool from which page was allocated
|
||||||
|
* @stats: struct page_pool_stats to fill in
|
||||||
|
*
|
||||||
|
* Retrieve statistics about the page_pool. This API is only available
|
||||||
|
* if the kernel has been configured with ``CONFIG_PAGE_POOL_STATS=y``.
|
||||||
|
* A pointer to a caller allocated struct page_pool_stats structure
|
||||||
|
* is passed to this API which is filled in. The caller can then report
|
||||||
|
* those stats to the user (perhaps via ethtool, debugfs, etc.).
|
||||||
|
*/
|
||||||
bool page_pool_get_stats(struct page_pool *pool,
|
bool page_pool_get_stats(struct page_pool *pool,
|
||||||
struct page_pool_stats *stats)
|
struct page_pool_stats *stats)
|
||||||
{
|
{
|
||||||
|
@ -224,6 +235,10 @@ static int page_pool_init(struct page_pool *pool,
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* page_pool_create() - create a page pool.
|
||||||
|
* @params: parameters, see struct page_pool_params
|
||||||
|
*/
|
||||||
struct page_pool *page_pool_create(const struct page_pool_params *params)
|
struct page_pool *page_pool_create(const struct page_pool_params *params)
|
||||||
{
|
{
|
||||||
struct page_pool *pool;
|
struct page_pool *pool;
|
||||||
|
@ -626,7 +641,21 @@ void page_pool_put_defragged_page(struct page_pool *pool, struct page *page,
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL(page_pool_put_defragged_page);
|
EXPORT_SYMBOL(page_pool_put_defragged_page);
|
||||||
|
|
||||||
/* Caller must not use data area after call, as this function overwrites it */
|
/**
|
||||||
|
* page_pool_put_page_bulk() - release references on multiple pages
|
||||||
|
* @pool: pool from which pages were allocated
|
||||||
|
* @data: array holding page pointers
|
||||||
|
* @count: number of pages in @data
|
||||||
|
*
|
||||||
|
* Tries to refill a number of pages into the ptr_ring cache holding ptr_ring
|
||||||
|
* producer lock. If the ptr_ring is full, page_pool_put_page_bulk()
|
||||||
|
* will release leftover pages to the page allocator.
|
||||||
|
* page_pool_put_page_bulk() is suitable to be run inside the driver NAPI tx
|
||||||
|
* completion loop for the XDP_REDIRECT use case.
|
||||||
|
*
|
||||||
|
* Please note the caller must not use data area after running
|
||||||
|
* page_pool_put_page_bulk(), as this function overwrites it.
|
||||||
|
*/
|
||||||
void page_pool_put_page_bulk(struct page_pool *pool, void **data,
|
void page_pool_put_page_bulk(struct page_pool *pool, void **data,
|
||||||
int count)
|
int count)
|
||||||
{
|
{
|
||||||
|
|
Loading…
Add table
Reference in a new issue