linux/mm/zpdesc.h

175 lines
4.9 KiB
C
Raw Permalink Normal View History

mm/zsmalloc: add zpdesc memory descriptor for zswap.zpool Patch series "Add zpdesc memory descriptor for zswap.zpool", v9. This patch series introduces a new memory descriptor for zswap.zpool that currently overlaps with struct page for now. This is part of the effort to reduce the size of struct page and to enable dynamic allocation of memory descriptors [1]. This series does not bloat anything for zsmalloc and no functional change is intended (except for using zpdesc and folios). In the near future, the removal of page->index from struct page [2] will be addressed and the project also depends on this patch series. Thanks to everyone got involved in this series, especially, Alex who's been pushing it forward this year. [1] https://lore.kernel.org/linux-mm/ZvRKzKizOfEWBtJp@casper.infradead.org [2] https://lore.kernel.org/linux-mm/Z09hOy-UY9KC8WMb@casper.infradead.org This patch (of 18): The 1st patch introduces new memory descriptor zpdesc and renames zspage.first_page to zspage.first_zpdesc, with no functional change. We removed the comment about PG_owner_priv_1 since it is no longer used after commit a41ec880aa7b ("zsmalloc: move huge compressed obj from page to zspage"). [rdunlap@infradead.org: fix function parameter kernel-doc notation] Link: https://lkml.kernel.org/r/20250111063305.911010-1-rdunlap@infradead.org [42.hyeyoo@gmail.com: rework comments a little bit] Link: https://lkml.kernel.org/r/20241216150450.1228021-1-42.hyeyoo@gmail.com Link: https://lkml.kernel.org/r/20241216150450.1228021-2-42.hyeyoo@gmail.com Originally-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Alex Shi <alexs@kernel.org> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org> Tested-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Alex Shi <alexs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-17 00:04:32 +09:00
/* SPDX-License-Identifier: GPL-2.0 */
/* zpdesc.h: zswap.zpool memory descriptor
*
* Written by Alex Shi <alexs@kernel.org>
* Hyeonggon Yoo <42.hyeyoo@gmail.com>
*/
#ifndef __MM_ZPDESC_H__
#define __MM_ZPDESC_H__
#include <linux/migrate.h>
#include <linux/pagemap.h>
mm/zsmalloc: add zpdesc memory descriptor for zswap.zpool Patch series "Add zpdesc memory descriptor for zswap.zpool", v9. This patch series introduces a new memory descriptor for zswap.zpool that currently overlaps with struct page for now. This is part of the effort to reduce the size of struct page and to enable dynamic allocation of memory descriptors [1]. This series does not bloat anything for zsmalloc and no functional change is intended (except for using zpdesc and folios). In the near future, the removal of page->index from struct page [2] will be addressed and the project also depends on this patch series. Thanks to everyone got involved in this series, especially, Alex who's been pushing it forward this year. [1] https://lore.kernel.org/linux-mm/ZvRKzKizOfEWBtJp@casper.infradead.org [2] https://lore.kernel.org/linux-mm/Z09hOy-UY9KC8WMb@casper.infradead.org This patch (of 18): The 1st patch introduces new memory descriptor zpdesc and renames zspage.first_page to zspage.first_zpdesc, with no functional change. We removed the comment about PG_owner_priv_1 since it is no longer used after commit a41ec880aa7b ("zsmalloc: move huge compressed obj from page to zspage"). [rdunlap@infradead.org: fix function parameter kernel-doc notation] Link: https://lkml.kernel.org/r/20250111063305.911010-1-rdunlap@infradead.org [42.hyeyoo@gmail.com: rework comments a little bit] Link: https://lkml.kernel.org/r/20241216150450.1228021-1-42.hyeyoo@gmail.com Link: https://lkml.kernel.org/r/20241216150450.1228021-2-42.hyeyoo@gmail.com Originally-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Alex Shi <alexs@kernel.org> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org> Tested-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Alex Shi <alexs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-17 00:04:32 +09:00
/*
* struct zpdesc - Memory descriptor for zpool memory.
* @flags: Page flags, mostly unused by zsmalloc.
* @lru: Indirectly used by page migration.
* @movable_ops: Used by page migration.
* @next: Next zpdesc in a zspage in zsmalloc zpool.
* @handle: For huge zspage in zsmalloc zpool.
* @zspage: Points to the zspage this zpdesc is a part of.
* @first_obj_offset: First object offset in zsmalloc zpool.
* @_refcount: The number of references to this zpdesc.
*
* This struct overlays struct page for now. Do not modify without a good
* understanding of the issues. In particular, do not expand into the overlap
* with memcg_data.
*
* Page flags used:
* * PG_private identifies the first component page.
* * PG_locked is used by page migration code.
*/
struct zpdesc {
unsigned long flags;
struct list_head lru;
unsigned long movable_ops;
union {
struct zpdesc *next;
unsigned long handle;
};
struct zspage *zspage;
/*
* Only the lower 24 bits are available for offset, limiting a page
* to 16 MiB. The upper 8 bits are reserved for PGTY_zsmalloc.
*
* Do not access this field directly.
* Instead, use {get,set}_first_obj_offset() helpers.
*/
unsigned int first_obj_offset;
atomic_t _refcount;
};
#define ZPDESC_MATCH(pg, zp) \
static_assert(offsetof(struct page, pg) == offsetof(struct zpdesc, zp))
ZPDESC_MATCH(flags, flags);
ZPDESC_MATCH(lru, lru);
ZPDESC_MATCH(mapping, movable_ops);
ZPDESC_MATCH(__folio_index, next);
ZPDESC_MATCH(__folio_index, handle);
mm/zsmalloc: add zpdesc memory descriptor for zswap.zpool Patch series "Add zpdesc memory descriptor for zswap.zpool", v9. This patch series introduces a new memory descriptor for zswap.zpool that currently overlaps with struct page for now. This is part of the effort to reduce the size of struct page and to enable dynamic allocation of memory descriptors [1]. This series does not bloat anything for zsmalloc and no functional change is intended (except for using zpdesc and folios). In the near future, the removal of page->index from struct page [2] will be addressed and the project also depends on this patch series. Thanks to everyone got involved in this series, especially, Alex who's been pushing it forward this year. [1] https://lore.kernel.org/linux-mm/ZvRKzKizOfEWBtJp@casper.infradead.org [2] https://lore.kernel.org/linux-mm/Z09hOy-UY9KC8WMb@casper.infradead.org This patch (of 18): The 1st patch introduces new memory descriptor zpdesc and renames zspage.first_page to zspage.first_zpdesc, with no functional change. We removed the comment about PG_owner_priv_1 since it is no longer used after commit a41ec880aa7b ("zsmalloc: move huge compressed obj from page to zspage"). [rdunlap@infradead.org: fix function parameter kernel-doc notation] Link: https://lkml.kernel.org/r/20250111063305.911010-1-rdunlap@infradead.org [42.hyeyoo@gmail.com: rework comments a little bit] Link: https://lkml.kernel.org/r/20241216150450.1228021-1-42.hyeyoo@gmail.com Link: https://lkml.kernel.org/r/20241216150450.1228021-2-42.hyeyoo@gmail.com Originally-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Alex Shi <alexs@kernel.org> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org> Tested-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Alex Shi <alexs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-17 00:04:32 +09:00
ZPDESC_MATCH(private, zspage);
ZPDESC_MATCH(page_type, first_obj_offset);
ZPDESC_MATCH(_refcount, _refcount);
#undef ZPDESC_MATCH
static_assert(sizeof(struct zpdesc) <= sizeof(struct page));
/*
* zpdesc_page - The first struct page allocated for a zpdesc
* @zp: The zpdesc.
*
* A convenience wrapper for converting zpdesc to the first struct page of the
* underlying folio, to communicate with code not yet converted to folio or
* struct zpdesc.
*
*/
#define zpdesc_page(zp) (_Generic((zp), \
const struct zpdesc *: (const struct page *)(zp), \
struct zpdesc *: (struct page *)(zp)))
/**
* zpdesc_folio - The folio allocated for a zpdesc
* @zp: The zpdesc.
*
* Zpdescs are descriptors for zpool memory. The zpool memory itself is
* allocated as folios that contain the zpool objects, and zpdesc uses specific
* fields in the first struct page of the folio - those fields are now accessed
* by struct zpdesc.
*
* It is occasionally necessary convert to back to a folio in order to
* communicate with the rest of the mm. Please use this helper function
* instead of casting yourself, as the implementation may change in the future.
*/
#define zpdesc_folio(zp) (_Generic((zp), \
const struct zpdesc *: (const struct folio *)(zp), \
struct zpdesc *: (struct folio *)(zp)))
/**
* page_zpdesc - Converts from first struct page to zpdesc.
* @p: The first (either head of compound or single) page of zpdesc.
*
* A temporary wrapper to convert struct page to struct zpdesc in situations
* where we know the page is the compound head, or single order-0 page.
*
* Long-term ideally everything would work with struct zpdesc directly or go
* through folio to struct zpdesc.
*
* Return: The zpdesc which contains this page
*/
#define page_zpdesc(p) (_Generic((p), \
const struct page *: (const struct zpdesc *)(p), \
struct page *: (struct zpdesc *)(p)))
static inline void zpdesc_lock(struct zpdesc *zpdesc)
{
folio_lock(zpdesc_folio(zpdesc));
}
static inline bool zpdesc_trylock(struct zpdesc *zpdesc)
{
return folio_trylock(zpdesc_folio(zpdesc));
}
static inline void zpdesc_unlock(struct zpdesc *zpdesc)
{
folio_unlock(zpdesc_folio(zpdesc));
}
static inline void zpdesc_wait_locked(struct zpdesc *zpdesc)
{
folio_wait_locked(zpdesc_folio(zpdesc));
}
static inline void zpdesc_get(struct zpdesc *zpdesc)
{
folio_get(zpdesc_folio(zpdesc));
}
static inline void zpdesc_put(struct zpdesc *zpdesc)
{
folio_put(zpdesc_folio(zpdesc));
}
static inline void *kmap_local_zpdesc(struct zpdesc *zpdesc)
{
return kmap_local_page(zpdesc_page(zpdesc));
}
static inline unsigned long zpdesc_pfn(struct zpdesc *zpdesc)
{
return page_to_pfn(zpdesc_page(zpdesc));
}
static inline struct zpdesc *pfn_zpdesc(unsigned long pfn)
{
return page_zpdesc(pfn_to_page(pfn));
}
mm: stop storing migration_ops in page->mapping ... instead, look them up statically based on the page type. Maybe in the future we want a registration interface? At least for now, it can be easily handled using the two page types that actually support page migration. The remaining usage of page->mapping is to flag such pages as actually being movable (having movable_ops), which we will change next. Link: https://lkml.kernel.org/r/20250704102524.326966-20-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brendan Jackman <jackmanb@google.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Christian Brauner <brauner@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pé rez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Gregory Price <gourry@gourry.net> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Rik van Riel <riel@surriel.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: xu xin <xu.xin16@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-04 12:25:13 +02:00
static inline void __zpdesc_set_movable(struct zpdesc *zpdesc)
{
mm: convert "movable" flag in page->mapping to a page flag Instead, let's use a page flag. As the page flag can result in false-positives, glue it to the page types for which we support/implement movable_ops page migration. We are reusing PG_uptodate, that is for example used to track file system state and does not apply to movable_ops pages. So warning in case it is set in page_has_movable_ops() on other page types could result in false-positive warnings. Likely we could set the bit using a non-atomic update: in contrast to page->mapping, we could have others trying to update the flags concurrently when trying to lock the folio. In isolate_movable_ops_page(), we already take care of that by checking if the page has movable_ops before locking it. Let's start with the atomic variant, we could later switch to the non-atomic variant once we are sure other cases are similarly fine. Once we perform the switch, we'll have to introduce __SETPAGEFLAG_NOOP(). [david@redhat.com: add missing `:' in kerneldoc] Link: https://lkml.kernel.org/r/d96e2916-2c43-462c-b6a1-2375ef397d8b@redhat.com Link: https://lkml.kernel.org/r/20250704102524.326966-21-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brendan Jackman <jackmanb@google.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Christian Brauner <brauner@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pé rez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Gregory Price <gourry@gourry.net> Cc: Harry Yoo <harry.yoo@oracle.com> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Rik van Riel <riel@surriel.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: xu xin <xu.xin16@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-04 12:25:14 +02:00
SetPageMovableOps(zpdesc_page(zpdesc));
}
static inline void __zpdesc_set_zsmalloc(struct zpdesc *zpdesc)
{
__SetPageZsmalloc(zpdesc_page(zpdesc));
}
static inline struct zone *zpdesc_zone(struct zpdesc *zpdesc)
{
return page_zone(zpdesc_page(zpdesc));
}
static inline bool zpdesc_is_locked(struct zpdesc *zpdesc)
{
return folio_test_locked(zpdesc_folio(zpdesc));
}
mm/zsmalloc: add zpdesc memory descriptor for zswap.zpool Patch series "Add zpdesc memory descriptor for zswap.zpool", v9. This patch series introduces a new memory descriptor for zswap.zpool that currently overlaps with struct page for now. This is part of the effort to reduce the size of struct page and to enable dynamic allocation of memory descriptors [1]. This series does not bloat anything for zsmalloc and no functional change is intended (except for using zpdesc and folios). In the near future, the removal of page->index from struct page [2] will be addressed and the project also depends on this patch series. Thanks to everyone got involved in this series, especially, Alex who's been pushing it forward this year. [1] https://lore.kernel.org/linux-mm/ZvRKzKizOfEWBtJp@casper.infradead.org [2] https://lore.kernel.org/linux-mm/Z09hOy-UY9KC8WMb@casper.infradead.org This patch (of 18): The 1st patch introduces new memory descriptor zpdesc and renames zspage.first_page to zspage.first_zpdesc, with no functional change. We removed the comment about PG_owner_priv_1 since it is no longer used after commit a41ec880aa7b ("zsmalloc: move huge compressed obj from page to zspage"). [rdunlap@infradead.org: fix function parameter kernel-doc notation] Link: https://lkml.kernel.org/r/20250111063305.911010-1-rdunlap@infradead.org [42.hyeyoo@gmail.com: rework comments a little bit] Link: https://lkml.kernel.org/r/20241216150450.1228021-1-42.hyeyoo@gmail.com Link: https://lkml.kernel.org/r/20241216150450.1228021-2-42.hyeyoo@gmail.com Originally-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Alex Shi <alexs@kernel.org> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org> Tested-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Alex Shi <alexs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-17 00:04:32 +09:00
#endif