linux/fs
Johannes Weiner 91cdcd8d62 mm: zswap: optimize zswap pool size tracking
Profiling the munmap() of a zswapped memory region shows 60% of the total
cycles currently going into updating the zswap_pool_total_size.

There are three consumers of this counter:
- store, to enforce the globally configured pool limit
- meminfo & debugfs, to report the size to the user
- shrink, to determine the batch size for each cycle

Instead of aggregating everytime an entry enters or exits the zswap
pool, aggregate the value from the zpools on-demand:

- Stores aggregate the counter anyway upon success. Aggregating to
  check the limit instead is the same amount of work.

- Meminfo & debugfs might benefit somewhat from a pre-aggregated
  counter, but aren't exactly hotpaths.

- Shrinking can aggregate once for every cycle instead of doing it for
  every freed entry. As the shrinker might work on tens or hundreds of
  objects per scan cycle, this is a large reduction in aggregations.

The paths that benefit dramatically are swapin, swapoff, and unmaps. 
There could be millions of pages being processed until somebody asks for
the pool size again.  This eliminates the pool size updates from those
paths entirely.

Top profile entries for a 24G range munmap(), before:

    38.54%  zswap-unmap  [kernel.kallsyms]  [k] zs_zpool_total_size
    12.51%  zswap-unmap  [kernel.kallsyms]  [k] zpool_get_total_size
     9.10%  zswap-unmap  [kernel.kallsyms]  [k] zswap_update_total_size
     2.95%  zswap-unmap  [kernel.kallsyms]  [k] obj_cgroup_uncharge_zswap
     2.88%  zswap-unmap  [kernel.kallsyms]  [k] __slab_free
     2.86%  zswap-unmap  [kernel.kallsyms]  [k] xas_store

and after:

     7.70%  zswap-unmap  [kernel.kallsyms]  [k] __slab_free
     7.16%  zswap-unmap  [kernel.kallsyms]  [k] obj_cgroup_uncharge_zswap
     6.74%  zswap-unmap  [kernel.kallsyms]  [k] xas_store

It was also briefly considered to move to a single atomic in zswap
that is updated by the backends, since zswap only cares about the sum
of all pools anyway. However, zram directly needs per-pool information
out of zsmalloc. To keep the backend from having to update two atomics
every time, I opted for the lazy aggregation instead for now.

Link: https://lkml.kernel.org/r/20240312153901.3441-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:55:47 -07:00
..
9p fs/9p: fix uninitialized values during inode evict 2024-03-25 14:16:06 +00:00
adfs
affs
afs afs: Fix occasional rmdir-then-VNOVNODE with generic/011 2024-03-14 12:13:21 +01:00
autofs
bcachefs bcachefs fixes for v6.9-rc4 2024-04-11 11:24:55 -07:00
befs
bfs
btrfs for-6.9-rc2-tag 2024-04-08 13:11:11 -07:00
cachefiles
ceph ceph: switch to use cap_delay_lock for the unlink delay list 2024-04-11 22:56:28 +02:00
coda
configfs
cramfs fs,block: yield devices early 2024-03-27 13:17:15 +01:00
crypto
debugfs
devpts
dlm dlm for 6.9 2024-03-18 15:39:48 -07:00
ecryptfs
efivarfs
efs
erofs erofs: drop experimental warning for FSDAX 2024-03-25 10:48:15 +08:00
exfat Description for this pull request: 2024-03-21 09:47:12 -07:00
exportfs
ext2
ext4 fs,block: yield devices early 2024-03-27 13:17:15 +01:00
f2fs fs,block: yield devices early 2024-03-27 13:17:15 +01:00
fat - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min 2024-03-14 18:03:09 -07:00
freevxfs
fuse fuse update for 6.9 2024-03-15 09:47:14 -07:00
gfs2 gfs2 fix 2024-03-25 10:53:39 -07:00
hfs
hfsplus
hostfs
hpfs
hugetlbfs
iomap
isofs
jbd2
jffs2
jfs fs,block: yield devices early 2024-03-27 13:17:15 +01:00
kernfs kernfs: annotate different lockdep class for of->mutex of writable files 2024-04-14 06:55:46 -04:00
lockd
minix
netfs fscache: Fix error handling in fscache_begin_operation() 2024-03-18 10:33:48 +01:00
nfs NFS client updates for Linux 6.9 2024-03-16 11:44:00 -07:00
nfs_common
nfsd nfsd-6.9 fixes: 2024-04-06 09:37:50 -07:00
nilfs2 nilfs2: fix OOB in nilfs_set_de_type 2024-04-16 15:39:52 -07:00
nls
notify
ntfs3
ocfs2 - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min 2024-03-14 18:03:09 -07:00
omfs
openpromfs
orangefs
overlayfs ovl: relax WARN_ON in ovl_verify_area() 2024-03-17 15:59:41 +02:00
proc mm: zswap: optimize zswap pool size tracking 2024-04-25 20:55:47 -07:00
pstore
qnx4
qnx6
quota
ramfs
reiserfs fs,block: yield devices early 2024-03-27 13:17:15 +01:00
romfs fs,block: yield devices early 2024-03-27 13:17:15 +01:00
smb smb3: fix broken reconnect when password changing on the server by allowing password rotation 2024-04-11 16:03:48 -05:00
squashfs Squashfs: check the inode number is not the invalid value of zero 2024-04-16 15:39:50 -07:00
sysfs
sysv
tracefs eventfs: Fix kernel-doc comments to functions 2024-04-11 17:42:09 -04:00
ubifs This pull request contains updates for UBI and UBIFS: 2024-03-21 15:09:29 -07:00
udf
ufs
unicode
vboxsf vboxsf: explicitly deny setlease attempts 2024-04-03 16:06:39 +02:00
verity
xfs Bug fixes for 6.9-rc3: 2024-04-06 09:14:18 -07:00
zonefs zonefs: Use str_plural() to fix Coccinelle warning 2024-04-10 07:23:47 +09:00
aio.c aio: Fix null ptr deref in aio_complete() wakeup 2024-04-05 11:20:28 +02:00
anon_inodes.c
attr.c
backing-file.c
bad_inode.c
binfmt_elf_fdpic.c binfmt: replace deprecated strncpy 2024-03-21 20:20:52 -07:00
binfmt_elf_test.c
binfmt_elf.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
buffer.c
char_dev.c
compat_binfmt_elf.c
coredump.c
d_path.c
dax.c
dcache.c
direct-io.c
drop_caches.c
eventfd.c
eventpoll.c
exec.c execve fixes for v6.9-rc2 2024-03-27 09:57:30 -07:00
fcntl.c
fhandle.c
file_table.c
file.c
filesystems.c
fs_context.c
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c
fsopen.c
init.c
inode.c bcachefs updates for 6.9 2024-03-15 09:00:09 -07:00
internal.h
ioctl.c
Kconfig - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames 2024-03-14 17:43:30 -07:00
Kconfig.binfmt
kernel_read_file.c
libfs.c
locks.c
Makefile
mbcache.c
mnt_idmapping.c
mount.h
mpage.c
namei.c security: Place security_path_post_mknod() where the original IMA call was 2024-04-03 10:21:32 -07:00
namespace.c
nsfs.c
open.c
pidfs.c
pipe.c
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c
readdir.c
remap_range.c
select.c
seq_file.c
signalfd.c
splice.c
stack.c
stat.c
statfs.c
super.c fs,block: yield devices early 2024-03-27 13:17:15 +01:00
sync.c
sysctls.c
timerfd.c
userfaultfd.c
utimes.c
xattr.c