linux/Documentation
Barry Song e7ac4daeed mm: count zeromap read and set for swapout and swapin
When the proportion of folios from the zeromap is small, missing their
accounting may not significantly impact profiling.  However, it's easy to
construct a scenario where this becomes an issue—for example, allocating
1 GB of memory, writing zeros from userspace, followed by MADV_PAGEOUT,
and then swapping it back in.  In this case, the swap-out and swap-in
counts seem to vanish into a black hole, potentially causing semantic
ambiguity.

On the other hand, Usama reported that zero-filled pages can exceed 10% in
workloads utilizing zswap, while Hailong noted that some app in Android
have more than 6% zero-filled pages.  Before commit 0ca0c24e32 ("mm:
store zero pages to be swapped out in a bitmap"), both zswap and zRAM
implemented similar optimizations, leading to these optimized-out pages
being counted in either zswap or zRAM counters (with pswpin/pswpout also
increasing for zRAM).  With zeromap functioning prior to both zswap and
zRAM, userspace will no longer detect these swap-out and swap-in actions.

We have three ways to address this:

1. Introduce a dedicated counter specifically for the zeromap.

2. Use pswpin/pswpout accounting, treating the zero map as a standard
   backend.  This approach aligns with zRAM's current handling of
   same-page fills at the device level.  However, it would mean losing the
   optimized-out page counters previously available in zRAM and would not
   align with systems using zswap.  Additionally, as noted by Nhat Pham,
   pswpin/pswpout counters apply only to I/O done directly to the backend
   device.

3. Count zeromap pages under zswap, aligning with system behavior when
   zswap is enabled.  However, this would not be consistent with zRAM, nor
   would it align with systems lacking both zswap and zRAM.

Given the complications with options 2 and 3, this patch selects
option 1.

We can find these counters from /proc/vmstat (counters for the whole
system) and memcg's memory.stat (counters for the interested memcg).

For example:

$ grep -E 'swpin_zero|swpout_zero' /proc/vmstat
swpin_zero 1648
swpout_zero 33536

$ grep -E 'swpin_zero|swpout_zero' /sys/fs/cgroup/system.slice/memory.stat
swpin_zero 3905
swpout_zero 3985

This patch does not address any specific zeromap bug, but the missing
swpout and swpin counts for zero-filled pages can be highly confusing and
may mislead user-space agents that rely on changes in these counters as
indicators.  Therefore, we add a Fixes tag to encourage the inclusion of
this counter in any kernel versions with zeromap.

Many thanks to Kanchana for the contribution of changing
count_objcg_event() to count_objcg_events() to support large folios[1],
which has now been incorporated into this patch.

[1] https://lkml.kernel.org/r/20241001053222.6944-5-kanchana.p.sridhar@intel.com

Link: https://lkml.kernel.org/r/20241107011246.59137-1-21cnbao@gmail.com
Fixes: 0ca0c24e32 ("mm: store zero pages to be swapped out in a bitmap")
Co-developed-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Hailong Liu <hailong.liu@oppo.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11 00:00:37 -08:00
..
ABI
accel
accounting
admin-guide mm: count zeromap read and set for swapout and swapin 2024-11-11 00:00:37 -08:00
arch
block
bpf
cdrom
core-api arm64 fixes for -rc4 2024-10-17 09:51:03 -07:00
cpu-freq
crypto
dev-tools
devicetree phy fixes for 6.12 2024-11-03 10:19:34 -10:00
doc-guide
driver-api
fault-injection
fb
features
filesystems vfs-6.12-rc6.fixes 2024-11-01 07:37:10 -10:00
firmware_class
firmware-guide
fpga
gpu
hid
hwmon
i2c
iio docs: iio: ad7380: fix supply for ad7380-4 2024-10-24 18:30:47 +01:00
images
infiniband
input
isdn
kbuild
kernel-hacking
leds
litmus-tests
livepatch
locking
maintainer
mhi
misc-devices
mm
netlabel
netlink
networking docs: networking: packet_mmap: replace dead links with archive.org links 2024-10-28 15:47:10 -07:00
nvdimm
nvme
PCI
pcmcia
peci
power
process soc: fixes for 6.12 2024-10-17 09:43:36 -07:00
RCU
rust RISC-V: disallow gcc + rust builds 2024-10-25 06:18:37 -07:00
scheduler
scsi
security
sound
sphinx
sphinx-static
spi
staging
target
tee
timers
tools
trace
translations
usb
userspace-api mseal: update mseal.rst 2024-10-28 21:40:41 -07:00
virt KVM: x86: Clean up documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL 2024-10-20 07:31:05 -04:00
w1
watchdog
wmi
.gitignore
atomic_bitops.txt
atomic_t.txt
Changes
CodingStyle
conf.py
docutils.conf
dontdiff
index.rst
Kconfig
Makefile
memory-barriers.txt
SubmittingPatches
subsystem-apis.rst