linux/Documentation
Kaiyang Zhao f77f0c7514 mm,memcg: provide per-cgroup counters for NUMA balancing operations
The ability to observe the demotion and promotion decisions made by the
kernel on a per-cgroup basis is important for monitoring and tuning
containerized workloads on machines equipped with tiered memory.

Different containers in the system may experience drastically different
memory tiering actions that cannot be distinguished from the global
counters alone.

For example, a container running a workload that has a much hotter memory
accesses will likely see more promotions and fewer demotions, potentially
depriving a colocated container of top tier memory to such an extent that
its performance degrades unacceptably.

For another example, some containers may exhibit longer periods between
data reuse, causing much more numa_hint_faults than numa_pages_migrated. 
In this case, tuning hot_threshold_ms may be appropriate, but the signal
can easily be lost if only global counters are available.

In the long term, we hope to introduce per-cgroup control of promotion and
demotion actions to implement memory placement policies in tiering.

This patch set adds seven counters to memory.stat in a cgroup:
numa_pages_migrated, numa_pte_updates, numa_hint_faults, pgdemote_kswapd,
pgdemote_khugepaged, pgdemote_direct and pgpromote_success.  pgdemote_*
and pgpromote_success are also available in memory.numa_stat.

count_memcg_events_mm() is added to count multiple event occurrences at
once, and get_mem_cgroup_from_folio() is added because we need to get a
reference to the memcg of a folio before it's migrated to track
numa_pages_migrated.  The accounting of PGDEMOTE_* is moved to
shrink_inactive_list() before being changed to per-cgroup.

[kaiyang2@cs.cmu.edu: add documentation of the memcg counters in cgroup-v2.rst]
  Link: https://lkml.kernel.org/r/20240814235122.252309-1-kaiyang2@cs.cmu.edu
Link: https://lkml.kernel.org/r/20240814174227.30639-1-kaiyang2@cs.cmu.edu
Signed-off-by: Kaiyang Zhao <kaiyang2@cs.cmu.edu>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Wei Xu <weixugc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:36 -07:00
..
ABI powerpc fixes for 6.11 #2 2024-08-17 19:23:02 -07:00
accel
accounting
admin-guide mm,memcg: provide per-cgroup counters for NUMA balancing operations 2024-09-03 21:15:36 -07:00
arch docs: move numa=fake description to kernel-parameters.txt 2024-09-03 21:15:32 -07:00
block
bpf
cdrom
core-api workqueue: doc: Fix function name, remove markers 2024-08-05 18:33:36 -10:00
cpu-freq
crypto
dev-tools kfence: introduce burst mode 2024-09-01 20:26:03 -07:00
devicetree USB fixes for 6.11-rc6 2024-09-01 07:06:28 +12:00
doc-guide
driver-api thermal: core: Update thermal zone registration documentation 2024-08-02 13:22:37 +02:00
fault-injection
fb
features LoongArch: Add ARCH_HAS_DEBUG_VM_PGTABLE support 2024-07-20 22:40:59 +08:00
filesystems fs: remove calls to set and clear the folio error flag 2024-09-01 20:26:04 -07:00
firmware_class
firmware-guide
fpga
gpu
hid
hwmon
i2c
iio
images
infiniband
input
isdn
kbuild Documentation/llvm: turn make command for ccache into code block 2024-08-16 21:34:12 +09:00
kernel-hacking
leds
litmus-tests
livepatch
locking
maintainer
mhi
misc-devices
mm mm: remove follow_page() 2024-09-01 20:26:01 -07:00
netlabel
netlink ethtool: rss: echo the context number back 2024-07-25 16:23:47 -07:00
networking ethtool: rss: echo the context number back 2024-07-25 16:23:47 -07:00
nvdimm
nvme
PCI Merge branch 'pci/misc' 2024-07-19 10:10:33 -05:00
pcmcia
peci
power
process net: drop special comment style 2024-08-23 10:21:02 +01:00
RCU
rust Rust changes for v6.11 2024-07-27 13:44:54 -07:00
scheduler
scsi
security
sound
sphinx
sphinx-static
spi
staging
target
tee
timers
tools
trace ftrace: Rewrite of function graph tracer 2024-07-18 13:36:33 -07:00
translations pci-v6.11-changes 2024-07-19 19:03:18 -07:00
usb
userspace-api media: v4l: Fix missing tabular column hint for Y14P format 2024-07-30 08:36:29 +02:00
virt KVM/arm64 fixes for 6.11, round #1 2024-08-13 06:06:27 -04:00
w1
watchdog
wmi platform/x86: msi-wmi-platform: Fix spelling mistakes 2024-07-31 12:37:01 +03:00
.gitignore
atomic_bitops.txt
atomic_t.txt
Changes
CodingStyle
conf.py
docutils.conf
dontdiff
index.rst
Kconfig
Makefile
memory-barriers.txt
SubmittingPatches
subsystem-apis.rst