linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-22 04:38:03 +00:00

History

Baolin Wang d2136d749d mm: support multi-size THP numa balancing Now the anonymous page allocation already supports multi-size THP (mTHP), but the numa balancing still prohibits mTHP migration even though it is an exclusive mapping, which is unreasonable. Allow scanning mTHP: Commit `859d4adc34` ("mm: numa: do not trap faults on shared data section pages") skips shared CoW pages' NUMA page migration to avoid shared data segment migration. In addition, commit `80d47f5de5` ("mm: don't try to NUMA-migrate COW pages that have other uses") change to use page_count() to avoid GUP pages migration, that will also skip the mTHP numa scanning. Theoretically, we can use folio_maybe_dma_pinned() to detect the GUP issue, although there is still a GUP race, the issue seems to have been resolved by commit `80d47f5de5`. Meanwhile, use the folio_likely_mapped_shared() to skip shared CoW pages though this is not a precise sharers count. To check if the folio is shared, ideally we want to make sure every page is mapped to the same process, but doing that seems expensive and using the estimated mapcount seems can work when running autonuma benchmark. Allow migrating mTHP: As mentioned in the previous thread[1], large folios (including THP) are more susceptible to false sharing issues among threads than 4K base page, leading to pages ping-pong back and forth during numa balancing, which is currently not easy to resolve. Therefore, as a start to support mTHP numa balancing, we can follow the PMD mapped THP's strategy, that means we can reuse the 2-stage filter in should_numa_migrate_memory() to check if the mTHP is being heavily contended among threads (through checking the CPU id and pid of the last access) to avoid false sharing at some degree. Thus, we can restore all PTE maps upon the first hint page fault of a large folio to follow the PMD mapped THP's strategy. In the future, we can continue to optimize the NUMA balancing algorithm to avoid the false sharing issue with large folios as much as possible. Performance data: Machine environment: 2 nodes, 128 cores Intel(R) Xeon(R) Platinum Base: 2024-03-25 mm-unstable branch Enable mTHP to run autonuma-benchmark mTHP:16K Base Patched numa01 numa01 224.70 143.48 numa01_THREAD_ALLOC numa01_THREAD_ALLOC 118.05 47.43 numa02 numa02 13.45 9.29 numa02_SMT numa02_SMT 14.80 7.50 mTHP:64K Base Patched numa01 numa01 216.15 114.40 numa01_THREAD_ALLOC numa01_THREAD_ALLOC 115.35 47.41 numa02 numa02 13.24 9.25 numa02_SMT numa02_SMT 14.67 7.34 mTHP:128K Base Patched numa01 numa01 205.13 144.45 numa01_THREAD_ALLOC numa01_THREAD_ALLOC 112.93 41.88 numa02 numa02 13.16 9.18 numa02_SMT numa02_SMT 14.81 7.49 [1] https://lore.kernel.org/all/20231117100745.fnpijbk4xgmals3k@techsingularity.net/ [baolin.wang@linux.alibaba.com: v3] Link: https://lkml.kernel.org/r/c33a5c0b0a0323b1f8ed53772f50501f4b196e25.1712132950.git.baolin.wang@linux.alibaba.com Link: https://lkml.kernel.org/r/d28d276d599c26df7f38c9de8446f60e22dd1950.1711683069.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>		2024-04-25 20:56:30 -07:00
..
damon
kasan	fix missing vmalloc.h includes	2024-04-25 20:55:49 -07:00
kfence	mm: introduce slabobj_ext to support slab object extensions	2024-04-25 20:55:51 -07:00
kmsan
backing-dev.c	mm: backing-dev: use group allocation/free of per-cpu counters API	2024-04-25 20:56:12 -07:00
balloon_compaction.c
bootmem_info.c
cma_debug.c
cma_sysfs.c
cma.c
cma.h
compaction.c	mm: enable page allocation tagging	2024-04-25 20:55:54 -07:00
debug_page_alloc.c	mm: page_alloc: consolidate free page accounting	2024-04-25 20:56:04 -07:00
debug_page_ref.c
debug_vm_pgtable.c	fix missing vmalloc.h includes	2024-04-25 20:55:49 -07:00
debug.c	mm: switch mm->get_unmapped_area() to a flag	2024-04-25 20:56:25 -07:00
dmapool_test.c
dmapool.c
early_ioremap.c
fadvise.c
fail_page_alloc.c
failslab.c
filemap.c	mm/filemap: optimize filemap folio adding	2024-04-25 20:56:09 -07:00
folio-compat.c	mm: remove __set_page_dirty_nobuffers()	2024-04-25 20:56:25 -07:00
gup_test.c
gup_test.h
gup.c	mm/gup: handle hugetlb in the generic follow_page_mask code	2024-04-25 20:56:23 -07:00
highmem.c
hmm.c	mm/treewide: replace pXd_huge() with pXd_leaf()	2024-04-25 20:55:46 -07:00
huge_memory.c	thp: add thp_get_unmapped_area_vmflags()	2024-04-25 20:56:26 -07:00
hugetlb_cgroup.c
hugetlb_vmemmap.c
hugetlb_vmemmap.h
hugetlb.c	mm/gup: handle hugetlb in the generic follow_page_mask code	2024-04-25 20:56:23 -07:00
hwpoison-inject.c
init-mm.c
internal.h	mm: init_mlocked_on_free_v3	2024-04-25 20:56:29 -07:00
interval_tree.c
io-mapping.c
ioremap.c
Kconfig	mm/Kconfig: CONFIG_PGTABLE_HAS_HUGE_LEAVES	2024-04-25 20:56:20 -07:00
Kconfig.debug
khugepaged.c
kmemleak.c	mm/kmemleak: compact kmemleak_object further	2024-04-25 20:56:05 -07:00
ksm.c
list_lru.c
maccess.c
madvise.c	mm: add pmd_folio()	2024-04-25 20:56:19 -07:00
Makefile	mm/kmemleak: disable KASAN instrumentation in kmemleak	2024-04-25 20:56:05 -07:00
mapping_dirty_helpers.c
memblock.c
memcontrol.c	mm, slab: move slab_memcg hooks to mm/memcontrol.c	2024-04-25 20:56:16 -07:00
memfd.c
memory_hotplug.c	mm: record the migration reason for struct migration_target_control	2024-04-25 20:56:06 -07:00
memory-failure.c	mm: record the migration reason for struct migration_target_control	2024-04-25 20:56:06 -07:00
memory-tiers.c
memory.c	mm: support multi-size THP numa balancing	2024-04-25 20:56:30 -07:00
mempolicy.c	mm: add pmd_folio()	2024-04-25 20:56:19 -07:00
mempool.c	mempool: hook up to memory allocation profiling	2024-04-25 20:55:56 -07:00
memremap.c
memtest.c	memtest: use {READ,WRITE}_ONCE in memory scanning	2024-03-13 12:12:21 -07:00
migrate_device.c	mm: convert migrate_vma_collect_pmd to use a folio	2024-04-25 20:56:19 -07:00
migrate.c	remove references to page->flags in documentation	2024-04-25 20:56:15 -07:00
mincore.c
mlock.c	mm: add pmd_folio()	2024-04-25 20:56:19 -07:00
mm_init.c	mm: init_mlocked_on_free_v3	2024-04-25 20:56:29 -07:00
mm_slot.h
mmap_lock.c
mmap.c	mm: take placement mappings gap into account	2024-04-25 20:56:28 -07:00
mmu_gather.c
mmu_notifier.c
mmzone.c
mprotect.c	mm: support multi-size THP numa balancing	2024-04-25 20:56:30 -07:00
mremap.c	mm: remove "prot" parameter from move_pte()	2024-04-25 20:56:24 -07:00
msync.c
nommu.c	mm: remove follow_pfn	2024-04-25 20:56:12 -07:00
oom_kill.c
page_alloc.c	mm: init_mlocked_on_free_v3	2024-04-25 20:56:29 -07:00
page_counter.c
page_ext.c	mm: make page_ext_get() take a const argument	2024-04-25 20:56:14 -07:00
page_idle.c
page_io.c	arm64: mm: swap: support THP_SWAP on hardware with MTE	2024-04-25 20:56:07 -07:00
page_isolation.c	mm: page_isolation: prepare for hygienic freelists	2024-04-25 20:56:04 -07:00
page_owner.c	mm: introduce slabobj_ext to support slab object extensions	2024-04-25 20:55:51 -07:00
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c
page_vma_mapped.c
page-writeback.c
pagewalk.c
percpu-internal.h	mm: percpu: add codetag reference into pcpuobj_ext	2024-04-25 20:55:56 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c	percpu: clean up all mappings when pcpu_map_pages() fails	2024-04-25 20:55:49 -07:00
percpu.c	mm: percpu: enable per-cpu allocation tagging	2024-04-25 20:55:56 -07:00
pgalloc-track.h
pgtable-generic.c
process_vm_access.c
ptdump.c
readahead.c	mm/readahead: break read-ahead loop if filemap_add_folio return -ENOMEM	2024-04-25 20:56:07 -07:00
rmap.c	remove references to page->flags in documentation	2024-04-25 20:56:15 -07:00
rodata_test.c
secretmem.c
shmem_quota.c	tmpfs: fix race on handling dquot rbtree	2024-03-26 11:07:23 -07:00
shmem.c	mm: switch mm->get_unmapped_area() to a flag	2024-04-25 20:56:25 -07:00
show_mem.c	lib: add memory allocations report in show_mem()	2024-04-25 20:55:57 -07:00
shrinker_debug.c
shrinker.c
shuffle.c
shuffle.h
slab_common.c	mm/slab: enable slab allocation tagging for kmalloc and friends	2024-04-25 20:55:55 -07:00
slab.h	mm, slab: move slab_memcg hooks to mm/memcontrol.c	2024-04-25 20:56:16 -07:00
slub.c	mm, slab: move slab_memcg hooks to mm/memcontrol.c	2024-04-25 20:56:16 -07:00
sparse-vmemmap.c
sparse.c	mm: move array mem_section init code out of memory_present()	2024-04-25 20:56:16 -07:00
swap_cgroup.c
swap_slots.c	arm64: mm: swap: support THP_SWAP on hardware with MTE	2024-04-25 20:56:07 -07:00
swap_state.c	mm: add is_huge_zero_folio()	2024-04-25 20:56:18 -07:00
swap.c	mm: add is_huge_zero_folio()	2024-04-25 20:56:18 -07:00
swap.h
swapfile.c	arm64: mm: swap: support THP_SWAP on hardware with MTE	2024-04-25 20:56:07 -07:00
truncate.c
usercopy.c
userfaultfd.c	mm: add pmd_folio()	2024-04-25 20:56:19 -07:00
util.c	mm: switch mm->get_unmapped_area() to a flag	2024-04-25 20:56:25 -07:00
vmalloc.c	mm/vmalloc.c: optimize to reduce arguments of alloc_vmap_area()	2024-04-25 20:56:08 -07:00
vmpressure.c
vmscan.c	mm: hold PTL from the first PTE while reclaiming a large folio	2024-04-25 20:56:08 -07:00
vmstat.c
workingset.c
z3fold.c	mm: zpool: return pool size in pages	2024-04-25 20:55:48 -07:00
zbud.c	mm: zpool: return pool size in pages	2024-04-25 20:55:48 -07:00
zpool.c	mm: zpool: return pool size in pages	2024-04-25 20:55:48 -07:00
zsmalloc.c	mm: zpool: return pool size in pages	2024-04-25 20:55:48 -07:00
zswap.c	zswap: replace RB tree with xarray	2024-04-25 20:56:18 -07:00