linux/mm
Arjun Roy 7a7f094635 tcp: Use per-vma locking for receive zerocopy
Per-VMA locking allows us to lock a struct vm_area_struct without
taking the process-wide mmap lock in read mode.

Consider a process workload where the mmap lock is taken constantly in
write mode. In this scenario, all zerocopy receives are periodically
blocked during that period of time - though in principle, the memory
ranges being used by TCP are not touched by the operations that need
the mmap write lock. This results in performance degradation.

Now consider another workload where the mmap lock is never taken in
write mode, but there are many TCP connections using receive zerocopy
that are concurrently receiving. These connections all take the mmap
lock in read mode, but this does induce a lot of contention and atomic
ops for this process-wide lock. This results in additional CPU
overhead caused by contending on the cache line for this lock.

However, with per-vma locking, both of these problems can be avoided.

As a test, I ran an RPC-style request/response workload with 4KB
payloads and receive zerocopy enabled, with 100 simultaneous TCP
connections. I measured perf cycles within the
find_tcp_vma/mmap_read_lock/mmap_read_unlock codepath, with and
without per-vma locking enabled.

When using process-wide mmap semaphore read locking, about 1% of
measured perf cycles were within this path. With per-VMA locking, this
value dropped to about 0.45%.

Signed-off-by: Arjun Roy <arjunroy@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-18 11:16:00 +01:00
..
damon mm/damon/core: fix divide error in damon_nr_accesses_to_accesses_bp() 2023-06-12 11:31:52 -07:00
kasan kasan: hw_tags: avoid invalid virt_to_page() 2023-05-02 17:23:27 -07:00
kfence mm: kfence: fix false positives on big endian 2023-05-17 15:24:33 -07:00
kmsan
backing-dev.c - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
balloon_compaction.c
bootmem_info.c
cma_debug.c
cma_sysfs.c
cma.c
cma.h
compaction.c - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
debug_page_ref.c
debug_vm_pgtable.c
debug.c
dmapool_test.c
dmapool.c dmapool: link blocks across pages 2023-05-06 10:33:38 -07:00
early_ioremap.c
fadvise.c
failslab.c
filemap.c page cache: fix page_cache_next/prev_miss off by one 2023-06-12 11:31:52 -07:00
folio-compat.c - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
frontswap.c
gup_test.c mm/gup_test: fix ioctl fail for compat task 2023-06-12 11:31:51 -07:00
gup_test.h
gup.c x86-64: make access_ok() independent of LAM 2023-05-03 10:37:22 -07:00
highmem.c
hmm.c
huge_memory.c
hugetlb_cgroup.c
hugetlb_vmemmap.c
hugetlb_vmemmap.h
hugetlb.c - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
hwpoison-inject.c
init-mm.c IOMMU Updates for Linux 6.4 2023-04-30 13:00:38 -07:00
internal.h
interval_tree.c
io-mapping.c
ioremap.c
Kconfig - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
Kconfig.debug mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM 2023-05-29 16:14:28 +01:00
khugepaged.c
kmemleak.c
ksm.c mm/ksm: move disabling KSM from s390/gmap code to KSM code 2023-05-02 17:21:50 -07:00
list_lru.c
maccess.c
madvise.c Add support for new Linear Address Masking CPU feature. This is similar 2023-04-28 09:43:49 -07:00
Makefile - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
mapping_dirty_helpers.c
memblock.c
memcontrol.c
memfd.c
memory_hotplug.c
memory-failure.c - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
memory-tiers.c
memory.c tcp: Use per-vma locking for receive zerocopy 2023-06-18 11:16:00 +01:00
mempolicy.c mm/mempolicy: correctly update prev when policy is equal on mbind 2023-05-02 17:23:27 -07:00
mempool.c
memremap.c
memtest.c
migrate_device.c
migrate.c Add support for new Linear Address Masking CPU feature. This is similar 2023-04-28 09:43:49 -07:00
mincore.c
mlock.c
mm_init.c
mm_slot.h
mmap_lock.c
mmap.c mm/mmap/vma_merge: always check invariants 2023-05-06 10:10:07 -07:00
mmu_gather.c
mmu_notifier.c
mmzone.c
mprotect.c
mremap.c
msync.c
nommu.c
oom_kill.c
page_alloc.c - Some DAMON cleanups from Kefeng Wang 2023-05-04 13:09:43 -07:00
page_counter.c
page_ext.c
page_idle.c
page_io.c
page_isolation.c
page_owner.c
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c mm: page_table_check: Ensure user pages are not slab pages 2023-05-29 16:14:28 +01:00
page_vma_mapped.c
page-writeback.c
pagewalk.c
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c
pgalloc-track.h
pgtable-generic.c
process_vm_access.c
ptdump.c
readahead.c
rmap.c mm,unmap: avoid flushing TLB in batch if PTE is inaccessible 2023-04-27 13:42:16 -07:00
rodata_test.c
secretmem.c
shmem.c - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
shrinker_debug.c mm: shrinkers: fix race condition on debugfs cleanup 2023-05-17 15:24:33 -07:00
shuffle.c
shuffle.h
slab_common.c
slab.c
slab.h - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
slub.c - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
sparse-vmemmap.c
sparse.c
swap_cgroup.c
swap_slots.c
swap_state.c
swap.c
swap.h
swapfile.c
truncate.c
usercopy.c
userfaultfd.c
util.c
vmalloc.c
vmpressure.c
vmscan.c mm: shrinkers: fix race condition on debugfs cleanup 2023-05-17 15:24:33 -07:00
vmstat.c
workingset.c
z3fold.c
zbud.c
zpool.c
zsmalloc.c zsmalloc: move LRU update from zs_map_object() to zs_malloc() 2023-05-17 15:24:33 -07:00
zswap.c zswap: do not shrink if cgroup may not zswap 2023-06-12 11:31:52 -07:00