linux/io_uring
Linus Torvalds 4de520f1fc io_uring-futex-2023-10-30
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmVAUXUQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpuGsEADEs0/4uXb8kLUF/y0B0bY9jmwiw5id14g5
 TkAH9lbceV0Yv0E1tPeWYIz7Y7s83UOduFVZo4hRH8EysH3IYFZCI/ny3v2nJ1av
 lN7F7YegVOu6qx77e/CwLo7on14awHkSo8pUdCOm6tYLunLg42miRf+xTpSAL0Mg
 ONnt0WxWDOgdNvTaGwBPaVE78FAWK8nc2ACzonQGfzCl2VXOsSy9JaJJMv8eyXOf
 VVZCNcSvHh/zVznlC1YPoZh/bgS2UUJmIGL/XMQnM5qzbK1IPpzlN0cu8rje3s9b
 TUKBKqr6xhC9nyAS1qAjgZ98RfjVnzcbMX+aWEb/Z0y9XFJVSSQQdW+f9A/0KLZm
 jAejHJpNuqwEdB9MplHTXdeSDTkJH3YNbXvtwA6cc/KpZ1FVQXlhSJPp/mbOa7qe
 IIeg6SYt84uZ2HxflTtm+I1uVE9QMcsesy3FIK4kxhA8jSximQw+hPZ3xrv4AHLd
 cTkRAzfXPUFsJJQCgpv289QXobV/vsFhCFTHFxv63H+EGpJ7e1EaW6Eq0pAHG0Ai
 8kk5Ns29jzTVer1W3sMMeDaZ7S8hGRAyRC+Zb/0QxtGsmvxikB0qY1GpdRGPFueQ
 gOawhLZdhkigIsq0U1UGMpHKY0G1Sl9wvHuH2qzUKeWk+vFRv5RwR6zQuVJr2Jo/
 j3HgyYDs7Q==
 =Z0L0
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-futex-2023-10-30' of git://git.kernel.dk/linux

Pull io_uring futex support from Jens Axboe:
 "This adds support for using futexes through io_uring - first futex
  wake and wait, and then the vectored variant of waiting, futex waitv.

  For both wait/wake/waitv, we support the bitset variant, as the
  'normal' variants can be easily implemented on top of that.

  PI and requeue are not supported through io_uring, just the above
  mentioned parts. This may change in the future, but in the spirit of
  keeping this small (and based on what people have been asking for),
  this is what we currently have.

  Wake support is pretty straight forward, most of the thought has gone
  into the wait side to avoid needing to offload wait operations to a
  blocking context. Instead, we rely on the usual callbacks to retry and
  post a completion event, when appropriate.

  As far as I can recall, the first request for futex support with
  io_uring came from Andres Freund, working on postgres. His aio rework
  of postgres was one of the early adopters of io_uring, and futex
  support was a natural extension for that. This is relevant from both a
  usability point of view, as well as for effiency and performance. In
  Andres's words, for the former:

     Futex wait support in io_uring makes it a lot easier to avoid
     deadlocks in concurrent programs that have their own buffer pool:
     Obviously pages in the application buffer pool have to be locked
     during IO. If the initiator of IO A needs to wait for a held lock
     B, the holder of lock B might wait for the IO A to complete. The
     ability to wait for a lock and IO completions at the same time
     provides an efficient way to avoid such deadlocks

  and in terms of effiency, even without unlocking the full potential
  yet, Andres says:

     Futex wake support in io_uring is useful because it allows for more
     efficient directed wakeups. For some "locks" postgres has queues
     implemented in userspace, with wakeup logic that cannot easily be
     implemented with FUTEX_WAKE_BITSET on a single "futex word"
     (imagine waiting for journal flushes to have completed up to a
     certain point).

     Thus a "lock release" sometimes need to wake up many processes in a
     row. A quick-and-dirty conversion to doing these wakeups via
     io_uring lead to a 3% throughput increase, with 12% fewer context
     switches, albeit in a fairly extreme workload"

* tag 'io_uring-futex-2023-10-30' of git://git.kernel.dk/linux:
  io_uring: add support for vectored futex waits
  futex: make the vectored futex operations available
  futex: make futex_parse_waitv() available as a helper
  futex: add wake_data to struct futex_q
  io_uring: add support for futex wake and wait
  futex: abstract out a __futex_wake_mark() helper
  futex: factor out the futex wake handling
  futex: move FUTEX2_VALID_MASK to futex.h
2023-11-01 11:25:08 -10:00
..
advise.c
advise.h
alloc_cache.h
cancel.c io_uring: add support for futex wake and wait 2023-09-29 02:36:57 -06:00
cancel.h io_uring: add support for futex wake and wait 2023-09-29 02:36:57 -06:00
epoll.c
epoll.h
fdinfo.c io_uring/fdinfo: lock SQ thread while retrieving thread cpu/pid 2023-10-25 07:44:14 -06:00
fdinfo.h
filetable.c
filetable.h
fs.c io_uring/fs: remove sqe->rw_flags checking from LINKAT 2023-09-29 03:07:09 -06:00
fs.h
futex.c io_uring: add support for vectored futex waits 2023-09-29 02:37:08 -06:00
futex.h io_uring: add support for vectored futex waits 2023-09-29 02:37:08 -06:00
io_uring.c io_uring-futex-2023-10-30 2023-11-01 11:25:08 -10:00
io_uring.h for-6.7/io_uring-2023-10-30 2023-11-01 11:09:19 -10:00
io-wq.c io-wq: fully initialize wqe before calling cpuhp_state_add_instance_nocalls() 2023-10-05 14:11:18 -06:00
io-wq.h io_uring: break out of iowq iopoll on teardown 2023-09-07 09:02:27 -06:00
kbuf.c for-6.7/io_uring-2023-10-30 2023-11-01 11:09:19 -10:00
kbuf.h
Makefile io_uring: add support for futex wake and wait 2023-09-29 02:36:57 -06:00
msg_ring.c
msg_ring.h
net.c io_uring/net: fix iter retargeting for selected buf 2023-09-14 10:12:55 -06:00
net.h
nop.c
nop.h
notif.c
notif.h
opdef.c io_uring: add support for vectored futex waits 2023-09-29 02:37:08 -06:00
opdef.h io_uring/rw: mark readv/writev as vectored in the opcode definition 2023-09-21 12:00:46 -06:00
openclose.c io_uring: use files_lookup_fd_locked() 2023-10-19 11:02:49 +02:00
openclose.h
poll.c io_uring/poll: use IOU_F_TWQ_LAZY_WAKE for wakeups 2023-10-19 06:42:29 -06:00
poll.h
refs.h
rsrc.c io_uring/rsrc: cleanup io_pin_pages() 2023-10-02 18:25:23 -06:00
rsrc.h
rw.c for-6.7/io_uring-2023-10-30 2023-11-01 11:09:19 -10:00
rw.h io_uring/rw: add support for IORING_OP_READ_MULTISHOT 2023-09-21 12:02:30 -06:00
slist.h
splice.c
splice.h
sqpoll.c io_uring: Don't set affinity on a dying sqpoll thread 2023-08-30 09:53:44 -06:00
sqpoll.h
statx.c
statx.h
sync.c
sync.h
tctx.c
tctx.h
timeout.c
timeout.h
uring_cmd.c io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT 2023-10-19 16:42:03 -06:00
uring_cmd.h
waitid.c io_uring: add IORING_OP_WAITID support 2023-09-21 12:04:45 -06:00
waitid.h io_uring: add IORING_OP_WAITID support 2023-09-21 12:04:45 -06:00
xattr.c
xattr.h