The directive used to set the value of the "max_udp_payload_size" transport
parameter. According to RFC 9000, Section 18.2, the value specifies the size
of buffer for reading incoming datagrams:
This limit does act as an additional constraint on datagram size in
the same way as the path MTU, but it is a property of the endpoint
and not the path; see Section 14. It is expected that this is the
space an endpoint dedicates to holding incoming packets.
Current QUIC implementation uses the maximum possible buffer size (65527) for
reading datagrams.
HTTP and Stream variables $remote_addr and $binary_remote_addr rely on
constant client address, particularly because they are cacheable.
However, QUIC client may migrate to a new address. While there's no perfect
way to handle this, the proposed solution is to copy client address to QUIC
stream at stream creation.
The change also fixes truncated $remote_addr if migration happened while the
stream was active. The reason is addr_text string was copied to stream by
value.
The existing logic to evaluate multi header "$sent_http_*" variables,
such as $sent_http_cache_control, as previously introduced in 1.23.0,
doesn't take into account that one or more elements can be cleared,
yet still present in a linked list, pointed to by the next field.
Such elements don't contribute to the resulting variable length, an
attempt to append a separator for them ends up in out of bounds write.
This is not possible with standard modules, though at least one third
party module is known to override multi header values this way, so it
makes sense to harden the logic.
The fix restores a generic boundary check.
Previously, the value was not set and remained zero. While in nginx code the
value of c->sockaddr is accessed without taking c->socklen into account,
invalid c->socklen could lead to unexpected results in third-party modules.
Previously, the post-migration value of addr_text could be truncated, if
it was longer than the previous one. Also, the new value always included
port, which should not be there.
According to RFC 9000, 8.2.4. Failed Path Validation,
the following value is recommended as a validation timeout:
A value of three times the larger of the current PTO
or the PTO for the new path (using kInitialRtt, as
defined in [QUIC-RECOVERY]) is RECOMMENDED.
The change adds PTO of the new path to the equation as the lower bound.
Path validation packets containing PATH_CHALLENGE frames are sent separately
from regular frame queue, because of the need to use a decicated path and pad
the packets. The packets are sent periodically, separately from the regular
probe/lost detection mechanism. A path validation packet is resent up to 3
times, each time after PTO expiration, with increasing per-path PTO backoff.
The check is needed for clients in order to unblock a server due to
anti-amplification limits, and it seems to make no sense for servers.
See RFC 9002, A.6 and A.8 for a further explanation.
This makes max_ack_delay to now always account, notably including
PATH_CHALLENGE timers as noted in the last paragraph of 9000, 9.4,
unlike when it was only used when there are packets in flight.
While here, fixed nearby style.
Previously, ssl_encryption_application was hardcoded. Before 9553eea74f2a,
ngx_quic_frame_sendto() was used only for PATH_CHALLENGE/PATH_RESPONSE sent
at the application level only. Since 9553eea74f2a, ngx_quic_frame_sendto()
is also used for CONNECTION_CLOSE, which can be sent at initial level after
SSL handshake error or rejection. This resulted in packet encryption error.
Now level is copied from frame, which fixes the error.
Previously, before sending CONNECTION_CLOSE to client, all pending frames
were sent. This is redundant and could prevent CONNECTION_CLOSE from being
sent due to congestion control. Now pending frames are freed and
CONNECTION_CLOSE is sent without congestion control, as advised by RFC 9002:
Packets containing frames besides ACK or CONNECTION_CLOSE frames
count toward congestion control limits and are considered to be in flight.
Do not corrupt frame data chain pointer on ngx_quic_read_buffer() error.
The error leads to closing a QUIC connection where the frame may be used
as part of the QUIC connection tear down, which envolves writing pending
frames, including this one.
The rcf->studies list is unconditionally accessed by ngx_regex_cleanup(),
and this used to cause NULL pointer dereference if allocation
failed. Fix is to set cleanup handler only when allocation succeeds.
Previously, waiting on a shared connection was not allowed, because the only
type of such connection was plain UDP. However, QUIC stream connections are
also shared since they share socket descriptor with the listen connection.
Meanwhile, it's perfectly normal to wait on such connections.
The issue manifested itself with stream write errors when the amount of data
exceeded stream buffer size or flow control. Now no error is triggered
and Stream write module is allowed to wait for buffer space to become available.
When a stream is created by client, it's often the case that nginx will send
immediate response on that stream. An example is HTTP/3 request stream, which
in most cases quickly replies with at least HTTP headers.
QUIC stream init handlers are called from a posted event. Output QUIC
frames are also sent to client from a posted event, called the push event.
If the push event is posted before the stream init event, then output produced
by stream may trigger sending an extra UDP datagram. To address this, push
event is now re-posted when a new stream init event is posted.
An example is handling 0-RTT packets. Client typically sends an init packet
coalesced with a 0-RTT packet. Previously, nginx replied with a padded CRYPTO
datagram, followed by a 1-RTT stream reply datagram. Now CRYPTO and STREAM
packets are coalesced in one reply datagram, which saves bandwidth.
Other examples include coalescing 1-RTT first stream response, and
MAX_STREAMS/STREAM sent in response to ACK/STREAM.
It now uses custom alloc_aligned() wrapper for all allocations,
therefore all allocations are larger than expected by (64 + sizeof(void*)).
Further, they are seen as allocations of 1 element. Relevant calculations
were adjusted to reflect this, and state allocation is now protected
with a flag to avoid misinterpreting other allocations as the zlib
deflate_state allocation.
Further, it no longer forces window bits to 13 on compression level 1,
so the comment was adjusted to reflect this.
When establishing a connection to the backend, nginx blocks reading
from the client with ngx_mail_proxy_block_read(). Previously, such
events were lost, and in some cases this resulted in connection hangs.
Notably, this affected mail_imap_ssl.t on Windows, since the test
closes connections after requesting authentication, but without
waiting for any responses (so the connection close events might be
lost).
Fix is to post an event to read from the client after connecting to
the backend if there were blocked events.
SSL context is not present if the default server has neither certificates nor
ssl_reject_handshake enabled. Previously, this led to null pointer dereference
before it would be caught with configuration checks.
Additionally, non-default servers with distinct SSL contexts need to initialize
compatibility layer in order to complete a QUIC handshake.
This ensures that errors which happen during logging to syslog are logged
with proper context, such as "while logging to syslog" and the server name.
Prodded by Safar Safarly.
During initial startup the ngx_cycle->hostname is not available, and
previously this resulted in incorrect logging. Instead, hostname from the
configuration being parsed is now preserved in the syslog peer structure
and then used during logging.
Similarly, ngx_cycle->log might not match the configuration where the
syslog peer is defined if the configuration is not yet fully applied,
and previously this resulted in unexpected logging of syslog errors
and debug information. Instead, cf->cycle->new_log is now referenced
in the syslog peer structure and used for logging, similarly to how it
is done in other modules.
Similarly to ticket #274 (7354:1812f1d79d84), early request finalization
without calling ngx_http_run_posted_requests() resulted in a connection
hang (a socket leak) if the 400 (Bad Request) error was generated in
ngx_http_v2_state_process_header() due to invalid request headers and
"return 444" was used in error_page 400.
As tested with tlsfuzzer with LibreSSL 3.7.0, the following errors are
certainly client-related:
SSL_do_handshake() failed (SSL: error:14026073:SSL routines:ACCEPT_SR_CLNT_HELLO:bad packet length)
SSL_do_handshake() failed (SSL: error:1402612C:SSL routines:ACCEPT_SR_CLNT_HELLO:ssl3 session id too long)
SSL_do_handshake() failed (SSL: error:140380EA:SSL routines:ACCEPT_SR_KEY_EXCH:tls rsa encrypted value length is wrong)
Accordingly, the SSL_R_BAD_PACKET_LENGTH ("bad packet length"),
SSL_R_SSL3_SESSION_ID_TOO_LONG ("ssl3 session id too long"),
SSL_R_TLS_RSA_ENCRYPTED_VALUE_LENGTH_IS_WRONG ("tls rsa encrypted value
length is wrong") errors are now logged at the "info" level.
To further differentiate client-related errors and adjust logging levels
of various SSL errors, nginx was tested with tlsfuzzer with multiple
OpenSSL versions (3.1.0-beta1, 3.0.8, 1.1.1t, 1.1.0l, 1.0.2u, 1.0.1u,
1.0.0s, 0.9.8zh).
The following errors were observed during tlsfuzzer runs with OpenSSL 3.0.8,
and are clearly client-related:
SSL_do_handshake() failed (SSL: error:0A000092:SSL routines::data length too long)
SSL_do_handshake() failed (SSL: error:0A0000A0:SSL routines::length too short)
SSL_do_handshake() failed (SSL: error:0A000124:SSL routines::bad legacy version)
SSL_do_handshake() failed (SSL: error:0A000178:SSL routines::no shared signature algorithms)
Accordingly, the SSL_R_DATA_LENGTH_TOO_LONG ("data length too long"),
SSL_R_LENGTH_TOO_SHORT ("length too short"), SSL_R_BAD_LEGACY_VERSION
("bad legacy version"), and SSL_R_NO_SHARED_SIGNATURE_ALGORITHMS
("no shared signature algorithms", misspelled as "sigature" in OpenSSL 1.0.2)
errors are now logged at the "info" level.
Additionally, the following errors were observed with OpenSSL 3.0.8 and
with TLSv1.3 enabled:
SSL_do_handshake() failed (SSL: error:0A00006F:SSL routines::bad digest length)
SSL_do_handshake() failed (SSL: error:0A000070:SSL routines::missing sigalgs extension)
SSL_do_handshake() failed (SSL: error:0A000096:SSL routines::encrypted length too long)
SSL_do_handshake() failed (SSL: error:0A00010F:SSL routines::bad length)
SSL_read() failed (SSL: error:0A00007A:SSL routines::bad key update)
SSL_read() failed (SSL: error:0A000125:SSL routines::mixed handshake and non handshake data)
Accordingly, the SSL_R_BAD_DIGEST_LENGTH ("bad digest length"),
SSL_R_MISSING_SIGALGS_EXTENSION ("missing sigalgs extension"),
SSL_R_ENCRYPTED_LENGTH_TOO_LONG ("encrypted length too long"),
SSL_R_BAD_LENGTH ("bad length"), SSL_R_BAD_KEY_UPDATE ("bad key update"),
and SSL_R_MIXED_HANDSHAKE_AND_NON_HANDSHAKE_DATA ("mixed handshake and non
handshake data") errors are now logged at the "info" level.
Additionally, the following errors were observed with OpenSSL 1.1.1t:
SSL_do_handshake() failed (SSL: error:14094091:SSL routines:ssl3_read_bytes:data between ccs and finished)
SSL_do_handshake() failed (SSL: error:14094199:SSL routines:ssl3_read_bytes:too many warn alerts)
SSL_read() failed (SSL: error:1408F0C6:SSL routines:ssl3_get_record:packet length too long)
SSL_read() failed (SSL: error:14094085:SSL routines:ssl3_read_bytes:ccs received early)
Accordingly, the SSL_R_CCS_RECEIVED_EARLY ("ccs received early"),
SSL_R_DATA_BETWEEN_CCS_AND_FINISHED ("data between ccs and finished"),
SSL_R_PACKET_LENGTH_TOO_LONG ("packet length too long"), and
SSL_R_TOO_MANY_WARN_ALERTS ("too many warn alerts") errors are now logged
at the "info" level.
Additionally, the following errors were observed with OpenSSL 1.0.2u:
SSL_do_handshake() failed (SSL: error:1407612A:SSL routines:SSL23_GET_CLIENT_HELLO:record too small)
SSL_do_handshake() failed (SSL: error:1408C09A:SSL routines:ssl3_get_finished:got a fin before a ccs)
Accordingly, the SSL_R_RECORD_TOO_SMALL ("record too small") and
SSL_R_GOT_A_FIN_BEFORE_A_CCS ("got a fin before a ccs") errors are now
logged at the "info" level.
No additional client-related errors were observed while testing with
OpenSSL 3.1.0-beta1, OpenSSL 1.1.0l, OpenSSL 1.0.1u, OpenSSL 1.0.0s,
and OpenSSL 0.9.8zh.
In some cases there might be multiple errors in the OpenSSL error queue,
notably when a libcrypto call fails, and then the SSL layer generates
an error itself. For example, the following errors were observed
with OpenSSL 3.0.8 with TLSv1.3 enabled:
SSL_do_handshake() failed (SSL: error:02800066:Diffie-Hellman routines::invalid public key error:0A000132:SSL routines::bad ecpoint)
SSL_do_handshake() failed (SSL: error:08000066:elliptic curve routines::invalid encoding error:0A000132:SSL routines::bad ecpoint)
SSL_do_handshake() failed (SSL: error:0800006B:elliptic curve routines::point is not on curve error:0A000132:SSL routines::bad ecpoint)
In such cases it seems to be better to determine logging level based on
the last error in the error queue (the one added by the SSL layer,
SSL_R_BAD_ECPOINT in all of the above example example errors). To do so,
the ngx_ssl_connection_error() function was changed to use
ERR_peek_last_error().
An UTF-8 octet sequence cannot start with a 11111xxx byte (above 0xf8),
see https://datatracker.ietf.org/doc/html/rfc3629#section-3. Previously,
such bytes were accepted by ngx_utf8_decode() and misinterpreted as 11110xxx
bytes (as in a 4-byte sequence). While unlikely, this can potentially cause
issues.
Fix is to explicitly reject such bytes in ngx_utf8_decode().
Just a drive letter might not correctly represent file system being used,
notably when using symlinks (as created by "mklink /d"). As such, instead
of trying to call GetDiskFreeSpace() with just a drive letter, we now always
use GetDiskFreeSpace() with full path.
Further, it looks like the code to use just a drive letter never worked,
since it tried to test name[2] instead of name[1] to be ':'.
This ensures that ngx_win32_rename_file() will support non-ASCII names
when supported by the wrappers.
Notably, this is used by PUT requests in the dav module when overwriting
existing files with non-ASCII names (ticket #1433).
Previously, ngx_win32_rename_file() retried on all errors returned by
MoveFile() to a temporary name. It only make sense, however, to retry
when the destination file already exists, similarly to the condition
when ngx_win32_rename_file() is called. Retrying on other errors is
meaningless and might result in an infinite loop.