If the MariaDB client is running with --ssl --ssl-verify-server-cert, it should not trust any application-level traffic received prior to the completion of the TLS handshake and the validation of the server's TLS certificate.
This commit modifies the server to unconditionally send an error packet to the client, prior to authentication and prior to TLS handshake and server certificate validation:
2023-06-05 15:24:07 0 [Note] sql/mariadbd: ready for connections.
Version: '10.11.4-MariaDB-debug' socket: '/tmp/tmp.P4FvcEcKrH/mysql.sock' port: 3306 Source distribution
Attempt to connect to it with --ssl --ssl-verify-server-cert:
$ client/mariadb -h 127.0.0.1
ERROR 1815 (HY000): Internal error: Client will accept this error as genuine even if running with --ssl --ssl-verify-server-cert, and even though this error is sent in plaintext PRIOR TO TLS HANDSHAKE.
Running tcpdump in the background confirms that the client is improperly accepting the error packet, even though it has been sent in plaintext and without a TLS handshake:
$ sudo tcpdump -n -X -i lo tcp port 3306
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
There are 3 possible errors which the server can send before the handshake succeeded:
too many connections
server ran out of threads
a TLS error/alert, e.g. no matching protocol, unsupported cipher suite, ....
plugin errors will be sent after the handshake, since the client hello packet will be encrypted.
That is fewer than I thought, which is good to know. I do think it is important to know which one of these has been hit, particularly a differentiator between the first two and the last one. At least logged somewhere for debugging / post-mortem purposes. Even if the original error code is not maintained.
Andrew Hutchings (Inactive)
added a comment -
There are 3 possible errors which the server can send before the handshake succeeded:
too many connections
server ran out of threads
a TLS error/alert, e.g. no matching protocol, unsupported cipher suite, ....
plugin errors will be sent after the handshake, since the client hello packet will be encrypted.
That is fewer than I thought, which is good to know. I do think it is important to know which one of these has been hit, particularly a differentiator between the first two and the last one. At least logged somewhere for debugging / post-mortem purposes. Even if the original error code is not maintained.
There are 3 possible errors which the server can send before the handshake succeeded:
too many connections
server ran out of threads
a TLS error/alert, e.g. no matching protocol, unsupported cipher suite, ....
These cases need to be distinguished:
The "too many connections" and "out of threads" errors are application-level errors: they are sent *by the server to the client.
The fact that these are sent by the server before the TLS handshake happens is bad. As I described above, the client may vary its retry behavior based on exactly what error it receives, and this could enable a variety of attacks… which rapidly get much, much more serious if the client starts automatically trying to connect to different servers based on errors it receives.
TLS errors/alerts are transport-level errors: they are generated by the client as a result of an inability to construct a satisfactorily-secure TLS channel between it and the server. Assuming they all get the same error code (CR_SSL_CONNECTION_ERROR, I think?) then an attacker can prevent a connection from succeeding, but can't cause a client to vary its behavior based on the contents of the error. (If a client receives a CR_SSL_CONNECTION_ERROR, it shouldn't vary its behavior based on the detailed error message; "TLS alert X" vs "not allowed cipher Y", etc.)
TheLinuxJedi, perhaps my statements above about this previously were unclear: it's not that the client needs to hide the details of any error which it encounters before the TLS handshake. It's just that the client should hide the details of an application-level error which it appears to have received from the server before the TLS handshake (because that error could actually be from a MITM attacker).
And indeed that is what my PR does in its current form: "Do not trust error packets received from the server prior to TLS handshake completion". It does not prevent the client from returning unique error codes for other conditions which are unrelated to application-layer traffic ("hostname not found", "TLS alert", "conflicting connection parameters don't make sense", etc).
Daniel Lenski (Inactive)
added a comment - georg wrote:
There are 3 possible errors which the server can send before the handshake succeeded:
too many connections
server ran out of threads
a TLS error/alert, e.g. no matching protocol, unsupported cipher suite, ....
These cases need to be distinguished:
The "too many connections" and "out of threads" errors are application -level errors: they are sent *by the server to the client.
The fact that these are sent by the server before the TLS handshake happens is bad. As I described above , the client may vary its retry behavior based on exactly what error it receives, and this could enable a variety of attacks… which rapidly get much, much more serious if the client starts automatically trying to connect to different servers based on errors it receives.
TLS errors/alerts are transport -level errors: they are generated by the client as a result of an inability to construct a satisfactorily-secure TLS channel between it and the server. Assuming they all get the same error code ( CR_SSL_CONNECTION_ERROR , I think?) then an attacker can prevent a connection from succeeding, but can't cause a client to vary its behavior based on the contents of the error. (If a client receives a CR_SSL_CONNECTION_ERROR , it shouldn't vary its behavior based on the detailed error message; "TLS alert X" vs "not allowed cipher Y", etc.)
TheLinuxJedi , perhaps my statements above about this previously were unclear: it's not that the client needs to hide the details of any error which it encounters before the TLS handshake. It's just that the client should hide the details of an application-level error which it appears to have received from the server before the TLS handshake (because that error could actually be from a MITM attacker).
And indeed that is what my PR does in its current form: "Do not trust error packets received from the server prior to TLS handshake completion". It does not prevent the client from returning unique error codes for other conditions which are unrelated to application-layer traffic ("hostname not found", "TLS alert", "conflicting connection parameters don't make sense", etc).
I've counted 13 errors that a server can legally send before TLS is established.
Four of them can only happen before authentication (before TLS, if requested):
ER_BAD_HOST_ERROR
ER_CANT_CREATE_THREAD
ER_HOST_IS_BLOCKED
ER_HOST_NOT_PRIVILEGED
Others can happen before or after TLS:
EE_OUTOFMEMORY
ER_CONNECTION_KILLED
ER_HANDSHAKE_ERROR
ER_NET_FCNTL_ERROR
ER_NET_PACKET_TOO_LARGE
ER_NET_READ_ERROR
ER_NET_READ_INTERRUPTED
ER_NET_UNCOMPRESS_ERROR
ER_OUT_OF_RESOURCES
Even if we only consider the first four errors, ER_CANT_CREATE_THREAD is transient and the client might want to retry, ER_HOST_IS_BLOCKED and ER_HOST_NOT_PRIVILEGED are permanent. A client cannot just ignore those errors. And in the MitM case they can be spoofed indeed. I don't see how you can possibly solve this.
ER_ACCESS_DENIED can never come before TLS and a client can safely ignore it and all errors that aren't in the list above. But errors from the above list are more than enough for MitM to use.
Sergei Golubchik
added a comment - I've counted 13 errors that a server can legally send before TLS is established.
Four of them can only happen before authentication (before TLS, if requested):
ER_BAD_HOST_ERROR
ER_CANT_CREATE_THREAD
ER_HOST_IS_BLOCKED
ER_HOST_NOT_PRIVILEGED
Others can happen before or after TLS:
EE_OUTOFMEMORY
ER_CONNECTION_KILLED
ER_HANDSHAKE_ERROR
ER_NET_FCNTL_ERROR
ER_NET_PACKET_TOO_LARGE
ER_NET_READ_ERROR
ER_NET_READ_INTERRUPTED
ER_NET_UNCOMPRESS_ERROR
ER_OUT_OF_RESOURCES
Even if we only consider the first four errors, ER_CANT_CREATE_THREAD is transient and the client might want to retry, ER_HOST_IS_BLOCKED and ER_HOST_NOT_PRIVILEGED are permanent. A client cannot just ignore those errors. And in the MitM case they can be spoofed indeed. I don't see how you can possibly solve this.
ER_ACCESS_DENIED can never come before TLS and a client can safely ignore it and all errors that aren't in the list above. But errors from the above list are more than enough for MitM to use.
Even if we only consider the first four errors, ER_CANT_CREATE_THREAD is transient and the client might want to retry, ER_HOST_IS_BLOCKED and ER_HOST_NOT_PRIVILEGED are permanent. A client cannot just ignore those errors. And in the MitM case they can be spoofed indeed. I don't see how you can possibly solve this.
You can solve this by designing the applications (client+server) and the protocol to ensure an appropriate separation of concerns between the transport layer (TLS to ensure authenticity of the server and end-to-end encryption of the communications with it) and the application layer (validating the user's credentials to access the database).
If a TCP-based client-server protocol wants to use TLS, then wrapping the TCP socket in a TLS socket should be the very first thing that happens, before any communication over the socket. There shouldn't be any exchange of plaintext packets prior to the TLS handshake.
Modern web browsers using TLS (especially TLSv1.3 with ECH), and VoIP apps, and TLS-based VPNs (like those supported by openconnect, which I contribute to) basically get this right, and leak no application-layer information prior to TLS establishment… but MariaDB does this quite badly and leaks a ton of information. This seems to be largely a consequence of the unstructured approach to bolting TLS on to the client and server code. The description of the connection protocol appears to have been written after-the-fact to reflect how MariaDB server and Connector/C use TLS, rather than designed in advance.
Even in the happy case where the server doesn't send any pre-handshake error to the client, the server sends a "greeting" packet to the client (containing server version information) and the client sends back a login request packet (with no credentials, but with the client's charset and flags) in plaintext before the TLS handshake starts:
This makes MariaDB client-server connections an exploitable and target-rich environment for pervasive MITM attackers. A government agency could, for example, fingerprint the plaintext client+server greeting packets to determine the exact versions, pull out the ones that appear to be from interesting parts of the world based on the plaintext preferred client charset, and manipulate them in various ways with MITM and downgrade attacks using this vulnerability, as well as the long-known MDEV-28634… and all of that without needing to actually do any TLS cracking.
For all I know, the NSA or CSIC or GCHQ or יחידה 8200 or the Chinese/Iranian/Indian/Russian/$COUNTRY equivalents have already figured this out themselves, and have been MITM'ing MariaDB connections on the Internet at massive scale for years.
(UPDATE: Spun this off into CONC-654 and MDEV-31585; it turns out that there's a server-side mistake in TLS setup, which makes it impossible to fix the client-side leakage without a server-side fix as well.)
I've been studying the client-server protocol and implementations pretty carefully for a couple weeks now, and I'm convinced that these vulnerabilities are entirely solvable and in a backwards-compatible way, but it'd require a concerted effort to prioritize the code and design changes.
Daniel Lenski (Inactive)
added a comment - - edited I've counted 13 errors that a server can legally send before TLS is established.
serg , thanks for researching this!
Even if we only consider the first four errors, ER_CANT_CREATE_THREAD is transient and the client might want to retry, ER_HOST_IS_BLOCKED and ER_HOST_NOT_PRIVILEGED are permanent. A client cannot just ignore those errors. And in the MitM case they can be spoofed indeed. I don't see how you can possibly solve this.
You can solve this by designing the applications (client+server) and the protocol to ensure an appropriate separation of concerns between the transport layer (TLS to ensure authenticity of the server and end-to-end encryption of the communications with it) and the application layer (validating the user's credentials to access the database).
If a TCP-based client-server protocol wants to use TLS, then wrapping the TCP socket in a TLS socket should be the very first thing that happens , before any communication over the socket. There shouldn't be any exchange of plaintext packets prior to the TLS handshake.
Modern web browsers using TLS (especially TLSv1.3 with ECH ), and VoIP apps, and TLS-based VPNs (like those supported by openconnect , which I contribute to) basically get this right , and leak no application-layer information prior to TLS establishment… but MariaDB does this quite badly and leaks a ton of information. This seems to be largely a consequence of the unstructured approach to bolting TLS on to the client and server code. The description of the connection protocol appears to have been written after-the-fact to reflect how MariaDB server and Connector/C use TLS, rather than designed in advance.
Even in the happy case where the server doesn't send any pre-handshake error to the client, the server sends a "greeting" packet to the client ( containing server version information ) and the client sends back a login request packet (with no credentials, but with the client's charset and flags ) in plaintext before the TLS handshake starts :
This makes MariaDB client-server connections an exploitable and target-rich environment for pervasive MITM attackers. A government agency could, for example, fingerprint the plaintext client+server greeting packets to determine the exact versions, pull out the ones that appear to be from interesting parts of the world based on the plaintext preferred client charset , and manipulate them in various ways with MITM and downgrade attacks using this vulnerability, as well as the long-known MDEV-28634 … and all of that without needing to actually do any TLS cracking.
For all I know, the NSA or CSIC or GCHQ or יחידה 8200 or the Chinese/Iranian/Indian/Russian/$COUNTRY equivalents have already figured this out themselves, and have been MITM'ing MariaDB connections on the Internet at massive scale for years.
(UPDATE: Spun this off into CONC-654 and MDEV-31585 ; it turns out that there's a server-side mistake in TLS setup, which makes it impossible to fix the client-side leakage without a server-side fix as well.)
I've been studying the client-server protocol and implementations pretty carefully for a couple weeks now, and I'm convinced that these vulnerabilities are entirely solvable and in a backwards-compatible way, but it'd require a concerted effort to prioritize the code and design changes.
That is fewer than I thought, which is good to know. I do think it is important to know which one of these has been hit, particularly a differentiator between the first two and the last one. At least logged somewhere for debugging / post-mortem purposes. Even if the original error code is not maintained.