Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23740

ssl connection fails when server and client certs signed by same CA

Details

    Description

      "vanilla" ssl connections work, but assigning a cert to the server and a client cert causes "ssl protocol failures", sometimes obscure version mismatch errors.

      On investigation it turns out the that problem is in viossl.c and how ssl errors are handled.
      ssl_should_retry calls SSL_get_error to check for the current error, but the documentation for SSL_get_error says:
      "The current thread's
      error queue must be empty before the TLS/SSL I/O operation is
      attempted, or SSL_get_error() will not work reliably."

      (attaching segment of mysqld.trace of the working, patched version...note how X509_R_CERT_ALREADY_IN_HASH_TABLE
      errors get dropped by the patched code; otherwise they would cause a fatal error in the SSL connection, even though they are harmless, coming from the X509 certs being taken from both the server and the client)

      It took a couple of iterations to clean up these errors, because there was some race conditions on when the errors came in vs. when they get handled in the code.

      Solution: add ERR_clear_error() before each SSL_(read|write|etc) calls. And check for whatever errors are in the "queue" in the ssl_should_retry routine. The mysqld.trace shows that sometimes several errors are queued up, so just dismissing the first one won't be enough.

      Patch attached; it has some extra whitespace changes, and extra DBUG_PRINT's for diagnostics

      NOTE: v10.6 viossl.c has the same code in the 10.2 viossl.c, with minor changes that do not alter the problem noted in this bugreport

      Attachments

        1. MariaDB.patch
          3 kB
        2. mariaSSL.tgz
          9 kB
        3. mysqld.trace
          2 kB

        Issue Links

          Activity

            celane
            + case SSL_ERROR_SYSCALL:
            + DBUG_PRINT("info",("harmless? ssl_error=%d",ssl_error));
            + ERR_print_errors_fp(DBUG_FILE);
            + break;

            this part of the patch makes me uncomfortable. If SSL_ERROR_SYSCALL happens to be persistent, it will run into endless loop. Maybe it is not harmless.

            wlad Vladislav Vaintroub added a comment - celane + case SSL_ERROR_SYSCALL: + DBUG_PRINT("info",("harmless? ssl_error=%d",ssl_error)); + ERR_print_errors_fp(DBUG_FILE); + break; this part of the patch makes me uncomfortable. If SSL_ERROR_SYSCALL happens to be persistent, it will run into endless loop. Maybe it is not harmless.

            I downgraded Prio from Critical to Major, due to lack of evidence something like this would happen often (and partly, due to my own inability to reproduce this, even with some effort)

            wlad Vladislav Vaintroub added a comment - I downgraded Prio from Critical to Major, due to lack of evidence something like this would happen often (and partly, due to my own inability to reproduce this, even with some effort)

            I'd still like to know if https://github.com/mariadb/server/commit/582e482d63876fef72af672776fffe7fec2bee4b fixes the error in the environment where it happens (as mentioned above, I got some trouble recreating the error).
            If it does, maybe it makes sense just to push a small local fix for some obscure? alleged? OpenSSL bug . Ignoring SSL_ERROR_SYSCALL seems to dangerous for a fix, to me.

            celane, could you check that patch https://github.com/mariadb/server/commit/582e482d63876fef72af672776fffe7fec2bee4b ?

            wlad Vladislav Vaintroub added a comment - I'd still like to know if https://github.com/mariadb/server/commit/582e482d63876fef72af672776fffe7fec2bee4b fixes the error in the environment where it happens (as mentioned above, I got some trouble recreating the error). If it does, maybe it makes sense just to push a small local fix for some obscure? alleged? OpenSSL bug . Ignoring SSL_ERROR_SYSCALL seems to dangerous for a fix, to me. celane , could you check that patch https://github.com/mariadb/server/commit/582e482d63876fef72af672776fffe7fec2bee4b ?
            celane Charles Lane added a comment -

            Finally grabbed a clone of the git repository of 10.6 and did a "cherry pick" of your patch. Compiled okay, and put it on one of my hosts that were having the ssl problem (with 10.2)
            And your patch does seem to fix the problem I was seeing.

            But since the problem is only with openssl < 1.1.0, and (it seems) only with whatever
            openssl configuration that I'm running, a minimal patch is certainly the way to go.

            I expect that in the next few months I'll be upgrading all my SL7 hosts to Centos8,
            (with mariadb 10.6 and openssl 1.1.1) which I've tried and had zero problems.

            celane Charles Lane added a comment - Finally grabbed a clone of the git repository of 10.6 and did a "cherry pick" of your patch. Compiled okay, and put it on one of my hosts that were having the ssl problem (with 10.2) And your patch does seem to fix the problem I was seeing. But since the problem is only with openssl < 1.1.0, and (it seems) only with whatever openssl configuration that I'm running, a minimal patch is certainly the way to go. I expect that in the next few months I'll be upgrading all my SL7 hosts to Centos8, (with mariadb 10.6 and openssl 1.1.1) which I've tried and had zero problems.
            celane Charles Lane added a comment -

            OKAY!
            I think finally I can give a more complete diagnosis.
            The 'cert with duplicate hash' error is deceptive: it wasn't actually a duplicate certificate, it was a duplicate CRL.
            (which is probably why the 'testing' didn't see the problem, you have to set up a CRL directory, and add to it both my
            'subCA.crl' and 'bundle.crl'), and tell MariaDB to use a 'crl directory'.

            My 'bundle.crl' = 'subCA.crl' + 'rootCA.crl' for applications that only allow giving them a single CRL file.
            So moving that bundle.crl to a different directory, and the OPENSSL-1.0.2 error goes away, and MariaDB
            is happy with my SSL connections.

            I don't recall seeing anything in the SSL docs saying 'don't do that', regarding the CRL files, but the real problem is
            that SSL errors are obscure at best, and too often dropped on the floor.

            So I strongly suggest putting a
            DBUG_PRINT("error", ("ssl_error: %s", ERR_error_string(ssl_error,NULL)));

            everywhere in viossl.c where a non-harmless ssl error is thrown, so that at least
            a --debug enabled MariaDB will see the errors. Even better is to write the error to where it
            will show up in the MariaDB error log file.

            celane Charles Lane added a comment - OKAY! I think finally I can give a more complete diagnosis. The 'cert with duplicate hash' error is deceptive: it wasn't actually a duplicate certificate, it was a duplicate CRL. (which is probably why the 'testing' didn't see the problem, you have to set up a CRL directory, and add to it both my 'subCA.crl' and 'bundle.crl'), and tell MariaDB to use a 'crl directory'. My 'bundle.crl' = 'subCA.crl' + 'rootCA.crl' for applications that only allow giving them a single CRL file. So moving that bundle.crl to a different directory, and the OPENSSL-1.0.2 error goes away, and MariaDB is happy with my SSL connections. I don't recall seeing anything in the SSL docs saying 'don't do that', regarding the CRL files, but the real problem is that SSL errors are obscure at best, and too often dropped on the floor. So I strongly suggest putting a DBUG_PRINT("error", ("ssl_error: %s", ERR_error_string(ssl_error,NULL))); everywhere in viossl.c where a non-harmless ssl error is thrown, so that at least a --debug enabled MariaDB will see the errors. Even better is to write the error to where it will show up in the MariaDB error log file.

            People

              wlad Vladislav Vaintroub
              celane Charles Lane
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.