Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-5258

delayed_retry should not retry interrupted writes

    XMLWordPrintable

Details

    Description

      The delayed_retry feature allows writes to be retried if they fail or if no valid servers are found. The former is unsafe to do and is documented to be a downside of using the feature. This has a side-effect of making transaction_replay less safe to use in general as duplication of writes may occur.

      The following example shows when the retrying of writes is safe.

      MariaDB [test]> CREATE TABLE t1(id INT PRIMARY KEY AUTO_INCREMENT, data INT);
      Query OK, 0 rows affected (0.001 sec)
       
      -- Server goes down
       
      MariaDB [test]> INSERT INTO t1(data) VALUES (1);
       
      -- Server comes back up
       
      Query OK, 1 row affected (0.000 sec)
      

      This is safe to do as it only delays the routing of the write. No actual retrying takes place and the query seems to be "stuck" inside readwritesplit for a moment.

      However, the following is not safe to retry.

      MariaDB [test]> CREATE TABLE t1(id INT PRIMARY KEY AUTO_INCREMENT, data INT);
      Query OK, 0 rows affected (0.001 sec)
       
      MariaDB [test]> INSERT INTO t1(data) VALUES (1);
       
      -- Server goes down
       
      -- Server comes back up
       
      Query OK, 1 row affected (0.000 sec)
      

      This is because the query was sent to the backend database and whether it committed or not is not known. It may be in the TCP buffers in MaxScale or it might've committed and the OK response is on its way to MaxScale when the network connection was lost.

      The default mode of operation for delayed_retry should be closer to what transaction_replay_safe_commit does in 24.08: writes outside of transactions are never retried. A refinement of this would be to never retry writes with delayed_retry that readwritesplit already sent to the server. This would make delayed_retry safe to use in general as the write could be delayed internally in readwritesplit as long as it has never been successfully sent downstream.

      Attachments

        Issue Links

          Activity

            People

              markus makela markus makela
              markus makela markus makela
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.