Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36666

atomic.alter_table still times out often

Details

    Description

      Cross reference.

      Test is heavy on server restarts, it became even worse after tests for VECTOR were added. Either split the test or optimise it somehow so that restarts don't take that much time.

      Attachments

        Issue Links

          Activity

            From its beginning, this test has been constructed as some nested loops that will kill and restart the server at each test step. Furthermore, some of the test step combinations could be entirely redundant, that is, they are only applicable to some particular storage engines.

            As far as I understand, currently the test atomic.alter_table is using 2×13×16=416 kill+restart cycles because it is covering both InnoDB and MyISAM.

            For MyISAM, most kill+restart steps could be completely unnecessary. In any case, a very low-hanging fruit might be to split MyISAM into a separate atomic.alter_table_myisam test (similar to atomic.alter_table_aria and atomic.alter_table_rocksdb), to halve the number of kill+restart cycles in atomic.alter_table itself.

            Furthermore, I would suggest a canonical (music) approach where instead of testing multiple kill+restart cycles across one table definition at time, we would create multiple tables at different stages, and let most kill+restart cycles cover multiple different ALTER TABLE steps.

            In this way, instead of using 1 table at a time with 13×16 kill+restart cycles per storage engine, we might create 13 tables and use 16 restarts, or create 13×16=208 tables in equally many concurrent connections, plus one more connection that would kill+restart the server (only one time).

            The kill mechanism would have to be rewritten too, replacing things like debug_crash_here("ddl_log_alter_after_create_frm") with DEBUG_SYNC points, so that each crash recovery step would be blocked by their own DEBUG_SYNC point in their own concurrent connection, and the server would be killed and restarted in one more connection.

            An early example of this approach is the test innodb.autoinc_persistent, which MDEV-6076 introduced in MariaDB Server 10.2. That test is restarting the server 6 times, while the original test in MySQL 8.0 would restart the server 10 times.

            marko Marko Mäkelä added a comment - From its beginning, this test has been constructed as some nested loops that will kill and restart the server at each test step. Furthermore, some of the test step combinations could be entirely redundant, that is, they are only applicable to some particular storage engines. As far as I understand, currently the test atomic.alter_table is using 2×13×16=416 kill+restart cycles because it is covering both InnoDB and MyISAM. For MyISAM, most kill+restart steps could be completely unnecessary. In any case, a very low-hanging fruit might be to split MyISAM into a separate atomic.alter_table_myisam test (similar to atomic.alter_table_aria and atomic.alter_table_rocksdb ), to halve the number of kill+restart cycles in atomic.alter_table itself. Furthermore, I would suggest a canonical (music) approach where instead of testing multiple kill+restart cycles across one table definition at time, we would create multiple tables at different stages, and let most kill+restart cycles cover multiple different ALTER TABLE steps. In this way, instead of using 1 table at a time with 13×16 kill+restart cycles per storage engine, we might create 13 tables and use 16 restarts, or create 13×16=208 tables in equally many concurrent connections, plus one more connection that would kill+restart the server (only one time). The kill mechanism would have to be rewritten too, replacing things like debug_crash_here("ddl_log_alter_after_create_frm") with DEBUG_SYNC points, so that each crash recovery step would be blocked by their own DEBUG_SYNC point in their own concurrent connection, and the server would be killed and restarted in one more connection. An early example of this approach is the test innodb.autoinc_persistent , which MDEV-6076 introduced in MariaDB Server 10.2. That test is restarting the server 6 times, while the original test in MySQL 8.0 would restart the server 10 times.
            svoj Sergey Vojtovich added a comment -

            I have created https://github.com/MariaDB/server/pull/4013, which, expectedly, takes simpler approach, that is splits MyISAM and InnoDB tests. Also disabled InnoDB where it wasn't required, it makes atomic.alter_table_myisam 3x faster than atomic.alter_table.

            I expect these steps to reduce failure frequency 10x at the very least. We can look into those once again in another iteration of "No read leafs in the forest" effort.

            I'm unsure how MyISAM can avoid kill+restart steps, but I like the idea with multiple connections.

            svoj Sergey Vojtovich added a comment - I have created https://github.com/MariaDB/server/pull/4013 , which, expectedly, takes simpler approach, that is splits MyISAM and InnoDB tests. Also disabled InnoDB where it wasn't required, it makes atomic.alter_table_myisam 3x faster than atomic.alter_table. I expect these steps to reduce failure frequency 10x at the very least. We can look into those once again in another iteration of "No read leafs in the forest" effort. I'm unsure how MyISAM can avoid kill+restart steps, but I like the idea with multiple connections.
            svoj Sergey Vojtovich added a comment -

            Since you analysed it so deeply, it'd be nice if you could review PR#4013. Feel free to decline review request if you're busy.

            svoj Sergey Vojtovich added a comment - Since you analysed it so deeply, it'd be nice if you could review PR#4013 . Feel free to decline review request if you're busy.
            marko Marko Mäkelä added a comment -

            The simple approach looks OK. But, I would suggest to apply the change to the earliest applicable branch (10.6) already, so that the test names and their scope will be identical between branches.

            marko Marko Mäkelä added a comment - The simple approach looks OK. But, I would suggest to apply the change to the earliest applicable branch (10.6) already, so that the test names and their scope will be identical between branches.

            People

              svoj Sergey Vojtovich
              svoj Sergey Vojtovich
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.