Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15213

UPDATEs are slow after instant ADD COLUMN

Details

    Description

      Consider the following test case, test_instant_alter.sql (that was created mostly to illustrate how fast instant ADD COLUMN is relative to other statements against the table):

      show variables like '%version%';
      drop table if exists t;
      create table t(id int auto_increment primary key, c1 int);
      insert into t(c1) values (0);
      insert into t(c1) select rand()*1000 from t;
      insert into t(c1) select rand()*1000 from t;
      insert into t(c1) select rand()*1000 from t;
      insert into t(c1) select rand()*1000 from t;
      insert into t(c1) select rand()*1000 from t;
      insert into t(c1) select rand()*1000 from t;
      insert into t(c1) select rand()*1000 from t;
      insert into t(c1) select rand()*1000 from t;
      insert into t(c1) select rand()*1000 from t;
      insert into t(c1) select rand()*1000 from t;
      insert into t(c1) select rand()*1000 from t;
      insert into t(c1) select rand()*1000 from t;
      insert into t(c1) select rand()*1000 from t;
      insert into t(c1) select rand()*1000 from t;
      insert into t(c1) select rand()*1000 from t;
      insert into t(c1) select rand()*1000 from t;
      insert into t(c1) select rand()*1000 from t;
      insert into t(c1) select rand()*1000 from t;
      insert into t(c1) select rand()*1000 from t;
       
      select count(*) from t;
      show table status like 't'\G
      analyze table t;
      show table status like 't'\G
       
      alter table t add column c2 char(200) default repeat('a',200);
       
      show table status like 't'\G
       
      update t set c2 = 'b'; -- 1
      update t set c2 = 'c'; -- 2
      update t set c2 = 'd'; -- 3
       
      analyze table t;
      show table status like 't'\G
      optimize table t;
      show table status like 't'\G
       
      show create table t\G
       
      alter table t force;
      update t set c2 = 'e'; -- 4
      update t set c2 = 'f'; -- 5
       
      analyze table t;
      show table status like 't'\G
      

      To prove the point (how fast is instant ALTER TABLE ... ADD COLUMN is and how much it influeces later changes to the rows of the table), I run it under MariaDB 10.3.5, 10.2.12 and Percona Server 5.7.20.

      It turned out that while ALTER TABLE itself is indeed instant in 10.3.5, subsequent updates to all rows of the table that set non-default value to newly added column (marked with 1, 2 and 3 in the comments) are executed unexpectedly slow, both comparing to similar updates after full table rebuild (ALTER TABLE t FORCE, marked as 5 and 6 in the comments). Moreover, total time to run the entire test case is much worse on 10.35 comparing to 10.2.12 and other versions the same scenario was tested against.

      Results are as follows, for example on my Ubuntu 14.04 netbook:

      openxs@ao756:~/dbs/maria10.3$ time bin/mysql -vvv -uroot --socket=/tmp/mariadb.sock test <~/test_instant_alter.sql > ~/instant_1035_new_innodb.txt
       
      real    6m46.066s
      user    0m0.022s
      sys     0m0.004s
       
      openxs@ao756:~/dbs/maria10.2$ time bin/mysql -vvv -uroot --socket=/tmp/mariadb.sock test <~/test_instant_alter.sql > ~/instant_10212_innodb.txt
       
      real    4m35.802s
      user    0m0.014s
      sys     0m0.013s
      

      or the following on Fedroa 27 quadcore box:

      [openxs@fc23 maria10.3]$ time bin/mysql -vvv -uroot --socket=/tmp/mariadb.sock test <~/test_instant_alter.sql > ~/instant_1035_innodb.txt
       
      real    5m40.820s
      user    0m0.009s
      sys     0m0.005s
       
      [openxs@fc23 maria10.2]$ time bin/mysql -vvv -uroot --socket=/tmp/mariadb.sock test <~/test_instant_alter.sql > ~/instant_10213_innodb.txt
       
      real    3m46.452s
      user    0m0.011s
      sys     0m0.003s
      

      In all cases servers where freshly installed and started with --no-defaults, like this:

      openxs@ao756:~$ bin/mysqld_safe --no-defaults --port=3308 --socket=/tmp/mariadb.sock &
      

      Relative times of individual statements were as follows, for example:

      --------------
      alter table t add column c2 char(200) default repeat('a',200)
      --------------
       
      Query OK, 0 rows affected (0.035 sec)
      Records: 0  Duplicates: 0  Warnings: 0
       
      --------------
      show table status like 't'
      --------------
       
      *************************** 1. row ***************************
                 Name: t
               Engine: InnoDB
              Version: 11
           Row_format: Dynamic
                 Rows: 523848
       Avg_row_length: 31
          Data_length: 16269312
      Max_data_length: 0
         Index_length: 0
            Data_free: 4194304
       Auto_increment: 786406
          Create_time: 2018-02-05 15:04:54
          Update_time: 2018-02-05 15:04:54
           Check_time: NULL
            Collation: latin1_swedish_ci
             Checksum: NULL
       Create_options:
              Comment:
      1 row in set (0.000 sec)
       
      --------------
      update t set c2 = 'b'
      --------------
       
      Query OK, 524288 rows affected (56.504 sec)
      Rows matched: 524288  Changed: 524288  Warnings: 0
       
      --------------
      update t set c2 = 'c'
      --------------
       
      Query OK, 524288 rows affected (56.892 sec)
      Rows matched: 524288  Changed: 524288  Warnings: 0
       
      --------------
      update t set c2 = 'd'
      --------------
       
      Query OK, 524288 rows affected (1 min 3.755 sec)
      Rows matched: 524288  Changed: 524288  Warnings: 0
       
      ...
      --------------
      alter table t force
      --------------
       
      Query OK, 0 rows affected (39.913 sec)
      Records: 0  Duplicates: 0  Warnings: 0
       
      --------------
      update t set c2 = 'e'
      --------------
       
      Query OK, 524288 rows affected (27.636 sec)
      Rows matched: 524288  Changed: 524288  Warnings: 0
       
      --------------
      update t set c2 = 'f'
      --------------
       
      Query OK, 524288 rows affected (1 min 35.323 sec)
      Rows matched: 524288  Changed: 524288  Warnings: 0
      

      In case of 10.2.12 we see:

      --------------
      alter table t add column c2 char(200) default repeat('a',200)
      --------------
       
      Query OK, 0 rows affected (30.02 sec)
      Records: 0  Duplicates: 0  Warnings: 0
       
      --------------
      show table status like 't'
      --------------
       
      *************************** 1. row ***************************
                 Name: t
               Engine: InnoDB
              Version: 11
           Row_format: Dynamic
                 Rows: 516360
       Avg_row_length: 289
          Data_length: 149569536
      Max_data_length: 0
         Index_length: 0
            Data_free: 5242880
       Auto_increment: 786406
          Create_time: 2018-01-30 19:18:45
          Update_time: NULL
           Check_time: NULL
            Collation: latin1_swedish_ci
             Checksum: NULL
       Create_options:
              Comment:
      1 row in set (0.00 sec)
       
      --------------
      update t set c2 = 'b'
      --------------
       
      Query OK, 524288 rows affected (27.39 sec)
      Rows matched: 524288  Changed: 524288  Warnings: 0
       
      --------------
      update t set c2 = 'c'
      --------------
       
      Query OK, 524288 rows affected (32.87 sec)
      Rows matched: 524288  Changed: 524288  Warnings: 0
       
      --------------
      update t set c2 = 'd'
      --------------
       
      Query OK, 524288 rows affected (32.98 sec)
      Rows matched: 524288  Changed: 524288  Warnings: 0
      

      So, on 10.3.5 the time to execute UPDATE varies a lot, but is larger than in 10.2.12 after normal ALTER, and also those executed immediately after ALTER are usually slower.

      Attachments

        Issue Links

          Activity

            axel Axel Schwenke added a comment -

            Here you go marko:

            +---------------+----------+----------+----------+
            | label         | min      | avg      | max      |
            +---------------+----------+----------+----------+
            | insert empty  | 0.712973 | 0.843348 | 1.003764 |
            | insert string | 0.576478 | 0.741412 | 0.923602 |
            +---------------+----------+----------+----------+
            

            and this is the test case:

            create table t1 (id serial, c1 int, c2 char(10) default '');
            insert into t1 (c1) values (rand()*1000);
            insert into t1 (c1) select rand()*1000 from t1;
            -- repeat last insert 17 times, yields 256K rows
            update t1 set c2="foobar";
            update t1 set c2="";
            

            I repeated everything 100 times. The unexpected outcome is, that updating the rows to a longer row image (causing page splits) is actually faster than the second update where the row image shrinks.

            axel Axel Schwenke added a comment - Here you go marko : +---------------+----------+----------+----------+ | label | min | avg | max | +---------------+----------+----------+----------+ | insert empty | 0.712973 | 0.843348 | 1.003764 | | insert string | 0.576478 | 0.741412 | 0.923602 | +---------------+----------+----------+----------+ and this is the test case: create table t1 (id serial, c1 int, c2 char(10) default ''); insert into t1 (c1) values (rand()*1000); insert into t1 (c1) select rand()*1000 from t1; -- repeat last insert 17 times, yields 256K rows update t1 set c2="foobar"; update t1 set c2=""; I repeated everything 100 times. The unexpected outcome is, that updating the rows to a longer row image (causing page splits) is actually faster than the second update where the row image shrinks.
            axel Axel Schwenke added a comment -

            Here is another interesting result. Above I mentioned that ALTER TABLE ... FORCE after the instantly added column runs 5x faster than the full table update while both had the same effect of adding the new column to each rows image.

            I was curious if that holds for different table sizes, so I cooked a test case where I create the table and fill it with some rows, then INSTANT ADD a column with default "foobar" and measure the time for ALTER TABLE FORCE. Then I do the same, but instead of ALTER TABLE FORCE I do a full table update, setting the new column to "snafu!" (not the default value, but the same number of chars). The second variant I call "expand", the first "force".

            Here are the execution times for different table sizes:

            scale   expand  force   ratio
            -----------------------------
            15      0,38    0,24    158%    
            16      0,83    0,39    216%    
            17      1,32    1,14    116%    
            18      2,98    1,38    215%    
            19      5,95    1,78    335%    
            20      11,22   2,70    415%    
            21      23,93   3,70    647%    
            22      48,36   6,34    763%    
            23      106,82  12,31   868%    
            24      242,10  24,67   982%    
            25      630,39  45,21   1394%
            

            "sacle" is ld(rownum). So the line with scale=20 corresponds to the test case I used before.

            Now the funny thing is that ALTER TABLE FORCE scales better than linear where "expand" grows about linear. Execution time for "expand" doubles when the number of rows doubles (scale increases by 1). But "force" is faster than that.

            I put both series in a log-log diagram and added a regression line. ALTER TABLE is not exactly on that line, but "expand" very much. See attached image scaling.png.

            axel Axel Schwenke added a comment - Here is another interesting result. Above I mentioned that ALTER TABLE ... FORCE after the instantly added column runs 5x faster than the full table update while both had the same effect of adding the new column to each rows image. I was curious if that holds for different table sizes, so I cooked a test case where I create the table and fill it with some rows, then INSTANT ADD a column with default "foobar" and measure the time for ALTER TABLE FORCE. Then I do the same, but instead of ALTER TABLE FORCE I do a full table update, setting the new column to "snafu!" (not the default value, but the same number of chars). The second variant I call "expand", the first "force". Here are the execution times for different table sizes: scale expand force ratio ----------------------------- 15 0,38 0,24 158% 16 0,83 0,39 216% 17 1,32 1,14 116% 18 2,98 1,38 215% 19 5,95 1,78 335% 20 11,22 2,70 415% 21 23,93 3,70 647% 22 48,36 6,34 763% 23 106,82 12,31 868% 24 242,10 24,67 982% 25 630,39 45,21 1394% "sacle" is ld(rownum). So the line with scale=20 corresponds to the test case I used before. Now the funny thing is that ALTER TABLE FORCE scales better than linear where "expand" grows about linear. Execution time for "expand" doubles when the number of rows doubles (scale increases by 1). But "force" is faster than that. I put both series in a log-log diagram and added a regression line. ALTER TABLE is not exactly on that line, but "expand" very much. See attached image scaling.png.

            axel, you could also test rebuilding the table with ALTER TABLE t1 ALGORITHM=COPY. That should be slower than ALTER TABLE t1 FORCE (when it is executed by ALGORITHM=INPLACE). The reason is that with MySQL 5.7’s WL#7277 in MariaDB 10.2, the table is being built one page at a time, rather than one row at a time. With ALGORITHM=COPY for large tables, the results may be skewed due to the InnoDB change buffer.

            In any case, there is not much that can be done to improve the performance of the UPDATE instantly added columns from the initially default values to something else, and I think that this issue should be closed as "Won’t fix".

            The generic UPDATE performance regression in 10.3 will be covered by MDEV-15914.

            marko Marko Mäkelä added a comment - axel , you could also test rebuilding the table with ALTER TABLE t1 ALGORITHM=COPY . That should be slower than ALTER TABLE t1 FORCE (when it is executed by ALGORITHM=INPLACE ). The reason is that with MySQL 5.7’s WL#7277 in MariaDB 10.2, the table is being built one page at a time, rather than one row at a time. With ALGORITHM=COPY for large tables, the results may be skewed due to the InnoDB change buffer. In any case, there is not much that can be done to improve the performance of the UPDATE instantly added columns from the initially default values to something else, and I think that this issue should be closed as "Won’t fix". The generic UPDATE performance regression in 10.3 will be covered by MDEV-15914 .
            axel Axel Schwenke added a comment -

            The observed behavior of the first UPDATE being slow is expected. It's also not slower than expected.

            axel Axel Schwenke added a comment - The observed behavior of the first UPDATE being slow is expected. It's also not slower than expected.

            MDEV-26740 has been filed for the fact that the native ALTER TABLE (without ALGORITHM=COPY) is growing the data files quicker.

            marko Marko Mäkelä added a comment - MDEV-26740 has been filed for the fact that the native ALTER TABLE (without ALGORITHM=COPY ) is growing the data files quicker.

            People

              axel Axel Schwenke
              valerii Valerii Kravchuk
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.