Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17520

Instant ALTER TABLE for failure-free column type changes

Details

    Description

      The original InnoDB ROW_FORMAT=REDUNDANT is essentially storing each field as variable-length and possibly NULL. For that format, we can trivially allow an instantaneous change of a column from NOT NULL to NULL.

      The space-optimized row formats COMPACT and DYNAMIC do not allow NULL values to be represented for columns that were originally declared as NOT NULL. They also do not store any length for fixed-length columns. Because of this, some failure-free conversions that are instantaneous for ROW_FORMAT=REDUNDANT in MDEV-15563 would require the table to be rebuilt when it is in the COMPACT or DYNAMIC format.

      Let us create a hybrid format that allows us to avoid rebuilding the table in the cases covered by MDEV-15563:

      1. To prevent downgrade or IMPORT TABLESPACE to older MariaDB releases, add a MDEV-15562 metadata record.
      2. For ROW_FORMAT=COMPACT or ROW_FORMAT=REDUNDANT, if the metadata record is present, allow clustered index leaf pages to be in ROW_FORMAT=REDUNDANT with the differences noted below.

      The clustered index leaf pages of the table would be gradually converted into something that resembles ROW_FORMAT=REDUNDANT as a result of modifications. This will increase the size usage a little. Also, ROW_FORMAT=REDUNDANT limits the maximum in-page record size to 16,383 bytes, which for innodb_page_size=64k is less than the limit for COMPACT or DYNAMIC.

      Because secondary index records would remain in the original ROW_FORMAT, secondary indexes may have to be rebuilt when an indexed column is changed. That is, changing an indexed column from NOT NULL to NULL will require the indexes to be rebuilt if ROW_FORMAT is not REDUNDANT.

      Any INSERT or UPDATE after an instant ALTER that removes a NOT NULL constraint (or changes a column to a wider type later in MDEV-15563) will cause all records in the affected clustered index leaf page to be rewritten in a format that resembles ROW_FORMAT=REDUNDANT, with the following differences:

      • NULL columns will occupy 0 bytes of storage, also when fixed-length.
      • CHAR(n) (unless NULL) will occupy n*mbminlenn*mbmaxlen bytes, instead of n*mbmaxlen.
      • For ROW_FORMAT=DYNAMIC, no local prefix of off-page columns will be stored. For ROW_FORMAT=COMPACT, we will continue to write the 768-byte prefixes.

      In ROW_FORMAT=REDUNDANT, the record header will store the length of each column (including fixed-length and NULL columns), using n_fields or 2·n_fields bytes. The fixed-length record header is 6 instead of 5 bytes. This will increase the size of each record by at least 1 byte, up to 2·n_fields+1 bytes.

      The metadata BLOB that was introduced in MDEV-15562 will be augmented. The flag 1U << 14 will be set in dict_instant_t::non_pk_col_map[] for ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC columns that were originally created as NOT NULL but no longer carry this attribute. Based on this information, we will initialize n_nullable and n_core_null_bytes for the clustered index based on the original column definition, instead of the latest one. Secondary indexes and ROW_FORMAT=REDUNDANT will use the latest definition.

      Attachments

        Issue Links

          Activity

            The test main.function_defaults_innodb turned out to be crashing randomly, and after my fix, it is returning wrong result. That is currently the only disabled test.

            marko Marko Mäkelä added a comment - The test main.function_defaults_innodb turned out to be crashing randomly, and after my fix, it is returning wrong result. That is currently the only disabled test.
            mleich Matthias Leich added a comment - - edited

            I have no idea if the test which follows is expected to assert or not on 
            commit c3ca982a0d5abb358f95c7d94ca14382dc81e1ee (HEAD -> bb-10.4-MDEV-17520, origin/bb-10.4-MDEV-17520)
            Date:   Tue Dec 4 17:25:28 2018 +0200
            (It does not fail on a 10.4 mid of November)
             
            --source include/have_innodb.inc
            --disable_abort_on_error
             
             
            --source include/have_innodb.inc
            CREATE TABLE t1 (col1 INT, col2 INT, col_int INTEGER, col_text TEXT ) ENGINE = InnoDB  ;
            SET AUTOCOMMIT = 0 ;
            SET @fill_amount = (@@innodb_page_size / 2 ) + 1  ;
            ALTER TABLE t1 ADD PRIMARY KEY IF NOT EXISTS ( col1, col_int ), ALGORITHM = DEFAULT, LOCK = SHARED  ;
            ALTER TABLE t1 DROP PRIMARY KEY, LOCK = DEFAULT  ;
            INSERT INTO t1 (col1,col2, col_int, col_text) VALUES ( 217, 217, 217, REPEAT(SUBSTR(CAST( 217 AS CHAR),1,1), @fill_amount) ) ;
            ALTER TABLE t1 CHANGE COLUMN col1 col1 INT, LOCK = NONE  ;
            INSERT INTO t1 (col1,col2, col_int, col_text) VALUES ( 43, 43, 43 - 1, REPEAT(SUBSTR(CAST( 43 AS CHAR),1,1), @fill_amount) ) ;
            ROLLBACK  ;
            --enable_abort_on_error
            SHOW PROCESSLIST;    # <-- Just for "convincing"  MTR that there is a crash.
             
            storage/innobase/btr/btr0cur.cc:8206: void btr_rec_free_externally_stored_fields(dict_index_t*, rec_t*, const ulint*, page_zip_des_t*, bool, mtr_t*): Assertion `(index->table)->not_redundant() == !!rec_offs_comp(offsets)' failed.
            ...
            uery (0x62b000000220): ROLLBACK
            Connection ID (thread ID): 9
            Status: NOT_KILLED
             
            3  <signal handler called>
            #4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
            #5  0x00007f30acfc7f5d in __GI_abort () at abort.c:90
            #6  0x00007f30acfbdf17 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x55a62957dbe0 "(index->table)->not_redundant() == !!rec_offs_comp(offsets)", file=file@entry=0x55a6295762e0 "storage/innobase/btr/btr0cur.cc", line=line@entry=8206, function=function@entry=0x55a629584860 <btr_rec_free_externally_stored_fields(dict_index_t*, unsigned char*, unsigned long const*, page_zip_des_t*, bool, mtr_t*)::__PRETTY_FUNCTION__> "void btr_rec_free_externally_stored_fields(dict_index_t*, rec_t*, const ulint*, page_zip_des_t*, bool, mtr_t*)") at assert.c:92
            #7  0x00007f30acfbdfc2 in __GI___assert_fail (assertion=0x55a62957dbe0 "(index->table)->not_redundant() == !!rec_offs_comp(offsets)", file=0x55a6295762e0 "storage/innobase/btr/btr0cur.cc", line=8206, function=0x55a629584860 <btr_rec_free_externally_stored_fields(dict_index_t*, unsigned char*, unsigned long const*, page_zip_des_t*, bool, mtr_t*)::__PRETTY_FUNCTION__> "void btr_rec_free_externally_stored_fields(dict_index_t*, rec_t*, const ulint*, page_zip_des_t*, bool, mtr_t*)") at assert.c:101
            #8  0x000055a6284f2f4f in btr_rec_free_externally_stored_fields (index=0x617000041408, rec=0x7f30a3b24121 "", offsets=0x61a000092b08, page_zip=0x0, rollback=true, mtr=0x7f30972f2ab0) at storage/innobase/btr/btr0cur.cc:8206
            #9  0x000055a6284e9a18 in btr_cur_pessimistic_delete (err=0x7f30972f29f0, has_reserved_extents=0, cursor=0x61b000084378, flags=0, rollback=true, mtr=0x7f30972f2ab0) at storage/innobase/btr/btr0cur.cc:5888
            #10 0x000055a62877f8cc in row_undo_ins_remove_clust_rec (node=0x61b000084308) at storage/innobase/row/row0uins.cc:187
            #11 0x000055a628781ee6 in row_undo_ins (node=0x61b000084308, thr=0x61700004b248) at storage/innobase/row/row0uins.cc:588
            #12 0x000055a628373823 in row_undo (node=0x61b000084308, thr=0x61700004b248) at storage/innobase/row/row0undo.cc:439
            #13 0x000055a628373cde in row_undo_step (thr=0x61700004b248) at storage/innobase/row/row0undo.cc:499
            #14 0x000055a62822362a in que_thr_step (thr=0x61700004b248) at storage/innobase/que/que0que.cc:1040
            #15 0x000055a628223a4e in que_run_threads_low (thr=0x61700004b248) at storage/innobase/que/que0que.cc:1104
            #16 0x000055a628223e37 in que_run_threads (thr=0x61700004b248) at storage/innobase/que/que0que.cc:1144
            #17 0x000055a628435714 in trx_rollback_to_savepoint_low (trx=0x7f30a3ea1908, savept=0x0) at storage/innobase/trx/trx0roll.cc:137
            #18 0x000055a628436004 in trx_rollback_for_mysql_low (trx=0x7f30a3ea1908) at storage/innobase/trx/trx0roll.cc:206
            #19 0x000055a628436894 in trx_rollback_for_mysql (trx=0x7f30a3ea1908) at storage/innobase/trx/trx0roll.cc:232
            #20 0x000055a627fc9ac8 in innobase_rollback (hton=0x614000002248, thd=0x62a0000de208, rollback_trx=true) at storage/innobase/handler/ha_innodb.cc:4716
            #21 0x000055a627a91130 in ha_rollback_trans (thd=0x62a0000de208, all=true) at sql/handler.cc:1722
            #22 0x000055a627712c75 in trans_rollback (thd=0x62a0000de208) at sql/transaction.cc:423
            #23 0x000055a62735c0f8 in mysql_execute_command (thd=0x62a0000de208) at sql/sql_parse.cc:5837
            #24 0x000055a62736a990 in mysql_parse (thd=0x62a0000de208, rawbuf=0x62b000000220 "ROLLBACK", length=8, parser_state=0x7f30972f5dd0, is_com_multi=false, is_next_command=false) at sql/sql_parse.cc:8092
            #25 0x000055a627341df3 in dispatch_command (command=COM_QUERY, thd=0x62a0000de208, packet=0x629000271209 "ROLLBACK  ", packet_length=10, is_com_multi=false, is_next_command=false) at sql/sql_parse.cc:1850
            #26 0x000055a62733ea1f in do_command (thd=0x62a0000de208) at sql/sql_parse.cc:1395
            #27 0x000055a6276e10e5 in do_handle_one_connection (connect=0x6080000010a8) at sql/sql_connect.cc:1402
            #28 0x000055a6276e09d7 in handle_one_connection (arg=0x6080000010a8) at sql/sql_connect.cc:1308
            

            mleich Matthias Leich added a comment - - edited I have no idea if the test which follows is expected to assert or not on commit c3ca982a0d5abb358f95c7d94ca14382dc81e1ee (HEAD -> bb-10.4-MDEV-17520, origin/bb-10.4-MDEV-17520) Date: Tue Dec 4 17:25:28 2018 +0200 (It does not fail on a 10.4 mid of November)   --source include/have_innodb.inc --disable_abort_on_error     --source include/have_innodb.inc CREATE TABLE t1 (col1 INT, col2 INT, col_int INTEGER, col_text TEXT ) ENGINE = InnoDB ; SET AUTOCOMMIT = 0 ; SET @fill_amount = (@@innodb_page_size / 2 ) + 1 ; ALTER TABLE t1 ADD PRIMARY KEY IF NOT EXISTS ( col1, col_int ), ALGORITHM = DEFAULT, LOCK = SHARED ; ALTER TABLE t1 DROP PRIMARY KEY, LOCK = DEFAULT ; INSERT INTO t1 (col1,col2, col_int, col_text) VALUES ( 217, 217, 217, REPEAT(SUBSTR(CAST( 217 AS CHAR),1,1), @fill_amount) ) ; ALTER TABLE t1 CHANGE COLUMN col1 col1 INT, LOCK = NONE ; INSERT INTO t1 (col1,col2, col_int, col_text) VALUES ( 43, 43, 43 - 1, REPEAT(SUBSTR(CAST( 43 AS CHAR),1,1), @fill_amount) ) ; ROLLBACK ; --enable_abort_on_error SHOW PROCESSLIST; # <-- Just for "convincing" MTR that there is a crash.   storage/innobase/btr/btr0cur.cc:8206: void btr_rec_free_externally_stored_fields(dict_index_t*, rec_t*, const ulint*, page_zip_des_t*, bool, mtr_t*): Assertion `(index->table)->not_redundant() == !!rec_offs_comp(offsets)' failed. ... uery (0x62b000000220): ROLLBACK Connection ID (thread ID): 9 Status: NOT_KILLED   3 <signal handler called> #4 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #5 0x00007f30acfc7f5d in __GI_abort () at abort.c:90 #6 0x00007f30acfbdf17 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x55a62957dbe0 "(index->table)->not_redundant() == !!rec_offs_comp(offsets)", file=file@entry=0x55a6295762e0 "storage/innobase/btr/btr0cur.cc", line=line@entry=8206, function=function@entry=0x55a629584860 <btr_rec_free_externally_stored_fields(dict_index_t*, unsigned char*, unsigned long const*, page_zip_des_t*, bool, mtr_t*)::__PRETTY_FUNCTION__> "void btr_rec_free_externally_stored_fields(dict_index_t*, rec_t*, const ulint*, page_zip_des_t*, bool, mtr_t*)") at assert.c:92 #7 0x00007f30acfbdfc2 in __GI___assert_fail (assertion=0x55a62957dbe0 "(index->table)->not_redundant() == !!rec_offs_comp(offsets)", file=0x55a6295762e0 "storage/innobase/btr/btr0cur.cc", line=8206, function=0x55a629584860 <btr_rec_free_externally_stored_fields(dict_index_t*, unsigned char*, unsigned long const*, page_zip_des_t*, bool, mtr_t*)::__PRETTY_FUNCTION__> "void btr_rec_free_externally_stored_fields(dict_index_t*, rec_t*, const ulint*, page_zip_des_t*, bool, mtr_t*)") at assert.c:101 #8 0x000055a6284f2f4f in btr_rec_free_externally_stored_fields (index=0x617000041408, rec=0x7f30a3b24121 "", offsets=0x61a000092b08, page_zip=0x0, rollback=true, mtr=0x7f30972f2ab0) at storage/innobase/btr/btr0cur.cc:8206 #9 0x000055a6284e9a18 in btr_cur_pessimistic_delete (err=0x7f30972f29f0, has_reserved_extents=0, cursor=0x61b000084378, flags=0, rollback=true, mtr=0x7f30972f2ab0) at storage/innobase/btr/btr0cur.cc:5888 #10 0x000055a62877f8cc in row_undo_ins_remove_clust_rec (node=0x61b000084308) at storage/innobase/row/row0uins.cc:187 #11 0x000055a628781ee6 in row_undo_ins (node=0x61b000084308, thr=0x61700004b248) at storage/innobase/row/row0uins.cc:588 #12 0x000055a628373823 in row_undo (node=0x61b000084308, thr=0x61700004b248) at storage/innobase/row/row0undo.cc:439 #13 0x000055a628373cde in row_undo_step (thr=0x61700004b248) at storage/innobase/row/row0undo.cc:499 #14 0x000055a62822362a in que_thr_step (thr=0x61700004b248) at storage/innobase/que/que0que.cc:1040 #15 0x000055a628223a4e in que_run_threads_low (thr=0x61700004b248) at storage/innobase/que/que0que.cc:1104 #16 0x000055a628223e37 in que_run_threads (thr=0x61700004b248) at storage/innobase/que/que0que.cc:1144 #17 0x000055a628435714 in trx_rollback_to_savepoint_low (trx=0x7f30a3ea1908, savept=0x0) at storage/innobase/trx/trx0roll.cc:137 #18 0x000055a628436004 in trx_rollback_for_mysql_low (trx=0x7f30a3ea1908) at storage/innobase/trx/trx0roll.cc:206 #19 0x000055a628436894 in trx_rollback_for_mysql (trx=0x7f30a3ea1908) at storage/innobase/trx/trx0roll.cc:232 #20 0x000055a627fc9ac8 in innobase_rollback (hton=0x614000002248, thd=0x62a0000de208, rollback_trx=true) at storage/innobase/handler/ha_innodb.cc:4716 #21 0x000055a627a91130 in ha_rollback_trans (thd=0x62a0000de208, all=true) at sql/handler.cc:1722 #22 0x000055a627712c75 in trans_rollback (thd=0x62a0000de208) at sql/transaction.cc:423 #23 0x000055a62735c0f8 in mysql_execute_command (thd=0x62a0000de208) at sql/sql_parse.cc:5837 #24 0x000055a62736a990 in mysql_parse (thd=0x62a0000de208, rawbuf=0x62b000000220 "ROLLBACK", length=8, parser_state=0x7f30972f5dd0, is_com_multi=false, is_next_command=false) at sql/sql_parse.cc:8092 #25 0x000055a627341df3 in dispatch_command (command=COM_QUERY, thd=0x62a0000de208, packet=0x629000271209 "ROLLBACK ", packet_length=10, is_com_multi=false, is_next_command=false) at sql/sql_parse.cc:1850 #26 0x000055a62733ea1f in do_command (thd=0x62a0000de208) at sql/sql_parse.cc:1395 #27 0x000055a6276e10e5 in do_handle_one_connection (connect=0x6080000010a8) at sql/sql_connect.cc:1402 #28 0x000055a6276e09d7 in handle_one_connection (arg=0x6080000010a8) at sql/sql_connect.cc:1308

            mleich, the assertions on rec_offs_comp() or page_rec_is_comp() or page_is_comp() are certainly failing due to this work. I just need a list of all such assertion failures (with stack traces), so that I can evaluate them. I have relaxed many such assertions with || index->dual_format(), and it should be applicable also in this case.
            This work is introducing dual-format leaf pages for the clustered index of COMPACT or DYNAMIC tables.

            Today I finished addressing the correctness-critical FIXME comments, except the one in dict_stats_analyze_index_level().

            marko Marko Mäkelä added a comment - mleich , the assertions on rec_offs_comp() or page_rec_is_comp() or page_is_comp() are certainly failing due to this work. I just need a list of all such assertion failures (with stack traces), so that I can evaluate them. I have relaxed many such assertions with || index->dual_format() , and it should be applicable also in this case. This work is introducing dual-format leaf pages for the clustered index of COMPACT or DYNAMIC tables. Today I finished addressing the correctness-critical FIXME comments, except the one in dict_stats_analyze_index_level() .

            I rebased the work on the latest 10.4 (slightly after the 10.4.1 release) and pushed to bb-10.4-MDEV-17520-2.
            This is very unlikely to be completed within the MariaDB Server 10.4 series.

            The main reason why the work was stalled is the increased space usage due to the dual-format leaf pages.

            For ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC, a better approach would be to store multiple versions of the table definition in the metadata BLOB that was introduced in MDEV-15562, and to use a new status value to indicate that a 'dictionary version number' field is present in the record header. This would be close to the original idea of MDEV-11424. The biggest unknown here is implementing the conversions between different dictionary versions. Ultimately, we would not only support NOT NULL removal, but also widening string or integer columns (MDEV-15563).

            marko Marko Mäkelä added a comment - I rebased the work on the latest 10.4 (slightly after the 10.4.1 release) and pushed to bb-10.4-MDEV-17520-2 . This is very unlikely to be completed within the MariaDB Server 10.4 series. The main reason why the work was stalled is the increased space usage due to the dual-format leaf pages. For ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC , a better approach would be to store multiple versions of the table definition in the metadata BLOB that was introduced in MDEV-15562 , and to use a new status value to indicate that a 'dictionary version number' field is present in the record header. This would be close to the original idea of MDEV-11424 . The biggest unknown here is implementing the conversions between different dictionary versions. Ultimately, we would not only support NOT NULL removal, but also widening string or integer columns ( MDEV-15563 ).

            MDEV-17520_sec_indexes.10.4.3.diff could be useful when reviving this work. It applies cleanly to mariadb-10.4.3. The following tests would fail: main.ctype_utf8mb4_innodb innodb.instant_alter innodb.innodb-alter-nullable main.selectivity_innodb main.function_defaults_innodb

            marko Marko Mäkelä added a comment - MDEV-17520_sec_indexes.10.4.3.diff could be useful when reviving this work. It applies cleanly to mariadb-10.4.3. The following tests would fail: main.ctype_utf8mb4_innodb innodb.instant_alter innodb.innodb-alter-nullable main.selectivity_innodb main.function_defaults_innodb

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.