Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15564

Avoid table rebuild in ALTER TABLE on collation or charset changes

Details

    Description

      Some character encodings are proper subsets of others. ASCII is a proper subset of latin1 or latin2 or UTF-8. The 3-byte UTF-8 is a proper subset of the 4-byte UTF-8.

      When the character encoding of a column changes such that the storage encoding does not change, we should avoid rebuilding the table.

      Likewise, when the collation of a column changes such that the encoding stays the same, we should avoid rebuilding the table.

      Collations do not matter if the column is not indexed.

      If the collation of an indexed column changes, then the affected indexes may have to be rebuilt. For certain collations we might avoid that as well. For example, changing the encoding and collation from binary 3-byte to binary 4-byte UTF-8 might not require any change.

      When the collation change involves a column that is part of the PRIMARY KEY and we have determined that a collation change is necessary, then the whole table will have to be rebuilt.

      Attachments

        Issue Links

          Activity

            Correct: ROW_FORMAT=REDUNDANT will reserve n*mbmaxlen bytes for {{CHAR(n)}}. It will do that even when the column is NULL. The idea is to allow update-in-place. The later ROW_FORMAT optimize the space and allocate n*mbminlen to n*mbmaxlen bytes. Trailing space is ‘compressed away’. For example, 'фыва' is 4×2 bytes. If the column were CHAR(8), we would allocate only 8 bytes for it, because the trailing space 'фыва ' does not need to be stored. If there was one more letter before the trailing space, we would have to allocate more space.

            It could be that {{CHAR(n)}} with a variable-length character encoding is actually internally stored as VARCHAR, that is, an explicit length is stored. I cannot remember it.

            For ROW_FORMAT=REDUNDANT, I think that it should be OK to bend the existing rules and allow an instant extension of the maximum length of the column, that is, allow CHAR(200) to be converted to a bigger mbmaxlen without changing the existing data. This would be a step towards MDEV-15563, which would allow CHAR or VARCHAR to be instantaneously extended arbitrarily in ROW_FORMAT=REDUNDANT.

            marko Marko Mäkelä added a comment - Correct: ROW_FORMAT=REDUNDANT will reserve n*mbmaxlen bytes for {{CHAR(n)}}. It will do that even when the column is NULL . The idea is to allow update-in-place. The later ROW_FORMAT optimize the space and allocate  n*mbminlen to n*mbmaxlen  bytes. Trailing space is ‘compressed away’. For example, 'фыва' is 4×2 bytes. If the column were CHAR(8) , we would allocate only 8 bytes for it, because the trailing space 'фыва ' does not need to be stored. If there was one more letter before the trailing space, we would have to allocate more space. It could be that {{CHAR(n)}} with a variable-length character encoding is actually internally stored as VARCHAR , that is, an explicit length is stored. I cannot remember it. For ROW_FORMAT=REDUNDANT , I think that it should be OK to bend the existing rules and allow an instant extension of the maximum length of the column, that is, allow CHAR(200) to be converted to a bigger mbmaxlen without changing the existing data. This would be a step towards MDEV-15563 , which would allow CHAR or VARCHAR to be instantaneously extended arbitrarily in ROW_FORMAT=REDUNDANT .

            Thanks, this looks generally good. I am mainly requesting some code refactoring.

            Please document (with examples) in which way the file format is being effectively changed, as witnessed by the change to the CHECK TABLE function btr_index_rec_validate(). We must prevent older MariaDB versions from accessing tables where the file format has been changed.

            marko Marko Mäkelä added a comment - Thanks, this looks generally good. I am mainly requesting some code refactoring. Please document (with examples) in which way the file format is being effectively changed, as witnessed by the change to the CHECK TABLE function btr_index_rec_validate() . We must prevent older MariaDB versions from accessing tables where the file format has been changed.

            Some necessary code changes are common between this and MDEV-15563. I think that MDEV-15563 is closer to completion at this point, and should be submitted first.

            marko Marko Mäkelä added a comment - Some necessary code changes are common between this and MDEV-15563 . I think that MDEV-15563 is closer to completion at this point, and should be submitted first.
            bar Alexander Barkov added a comment - - edited

            I've taken a look into a number of commits:

            @kevgs pushed 6 commits.

            20cb37e add GBK
            053e48e add UJIS
            4e3f6ef add SJIS
            54e4cc0 add BIG5
            90b4b4f add latin2
            3237b5e added latin7

            There is no need to list all ASCII compatible character set names and check them using strcmp().
            Please check:

            (cs->state & MY_CS_NONASCII)
            

            to know if a character set is not ASCII compatible.

            Note, SJIS is not ASCII compatible.
            This change is probably not correct:
            https://github.com/MariaDB/server/commit/54e4cc0b6571a6a929538eca46f940c465e42e68

            bar Alexander Barkov added a comment - - edited I've taken a look into a number of commits: @kevgs pushed 6 commits. 20cb37e add GBK 053e48e add UJIS 4e3f6ef add SJIS 54e4cc0 add BIG5 90b4b4f add latin2 3237b5e added latin7 There is no need to list all ASCII compatible character set names and check them using strcmp(). Please check: (cs->state & MY_CS_NONASCII) to know if a character set is not ASCII compatible. Note, SJIS is not ASCII compatible. This change is probably not correct: https://github.com/MariaDB/server/commit/54e4cc0b6571a6a929538eca46f940c465e42e68

            Due to conflicts with MDEV-15563, I refactored the InnoDB part of this and also the functions Field_str::is_equal() and Field_varstring::is_equal().

            The following bugs were filed for limitations that I think we should try to fix soon:
            MDEV-18583 Avoid copying when changing the type of an indexed column
            MDEV-18584 Avoid copying when altering CHAR column in InnoDB table

            marko Marko Mäkelä added a comment - Due to conflicts with MDEV-15563 , I refactored the InnoDB part of this and also the functions Field_str::is_equal() and Field_varstring::is_equal() . The following bugs were filed for limitations that I think we should try to fix soon: MDEV-18583 Avoid copying when changing the type of an indexed column MDEV-18584 Avoid copying when altering CHAR column in InnoDB table

            People

              kevg Eugene Kosov (Inactive)
              marko Marko Mäkelä
              Votes:
              3 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.