Details

    Description

      innodb_zip.* and innodb.innodb-16k tests are failing on s390x arch with the following error message:

      ER_TOO_BIG_ROWSIZE (1118): Row size too large (> 8126). Changing some columns to TEXT or BLOB may help. In current row format, BLOB prefix of 0 bytes is stored inline.
      

      I am attaching the full log of the tests.

      Attachments

        Issue Links

          Activity

            Support Case CS0378173 connected to this ticket.

            Attached log "mariadb-s390x-tests.log" contains 36 cases of ER_TOO_BIG_ROWSIZE and 1 case of ER_LOCK_WAIT_TIMEOUT.

            edward Edward Stoever added a comment - Support Case CS0378173 connected to this ticket. Attached log "mariadb-s390x-tests.log" contains 36 cases of ER_TOO_BIG_ROWSIZE and 1 case of ER_LOCK_WAIT_TIMEOUT.

            Added files from CS0378173.

            edward Edward Stoever added a comment - Added files from CS0378173.

            I see that mariadb-s390x-build.txt includes WITH_ZLIB=system. Which version of that library is provided by the operating system? The file mariadb-s390x-tests.log does not include any server error logs.

            Some tests of ROW_FORMAT=COMPRESSED depend on a particular zlib version. I remember that a change of the definition of the function compressBound() caused some trouble some 10 to 15 years ago.

            A quick check in the zlib changelog on my AMD64 system suggests that versions 1.2.8 and 1.2.11 could work.

            marko Marko Mäkelä added a comment - I see that mariadb-s390x-build.txt includes WITH_ZLIB=system . Which version of that library is provided by the operating system? The file mariadb-s390x-tests.log does not include any server error logs. Some tests of ROW_FORMAT=COMPRESSED depend on a particular zlib version. I remember that a change of the definition of the function compressBound() caused some trouble some 10 to 15 years ago. A quick check in the zlib changelog on my AMD64 system suggests that versions 1.2.8 and 1.2.11 could work.

            The version used is the latest (1.2.11) but with this patch applied. It provides hardware-accelerated on s390x, and I think it could be the root cause of this issue.

            danyspin97 Danilo Spinella added a comment - The version used is the latest (1.2.11) but with this patch applied. It provides hardware-accelerated on s390x, and I think it could be the root cause of this issue.

            I see that the patch indeed does redefine the compressBound() function. It could very well explain those test failures. At least some tests are trying to exercise some internal limits, such as triggering a compressed page overflow.

            If you want those tests to pass, they should be adjusted for this limit somehow.

            The following is me thinking aloud. MariaDB does provide WITH_ZLIB=bundled. The MariaDB Foundation CI does include some S390x systems. Why not apply that patch there?

            Related to that: In MDEV-19935 we replaced zlib’s built-in crc32() function with accelerated ones on IA-32, AMD64, ARMv8 and POWER 8 or 9, but nothing for s390x yet. Maybe you’d want to fix that as well? danblack should be able to coordinate. For InnoDB, you’d also want to implement an accelerated crc32c() function.

            Also, unrelated to this, but related to s390x, maybe you would want to implement MDEV-26769? I tried but in the end gave up; somehow the programming interface felt different from the POWER 8 one, or we had some libraries missing from our build environment.

            marko Marko Mäkelä added a comment - I see that the patch indeed does redefine the compressBound() function. It could very well explain those test failures. At least some tests are trying to exercise some internal limits, such as triggering a compressed page overflow. If you want those tests to pass, they should be adjusted for this limit somehow. The following is me thinking aloud. MariaDB does provide WITH_ZLIB=bundled . The MariaDB Foundation CI does include some S390x systems. Why not apply that patch there? Related to that: In MDEV-19935 we replaced zlib’s built-in crc32() function with accelerated ones on IA-32, AMD64, ARMv8 and POWER 8 or 9, but nothing for s390x yet. Maybe you’d want to fix that as well? danblack should be able to coordinate. For InnoDB, you’d also want to implement an accelerated crc32c() function. Also, unrelated to this, but related to s390x, maybe you would want to implement MDEV-26769 ? I tried but in the end gave up; somehow the programming interface felt different from the POWER 8 one, or we had some libraries missing from our build environment.

            The following is me thinking aloud. MariaDB does provide WITH_ZLIB=bundled. The MariaDB Foundation CI does include some S390x systems. Why not apply that patch there?

            Yea, that would definitely be a good idea. The best solution would be to zlib to merge this patch, but as far as I remember, they are not accepting any Pull Request at the moment. I will try running the tests without this patch just to confirm that it's the root cause.

            Regarding MDEV-19935 and MDEV-26769, ideally it would be great to have them implemented for s390x as well. Unfortunately I have no experience in developing on such a platform, so I can't be of much help. I think for the time being it would be okay to leave it like this, as we didn't receive any report about MariaDB being significantly slower on this architecture.

            danyspin97 Danilo Spinella added a comment - The following is me thinking aloud. MariaDB does provide WITH_ZLIB=bundled. The MariaDB Foundation CI does include some S390x systems. Why not apply that patch there ? Yea, that would definitely be a good idea. The best solution would be to zlib to merge this patch, but as far as I remember, they are not accepting any Pull Request at the moment. I will try running the tests without this patch just to confirm that it's the root cause. Regarding MDEV-19935 and MDEV-26769 , ideally it would be great to have them implemented for s390x as well. Unfortunately I have no experience in developing on such a platform, so I can't be of much help. I think for the time being it would be okay to leave it like this, as we didn't receive any report about MariaDB being significantly slower on this architecture.

            Just checked and without the patch the tests succeed.

            danyspin97 Danilo Spinella added a comment - Just checked and without the patch the tests succeed.

            danyspin97, thank you.

            For what it is worth, before we did our ‘homebrew’ my_checksum() for the ISO 3309 CRC-32 polynomial, I was advocating the idea of simply getting an optimized implementation via zlib. After some research, it looked like many performance enhancement patches had been waiting to be merged for a long time, so rolling our own version was the most practical option.

            marko Marko Mäkelä added a comment - danyspin97 , thank you. For what it is worth, before we did our ‘homebrew’ my_checksum() for the ISO 3309 CRC-32 polynomial, I was advocating the idea of simply getting an optimized implementation via zlib . After some research, it looked like many performance enhancement patches had been waiting to be merged for a long time, so rolling our own version was the most practical option.

            I see that we already have a builder where tests fail in a similar way:

            https://buildbot.mariadb.org/#/builders/328/builds/1495/steps/6/logs/stdio

            So, this could be fixed by adjusting the tests in some way. I may need interactive access to that builder to be able to do that quickly, or I might try to patch the compressBound() function in a similar way on my local AMD64 based system.

            marko Marko Mäkelä added a comment - I see that we already have a builder where tests fail in a similar way: https://buildbot.mariadb.org/#/builders/328/builds/1495/steps/6/logs/stdio So, this could be fixed by adjusting the tests in some way. I may need interactive access to that builder to be able to do that quickly, or I might try to patch the compressBound() function in a similar way on my local AMD64 based system.

            I see that we already have a builder where tests fail in a similar way:

            This is interesting. Yea, it would be great if the tests could be patched. Is it safe to skip them in the meanwhile? Or is there some issue when mariadb uses the patched zlib?

            danyspin97 Danilo Spinella added a comment - I see that we already have a builder where tests fail in a similar way: This is interesting. Yea, it would be great if the tests could be patched. Is it safe to skip them in the meanwhile? Or is there some issue when mariadb uses the patched zlib?

            I think that the failures can be safely ignored meanwhile. My design of the InnoDB ROW_FORMAT=COMPRESSED is conservative. It is assumed that any compression operation may fail (due to the lack of space). The only potentially invalid (but in my opinion, reasonable) assumption is that any zlib version can decompress what any other zlib version compressed.

            There was a parameter innodb_log_compressed_pages that was deprecated and ignored in MDEV-12353. It was intentionally disabled by default in MySQL, but MDEV-6935 enabled it by default in MariaDB. The commit messages stated an overly optimistic assumption that one would never need to execute crash recovery or apply backed up logs with a different zlib version. This assumption turned out to be blatantly false in MDEV-13247.

            For some time, I wanted to get rid of this format (see MDEV-23497), but I have since changed my mind, and I understand that this format can be useful for read-mostly workloads. My biggest pain point with this was the buffer pool block descriptor overhead, which was significantly reduced in MDEV-27058.

            marko Marko Mäkelä added a comment - I think that the failures can be safely ignored meanwhile. My design of the InnoDB ROW_FORMAT=COMPRESSED is conservative. It is assumed that any compression operation may fail (due to the lack of space). The only potentially invalid (but in my opinion, reasonable) assumption is that any zlib version can decompress what any other zlib version compressed. There was a parameter innodb_log_compressed_pages that was deprecated and ignored in MDEV-12353 . It was intentionally disabled by default in MySQL, but MDEV-6935 enabled it by default in MariaDB. The commit messages stated an overly optimistic assumption that one would never need to execute crash recovery or apply backed up logs with a different zlib version. This assumption turned out to be blatantly false in MDEV-13247 . For some time, I wanted to get rid of this format (see MDEV-23497 ), but I have since changed my mind, and I understand that this format can be useful for read-mostly workloads. My biggest pain point with this was the buffer pool block descriptor overhead, which was significantly reduced in MDEV-27058 .

            I applied the following patch to simulate the patched s390x compressBound() in any cmake -DWITH_ZLIB=bundled build:

            diff --git a/zlib/compress.c b/zlib/compress.c
            index e2db404abf8..ab232aaaef6 100644
            --- a/zlib/compress.c
            +++ b/zlib/compress.c
            @@ -81,6 +81,28 @@ int ZEXPORT compress (dest, destLen, source, sourceLen)
             uLong ZEXPORT compressBound (sourceLen)
                 uLong sourceLen;
             {
            -    return sourceLen + (sourceLen >> 12) + (sourceLen >> 14) +
            -           (sourceLen >> 25) + 13;
            +
            +#define DFLTCC_BLOCK_HEADER_BITS 3
            +#define DFLTCC_HLITS_COUNT_BITS 5
            +#define DFLTCC_HDISTS_COUNT_BITS 5
            +#define DFLTCC_HCLENS_COUNT_BITS 4
            +#define DFLTCC_MAX_HCLENS 19
            +#define DFLTCC_HCLEN_BITS 3
            +#define DFLTCC_MAX_HLITS 286
            +#define DFLTCC_MAX_HDISTS 30
            +#define DFLTCC_MAX_HLIT_HDIST_BITS 7
            +#define DFLTCC_MAX_SYMBOL_BITS 16
            +#define DFLTCC_MAX_EOBS_BITS 15
            +#define DFLTCC_MAX_PADDING_BITS 7
            +#define DEFLATE_BOUND_COMPLEN(source_len) \
            +    ((DFLTCC_BLOCK_HEADER_BITS + \
            +      DFLTCC_HLITS_COUNT_BITS + \
            +      DFLTCC_HDISTS_COUNT_BITS + \
            +      DFLTCC_HCLENS_COUNT_BITS + \
            +      DFLTCC_MAX_HCLENS * DFLTCC_HCLEN_BITS + \
            +      (DFLTCC_MAX_HLITS + DFLTCC_MAX_HDISTS) * DFLTCC_MAX_HLIT_HDIST_BITS + \
            +      (source_len) * DFLTCC_MAX_SYMBOL_BITS + \
            +      DFLTCC_MAX_EOBS_BITS + \
            +      DFLTCC_MAX_PADDING_BITS) >> 3)
            +    return DEFLATE_BOUND_COMPLEN(sourceLen) + 6;
             }
            

            In the server error log I see the following:

            10.5 52b32c60c26b512ccf9b1233d7f54c4b56499df3

            2022-02-16 11:40:24 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
            

            In most failed tests, a CREATE TABLE statement fails with ER_TOO_BIG_ROWSIZE (1118). In innodb_zip.wl6347_comp_indx_stat, the observed data file sizes are larger. For the test innodb_zip.bug36169, a debug assertion (which is only enabled in debug builds) fails:

            10.5 52b32c60c26b512ccf9b1233d7f54c4b56499df3

            mariadbd: /mariadb/10.5m/storage/innobase/handler/ha_innodb.cc:12905: bool create_table_info_t::row_size_is_acceptable(const dict_index_t&, bool) const: Assertion `info.max_leaf_size != 0' failed.
            

            The full list of test failures is as follows:

            Completed: Failed 28/5574 tests, 99.50% were successful.
             
            Failing test(s): innodb.innodb-16k innodb_zip.page_size innodb_zip.index_large_prefix innodb_zip.prefix_index_liftedlimit innodb_zip.wl6347_comp_indx_stat innodb_zip.bug36172 innodb_zip.bug52745 innodb_zip.wl6344_compress_level innodb_zip.bug36169
            

            I will try to tweak all these tests so that they will pass with the normal compressBound() as well the weaker compressBound() guarantee that is associated with the s390x DFLTCC instruction.

            marko Marko Mäkelä added a comment - I applied the following patch to simulate the patched s390x compressBound() in any cmake -DWITH_ZLIB=bundled build: diff --git a/zlib/compress.c b/zlib/compress.c index e2db404abf8..ab232aaaef6 100644 --- a/zlib/compress.c +++ b/zlib/compress.c @@ -81,6 +81,28 @@ int ZEXPORT compress (dest, destLen, source, sourceLen) uLong ZEXPORT compressBound (sourceLen) uLong sourceLen; { - return sourceLen + (sourceLen >> 12) + (sourceLen >> 14) + - (sourceLen >> 25) + 13; + +#define DFLTCC_BLOCK_HEADER_BITS 3 +#define DFLTCC_HLITS_COUNT_BITS 5 +#define DFLTCC_HDISTS_COUNT_BITS 5 +#define DFLTCC_HCLENS_COUNT_BITS 4 +#define DFLTCC_MAX_HCLENS 19 +#define DFLTCC_HCLEN_BITS 3 +#define DFLTCC_MAX_HLITS 286 +#define DFLTCC_MAX_HDISTS 30 +#define DFLTCC_MAX_HLIT_HDIST_BITS 7 +#define DFLTCC_MAX_SYMBOL_BITS 16 +#define DFLTCC_MAX_EOBS_BITS 15 +#define DFLTCC_MAX_PADDING_BITS 7 +#define DEFLATE_BOUND_COMPLEN(source_len) \ + ((DFLTCC_BLOCK_HEADER_BITS + \ + DFLTCC_HLITS_COUNT_BITS + \ + DFLTCC_HDISTS_COUNT_BITS + \ + DFLTCC_HCLENS_COUNT_BITS + \ + DFLTCC_MAX_HCLENS * DFLTCC_HCLEN_BITS + \ + (DFLTCC_MAX_HLITS + DFLTCC_MAX_HDISTS) * DFLTCC_MAX_HLIT_HDIST_BITS + \ + (source_len) * DFLTCC_MAX_SYMBOL_BITS + \ + DFLTCC_MAX_EOBS_BITS + \ + DFLTCC_MAX_PADDING_BITS) >> 3) + return DEFLATE_BOUND_COMPLEN(sourceLen) + 6; } In the server error log I see the following: 10.5 52b32c60c26b512ccf9b1233d7f54c4b56499df3 2022-02-16 11:40:24 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 In most failed tests, a CREATE TABLE statement fails with ER_TOO_BIG_ROWSIZE (1118). In innodb_zip.wl6347_comp_indx_stat , the observed data file sizes are larger. For the test innodb_zip.bug36169 , a debug assertion (which is only enabled in debug builds) fails: 10.5 52b32c60c26b512ccf9b1233d7f54c4b56499df3 mariadbd: /mariadb/10.5m/storage/innobase/handler/ha_innodb.cc:12905: bool create_table_info_t::row_size_is_acceptable(const dict_index_t&, bool) const: Assertion `info.max_leaf_size != 0' failed. The full list of test failures is as follows: Completed: Failed 28/5574 tests, 99.50% were successful.   Failing test(s): innodb.innodb-16k innodb_zip.page_size innodb_zip.index_large_prefix innodb_zip.prefix_index_liftedlimit innodb_zip.wl6347_comp_indx_stat innodb_zip.bug36172 innodb_zip.bug52745 innodb_zip.wl6344_compress_level innodb_zip.bug36169 I will try to tweak all these tests so that they will pass with the normal compressBound() as well the weaker compressBound() guarantee that is associated with the s390x DFLTCC instruction.

            The following version of the function generates the same AMD64 code for me:

            uLong ZEXPORT compressBound (sourceLen)
                uLong sourceLen;
            {
                return (sourceLen * 16 + 2308) / 8 + 6;
            }
            

            marko Marko Mäkelä added a comment - The following version of the function generates the same AMD64 code for me: uLong ZEXPORT compressBound (sourceLen) uLong sourceLen; { return (sourceLen * 16 + 2308) / 8 + 6; }

            danyspin97, for some time, the results from my test push are available at https://buildbot.mariadb.org/#/grid?branch=st-10.5-MDEV-27634
            For reference, I am quoting the names of all failed tests on the s390x builders that have completed so far:

            s390x-rhel-8

            archive.archive
            connect.zip
            main.column_compression
            main.column_compression_rpl
            main.column_compression_rpl
            main.column_compression_rpl
            main.func_compress
            main.func_math
            main.mysqlbinlog_row_compressed
            main.mysqlbinlog_stmt_compressed

            s390x-sles-15

            archive.archive
            connect.zip
            main.column_compression
            main.column_compression_rpl
            main.column_compression_rpl
            main.column_compression_rpl
            main.func_compress
            main.mysqlbinlog_row_compressed
            main.mysqlbinlog_stmt_compressed
            perfschema.show_aggregate

            s390x-ubuntu-2004

            main.column_compression
            main.func_math

            Analysis

            The builders s390x-rhel-8-rpm-autobake, s390x-sles-15-rpm-autobake had not completed as of this writing. I would assume that they will fare in a similar way to the non-autobake builds.

            The test perfschema.show_aggregate is failing very often on various platforms, including AMD64. The other failures look like they may be directly related to the zlib version.

            As you can see, the Docker container that pretends to be Ubuntu 20.04 had fewer failing tests; perhaps they are not applying that s390x DFLTCC patch? The failure of the test main.column_compression might be explained by the bug MDEV-24797; I suggest that if it is so, you’d post any analysis there.

            None of the remaining tests should be related to InnoDB and this ticket.

            I plan to push the fix to the 10.2 branch and merge from there to later-version branches.

            marko Marko Mäkelä added a comment - danyspin97 , for some time, the results from my test push are available at https://buildbot.mariadb.org/#/grid?branch=st-10.5-MDEV-27634 For reference, I am quoting the names of all failed tests on the s390x builders that have completed so far: s390x-rhel-8 archive.archive connect.zip main.column_compression main.column_compression_rpl main.column_compression_rpl main.column_compression_rpl main.func_compress main.func_math main.mysqlbinlog_row_compressed main.mysqlbinlog_stmt_compressed s390x-sles-15 archive.archive connect.zip main.column_compression main.column_compression_rpl main.column_compression_rpl main.column_compression_rpl main.func_compress main.mysqlbinlog_row_compressed main.mysqlbinlog_stmt_compressed perfschema.show_aggregate s390x-ubuntu-2004 main.column_compression main.func_math Analysis The builders s390x-rhel-8-rpm-autobake, s390x-sles-15-rpm-autobake had not completed as of this writing. I would assume that they will fare in a similar way to the non-autobake builds. The test perfschema.show_aggregate is failing very often on various platforms, including AMD64. The other failures look like they may be directly related to the zlib version. As you can see, the Docker container that pretends to be Ubuntu 20.04 had fewer failing tests; perhaps they are not applying that s390x DFLTCC patch? The failure of the test main.column_compression might be explained by the bug MDEV-24797 ; I suggest that if it is so, you’d post any analysis there. None of the remaining tests should be related to InnoDB and this ticket. I plan to push the fix to the 10.2 branch and merge from there to later-version branches.

            Thank you @Marko Mäkelä for the help and the fix. Regarding the other tests, I also think that they are not related to s390x DFLTCC patch as they are currently succeeding for us. The main.func_math is currently failing in some configurations and could be related to MDEV-26645.

            danyspin97 Danilo Spinella added a comment - Thank you @Marko Mäkelä for the help and the fix. Regarding the other tests, I also think that they are not related to s390x DFLTCC patch as they are currently succeeding for us. The main.func_math is currently failing in some configurations and could be related to MDEV-26645 .

            The test case innodb.row_size_error_log_warnings_3 that was added along with the fix of MDEV-20194 failed to take the modified compressBound() into account. I have now fixed that in 10.4 (where we do not have any s390x builders at the moment) and merged the change to 10.5 and 10.6.

            marko Marko Mäkelä added a comment - The test case innodb.row_size_error_log_warnings_3 that was added along with the fix of MDEV-20194 failed to take the modified compressBound() into account. I have now fixed that in 10.4 (where we do not have any s390x builders at the moment) and merged the change to 10.5 and 10.6 .

            People

              marko Marko Mäkelä
              danyspin97 Danilo Spinella
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.