Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14051

'Undo log record is too big.' error occurring in very narrow range of string lengths

Details

    Description

      A user is seeing an 'Undo log record is too big.' error that only occurs if they update 2 mediumtext fields at once, and if the length of the existing values in each those fields happen to be between 3962 and 4030 characters long.

      Some other interesting notes about this:

      • The table definition has ROW_FORMAT=COMPACT, so it uses the Antelope format.
      • The only indexes on the two mediumtext fields have prefix lengths of 255.
      • The problem is repeatable on MySQL 5.5 and 5.7, and MariaDB 10.1 and 10.2, so I don't think it has been fixed in any released version of MySQL or MariaDB.

      I have attached an SQL file that can be used to reproduce this issue.

      Upstream bug report:

      https://bugs.mysql.com/bug.php?id=88150

      Attachments

        Issue Links

          Activity

            Some thoughts from marko:

            For ROW_FORMAT=COMPACT (or ROW_FORMAT=REDUNDANT), a local prefix of 768
            bytes will be stored in the clustered index leaf page record for all
            columns that are longer than that.

            This could be a previously unknown limitation for me, and it is
            theoretically possible that we can fix it. I would have to carefully
            analyze the test case first. I suppose that InnoDB’s practice of
            dynamically deciding which columns to move off-page is related to this.

            It is also possible that the problem was made bigger by a BLOB bug fix that
            sometimes causes non-updated columns to be added to the update vector (and
            to the undo log record):

            commit ce0a1e85e24e48b8171f767b44330da635a6ea0a
            Author: Annamalai Gurusami <annamalai.gurusami@oracle.com>
            Date: Wed Apr 23 18:45:35 2014 +0530

            Bug #16963396 INNODB: USE OF LARGE EXTERNALLY-STORED FIELDS MAKES CRASH
            RECOVERY LOSE DATA

            This fix is only present in 5.7.5 (and MariaDB 10.2.2) onwards. It is a
            follow-up to an earlier fix by me, which may cause redo log overrun (making
            InnoDB crash-unsafe):

            commit 41bb3537ba507799ab0143acd75ccab72192931e
            Author: Marko Mäkelä <marko.makela@oracle.com>
            Date: Mon Aug 29 11:16:42 2011 +0300

            Bug#12704861 Corruption after a crash during BLOB update

            That fix was present in 5.1.60,5.5.17,5.6.4. But with that earlier fix, the
            undo log records should not be "padded" with non-updated columns. I am only
            saying that the problem is likely worse in 10.2.

            The ultimate fix would be a new undo log format that would lift any size
            restrictions, as outlined in https://jira.mariadb.org/browse/MDEV-11657

            GeoffMontee Geoff Montee (Inactive) added a comment - Some thoughts from marko : For ROW_FORMAT=COMPACT (or ROW_FORMAT=REDUNDANT), a local prefix of 768 bytes will be stored in the clustered index leaf page record for all columns that are longer than that. This could be a previously unknown limitation for me, and it is theoretically possible that we can fix it. I would have to carefully analyze the test case first. I suppose that InnoDB’s practice of dynamically deciding which columns to move off-page is related to this. It is also possible that the problem was made bigger by a BLOB bug fix that sometimes causes non-updated columns to be added to the update vector (and to the undo log record): commit ce0a1e85e24e48b8171f767b44330da635a6ea0a Author: Annamalai Gurusami <annamalai.gurusami@oracle.com> Date: Wed Apr 23 18:45:35 2014 +0530 Bug #16963396 INNODB: USE OF LARGE EXTERNALLY-STORED FIELDS MAKES CRASH RECOVERY LOSE DATA This fix is only present in 5.7.5 (and MariaDB 10.2.2) onwards. It is a follow-up to an earlier fix by me, which may cause redo log overrun (making InnoDB crash-unsafe): commit 41bb3537ba507799ab0143acd75ccab72192931e Author: Marko Mäkelä <marko.makela@oracle.com> Date: Mon Aug 29 11:16:42 2011 +0300 Bug#12704861 Corruption after a crash during BLOB update That fix was present in 5.1.60,5.5.17,5.6.4. But with that earlier fix, the undo log records should not be "padded" with non-updated columns. I am only saying that the problem is likely worse in 10.2. The ultimate fix would be a new undo log format that would lift any size restrictions, as outlined in https://jira.mariadb.org/browse/MDEV-11657
            claudio.nanni Claudio Nanni added a comment -

            For completeness the problem is reproducible also with one single field, upper limit should be of 8061 characters.

            ------------------

            LENGTH(a_str_18)

            ------------------

            63381

            ------------------
            1 row in set (0.02 sec)

            mysql> update test_tab set a_str_18=CONCAT(a_str_18,1);
            Query OK, 1 row affected (0.00 sec)
            Rows matched: 1 Changed: 1 Warnings: 0

            mysql> UPDATE test_tab set a_str_18=LEFT(a_str_18,8000);
            Query OK, 1 row affected (0.02 sec)
            Rows matched: 1 Changed: 1 Warnings: 0

            mysql> update test_tab set a_str_18=CONCAT(a_str_18,1);
            ERROR 1713 (HY000): Undo log record is too big.

            claudio.nanni Claudio Nanni added a comment - For completeness the problem is reproducible also with one single field, upper limit should be of 8061 characters. ------------------ LENGTH(a_str_18) ------------------ 63381 ------------------ 1 row in set (0.02 sec) mysql> update test_tab set a_str_18=CONCAT(a_str_18,1); Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> UPDATE test_tab set a_str_18=LEFT(a_str_18,8000); Query OK, 1 row affected (0.02 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> update test_tab set a_str_18=CONCAT(a_str_18,1); ERROR 1713 (HY000): Undo log record is too big.

            I believe that the problem must be in the function dtuple_convert_big_rec() or in the related logic.
            When a clustered index leaf page record is too long to fit in a page, InnoDB would start moving the longest columns to off-page storage.
            The goal is to have room for at least 2 records in each leaf page. (For ROW_FORMAT=COMPRESSED, the minimum is 1 record for leaf pages and 2 for non-leaf pages.)
            Some boundary condition must be off by a few bytes.

            marko Marko Mäkelä added a comment - I believe that the problem must be in the function dtuple_convert_big_rec() or in the related logic. When a clustered index leaf page record is too long to fit in a page, InnoDB would start moving the longest columns to off-page storage. The goal is to have room for at least 2 records in each leaf page. (For ROW_FORMAT=COMPRESSED, the minimum is 1 record for leaf pages and 2 for non-leaf pages.) Some boundary condition must be off by a few bytes.

            My guess was wrong, and this indeed is a bug with the InnoDB undo log format.
            Here is how I debugged it. I took the attached test, prepended the line

            --source include/have_innodb.inc
            

            saved into a .test file, executed with

            ./mtr --manual-gdb …
            

            and then

            break ha_innobase::update_row
            run
            break trx_undo_page_report_modify
            c
            fin
            

            In this function, I started to single-step. The first invocation of trx_undo_page_report_modify() can ‘safely’ fail; there is a fall-back that retries with an empty undo log page.
            On the second invocation, we can see the following:

            (gdb) p first_free
            $3 = 56
            (gdb) p undo_page
            $4 = (ib_page_t *) 0x7fffef1f4000 ""
            (gdb) display ptr-$3-$4
            

            After some single-stepping I set breakpoints to every return statement in the function.
            There are two loops that are the problem:

            		for (i = 0; i < upd_get_n_fields(update) - extended; i++) {
             
            			ulint	pos = upd_get_nth_field(update, i)->field_no;
             
            			/* Write field number to undo log */
            			if (trx_undo_left(undo_page, ptr) < 5) {
             
            				return(0);
            			}
            …
             
            			if (flen != UNIV_SQL_NULL) {
            				if (trx_undo_left(undo_page, ptr) < flen) {
             
            					return(0);
            				}
             
            				ut_memcpy(ptr, field, flen);
            				ptr += flen;
            			}
            		}
            

            The first one (above) logs every updated column.
            The second loop (below) logs every indexed column, including columns that were already logged above:

            		for (col_no = 0; col_no < dict_table_get_n_cols(table);
            		     col_no++) {
             
            			const dict_col_t*	col
            				= dict_table_get_nth_col(table, col_no);
             
            			if (col->ord_part) {
            				ulint	pos;
             
            				/* Write field number to undo log */
            				if (trx_undo_left(undo_page, ptr) < 5 + 15) {
             
            					return(0);
            				}
             
            				pos = dict_index_get_nth_col_pos(index,
            								 col_no);
            				ptr += mach_write_compressed(ptr, pos);
            …
            			}
            		}
            

            The space runs out during this loop, apparently at the very last user column. We abort at ptr-$3-$4=15933, right here in the second loop:

            				/* Write field number to undo log */
            				if (trx_undo_left(undo_page, ptr) < 5 + 15) {
             
            					return(0);
            				}
            

            Because this second loop is writing the column number to the undo log record, the fix might be as simple as omitting those indexed columns that were logged as updated.
            I will have to carefully check that such a change would be safe with the undo log parsing code.

            Furthermore, in MariaDB 10.2 (and MySQL 5.7) the undo log format has changed due to indexed virtual columns (MDEV-5800).
            Omitting the numbers of updated non-virtual columns must be determined to be safe also when indexed virtual columns are present (and affected by the update).

            marko Marko Mäkelä added a comment - My guess was wrong, and this indeed is a bug with the InnoDB undo log format. Here is how I debugged it. I took the attached test, prepended the line --source include/have_innodb.inc saved into a .test file, executed with ./mtr --manual-gdb … and then break ha_innobase::update_row run break trx_undo_page_report_modify c fin In this function, I started to single-step. The first invocation of trx_undo_page_report_modify() can ‘safely’ fail; there is a fall-back that retries with an empty undo log page. On the second invocation, we can see the following: (gdb) p first_free $3 = 56 (gdb) p undo_page $4 = (ib_page_t *) 0x7fffef1f4000 "" (gdb) display ptr-$3-$4 After some single-stepping I set breakpoints to every return statement in the function. There are two loops that are the problem: for (i = 0; i < upd_get_n_fields(update) - extended; i++) {   ulint pos = upd_get_nth_field(update, i)->field_no;   /* Write field number to undo log */ if (trx_undo_left(undo_page, ptr) < 5) {   return (0); } …   if (flen != UNIV_SQL_NULL) { if (trx_undo_left(undo_page, ptr) < flen) {   return (0); }   ut_memcpy(ptr, field, flen); ptr += flen; } } The first one (above) logs every updated column. The second loop (below) logs every indexed column, including columns that were already logged above: for (col_no = 0; col_no < dict_table_get_n_cols(table); col_no++) {   const dict_col_t* col = dict_table_get_nth_col(table, col_no);   if (col->ord_part) { ulint pos;   /* Write field number to undo log */ if (trx_undo_left(undo_page, ptr) < 5 + 15) {   return (0); }   pos = dict_index_get_nth_col_pos(index, col_no); ptr += mach_write_compressed(ptr, pos); … } } The space runs out during this loop, apparently at the very last user column. We abort at ptr-$3-$4=15933, right here in the second loop: /* Write field number to undo log */ if (trx_undo_left(undo_page, ptr) < 5 + 15) {   return (0); } Because this second loop is writing the column number to the undo log record, the fix might be as simple as omitting those indexed columns that were logged as updated. I will have to carefully check that such a change would be safe with the undo log parsing code. Furthermore, in MariaDB 10.2 (and MySQL 5.7) the undo log format has changed due to indexed virtual columns ( MDEV-5800 ). Omitting the numbers of updated non-virtual columns must be determined to be safe also when indexed virtual columns are present (and affected by the update).

            I pushed to bb-5.5-marko.
            The merge upwards will require some effort. At least in 10.2 there are additional tests failing, because the undo log record size limitations are no longer being exceeded.
            (In 10.2, also the code is slightly different due to spatial indexes and indexed virtual columns.)

            marko Marko Mäkelä added a comment - I pushed to bb-5.5-marko . The merge upwards will require some effort. At least in 10.2 there are additional tests failing, because the undo log record size limitations are no longer being exceeded. (In 10.2, also the code is slightly different due to spatial indexes and indexed virtual columns.)

            Merged up to 10.2 so far.

            marko Marko Mäkelä added a comment - Merged up to 10.2 so far.

            This bug caused a permanent ‘corruption’ of data files that causes performance regression for accessing secondary indexes:
            MDEV-14799 After UPDATE of indexed columns, old values will not be purged from secondary indexes

            marko Marko Mäkelä added a comment - This bug caused a permanent ‘corruption’ of data files that causes performance regression for accessing secondary indexes: MDEV-14799 After UPDATE of indexed columns, old values will not be purged from secondary indexes

            People

              marko Marko Mäkelä
              GeoffMontee Geoff Montee (Inactive)
              Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.