Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13937

Aria engine: Internal Error 160 after partition handling

Details

    • 10.2.10, 5.5.58

    Description

      Partitions get corrupted: internal error 160.
      Unable to repair (with Alter table).
      Symptom: The AutoIncrement value goes wild (from a few thousand, to numbers with 20+ digits).
      Viewing an allegedly uncorrupted partition shows extremely distorted data.
      Fault appears to follow adding a new partition (partitions by date) although the new
      partition is for a future date, so not yet used.
      Only fix discovered, so far, is to drop the table.
      Not yet isolated the exact trigger.
      Tables schema: ARIA ROW_FORMAT=PAGE TRANSACTIONAL=1 PAGE_CHECKSUM=1
      (FYI: same application code using MyISAM does not fail)
      What causes internal error 160?

      Attachments

        Activity

          Error 160 is probably "There's no partition in the table for the given value". Apparently it's caused by the described corruption.
          How often does it happen? Would you be able to temporarily enable general log until the next time the problem occurs? Or, do you have binary logs which would also contain the story of the table?
          Does the table contain highly confidential data? If not, next time when you encounter it, could you please back up files related to the table (frm, MAI, MAD and partition files as well), and upload them to ftp.askmonty.org/private?

          elenst Elena Stepanova added a comment - Error 160 is probably "There's no partition in the table for the given value" . Apparently it's caused by the described corruption. How often does it happen? Would you be able to temporarily enable general log until the next time the problem occurs? Or, do you have binary logs which would also contain the story of the table? Does the table contain highly confidential data? If not, next time when you encounter it, could you please back up files related to the table (frm, MAI, MAD and partition files as well), and upload them to ftp.askmonty.org/private?

          All the files for the table transferred.
          Also queryDrop.log to show sequence.
          In getting you the files, I found I can reproduce the fault at will (on this table):
          It requires an "alter table drop partition". The partition dropped is one for the future: not used - i.e. empty.
          As I have many tables that are modified in the same way, but only a small number get corrupted, I wondered if it was data structure dependent: so I added a "dummy" field to the schema; Nope, no change. Likewise I set defaults for all fields; Nope.

          My app creates a missing table automatically, including 2 "future" partitions, which exist to catch midnight write issues. So
          normally they are empty on the day of creation. By experiment I found the repeatable issue.
          However, my app often creates this very problem: it does not delete that partition. All it does it create more future partitions, and delete old, expired, ones. So i deduce the issue can be triggered by other partition operations. Also I sometimes see the AutoIncrement values go mad (as the app runs), but not always get an error on check table (manually check).

          FYI: This is a "logging" app. Records are written only: never updated: old records are dropped when old partitions are deleted. I have problems with MyIsam table corruption on a PC that "BugChecks", so I thought I would try ARIA. The MyISAM tables never get corrupted as I see here.

          AndyMidd Andy Middleton added a comment - All the files for the table transferred. Also queryDrop.log to show sequence. In getting you the files, I found I can reproduce the fault at will (on this table): It requires an "alter table drop partition". The partition dropped is one for the future: not used - i.e. empty. As I have many tables that are modified in the same way, but only a small number get corrupted, I wondered if it was data structure dependent: so I added a "dummy" field to the schema; Nope, no change. Likewise I set defaults for all fields; Nope. My app creates a missing table automatically, including 2 "future" partitions, which exist to catch midnight write issues. So normally they are empty on the day of creation. By experiment I found the repeatable issue. However, my app often creates this very problem: it does not delete that partition. All it does it create more future partitions, and delete old, expired, ones. So i deduce the issue can be triggered by other partition operations. Also I sometimes see the AutoIncrement values go mad (as the app runs), but not always get an error on check table (manually check). FYI: This is a "logging" app. Records are written only: never updated: old records are dropped when old partitions are deleted. I have problems with MyIsam table corruption on a PC that "BugChecks", so I thought I would try ARIA. The MyISAM tables never get corrupted as I see here.

          Thank you for the data and observations, I was able to reproduce the problem.
          If your application creates the table with the option ROW_FORMAT=PAGE, as a workaround, try to remove it, it seems to help (although I'm not yet sure why, hopefully further investigation will show it).
          On a separate note, I also suggest to remove TRANSACTIONAL=1 – it is not related to the problem, but it causes unnecessary warnings.

          Test case

          --source include/have_partition.inc
           
          CREATE TABLE t1 (
            pk BIGINT NOT NULL AUTO_INCREMENT,
            dt DATETIME DEFAULT NULL,
            PRIMARY KEY (pk, dt),
            KEY (dt)
          ) ENGINE=Aria ROW_FORMAT=PAGE
            PARTITION BY RANGE  COLUMNS(dt) (
              PARTITION `P20171231` VALUES LESS THAN ('2017-12-31'),
              PARTITION `P20181231` VALUES LESS THAN ('2018-12-31')
          );
           
          INSERT INTO t1 VALUES (1,'2017-09-28 15:12:00');
           
          SELECT * FROM t1;
          ALTER TABLE t1 DROP PARTITION P20181231;
          SELECT * FROM t1;
           
          # Cleanup
          DROP TABLE t1;
          

          Before DROP PARTITION

          MariaDB [test]> SELECT * FROM t1;
          +----+---------------------+
          | pk | dt                  |
          +----+---------------------+
          |  1 | 2017-09-28 15:12:00 |
          +----+---------------------+
          1 row in set (0.00 sec)
          

          After DROP PARTITION

          MariaDB [test]> SELECT * FROM t1;
          +----------------------+---------------------+
          | pk                   | dt                  |
          +----------------------+---------------------+
          | -7421932185906577408 | 2341-02-25 16:00:00 |
          +----------------------+---------------------+
          1 row in set (0.00 sec)
          

          elenst Elena Stepanova added a comment - Thank you for the data and observations, I was able to reproduce the problem. If your application creates the table with the option ROW_FORMAT=PAGE , as a workaround, try to remove it, it seems to help (although I'm not yet sure why, hopefully further investigation will show it). On a separate note, I also suggest to remove TRANSACTIONAL=1 – it is not related to the problem, but it causes unnecessary warnings. Test case --source include/have_partition.inc   CREATE TABLE t1 ( pk BIGINT NOT NULL AUTO_INCREMENT, dt DATETIME DEFAULT NULL , PRIMARY KEY (pk, dt), KEY (dt) ) ENGINE=Aria ROW_FORMAT=PAGE PARTITION BY RANGE COLUMNS(dt) ( PARTITION `P20171231` VALUES LESS THAN ( '2017-12-31' ), PARTITION `P20181231` VALUES LESS THAN ( '2018-12-31' ) );   INSERT INTO t1 VALUES (1, '2017-09-28 15:12:00' );   SELECT * FROM t1; ALTER TABLE t1 DROP PARTITION P20181231; SELECT * FROM t1;   # Cleanup DROP TABLE t1; Before DROP PARTITION MariaDB [test]> SELECT * FROM t1; +----+---------------------+ | pk | dt | +----+---------------------+ | 1 | 2017-09-28 15:12:00 | +----+---------------------+ 1 row in set (0.00 sec) After DROP PARTITION MariaDB [test]> SELECT * FROM t1; +----------------------+---------------------+ | pk | dt | +----------------------+---------------------+ | -7421932185906577408 | 2341-02-25 16:00:00 | +----------------------+---------------------+ 1 row in set (0.00 sec)

          A simple test case, indeed. I would add a "Check" table, finally.
          It also fails with ADD partition, instead of Drop.
          I have examples of the pk being corrupted with/without Internal Error 160 when the table is Checked.
          Adding/Dropping partitions when the table is empty is OK. Truncate of such a corrupted table clears the error.
          So I conclude the partition change is causing a corrupting write to the files of another partition.
          I have a number of tables, partitions all handled the same way: not all get corrupted: I can't find the isolating factor.

          AndyMidd Andy Middleton added a comment - A simple test case, indeed. I would add a "Check" table, finally. It also fails with ADD partition, instead of Drop. I have examples of the pk being corrupted with/without Internal Error 160 when the table is Checked. Adding/Dropping partitions when the table is empty is OK. Truncate of such a corrupted table clears the error. So I conclude the partition change is causing a corrupting write to the files of another partition. I have a number of tables, partitions all handled the same way: not all get corrupted: I can't find the isolating factor.

          Given my test case, I suppose the combination of indexes was important there, otherwise I would have removed unnecessary ones. It's most likely not the only combination which contributes to the failure, but it might be the factor that makes some of your tables get corrupted while others remain intact.

          elenst Elena Stepanova added a comment - Given my test case, I suppose the combination of indexes was important there, otherwise I would have removed unnecessary ones. It's most likely not the only combination which contributes to the failure, but it might be the factor that makes some of your tables get corrupted while others remain intact.

          So, hopes high, I tried removing all indices, except the pk. No improvement. All my tables have the same pk. Pity!
          Is this row,date pk the issue?

          AndyMidd Andy Middleton added a comment - So, hopes high, I tried removing all indices, except the pk. No improvement. All my tables have the same pk. Pity! Is this row,date pk the issue?

          You are right. Even with different keys, there is still corruption (with my test case too), it just looked less obvious, easier to miss, e.g. if I remove the secondary index and replace PK with UNIQUE, I get this:

          CREATE TABLE t1 (
            pk BIGINT NOT NULL AUTO_INCREMENT,
            dt DATETIME DEFAULT NULL,
            UNIQUE (pk, dt)
          ) ENGINE=Aria ROW_FORMAT=PAGE
            PARTITION BY RANGE  COLUMNS(dt) (
              PARTITION `P20171231` VALUES LESS THAN ('2017-12-31'),
              PARTITION `P20181231` VALUES LESS THAN ('2018-12-31')
          );
           
          INSERT INTO t1 VALUES (1,'2017-09-28 15:12:00');
          

          MariaDB [test]> SELECT * FROM t1;
          +----+---------------------+
          | pk | dt                  |
          +----+---------------------+
          |  1 | 2017-09-28 15:12:00 |
          +----+---------------------+
          1 row in set (0.00 sec)
           
          MariaDB [test]> ALTER TABLE t1 DROP PARTITION P20181231;
          Query OK, 0 rows affected (0.55 sec)
          Records: 0  Duplicates: 0  Warnings: 0
           
          MariaDB [test]> SELECT * FROM t1;
          +----+------+
          | pk | dt   |
          +----+------+
          |  1 | NULL |
          +----+------+
          1 row in set (0.00 sec)
          

          So yes, if you can afford altering tables, the best chance is replacing ROW_FORMAT=PAGE with e.g. ROW_FORMAT=FIXED.

          elenst Elena Stepanova added a comment - You are right. Even with different keys, there is still corruption (with my test case too), it just looked less obvious, easier to miss, e.g. if I remove the secondary index and replace PK with UNIQUE, I get this: CREATE TABLE t1 ( pk BIGINT NOT NULL AUTO_INCREMENT, dt DATETIME DEFAULT NULL , UNIQUE (pk, dt) ) ENGINE=Aria ROW_FORMAT=PAGE PARTITION BY RANGE COLUMNS(dt) ( PARTITION `P20171231` VALUES LESS THAN ( '2017-12-31' ), PARTITION `P20181231` VALUES LESS THAN ( '2018-12-31' ) ); INSERT INTO t1 VALUES (1, '2017-09-28 15:12:00' ); MariaDB [test]> SELECT * FROM t1; + ----+---------------------+ | pk | dt | + ----+---------------------+ | 1 | 2017-09-28 15:12:00 | + ----+---------------------+ 1 row in set (0.00 sec)   MariaDB [test]> ALTER TABLE t1 DROP PARTITION P20181231; Query OK, 0 rows affected (0.55 sec) Records: 0 Duplicates: 0 Warnings: 0   MariaDB [test]> SELECT * FROM t1; + ----+------+ | pk | dt | + ----+------+ | 1 | NULL | + ----+------+ 1 row in set (0.00 sec) So yes, if you can afford altering tables, the best chance is replacing ROW_FORMAT=PAGE with e.g. ROW_FORMAT=FIXED .

          Thanks, but I particularly wanted "crash proof" tables: According to this page: https://mariadb.com/kb/en/library/aria-faq/
          "In the current development phase Aria tables created with TRANSACTIONAL=1 are crashsafe":
          But setting that forces "Page" format. "Fixed" behaves like MyISAM, I believe, and won't help me survive those pesky bugchecks/inquistive finger resets.
          What I don't know; is ARIA in "fixed" still better than MyISAM?
          Am I the only person using ARIA with partitions for logging time based data? ooohh, the excitement!

          AndyMidd Andy Middleton added a comment - Thanks, but I particularly wanted "crash proof" tables: According to this page: https://mariadb.com/kb/en/library/aria-faq/ "In the current development phase Aria tables created with TRANSACTIONAL=1 are crashsafe": But setting that forces "Page" format. "Fixed" behaves like MyISAM, I believe, and won't help me survive those pesky bugchecks/inquistive finger resets. What I don't know; is ARIA in "fixed" still better than MyISAM? Am I the only person using ARIA with partitions for logging time based data? ooohh, the excitement!

          The thing is, TRANSACTIONAL=1 is not supported for partitioned tables, so you are probably not getting much out of it anyway:

          MariaDB [test]> create table t1 (a int) engine=Aria transactional=1 partition by hash(a) partitions 2;
          Query OK, 0 rows affected, 1 warning (0.67 sec)
           
          MariaDB [test]> show warnings;
          +---------+------+---------------------------------------------------------------------------------------+
          | Level   | Code | Message                                                                               |
          +---------+------+---------------------------------------------------------------------------------------+
          | Warning | 1478 | Table storage engine 'partition' does not support the create option 'TRANSACTIONAL=1' |
          +---------+------+---------------------------------------------------------------------------------------+
          1 row in set (0.00 sec)
          

          I don't see it in the KB though, greenman, could you please make sure the limitation is documented?

          elenst Elena Stepanova added a comment - The thing is, TRANSACTIONAL=1 is not supported for partitioned tables, so you are probably not getting much out of it anyway: MariaDB [test]> create table t1 (a int ) engine=Aria transactional=1 partition by hash(a) partitions 2; Query OK, 0 rows affected, 1 warning (0.67 sec)   MariaDB [test]> show warnings; + ---------+------+---------------------------------------------------------------------------------------+ | Level | Code | Message | + ---------+------+---------------------------------------------------------------------------------------+ | Warning | 1478 | Table storage engine 'partition' does not support the create option 'TRANSACTIONAL=1' | + ---------+------+---------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) I don't see it in the KB though, greenman , could you please make sure the limitation is documented?

          People

            serg Sergei Golubchik
            AndyMidd Andy Middleton
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.