[MDEV-15885] Wrong error message to ALTER TABLE ... ADD KEY ... Created: 2018-04-16  Updated: 2020-12-17

Status: Open
Project: MariaDB Server
Component/s: Data Definition - Alter Table, Locking
Affects Version/s: 10.2.15
Fix Version/s: 10.2

Type: Bug Priority: Minor
Reporter: Matthias Leich Assignee: Vladislav Vaintroub
Resolution: Unresolved Votes: 0
Labels: ddl, mdl
Environment:

Linux: openSUSE 42.3 (x86_64) – IMHO not important
MariaDB compiled from source with debug


Attachments: File Bug1.tgz    

 Description   

Source:
---------
MariaDB 10.2 commit 87af52d7dd733e71fc7a9e39b882a4fd44f41fec
2018-04-16 compiled with debug

My test

SET GLOBAL SQL_MODE = 'traditional';
SET GLOBAL wait_timeout = 45;
SET GLOBAL lock_wait_timeout = 2;
SET GLOBAL innodb_lock_wait_timeout = 1;
CREATE TABLE t1 (col1 INT, col4 CHAR(100)) ENGINE = InnoDB ;

Now four concurrent sessions start to run one SQL in a loop:
Session 1 --- INSERT INTO t1 (col1,col4) VALUES (1,'a') , (1,'a') ;
Session 2 --- ALTER TABLE t1 ADD UNIQUE KEY uidx ( col4 );
Session 3 --- ALTER TABLE t1 ADD               KEY  idx ( col1 );
                     The session aborts immediate if hitting error 1702.
Session 4 --- ALTER TABLE t1 DROP KEY idx;

After some time (you might need several test runs)

  • the session 3 which runs ALTER TABLE t1 ADD KEY idx ( col1 )
    could harvest the server response ERROR HY000: Index uidx is corrupted
  • but the final checks after the four concurrent sessions have disconnected
    show that there is no Index uidx and no corruption at all.

     SHOW CREATE TABLE t1;
      Table   Create Table
       t1      CREATE TABLE `t1` (
       `col1` int(11) DEFAULT NULL,
       `col4` char(100) DEFAULT NULL
       ) ENGINE=InnoDB DEFAULT CHARSET=latin1
       SELECT * FROM t1 LIMIT 3;
       col1    col4
       1       a
       1       a
       1       a
       CHECK TABLE t1 EXTENDED;
       Table   Op      Msg_type        Msg_text
       test.t1 check   status  OK
       ANALYZE TABLE t1;
       Table   Op      Msg_type        Msg_text
        test.t1 analyze status  OK
    

Protocols of the concurrent sessions:

1 --- All INSERTS passed
2 --- masses of ERROR 23000: Duplicate entry 'a' for key 'uidx'
3 --- One ERROR HY000: Index uidx is corrupted
4 --- masses of ERROR 42000: Can't DROP INDEX `idx`; check that it exists

IMHO the bug itself isn't that important because

  • there is no obvious damage to the table or indexes
  • running concurrent DDL on the same table is extreme unlikely
    but the error message from the server is in two points ugly
    1. A user might assume that something around his table is rotten, get in panic
    and either waste time with DDL or restore backups in order to fix something
    or try the same, but make an error and introduce by that damages
    2. ALTER TABLE t1 ADD KEY idx ( col1 ) harvesting an error message
    referring to an index with different name (uidx) looks very strange.


 Comments   
Comment by Matthias Leich [ 2018-04-16 ]

How to use the archive?
------------------------------

cd <tree>/mysql-test
tar xvzf <archive>
Feel free to inspect the files run_mysqltest_prt.* which belong
to my test run.

./mtr --mem ml001

Please just ignore the majority of test output.
.....
CREATE TABLE t1 (col1 INT, col4 CHAR(100)) ENGINE = InnoDB ;
./mysqltest_background.sh 1523893362 1523893422 /mariadb/Source/10.2/mysql-test/var/tmp/my.sql1 1
./mysqltest_background.sh 1523893362 1523893422 /mariadb/Source/10.2/mysql-test/var/tmp/my.sql2 2
./mysqltest_background.sh 1523893362 1523893422 /mariadb/Source/10.2/mysql-test/var/tmp/my.sql3 3
./mysqltest_background.sh 1523893362 1523893422 /mariadb/Source/10.2/mysql-test/var/tmp/my.sql4 4
SHOW CREATE TABLE t1;
Table Create Table
t1 CREATE TABLE `t1` (
`col1` int(11) DEFAULT NULL,
`col4` char(100) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
SELECT * FROM t1 LIMIT 3;
col1 col4
1 a
1 a
1 a
CHECK TABLE t1 EXTENDED;
Table Op Msg_type Msg_text
test.t1 check status OK
ANALYZE TABLE t1;
Table Op Msg_type Msg_text
test.t1 analyze status OK
DROP TABLE t1;
"Was the error pattern found ---- Begin"
ERROR HY000: Index uidx is corrupted <== In case you see that than session replayed
and run_mysqltest_prt.3 shows that error message.
"Was the error pattern found ---- End"
main.ml001 'innodb' [ pass ] 70179

MTR's internal check of the test case 'main.ml001' failed.
....

Comment by Marko Mäkelä [ 2018-04-17 ]

The error 'index is corrupted' is technically correct, but it would be nicer to report 'duplicate key error' in this particular case. The following scenario is possible:

  1. ADD UNIQUE INDEX copies the data and creates the index.
  2. Optionally, any log from concurrent DML is fully replayed, and the unique index is completed.
  3. Subsequent DML operations start to update the completed but not yet committed index. If the uniqueness is violated, the ALTER TABLE must return an error, not the DML. The uncommitted index will be corrupted.
  4. ALTER TABLE notices that the index was corrupted as a result of DML that completed after index creation.

Note that secondary indexes will be created one at a time. If multiple indexes are created, the first ones would be directly updated by concurrent DML while some indexes are still being created. Also, uncommitted added indexes would be directly updated by DML while the MDL upgrade is in progress, so that the ALTER TABLE would be committed.

At least in some cases, the corrupted index would be reported as a duplicate key. Here is an excerpt from the 10.0 innodb.innodb-index-online test:

# Now, create the index without any concurrent DML, while no duplicate exists.
SET DEBUG_SYNC = 'row_log_apply_after SIGNAL created WAIT_FOR dml_done';
--send
ALTER TABLE t1 ADD UNIQUE INDEX(c2);
connection default;
SET DEBUG_SYNC = 'now WAIT_FOR created';
# At this point, the index has been created inside InnoDB but not yet
# in the MySQL data dictionary.
eval $innodb_metrics_select;
# A duplicate key error should now be triggered by InnoDB, but reported
# by the ALTER TABLE because the index does not 'officially' exist yet.
INSERT INTO t1 VALUES(6,3,1);
SET DEBUG_SYNC = 'now SIGNAL dml_done';
connection con1;
# This is due to the duplicate entry (6,3,1).
--error ER_DUP_UNKNOWN_IN_INDEX
reap;
DELETE FROM t1 WHERE c1=6;
ALTER TABLE t1 ADD UNIQUE INDEX(c2);

Comment by Matthias Leich [ 2018-04-17 ]

I am aware of the complexity but have also problems to agree with the report 'duplicate key error'.
IMHO the error message should mostly explain to the user why his ALTER TABLE t1 ADD KEY idx ( col1 ) could not pass.
The "ALTER TABLE t1 ADD KEY idx ( col1 )" operation itself

  • should be never capable to
  • cause a "duplicate key error" (idx is not defined UNIQUE)
  • affect a key with a name != "idx" like the "uidx" here so that this key "uidx" is than corrupted
    So a user should rather never get just such an error message if his DDL could not be served.
  • might be denied because of various reasons like
  • (arbitrary concurrency): the table and/or other indexes are permanent damaged/unusable
    Than please report:
    We cannot perform the DDL because this or that needs to be fixed but avoid any
    maybe upcoming impression that the shape of the "ALTER TABLE t1 ADD KEY idx ( col1 )"
    might be the reason.
  • (concurrency required):
  • MDL lock or similar not got, Deadlock etc.: Ok report the locking problem
  • the table and/or other indexes are caused by some concurrent already ongoing operation
    temporary in some "unusable" state or similar.
    Than please:
    Report that we cannot perform the DDL because some parallel ongoing SQL prevents that
    but please do not mention 'duplicate key error' or 'index corrupted' at all.
    What if the "ADD UNIQUE INDEX" sets a MDL lock preventing concurrent DDL on the same table?
Comment by Marko Mäkelä [ 2018-04-17 ]

I did not realize that there are multiple concurrent DDL on the table. I was under the impression that multiple DDL would be prevented by metadata lock (MDL), also during the time when online ALTER has downgraded the metadata lock. In that case, this would not be an InnoDB bug, but a bug in MDL.

Comment by Marko Mäkelä [ 2018-04-24 ]

The problem is that a running ALTER TABLE is not preventing another ALTER TABLE from being attempted on the same table.

Generated at Thu Feb 08 08:24:45 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.