I have a table partitioned by range created with the tokudb engine on mariadb 10.0.12. When I have one client inserting into this table everything works as expected. However, as soon as I have a second client inserting more data into this table I get the following error: Failed to read auto-increment value from storage engine. When I see the table status the Auto_increment value is 2049. My auto increment description is "id bigint not null auto_increment".
ALTER TABLE order_books partition by range (time) (
partition p0 values less than (1507495611880)
);
public class MultipleClientsInsertingDataIntoPartitionedTokudbTable {
public static final String INSERT_STATEMENT = "INSERT INTO test.auto_inc_test (time) values (?)";
private final CountDownLatch countDownLatch;
public MultipleClientsInsertingDataIntoPartitionedTokudbTable() {
countDownLatch = new CountDownLatch(2);
}
public static void main(String[] args) throws InterruptedException {
MultipleClientsInsertingDataIntoPartitionedTokudbTable multipleClientsInsertingDataIntoPartitionedTokudbTable = new MultipleClientsInsertingDataIntoPartitionedTokudbTable();
MultipleClientsInsertingDataIntoPartitionedTokudbTable multipleClientsInsertingDataIntoPartitionedTokudbTable = new MultipleClientsInsertingDataIntoPartitionedTokudbTable();
MariaDB will not create an InnoDB table with an auto increment column that is not the first column in the primary key. This tells me that there are likely to be bugs in MariaDB's handling of auto increment since no other engines (except TokuDB) supports it.
MariaDB [test]> CREATE TABLE `auto_inc_test` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `time` bigint(20) NOT NULL, PRIMARY KEY (`time`,`id`) ) ENGINE=innodb;
ERROR 1075 (42000): Incorrect table definition; there can be only one auto column and it must be defined as a key
Rich Prohaska
added a comment - MariaDB will not create an InnoDB table with an auto increment column that is not the first column in the primary key. This tells me that there are likely to be bugs in MariaDB's handling of auto increment since no other engines (except TokuDB) supports it.
See the following example.
MariaDB [test] > CREATE TABLE `auto_inc_test` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `time` bigint(20) NOT NULL, PRIMARY KEY (`time`,`id`) ) ENGINE=tokudb;
Query OK, 0 rows affected (0.05 sec)
MariaDB [test] > drop table auto_inc_test;
Query OK, 0 rows affected (0.02 sec)
MariaDB [test] > CREATE TABLE `auto_inc_test` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `time` bigint(20) NOT NULL, PRIMARY KEY (`time`,`id`) ) ENGINE=innodb;
ERROR 1075 (42000): Incorrect table definition; there can be only one auto column and it must be defined as a key
"no other engine" is not exactly correct. The table flag HA_AUTO_PART_KEY is responsible for this feature, and it is announced by Blackhole, Federated, FederatedX, Aria, MyISAM, Merge, and TokuDB engines. InnoDB and XtraDB, indeed, do not have it.
Sergei Golubchik
added a comment - "no other engine" is not exactly correct. The table flag HA_AUTO_PART_KEY is responsible for this feature, and it is announced by Blackhole, Federated, FederatedX, Aria, MyISAM, Merge, and TokuDB engines. InnoDB and XtraDB, indeed, do not have it.
This test program deadlocks. One thread owns the partition auto inc mutex and is blocked trying to get a tokudb lock on a PK. The other threads owns a tokudb lock on a PK that conflicts with the first thread, and is blocked trying to get the partition auto inc mutex. Eventually, tokudb's lock timeout expires, and the first thread gets a lock timeout error. This hits the debug assert (on a debug build), or returns an error to the app (on a release build).
Rich Prohaska
added a comment - This test program deadlocks. One thread owns the partition auto inc mutex and is blocked trying to get a tokudb lock on a PK. The other threads owns a tokudb lock on a PK that conflicts with the first thread, and is blocked trying to get the partition auto inc mutex. Eventually, tokudb's lock timeout expires, and the first thread gets a lock timeout error. This hits the debug assert (on a debug build), or returns an error to the app (on a release build).
prohaska7, thanks. Could you elaborate a bit, please? Where the tokudb lock on PK is taken and on what code execution path is it taken before the partition auto inc mutex?
Sergei Golubchik
added a comment - prohaska7 , thanks. Could you elaborate a bit, please? Where the tokudb lock on PK is taken and on what code execution path is it taken before the partition auto inc mutex?
given table mdev6605 (t bigint not null, id bigint not null auto_increment, primary key(t,id)) engine=tokudb partition by range(t) (partition p0 values less than (1));
t1: begin; insert into mdev6605 (t) values (0);
t1 holds a lock on the pk. the pk lock taken by ha_tokudb::index_read, handler::index_read_map, handler::get_auto_increment, ha_partition::get_auto_increment call stack.
t2: begin; insert into mdev6605 (t) values (0);
t2 holds the partition auto inc mutex, blocked on the tokudb pk
t1: insert into mdev6605 (t) values (0);
t1 blocked on the partition auto inc mutex
eventually t2 times out in tokudb's lock manager, t2 rolls back, and the deadlock is removed.
Rich Prohaska
added a comment - - edited given table mdev6605 (t bigint not null, id bigint not null auto_increment, primary key(t,id)) engine=tokudb partition by range(t) (partition p0 values less than (1));
t1: begin; insert into mdev6605 (t) values (0);
t1 holds a lock on the pk. the pk lock taken by ha_tokudb::index_read, handler::index_read_map, handler::get_auto_increment, ha_partition::get_auto_increment call stack.
t2: begin; insert into mdev6605 (t) values (0);
t2 holds the partition auto inc mutex, blocked on the tokudb pk
t1: insert into mdev6605 (t) values (0);
t1 blocked on the partition auto inc mutex
eventually t2 times out in tokudb's lock manager, t2 rolls back, and the deadlock is removed.
createtable t1 (a int auto_increment, b bigint(20), primarykey (b,a)) engine=tokudb;
start transaction;
insert t1 (b) values (1);
connect(con2,localhost,root);
start transaction;
insert t1 (b) values (1); -- crash!
That means, partitioning autoinc mutex is not the issue here. handler::get_auto_increment() crashes, because it doesn't expect an error, but, I believe, lock timeout is a perfectly valid status and it should be expected. Compare with
createtable t1 (a int auto_increment, b bigint(20), primarykey (b,a)) engine=tokudb;
start transaction;
insert t1 selectmax(a)+1, 1 from t1 where b=1;
connect(con2,localhost,root);
start transaction;
insert t1 selectmax(a)+1, 1 from t1 where b=1;
This has the same semantics, and the second insert also times out on a lock wait.
Sergei Golubchik
added a comment - Thanks. Simple test case without partitioning:
create table t1 (a int auto_increment, b bigint (20), primary key (b,a)) engine=tokudb;
start transaction ;
insert t1 (b) values (1);
connect (con2,localhost,root);
start transaction ;
insert t1 (b) values (1); -- crash!
That means, partitioning autoinc mutex is not the issue here. handler::get_auto_increment() crashes, because it doesn't expect an error, but, I believe, lock timeout is a perfectly valid status and it should be expected. Compare with
create table t1 (a int auto_increment, b bigint (20), primary key (b,a)) engine=tokudb;
start transaction ;
insert t1 select max (a)+1, 1 from t1 where b=1;
connect (con2,localhost,root);
start transaction ;
insert t1 select max (a)+1, 1 from t1 where b=1;
This has the same semantics, and the second insert also times out on a lock wait.
True. And the deadlock is caused by the partitioning auto-inc mutex. But the underlying error will happen in debug or release builds, with or without partitioning. See the second test case, the one with insert ... select. The second thread needs to read the value from the index, and the first thread keeps it locked. There can be no workaround, if the first thread doesn't commit — the second will time out eventually.
I'll fix the deadlock, though, so that the first thread would be able to commit.
Sergei Golubchik
added a comment - True. And the deadlock is caused by the partitioning auto-inc mutex. But the underlying error will happen in debug or release builds, with or without partitioning. See the second test case, the one with insert ... select . The second thread needs to read the value from the index, and the first thread keeps it locked. There can be no workaround, if the first thread doesn't commit — the second will time out eventually.
I'll fix the deadlock, though, so that the first thread would be able to commit.
Reproducible on current 5.5 and 10.0 trees.
On debug versions, the test case causes the assertion failure:
5.5/sql/handler.cc:2828: virtual void handler::get_auto_increment(ulonglong, ulonglong, ulonglong, ulonglong*, ulonglong*): Assertion `0' failed.
140822 1:02:19 [ERROR] mysqld got signal 6 ;
#6 0x00007fd2448d46f1 in *__GI___assert_fail (assertion=0xe3553d "0", file=<optimized out>, line=2828, function=0xe38480 "virtual void handler::get_auto_increment(ulonglong, ulonglong, ulonglong, ulonglong*, ulonglong*)") at assert.c:81
#7 0x00000000007e8819 in handler::get_auto_increment (this=0x7fd22605c078, offset=1, increment=1, nb_desired_values=1, first_value=0x7fd2461b7878, nb_reserved_values=0x7fd2461b7880) at 5.5/sql/handler.cc:2828
#8 0x00007fd21d013d8e in ha_tokudb::get_auto_increment (this=0x7fd22605c078, offset=1, increment=1, nb_desired_values=1, first_value=0x7fd2461b7878, nb_reserved_values=0x7fd2461b7880) at 5.5/storage/tokudb/ha_tokudb.cc:7365
#9 0x0000000000cb0e36 in ha_partition::get_auto_increment (this=0x7fd226093678, offset=1, increment=1, nb_desired_values=1000, first_value=0x7fd2461b78e8, nb_reserved_values=0x7fd2461b78e0) at 5.5/sql/ha_partition.cc:7863
#10 0x00000000007e8259 in handler::update_auto_increment (this=0x7fd226093678) at 5.5/sql/handler.cc:2641
#11 0x0000000000ca8bb4 in ha_partition::write_row (this=0x7fd226093678, buf=0x7fd2309bd988 "\377") at 5.5/sql/ha_partition.cc:3543
#12 0x00000000007ed30d in handler::ha_write_row (this=0x7fd226093678, buf=0x7fd2309bd988 "\377") at 5.5/sql/handler.cc:5210
#13 0x00000000006175e7 in write_record (thd=0x7fd230a2a060, table=0x7fd2260db860, info=0x7fd2461b7bb0) at 5.5/sql/sql_insert.cc:1843
#14 0x00000000006152d1 in mysql_insert (thd=0x7fd230a2a060, table_list=0x7fd2260520d8, fields=..., values_list=..., update_fields=..., update_values=..., duplic=DUP_ERROR, ignore=false) at 5.5/sql/sql_insert.cc:986
#15 0x000000000063370c in mysql_execute_command (thd=0x7fd230a2a060) at 5.5/sql/sql_parse.cc:2973
#16 0x000000000063b0d2 in mysql_parse (thd=0x7fd230a2a060, rawbuf=0x7fd2260aa078 "INSERT INTO test.auto_inc_test (time) values (1408654935118),(1408654935119),(1408654935119),(1408654935119),(1408654935119),(1408654935119),(1408654935119),(1408654935119),(1408654935119),(1408654935"..., length=16044, parser_state=0x7fd2461b8680) at 5.5/sql/sql_parse.cc:5799
#17 0x000000000062f11d in dispatch_command (command=COM_QUERY, thd=0x7fd230a2a060, packet=0x7fd230a2f061 "INSERT INTO test.auto_inc_test (time) values (1408654935118),(1408654935119),(1408654935119),(1408654935119),(1408654935119),(1408654935119),(1408654935119),(1408654935119),(1408654935119),(1408654935"..., packet_length=16044) at 5.5/sql/sql_parse.cc:1079
#18 0x000000000062e2a9 in do_command (thd=0x7fd230a2a060) at 5.5/sql/sql_parse.cc:793
#19 0x00000000007300aa in do_handle_one_connection (thd_arg=0x7fd230a2a060) at 5.5/sql/sql_connect.cc:1266
#20 0x000000000072fb69 in handle_one_connection (arg=0x7fd230a2a060) at 5.5/sql/sql_connect.cc:1181
#21 0x00007fd245e35b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#22 0x00007fd24498520d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
Slightly tweaked Java class (imports added, connection parameters changed to MTR-like). Apparently, Java 7 is required for it to work.
}
multipleClientsInsertingDataIntoPartitionedTokudbTable.start();
}
DataSource dataSource = createDataSource();
countDownLatch.await();
}
mysqlDataSource.setURL(jdbcUrl);
mysqlDataSource.setUser(username);
mysqlDataSource.setPassword(password);
}
PreparedStatement preparedStatement = connection
.prepareStatement(INSERT_STATEMENT)) {
preparedStatement.addBatch();
}
preparedStatement.executeBatch();
e.printStackTrace();
}
}
}
insertData(dataSource);
}
countDownLatch.countDown();
}
}
}
Also slightly tweaked version of SQL (table name fixed, COMPRESSION clause removed):
Couldn't reproduce on the Tokutek MariaDB 5.5.38 build.