|
Sachins implementation returns a false UNIQE error when using long, nearly duplicate BLOBs:
MariaDB [test]> show create table t1\G
|
Table: t1
|
Create Table: CREATE TABLE `t1` (
|
`c1` int(10) unsigned NOT NULL AUTO_INCREMENT,
|
`c2` mediumblob DEFAULT NULL,
|
PRIMARY KEY (`c1`),
|
UNIQUE KEY `c2` (`c2`)
|
) ENGINE=MyISAM AUTO_INCREMENT=5 DEFAULT CHARSET=latin1
|
1 row in set (0.00 sec)
|
|
MariaDB [test]> insert into t1 values (NULL, CONCAT(REPEAT('foobar', 1000000), '1'));
|
Query OK, 1 row affected (0.04 sec)
|
|
MariaDB [test]> insert into t1 values (NULL, CONCAT(REPEAT('foobar', 1000000), '2'));
|
ERROR 1062 (23000): Duplicate entry 'foobarfoobarfoobarfoobarfoobarfoobarfoobarfoobarfoobarfoobarfoob' for key 'c2'
|
MariaDB [test]> show create table t2\G
|
Table: t2
|
Create Table: CREATE TABLE `t2` (
|
`c1` int(10) unsigned NOT NULL AUTO_INCREMENT,
|
`c2` mediumblob DEFAULT NULL,
|
PRIMARY KEY (`c1`),
|
UNIQUE KEY `c2` (`c2`)
|
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=latin1
|
1 row in set (0.00 sec)
|
|
MariaDB [test]> insert into t2 values (NULL, CONCAT(REPEAT('foobar', 1000000), '1'));
|
Query OK, 1 row affected (0.09 sec)
|
|
MariaDB [test]> insert into t2 values (NULL, CONCAT(REPEAT('foobar', 1000000), '2'));
|
ERROR 1062 (23000): Duplicate entry 'foobarfoobarfoobarfoobarfoobarfoobarfoobarfoobarfoobarfoobarfoob' for key 'c2'
|
|
Executive summary
Both implementations of MDEV-371 have functional deficites. For the cases where they work, the implementation from Shubham is faster.
Details
Test case: create table (auto_index PK, mediumblob), load 23/47K rows (1 or 2 times the 23746 files in a MariaDB source tree). The blob content is CONCAT($file, READ_FILE($file), $nr) with $nr from (1,2) so it is guaranteed to be unique. When loading only 23K rows, the first few 100 bytes in each blob are guaranteed to be unique.
The table was MyISAM or InnoDB, the UNIQUE index was created before or after loading the rows.
Nothing special was configured on the MariaDB side. Huge key_buffer and innodb_buffer_pool, default isolation level, default XtraDB.
Functional behavior
Check below for an explanation of (number)
Sachin
| |
MyISAM |
InnoDB |
| CREATE TABLE w/ UNIQUE |
ok |
ok |
| ADD UNIQE to empty table |
ok |
ok |
| ADD UNIQE to filled table |
fail (1) |
fail (1) |
| LOAD INTO TABLE w/ UNIQUE |
fail (1) |
fail(1) |
Shubham
| |
MyISAM |
InnoDB |
| CREATE TABLE w/ UNIQUE |
fail (2) |
ok |
| ADD UNIQE to empty table |
fail (2) |
ok |
| ADD UNIQE to filled table |
sometimes (3) |
sometimes (5) |
| LOAD INTO TABLE w/ UNIQUE |
ok (4) |
sometimes (5) |
Execution time (seconds) for 23K rows
| |
Sachin |
Shubham |
| LOAD MyISAM |
6,46 |
6,74 |
| INDEX MyISAM |
3,30 |
2,68 |
| LOAD INDEXED MYISAM |
11,02 |
7,90 |
| |
Sachin |
Shubham |
| LOAD InnoDB |
24,19 |
23,70 |
| INDEX InnoDB |
30,24 |
(fail) |
| LOAD INDEXED InnoDB |
30,89 |
27,35 |
Numbers are averages from 3 runs. In general the execution times didn't differ much between runs.
Explanation of failures
(1) same failure mode for MyISAM and InnoDB; nearly identical, long blobs cause false unique violations:
create table t1 (c1 int unsigned primary key auto_increment, c2 mediumblob);
|
insert into t1 values (NULL, CONCAT(REPEAT('foobar',1000000), '1'));
|
insert into t1 values (NULL, CONCAT(REPEAT('foobar',1000000), '2'));
|
create unique index c2 on t1 (c2);
|
-> ERROR 1062 (23000): Duplicate entry ...
|
(2) create table works ok, but add index or access to table gives "Incorrect key file" error.
CREATE TABLE `t1` (
|
`c1` int(10) unsigned NOT NULL AUTO_INCREMENT,
|
`c2` mediumblob DEFAULT NULL,
|
PRIMARY KEY (`c1`),
|
UNIQUE KEY (`c2`)
|
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
|
-> ok
|
|
show create table t1;
|
-> ERROR 126 (HY000): Incorrect key file for table './test/t1'; try to repair it
|
-> Error (Code 126): Incorrect key file for table './test/t1'; try to repair it
|
-> Error (Code 1034): Incorrect key file for table 't1'; try to repair it
|
(3) adding the UNIQUE index after loading my workload of 47K rows works. But I haven't found a synthetic sequence of inserts that works the same way
(4) this requires a trick: create table w/ unique key, load data, create index, delete all rows. Truncate will not work, but leave you with a broken key file again
(5) strange UNIQUE violation, showing field as NULL. Again I haven't found a synthetic sequence of inserts that triggers this behavior, but with my test data it is repeatable.
MariaDB [test]> \. tables.sql
|
-> ok
|
MariaDB [test]> \. load_t2.sql
|
-> ok
|
MariaDB [test]> select count(*) from t2;
|
+----------+
|
| count(*) |
|
+----------+
|
| 23746 |
|
+----------+
|
1 row in set (0.02 sec)
|
MariaDB [test]> select c1 from t2 where c2 is null;
|
Empty set (0.81 sec)
|
MariaDB [test]> create unique index c2 on t2 (c2);
|
ERROR 1062 (23000): Duplicate entry 'NULL' for key 'c2'
|
Files
my.cnf
[mysqld_safe]
|
|
[mysqld]
|
|
performance-schema = false
|
|
#####files and sockets
|
pid-file=/tmp/mysqld.pid.sysbench
|
|
#####non innodb options (fixed)
|
max_connections = 400
|
back_log = 150
|
table_open_cache = 800
|
key_buffer_size = 8G
|
query_cache_type = 0
|
join_buffer_size = 32K
|
sort_buffer_size = 32K
|
max_allowed_packet = 16M
|
|
#####fixed innodb options
|
innodb_file_per_table = true
|
innodb_open_files = 100
|
innodb_data_file_path = ibdata1:50M:autoextend
|
innodb_flush_log_at_trx_commit = 2
|
innodb_flush_method = O_DIRECT_NO_FSYNC
|
innodb_log_buffer_size = 32M
|
innodb_log_file_size = 512M
|
innodb_log_files_in_group = 2
|
innodb_buffer_pool_size = 32G
|
innodb_buffer_pool_instances = 8
|
innodb_adaptive_hash_index_partitions = 8
|
innodb_thread_concurrency = 0
|
|
#####tuning for SSD
|
innodb_adaptive_flushing = 1
|
innodb_flush_neighbors = 0
|
innodb_io_capacity = 2000
|
innodb_lru_scan_depth = 2000
|
innodb_purge_threads = 1
|
innodb_read_io_threads = 4
|
innodb_write_io_threads = 4
|
prepare.sh (creates the .sql file to load tables)
#!/bin/bash
|
|
SIZE=${1:-2}
|
|
find /home/axel/mariadb-source/mariadb-10.0.26/ -type f -size -15M >files.txt
|
|
(
|
echo "USE test;"
|
for i in `seq $SIZE`
|
do
|
while read F
|
do
|
echo "INSERT INTO t1 VALUES (NULL, CONCAT('$F', LOAD_FILE('$F'), '$i'));"
|
done <files.txt
|
done
|
) > load_t1.sql
|
|
(
|
echo "USE test;"
|
for i in `seq $SIZE`
|
do
|
echo "BEGIN;"
|
while read F
|
do
|
echo "INSERT INTO t2 VALUES (NULL, CONCAT('$F', LOAD_FILE('$F'), '$i'));"
|
done <files.txt
|
echo "COMMIT;"
|
done
|
) > load_t2.sql
|
tables.sql (creates tables without UNIQUE constraint)
USE `test`;
|
|
DROP TABLE IF EXISTS `t1`;
|
CREATE TABLE `t1` (
|
`c1` int(10) unsigned NOT NULL AUTO_INCREMENT,
|
`c2` mediumblob DEFAULT NULL,
|
PRIMARY KEY (`c1`)
|
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
|
|
DROP TABLE IF EXISTS `t2`;
|
CREATE TABLE `t2` (
|
`c1` int(10) unsigned NOT NULL AUTO_INCREMENT,
|
`c2` mediumblob DEFAULT NULL,
|
PRIMARY KEY (`c1`)
|
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
|
index_t1.sql (add UNIQUE index to t1, similar file for t2)
USE `test`;
|
ALTER TABLE `t1` ADD UNIQUE INDEX `c2` (`c2`);
|
test.sh (shell script to run the tests)
#!/bin/bash
|
#set -eu
|
USER="root"
|
SOCKET="/tmp/mysqld.sock.sysbench"
|
MYSQL="mysql -u $USER -S $SOCKET --show-warnings"
|
|
$MYSQL < tables.sql
|
|
t1=$(date +%s.%N);
|
date '+[%X] load into MyISAM w/o index'
|
$MYSQL -s < load_t1.sql
|
t2=$(date +%s.%N);
|
echo -n "seconds: "; echo "$t2 - $t1" | bc
|
|
t1=$(date +%s.%N);
|
date '+[%X] indexing MyISAM'
|
$MYSQL -s < index_t1.sql
|
t2=$(date +%s.%N);
|
echo -n "seconds: "; echo "$t2 - $t1" | bc
|
|
t1=$(date +%s.%N);
|
date '+[%X] load into InnoDB w/o index'
|
$MYSQL -s < load_t2.sql
|
t2=$(date +%s.%N);
|
echo -n "seconds: "; echo "$t2 - $t1" | bc
|
|
t1=$(date +%s.%N);
|
date '+[%X] indexing InnoDB'
|
$MYSQL -s < index_t2.sql
|
t2=$(date +%s.%N);
|
echo -n "seconds: "; echo "$t2 - $t1" | bc
|
|
$MYSQL < tables.sql
|
$MYSQL -s < load_t1.sql
|
$MYSQL -s < index_t1.sql
|
$MYSQL -s -e "delete from test.t1"
|
$MYSQL -s < index_t2.sql
|
|
t1=$(date +%s.%N);
|
date '+[%X] load into MyISAM w/ index'
|
$MYSQL -s < load_t1.sql
|
t2=$(date +%s.%N);
|
echo -n "seconds: "; echo "$t2 - $t1" | bc
|
|
t1=$(date +%s.%N);
|
date '+[%X] load into InnoDB w/ index'
|
$MYSQL -s < load_t2.sql
|
t2=$(date +%s.%N);
|
echo -n "seconds: "; echo "$t2 - $t1" | bc
|
|
|
Hi Axel Schwenke
Actually I made some changes in gsoc_long_unique , now it is passing all the tests in
test.sh
╭─xyz@anom in ~/gsoc/mdev
╰$ ./test.sh
[09:04:54 PM] load into MyISAM w/o index
seconds: 42.279655724
[09:05:36 PM] indexing MyISAM
seconds: 10.086230505
[09:05:46 PM] load into InnoDB w/o index
seconds: 103.977850135
[09:07:30 PM] indexing InnoDB
seconds: 92.310543092
[09:09:05 PM] load into MyISAM w/ index
seconds: 81.708425681
[09:10:27 PM] load into InnoDB w/ index
seconds: 117.633465718
test2.sh
╭─xyz@anom in ~/gsoc/mdev
╰$ ./test2.sh
[09:12:38 PM] load into MyISAM w/o index
seconds: 48.508866006
[09:13:27 PM] indexing MyISAM
seconds: 26.044388930
[09:13:53 PM] load into InnoDB w/o index
seconds: 130.933864104
[09:16:04 PM] indexing InnoDB
seconds: 110.094988468
[09:18:52 PM] load into MyISAM w/ index
seconds: 73.351495860
[09:20:05 PM] load into InnoDB w/ index
seconds: 153.227781848
|