[MDEV-10730] benchmark MDEV-371 Created: 2016-09-02  Updated: 2021-07-27  Resolved: 2018-03-24

Status: Closed
Project: MariaDB Server
Component/s: Tests
Fix Version/s: N/A

Type: Task Priority: Minor
Reporter: Sergei Golubchik Assignee: Axel Schwenke
Resolution: Fixed Votes: 1
Labels: None

Attachments: Zip Archive mdev-10730.zip    

 Description   

We have two different implementations of MDEV-371.

Would be nice to have a performance comparison.



 Comments   
Comment by Axel Schwenke [ 2016-09-06 ]

Note: https://github.com/SachinSetiya/server doesn't compile out of the box. I fixed the (obvious) bug with this patch:

diff --git a/sql/sql_show.cc b/sql/sql_show.cc
index dc7417c..bcf2de6 100644
--- a/sql/sql_show.cc
+++ b/sql/sql_show.cc
@@ -1815,6 +1815,7 @@ int show_create_table(THD *thd, TABLE_LIST *table_list, String *packet,
   handlerton *hton;
   my_bitmap_map *old_map;
   int error= 0;
+  bool not_the_first_field= false;
   DBUG_ENTER("show_create_table");
   DBUG_PRINT("enter",("table: %s", table->s->table_name.str));

Comment by Axel Schwenke [ 2016-09-06 ]

Note: the branches GSoC__mdev-371 and gsoc-mdev-371 in https://github.com/shubhambaraiss/MariaDB show very similar behavior. However the first branch is 4 days younger. I built end test both now.

Comment by Axel Schwenke [ 2016-09-06 ]

Sachins implementation returns a false UNIQE error when using long, nearly duplicate BLOBs:

MariaDB [test]> show create table t1\G
       Table: t1
Create Table: CREATE TABLE `t1` (
  `c1` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `c2` mediumblob DEFAULT NULL,
  PRIMARY KEY (`c1`),
  UNIQUE KEY `c2` (`c2`)
) ENGINE=MyISAM AUTO_INCREMENT=5 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
 
MariaDB [test]> insert into t1 values (NULL, CONCAT(REPEAT('foobar', 1000000), '1'));
Query OK, 1 row affected (0.04 sec)
 
MariaDB [test]> insert into t1 values (NULL, CONCAT(REPEAT('foobar', 1000000), '2'));
ERROR 1062 (23000): Duplicate entry 'foobarfoobarfoobarfoobarfoobarfoobarfoobarfoobarfoobarfoobarfoob' for key 'c2'

MariaDB [test]> show create table t2\G
       Table: t2
Create Table: CREATE TABLE `t2` (
  `c1` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `c2` mediumblob DEFAULT NULL,
  PRIMARY KEY (`c1`),
  UNIQUE KEY `c2` (`c2`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
 
MariaDB [test]> insert into t2 values (NULL, CONCAT(REPEAT('foobar', 1000000), '1'));
Query OK, 1 row affected (0.09 sec)
 
MariaDB [test]> insert into t2 values (NULL, CONCAT(REPEAT('foobar', 1000000), '2'));
ERROR 1062 (23000): Duplicate entry 'foobarfoobarfoobarfoobarfoobarfoobarfoobarfoobarfoobarfoobarfoob' for key 'c2'

Comment by Axel Schwenke [ 2016-09-06 ]

strange behavior in Shubhams implementation when adding an index to a pre-filled InnoDB table:

MariaDB [test]> show create table t2\G
       Table: t2
Create Table: CREATE TABLE `t2` (
  `c1` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `c2` mediumblob,
  PRIMARY KEY (`c1`)
) ENGINE=InnoDB AUTO_INCREMENT=47493 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
 
MariaDB [test]> select count(*) from t2;
+----------+
| count(*) |
+----------+
|    47492 |
+----------+
1 row in set (0.03 sec)
 
MariaDB [test]> select c1 from t2 where c2 is null;
Empty set (1.46 sec)
 
MariaDB [test]> create unique index c2 on t2 (c2);
ERROR 1062 (23000): Duplicate entry 'NULL' for key 'c2'

Comment by Axel Schwenke [ 2016-09-06 ]

Executive summary

Both implementations of MDEV-371 have functional deficites. For the cases where they work, the implementation from Shubham is faster.

Details

Test case: create table (auto_index PK, mediumblob), load 23/47K rows (1 or 2 times the 23746 files in a MariaDB source tree). The blob content is CONCAT($file, READ_FILE($file), $nr) with $nr from (1,2) so it is guaranteed to be unique. When loading only 23K rows, the first few 100 bytes in each blob are guaranteed to be unique.

The table was MyISAM or InnoDB, the UNIQUE index was created before or after loading the rows.

Nothing special was configured on the MariaDB side. Huge key_buffer and innodb_buffer_pool, default isolation level, default XtraDB.

Functional behavior

Check below for an explanation of (number)

Sachin

  MyISAM InnoDB
CREATE TABLE w/ UNIQUE ok ok
ADD UNIQE to empty table ok ok
ADD UNIQE to filled table fail (1) fail (1)
LOAD INTO TABLE w/ UNIQUE fail (1) fail(1)

Shubham

  MyISAM InnoDB
CREATE TABLE w/ UNIQUE fail (2) ok
ADD UNIQE to empty table fail (2) ok
ADD UNIQE to filled table sometimes (3) sometimes (5)
LOAD INTO TABLE w/ UNIQUE ok (4) sometimes (5)

Execution time (seconds) for 23K rows

  Sachin Shubham
LOAD MyISAM 6,46 6,74
INDEX MyISAM 3,30 2,68
LOAD INDEXED MYISAM 11,02 7,90
  Sachin Shubham
LOAD InnoDB 24,19 23,70
INDEX InnoDB 30,24 (fail)
LOAD INDEXED InnoDB 30,89 27,35

Numbers are averages from 3 runs. In general the execution times didn't differ much between runs.

Explanation of failures

(1) same failure mode for MyISAM and InnoDB; nearly identical, long blobs cause false unique violations:

create table t1 (c1 int unsigned primary key auto_increment, c2 mediumblob);
insert into t1 values (NULL, CONCAT(REPEAT('foobar',1000000), '1'));
insert into t1 values (NULL, CONCAT(REPEAT('foobar',1000000), '2'));
create unique index c2 on t1 (c2);
-> ERROR 1062 (23000): Duplicate entry ...

(2) create table works ok, but add index or access to table gives "Incorrect key file" error.

CREATE TABLE `t1` (
  `c1` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `c2` mediumblob DEFAULT NULL,
  PRIMARY KEY (`c1`),
  UNIQUE KEY (`c2`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
-> ok
 
show create table t1;
-> ERROR 126 (HY000): Incorrect key file for table './test/t1'; try to repair it
-> Error (Code 126): Incorrect key file for table './test/t1'; try to repair it
-> Error (Code 1034): Incorrect key file for table 't1'; try to repair it

(3) adding the UNIQUE index after loading my workload of 47K rows works. But I haven't found a synthetic sequence of inserts that works the same way

(4) this requires a trick: create table w/ unique key, load data, create index, delete all rows. Truncate will not work, but leave you with a broken key file again

(5) strange UNIQUE violation, showing field as NULL. Again I haven't found a synthetic sequence of inserts that triggers this behavior, but with my test data it is repeatable.

MariaDB [test]> \. tables.sql
-> ok
MariaDB [test]> \. load_t2.sql
-> ok
MariaDB [test]> select count(*) from t2;
+----------+
| count(*) |
+----------+
|    23746 |
+----------+
1 row in set (0.02 sec)
MariaDB [test]> select c1 from t2 where c2 is null;
Empty set (0.81 sec)
MariaDB [test]> create unique index c2 on t2 (c2);
ERROR 1062 (23000): Duplicate entry 'NULL' for key 'c2'

Files

my.cnf

[mysqld_safe]
 
[mysqld]
 
performance-schema = false
 
#####files and sockets
pid-file=/tmp/mysqld.pid.sysbench
 
#####non innodb options (fixed)
max_connections = 400
back_log = 150
table_open_cache = 800
key_buffer_size = 8G
query_cache_type = 0
join_buffer_size = 32K
sort_buffer_size = 32K
max_allowed_packet = 16M
 
#####fixed innodb options
innodb_file_per_table = true
innodb_open_files = 100
innodb_data_file_path = ibdata1:50M:autoextend
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT_NO_FSYNC
innodb_log_buffer_size = 32M
innodb_log_file_size = 512M
innodb_log_files_in_group = 2
innodb_buffer_pool_size = 32G
innodb_buffer_pool_instances = 8
innodb_adaptive_hash_index_partitions = 8
innodb_thread_concurrency = 0
 
#####tuning for SSD
innodb_adaptive_flushing = 1
innodb_flush_neighbors = 0
innodb_io_capacity = 2000
innodb_lru_scan_depth = 2000
innodb_purge_threads = 1
innodb_read_io_threads = 4
innodb_write_io_threads = 4

prepare.sh (creates the .sql file to load tables)

#!/bin/bash
 
SIZE=${1:-2}
 
find /home/axel/mariadb-source/mariadb-10.0.26/ -type f -size -15M >files.txt
 
(
  echo "USE test;"
  for i in `seq $SIZE`
  do
    while read F 
    do 
      echo "INSERT INTO t1 VALUES (NULL, CONCAT('$F', LOAD_FILE('$F'), '$i'));" 
    done <files.txt 
  done
) > load_t1.sql
 
(
  echo "USE test;"
  for i in `seq $SIZE`
  do
    echo "BEGIN;" 
    while read F 
    do 
      echo "INSERT INTO t2 VALUES (NULL, CONCAT('$F', LOAD_FILE('$F'), '$i'));" 
    done <files.txt 
    echo "COMMIT;"
  done
) > load_t2.sql

tables.sql (creates tables without UNIQUE constraint)

USE `test`;
 
DROP TABLE IF EXISTS `t1`;
CREATE TABLE `t1` (
  `c1` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `c2` mediumblob DEFAULT NULL,
  PRIMARY KEY (`c1`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
 
DROP TABLE IF EXISTS `t2`;
CREATE TABLE `t2` (
  `c1` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `c2` mediumblob DEFAULT NULL,
  PRIMARY KEY (`c1`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

index_t1.sql (add UNIQUE index to t1, similar file for t2)

USE `test`;
ALTER TABLE `t1` ADD UNIQUE INDEX `c2` (`c2`);

test.sh (shell script to run the tests)

#!/bin/bash
#set -eu
USER="root"
SOCKET="/tmp/mysqld.sock.sysbench"
MYSQL="mysql -u $USER -S $SOCKET --show-warnings"
 
$MYSQL < tables.sql
 
t1=$(date +%s.%N);
date '+[%X] load into MyISAM w/o index'
$MYSQL -s < load_t1.sql
t2=$(date +%s.%N);
echo -n "seconds: "; echo "$t2 - $t1" | bc
 
t1=$(date +%s.%N);
date '+[%X] indexing MyISAM'
$MYSQL -s < index_t1.sql
t2=$(date +%s.%N);
echo -n "seconds: "; echo "$t2 - $t1" | bc
 
t1=$(date +%s.%N);
date '+[%X] load into InnoDB w/o index'
$MYSQL -s < load_t2.sql
t2=$(date +%s.%N);
echo -n "seconds: "; echo "$t2 - $t1" | bc
 
t1=$(date +%s.%N);
date '+[%X] indexing InnoDB'
$MYSQL -s < index_t2.sql
t2=$(date +%s.%N);
echo -n "seconds: "; echo "$t2 - $t1" | bc
 
$MYSQL < tables.sql
$MYSQL -s < load_t1.sql
$MYSQL -s < index_t1.sql
$MYSQL -s -e "delete from test.t1"
$MYSQL -s < index_t2.sql
 
t1=$(date +%s.%N);
date '+[%X] load into MyISAM w/ index'
$MYSQL -s < load_t1.sql
t2=$(date +%s.%N);
echo -n "seconds: "; echo "$t2 - $t1" | bc
 
t1=$(date +%s.%N);
date '+[%X] load into InnoDB w/ index'
$MYSQL -s < load_t2.sql 
t2=$(date +%s.%N);
echo -n "seconds: "; echo "$t2 - $t1" | bc

Comment by Axel Schwenke [ 2016-09-07 ]

mdev-10730.zip

Howto:

  1. unpack mariadb-10.0.25 source somewhere
  2. edit prepare.sh to point to the 10.0.25 source, run it
  3. start up the server with my.cnf.01 (you need to add datadir and socket)
  4. edit test.sh to use your socket. run it - should demonstrate broken innodb add index and broken loading into indexed myisam
Comment by sachin setiya (Inactive) [ 2016-09-07 ]

Hi Axel Schwenke
Actually I made some changes in gsoc_long_unique , now it is passing all the tests in

test.sh
╭─xyz@anom in ~/gsoc/mdev
╰$ ./test.sh
[09:04:54 PM] load into MyISAM w/o index
seconds: 42.279655724

[09:05:36 PM] indexing MyISAM
seconds: 10.086230505

[09:05:46 PM] load into InnoDB w/o index
seconds: 103.977850135

[09:07:30 PM] indexing InnoDB
seconds: 92.310543092

[09:09:05 PM] load into MyISAM w/ index
seconds: 81.708425681

[09:10:27 PM] load into InnoDB w/ index
seconds: 117.633465718

test2.sh

╭─xyz@anom in ~/gsoc/mdev
╰$ ./test2.sh
[09:12:38 PM] load into MyISAM w/o index
seconds: 48.508866006

[09:13:27 PM] indexing MyISAM
seconds: 26.044388930

[09:13:53 PM] load into InnoDB w/o index
seconds: 130.933864104

[09:16:04 PM] indexing InnoDB
seconds: 110.094988468

[09:18:52 PM] load into MyISAM w/ index
seconds: 73.351495860

[09:20:05 PM] load into InnoDB w/ index
seconds: 153.227781848

Comment by Axel Schwenke [ 2016-09-08 ]

@Sachin: confirmed. I cannot produce any false UNIQUE violations any more.

This is for loading 10 times 19905 files:

~/benchmark/MDEV-10730 $./test.sh 
[15:09:48] load into MyISAM w/o index
seconds: 31.668256020
 
[15:10:20] indexing MyISAM
seconds: 15.130124954
 
[15:10:35] load into InnoDB w/o index
seconds: 84.031919241
 
[15:11:59] indexing InnoDB
seconds: 68.915392555
 
[15:13:09] load into MyISAM w/ index
seconds: 46.187826893
 
[15:13:55] load into InnoDB w/ index
seconds: 88.617809105

I notice that adding the index to MyISAM takes rather long. Loading into indexed MyISAM is consistent performancewise - it need about the same time as loading + adding index.
For InnoDB there is much smaller impact of the UNIQUE index. 84s to 88s is much faster than 31s to 46s for MyISAM. Any idea why the impact is heavier on MyISAM?

Comment by sachin setiya (Inactive) [ 2016-09-08 ]

axel
I do not know, Currently I am busy working at where optimization.
Please give me 3 to 4 days, Then I will try to look for a solution.

Comment by Axel Schwenke [ 2017-08-10 ]

Re-Test with the implementation in MariaDb 10.3

Comment by Sergei Golubchik [ 2018-03-24 ]

akhilseshan, there's not much left to do here. We've chosen what implementation of MDEV-371 to use already. Besides, even if we weren't, this would be too little work for GSoC.

Generated at Thu Feb 08 07:44:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.