[MDEV-13653] ADD / DROP COLUMN is very slow on 10.3 debug build Created: 2017-08-25  Updated: 2017-11-17  Resolved: 2017-11-17

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table, Storage Engine - InnoDB
Affects Version/s: 10.3
Fix Version/s: 10.3.3

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: None


 Description   

ADD / DROP column on 10.3 debug build takes 5-10 seconds on a small enough table (e.g. 2000 rows) having a blob field. On 10.2 debug it only takes about 1 second.

NOTE: The test case is for reproducing only, don't put it into the regression test suite!

--source include/have_innodb.inc
 
--let $num = 2000
 
CREATE TABLE `t1` (
  `col1` int(11) NOT NULL,
  `col2` int(11) DEFAULT NULL,
  `col3` int(11) DEFAULT NULL,
  `col4` text NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
;
 
BEGIN;
--disable_query_log
--echo # Inserting $num rows...
while ($num)
{
  INSERT INTO t1 (col1,col2,col3,col4) VALUES (3,3,3,REPEAT('a',10000));
  --dec $num
}
--enable_query_log
COMMIT;
 
select now(6);
ALTER TABLE t1 ADD COLUMN extra INT;
select now(6);
ALTER TABLE t1 DROP COLUMN extra;
select now(6);
 
DROP TABLE t1;

10.3 578b2b05b8f

2017-08-26 02:06:06.273911
ALTER TABLE t1 ADD COLUMN extra INT;
select now(6);
now(6)
2017-08-26 02:06:15.255533
ALTER TABLE t1 DROP COLUMN extra;
select now(6);
now(6)
2017-08-26 02:06:20.596070

10.2 a544225d0a7

2017-08-26 02:08:49.037541
ALTER TABLE t1 ADD COLUMN extra INT;
select now(6);
now(6)
2017-08-26 02:08:50.230408
ALTER TABLE t1 DROP COLUMN extra;
select now(6);
now(6)
2017-08-26 02:08:51.368844



 Comments   
Comment by Marko Mäkelä [ 2017-08-28 ]

Both the ADD COLUMN and DROP COLUMN consume about 3.05 seconds on my system (10.2) when the server is built -DWITH_ASAN:BOOL=ON
Without AddressSanitizer, the time drops to about 1.52 seconds (total test time: 5.171s). The CPU utilization is about 130% during the ADD/DROP COLUMN.

With 10.3 debug and without AddressSanitizer, I can repeat the slow speed. The ADD COLUMN took 9.44 seconds and the DROP COLUMN took 6.96 seconds. The CPU utilization varies between 260% to 300% during the ADD COLUMN and stays around 110% during the DROP COLUMN.

The high CPU utilization during ADD COLUMN could be a sign of a mutex contention. "perf record -g" caught only one caller of ut_delay(), which is what InnoDB uses during mutex spins:

               - 1,41% row_purge_record_func
                  - 1,41% row_purge_reset_trx_id
                     - 1,41% row_purge_reposition_pcur
                          row_search_on_row_ref
                          btr_pcur_open_low
                        - btr_cur_search_to_nth_level
                           - 1,41% buf_page_get_gen
                              - 1,41% pfs_rw_lock_x_lock_func
                                   rw_lock_x_lock_func
                                   ut_delay

As hinted by the 1.41% above, modifying the test to wait all the DB_TRX_ID for the INSERTs to be reset (wait_all_purged.inc) did not have a significant performance impact. So, I think that that part of MDEV-12288/MDEV-13536 should be innocent.

Comment by Marko Mäkelä [ 2017-08-28 ]

Because I feared that this performance regression could somehow have been caused by MDEV-12288 or MDEV-13536, I tested the last 10.3 revision immediately before MDEV-12288, and various revisions between that and MDEV-13536. The performance regression was not present in any of these revisions: the test with debug build consistently completed in 5.2 or 5.3 seconds.

Finally, I narrowed the culprit to between these two:
commit 536215e32fc43aa423684e9807640dcf3453924b Added DBUG_ASSERT_AS_PRINTF compile flag
commit f753480c726c5c8137cb2f87cc3bc71e8d535098 Fixed assert when running rpl.rpl_parallel_partition

The first commit makes InnoDB debug assertions slightly slower by invoking _db_flush() in DBUG_ASSERT_SLOW(), but that is only increasing the test time to 5.5 seconds (almost 10%). The big regression must be caused by something else, and likely outside InnoDB code. The only commit in that above range that affects innoDB is Changed KEY names to use LEX_CSTRING, which should not affect the DML or ha_innobase::inplace_alter_table() code paths.

Comment by Michael Widenius [ 2017-11-17 ]

Problem was that db_flush() that was always done as part of ASSERT testing and db_flush did become a bit slower in on of my pushes to 10.2 to make dbug multi thread safe.

Fixed by Serg in end of August in 10.2 and merged some time ago in 10.3

I did run a test on 10.3 debug build and the time for the drop table is 1.5 seconds on my laptop. Running same test on 10.1, without any of the related patches, the time for the drop column is 5 seconds (this is because we still do a db_flush() in 10.1)

Comment by Michael Widenius [ 2017-11-17 ]

Fixed by recent merge from 10.2

Generated at Thu Feb 08 08:07:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.