[MDEV-5158] ALTER TABLE ... ENGINE=INNODB in one database blocks ALTER TABLE in other databases Created: 2013-10-19  Updated: 2019-12-14  Resolved: 2019-12-14

Status: Closed
Project: MariaDB Server
Component/s: wsrep
Fix Version/s: N/A

Type: Task Priority: Minor
Reporter: Jeff Armstrong Assignee: Jan Lindström (Inactive)
Resolution: Won't Fix Votes: 1
Labels: galera


 Description   

I have 400 databases on my server. If I issue an ALTER TABLE ... ENGINE=INNODB on different tables in different databases at the same time, the updates are serialised - i.e only one ALTER TABLE is executed on the server at a time.

All other ALTER TABLE on unrelated tables in unrelated databases block with a status of "checking permissions".

I have about 28,000 tables / 400 databases / 250G of MyISAM. This serialisation means that it takes 6 hours to perform a MyISAM to INNODB upgrade. The server load and iostats indicate that the server is almost idle this entire time.



 Comments   
Comment by Elena Stepanova [ 2013-10-21 ]

Hi Jeff,

Could you please provide the cnf file and structures (and the number of rows) for two tables which get serialized this way? I don't observe it on a primitive example, so it must be something specific either for the tables or for configuration.

CONNECTION 1 (altering a big table):
------------------------------
 
MariaDB [test]> use db1;
Database changed
MariaDB [db1]> create table t1 (i int) engine=MyISAM;
Query OK, 0 rows affected (0.28 sec)
 
MariaDB [db1]> insert into t1 values (1),(2),(3),(4);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0
 
MariaDB [db1]> insert into t1 select * from t1;
Query OK, 4 rows affected (0.01 sec)
Records: 4  Duplicates: 0  Warnings: 0
 
...
 
MariaDB [db1]> insert into t1 select * from t1;
Query OK, 2097152 rows affected (48.55 sec)
Records: 2097152  Duplicates: 0  Warnings: 0
 
MariaDB [db1]> select now(); alter table t1 engine=InnoDB; select now();
+---------------------+
| now()               |
+---------------------+
| 2013-10-21 10:32:45 |
+---------------------+
1 row in set (0.01 sec)
 
Query OK, 4194304 rows affected (11 min 52.97 sec)     
Records: 4194304  Duplicates: 0  Warnings: 0
 
+---------------------+
| now()               |
+---------------------+
| 2013-10-21 10:44:38 |
+---------------------+
1 row in set (0.00 sec)
 
 
CONNECTION 2 (altering a small table many times):
--------------------------------------------------------------------------
 
MariaDB [test]> use db2;
Database changed
MariaDB [db2]> create table t2 (i int) engine=MyISAM;
Query OK, 0 rows affected (0.26 sec)
 
MariaDB [db2]> insert into t2 values (1),(2),(3),(4);
Query OK, 4 rows affected (0.01 sec)
Records: 4  Duplicates: 0  Warnings: 0
 
MariaDB [db2]> select now(); alter table t2 engine=InnoDB; select now();
+---------------------+
| now()               |
+---------------------+
| 2013-10-21 10:32:46 |
+---------------------+
1 row in set (0.00 sec)
 
Query OK, 4 rows affected (0.38 sec)               
Records: 4  Duplicates: 0  Warnings: 0
 
.............
 
MariaDB [db2]> select now(); alter table t2 engine=MyISAM; select now();
+---------------------+
| now()               |
+---------------------+
| 2013-10-21 10:35:33 |
+---------------------+
1 row in set (0.00 sec)
 
Query OK, 4 rows affected (0.33 sec)               
Records: 4  Duplicates: 0  Warnings: 0
 
+---------------------+
| now()               |
+---------------------+
| 2013-10-21 10:35:34 |
+---------------------+
1 row in set (0.00 sec)
 
MariaDB [db2]> select now(); alter table t2 engine=InnoDB; select now();
+---------------------+
| now()               |
+---------------------+
| 2013-10-21 10:35:37 |
+---------------------+
1 row in set (0.00 sec)
 
Query OK, 4 rows affected (0.39 sec)               
Records: 4  Duplicates: 0  Warnings: 0
 
+---------------------+
| now()               |
+---------------------+
| 2013-10-21 10:35:37 |
+---------------------+
1 row in set (0.00 sec)
 
.............................
 
MariaDB [db2]> select now(); alter table t2 engine=MyISAM; select now();
+---------------------+
| now()               |
+---------------------+
| 2013-10-21 10:44:00 |
+---------------------+
1 row in set (0.00 sec)
 
Query OK, 4 rows affected (0.50 sec)               
Records: 4  Duplicates: 0  Warnings: 0
 
+---------------------+
| now()               |
+---------------------+
| 2013-10-21 10:44:01 |
+---------------------+
1 row in set (0.00 sec)
 
MariaDB [db2]> select now(); alter table t2 engine=InnoDB; select now();
+---------------------+
| now()               |
+---------------------+
| 2013-10-21 10:44:17 |
+---------------------+
1 row in set (0.00 sec)
 
Query OK, 4 rows affected (0.29 sec)               
Records: 4  Duplicates: 0  Warnings: 0
 
+---------------------+
| now()               |
+---------------------+
| 2013-10-21 10:44:18 |
+---------------------+
1 row in set (0.00 sec)
 
...............

(it's just a small fraction of the output, i was able to alter the table dozens of times, and every time it happened within a second or so.

Comment by Elena Stepanova [ 2013-10-21 ]

Sorry I didn't notice earlier that you were using MariaDB Galera, it might be important. So, does it happen when the server is running in standalone mode, or do you have an actual cluster?

Comment by Jeff Armstrong [ 2013-10-22 ]

I have a two node cluster configured, but only one node is actually up for the ALTER TABLE session. Node2 was closed down cleanly. Once the ALTER TABLE has completed, I zap the galera state on node2 and it comes up after SST.

I could try wsrep_on = OFF if you think that might be a valid test case? As this is a server setting, I would have to do this on the weekend as the cluster is active and used throughout the day. I will create a test based on your SQL and let you know how I get on in the next two days.

Regards
Jeff

Comment by Jeff Armstrong [ 2013-10-22 ]

Configuration: In order from three files:
port = 3306
socket = /var/run/mysqld/mysqld.sock
pid-file = /var/run/mysqld/mysqld.pid
user = mysql
basedir = /usr
datadir = /var/lib/mysql
tmpdir = /tmp
lc_messages_dir = /usr/share/mysql
lc_messages = en_US
skip-external-locking
bind-address = 0.0.0.0
max_connections = 10000
table_open_cache = 50000
max_allowed_packet = 512M
group_concat_max_len = 50M
ignore_db_dirs = lost+found
interactive_timeout = 57600
wait_timeout = 57600
key_buffer_size = 512M
connect_timeout = 5
thread_cache_size = 128
sort_buffer_size = 4M
bulk_insert_buffer_size = 16M
tmp_table_size = 1G
max_heap_table_size = 1G
myisam_recover_options = BACKUP
myisam_sort_buffer_size = 512M
concurrent_insert = 2
read_buffer_size = 2M
read_rnd_buffer_size = 1M
query_cache_limit = 128K
query_cache_size = 64M
log_warnings = 2
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow_query.log
log_queries_not_using_indexes = 1
long_query_time = 10
log_slow_verbosity = query_plan
default_storage_engine = InnoDB
innodb_file_per_table = 1

binlog_cache_size = 1G
expire_logs_days = 10
innodb_adaptive_flushing_method = keep_average
innodb_autoinc_lock_mode = 2
innodb_buffer_pool_instances = 8
innodb_buffer_pool_size = 100G
innodb_flush_log_at_trx_commit = 2
innodb_flush_neighbor_pages = none
innodb_io_capacity = 30000
innodb_lock_wait_timeout = 50
innodb_locks_unsafe_for_binlog = 1
innodb_log_block_size = 4096
innodb_log_buffer_size = 500M
innodb_log_file_size = 2G
innodb_log_files_in_group = 10
innodb_open_files = 8000
innodb_read_io_threads = 8
innodb_spin_wait_delay = 0
innodb_thread_concurrency = 4
innodb_write_io_threads = 8
max_binlog_size = 1G
max_heap_table_size = 1G
tmp_table_size = 1G
transaction-isolation = READ-COMMITTED

query_cache_type = 0
query_cache_size = 0
binlog_format = ROW
wsrep_provider_options = "gcache.size=400G; gcache.dir=/var/trx/galera"
wsrep_provider = /usr/lib/galera/libgalera_smm.so
wsrep_sst_method = xtrabackup
wsrep_sst_auth = "XXXX:XXXX"
wsrep_certify_nonPK = 1
wsrep_convert_LOCK_to_trx = 0
wsrep_auto_increment_control = 1
wsrep_causal_reads = 0
wsrep_slave_threads = 16
wsrep_max_ws_rows = 500000
wsrep_max_ws_size = 4294967296

Comment by Elena Stepanova [ 2013-10-22 ]

It seems to be indeed related to wsrep. As soon as I enabled wsrep_provider / wsrep_cluster_address on a single-node setup, I got exactly the same blocking as described, even with the primitive two-table test.

Assigning to Seppo to confirm and say whether it's intentional.

Comment by Nirbhay Choubey (Inactive) [ 2013-12-02 ]

https://bugs.launchpad.net/galera/+bug/1257069

Comment by Aleksey Sanin (Inactive) [ 2013-12-03 ]

http://www.codership.com/wiki/doku.php?id=faq#qcluster_stalls_when_running_alter_on_a_table_which_is_not_used

Comment by Jan Lindström (Inactive) [ 2013-12-05 ]

Removing target because this is current limitation.

Generated at Thu Feb 08 07:02:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.