[MDEV-13869] MariaDB slow start Created: 2017-09-22  Updated: 2019-03-09  Resolved: 2018-02-12

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2.2, 10.3.0
Fix Version/s: 10.2.13, 10.3.5

Type: Bug Priority: Major
Reporter: Dan Rimal Assignee: Marko Mäkelä
Resolution: Fixed Votes: 1
Labels: performance, startup
Environment:

ovz container, 2.6.32-042stab120.6, CentOS release 6.7 (Final)


Issue Links:
Blocks
blocks MDEV-14481 Execute InnoDB crash recovery in the ... Closed
Relates
relates to MDEV-9843 InnoDB hangs on startup between "Inno... Closed
relates to MDEV-18867 Long Time to Stop and Start Closed
relates to MDEV-15333 MariaDB (still) slow start Closed

 Description   

This DB has about 30GB and a lot of tables in many databases:

[mysql]# find ./ -name "*.ibd" | wc -l
90219

Start takes more than 5 minutes with heavy IO load made by mysql process:

2017-09-22 10:07:27 140632756656096 [Note] InnoDB: Highest supported file format is Barracuda.
2017-09-22 10:13:05 140632756656096 [Note] InnoDB: 128 out of 128 rollback segments are active.

During the startup, strace shows this:

cat /root/strace | grep open
 
[pid 29937] open("./deploy_test_b2c15/data_sources.ibd", O_RDONLY) = 1363
[pid 29937] open("./deploy_test_b2c15/data_sources.ibd", O_RDWR) = 1363
[pid 29937] open("./deploy_test_b2c15/data_sources_backup.isl", O_RDWR) = -1 ENOENT (No such file or directory)
[pid 29937] open("./deploy_test_b2c15/data_sources_backup.ibd", O_RDONLY) = 4108
[pid 29937] open("./deploy_test_b2c15/data_sources_backup.ibd", O_RDONLY) = 1364
[pid 29937] open("./deploy_test_b2c15/data_sources_backup.ibd", O_RDWR) = 1364
[pid 29937] open("./deploy_test_b2c15/data_sources_brands.isl", O_RDWR) = -1 ENOENT (No such file or directory)
[pid 29937] open("./deploy_test_b2c15/data_sources_brands.ibd", O_RDONLY) = 4108
[pid 29937] open("./deploy_test_b2c15/data_sources_brands.ibd", O_RDONLY) = 1365
[pid 29937] open("./deploy_test_b2c15/data_sources_brands.ibd", O_RDWR) = 1365
[pid 29937] open("./deploy_test_b2c15/data_sources_categories.isl", O_RDWR) = -1 ENOENT (No such file or directory)
[pid 29937] open("./deploy_test_b2c15/data_sources_categories.ibd", O_RDONLY) = 4108
[pid 29937] open("./deploy_test_b2c15/data_sources_categories.ibd", O_RDONLY) = 1366
[pid 29937] open("./deploy_test_b2c15/data_sources_categories.ibd", O_RDWR) = 1366
[pid 29937] open("./deploy_test_b2c15/data_sources_categories_brands.isl", O_RDWR) = -1 ENOENT (No such file or directory)
[pid 29937] open("./deploy_test_b2c15/data_sources_categories_brands.ibd", O_RDONLY) = 4108
[pid 29937] open("./deploy_test_b2c15/data_sources_categories_brands.ibd", O_RDONLY) = 1367
[pid 29937] open("./deploy_test_b2c15/data_sources_categories_brands.ibd", O_RDWR) = 1367
[pid 29937] open("./deploy_test_b2c15/data_sources_img_queue.isl", O_RDWR) = -1 ENOENT (No such file or directory)
[pid 29937] open("./deploy_test_b2c15/data_sources_img_queue.ibd", O_RDONLY) = 4108
[pid 29937] open("./deploy_test_b2c15/data_sources_img_queue.ibd", O_RDONLY) = 1368
[pid 29937] open("./deploy_test_b2c15/data_sources_img_queue.ibd", O_RDWR) = 1368
[pid 29937] open("./deploy_test_b2c15/data_sources_methods.isl", O_RDWR) = -1 ENOENT (No such file or directory)
[pid 29937] open("./deploy_test_b2c15/data_sources_methods.ibd", O_RDONLY) = 4108
[pid 29937] open("./deploy_test_b2c15/data_sources_methods.ibd", O_RDONLY) = 1369
[pid 29937] open("./deploy_test_b2c15/data_sources_methods.ibd", O_RDWR) = 1369
[pid 29937] open("./deploy_test_b2c15/default_delivery_address.isl", O_RDWR) = -1 ENOENT (No such file or directory)
[pid 29937] open("./deploy_test_b2c15/default_delivery_address.ibd", O_RDONLY) = 4108
[pid 29937] open("./deploy_test_b2c15/default_delivery_address.ibd", O_RDONLY) = 1370
[pid 29937] open("./deploy_test_b2c15/default_delivery_address.ibd", O_RDWR) = 1370??

Full strace:

[pid 29937] open("./demo_b2c15/novinky4_kategorie_priv_write.ibd", O_RDONLY) = 3810
[pid 29937] lseek(3810, 0, SEEK_CUR)    = 0
[pid 29937] lseek(3810, 0, SEEK_END)    = 131072
[pid 29937] lseek(3810, 0, SEEK_SET)    = 0
[pid 29937] pread(3810, "\343t\2[\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\351\31;\355\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384
[pid 29937] close(3810)                 = 0
[pid 29937] open("./demo_b2c15/novinky4_kategorie_priv_write.ibd", O_RDWR) = 3810
[pid 29937] fcntl(3810, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
[pid 29937] pread(3810, "\343t\2[\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\351\31;\355\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384
[pid 29937] open("./demo_b2c15/novinky4_soubory.isl", O_RDWR) = -1 ENOENT (No such file or directory)
[pid 29937] open("./demo_b2c15/novinky4_soubory.ibd", O_RDONLY) = 4108
[pid 29937] fstat(4108, {st_mode=S_IFREG|0660, st_size=114688, ...}) = 0
[pid 29937] pread(4108, "W\352G\231\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\351\31`/\0\10\0\0\0\0\0\0"..., 65536, 0) = 65536
[pid 29937] close(4108)                 = 0
[pid 29937] close(3811)                 = 0
[pid 29937] open("./demo_b2c15/novinky4_soubory.ibd", O_RDONLY) = 3811
[pid 29937] lseek(3811, 0, SEEK_CUR)    = 0
[pid 29937] lseek(3811, 0, SEEK_END)    = 114688
[pid 29937] lseek(3811, 0, SEEK_SET)    = 0
[pid 29937] pread(3811, "W\352G\231\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\351\31`/\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384
[pid 29937] close(3811)                 = 0
[pid 29937] open("./demo_b2c15/novinky4_soubory.ibd", O_RDWR) = 3811
[pid 29937] fcntl(3811, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
[pid 29937] pread(3811, "W\352G\231\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\351\31`/\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384
[pid 29937] open("./demo_b2c15/novinky5.isl", O_RDWR) = -1 ENOENT (No such file or directory)
[pid 29937] open("./demo_b2c15/novinky5.ibd", O_RDONLY) = 4108
[pid 29937] fstat(4108, {st_mode=S_IFREG|0660, st_size=212992, ...}) = 0
[pid 29937] pread(4108, "\6\242(\27\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\351\32\10\352\0\10\0\0\0\0\0\0"..., 65536, 0) = 65536
[pid 29937] close(4108)                 = 0
[pid 29937] close(3812)                 = 0
[pid 29937] open("./demo_b2c15/novinky5.ibd", O_RDONLY) = 3812
[pid 29937] lseek(3812, 0, SEEK_CUR)    = 0
[pid 29937] lseek(3812, 0, SEEK_END)    = 212992
[pid 29937] lseek(3812, 0, SEEK_SET)    = 0
[pid 29937] pread(3812, "\6\242(\27\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\351\32\10\352\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384
[pid 29937] close(3812)                 = 0
[pid 29937] open("./demo_b2c15/novinky5.ibd", O_RDWR) = 3812
[pid 29937] fcntl(3812, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
[pid 29937] pread(3812, "\6\242(\27\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\351\32\10\352\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384
[pid 29937] open("./demo_b2c15/novinky5_formulare.isl", O_RDWR) = -1 ENOENT (No such file or directory)
[pid 29937] open("./demo_b2c15/novinky5_formulare.ibd", O_RDONLY) = 4108
[pid 29937] fstat(4108, {st_mode=S_IFREG|0660, st_size=114688, ...}) = 0
[pid 29937] pread(4108, "\253\270\202\216\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\351\32\323)\0\10\0\0\0\0\0\0"..., 65536, 0) = 65536
[pid 29937] close(4108)                 = 0
[pid 29937] close(3813)                 = 0

I found the very same problem (https://jira.mariadb.org/browse/MDEV-12610) closed as solved in 10.2.7. This is 10.2.8 and looks like problem appears again.

This db was 10.0.xx upgraded to 10.1.xx and finally to 10.2.8. Can make this upgrade this issue?



 Comments   
Comment by Marko Mäkelä [ 2018-01-16 ]

In MariaDB 10.2.2, we seem to always open all .ibd files on startup, even when no crash recovery is involved. (When crash recovery is involved, we do this on purpose, as noted in MDEV-9843.)

This mistake was made in the merge of MySQL 5.7.9 into MariaDB 10.2.2. The assignment validate=true in dict_check_sys_tables() causes InnoDB to read every single .ibd file at startup, even when no crash recovery is needed.

Comment by Marko Mäkelä [ 2018-01-17 ]

As far as I can tell, the purpose of the parameter bool validate of fil_ibd_open() is to perform some extra checks in case any inconsistencies were caused by the implementation of DATA DIRECTORY, which was originally introduced in MySQL 5.6 and MariaDB 10.0.

diff --git a/storage/innobase/dict/dict0load.cc b/storage/innobase/dict/dict0load.cc
index 15e14b9d9f7..725862944bc 100644
--- a/storage/innobase/dict/dict0load.cc
+++ b/storage/innobase/dict/dict0load.cc
@@ -1471,8 +1471,6 @@ dict_check_sys_tables(
 		char*	filepath = dict_get_first_path(space_id);
 
 		/* Check that the .ibd file exists. */
-		validate = true; /* Encryption */
-
 		dberr_t	err = fil_ibd_open(
 			validate,
 			!srv_read_only_mode && srv_log_file_size != 0,

With the above patch, we would skip reading the files in fil_ibd_open() at database startup, but then we would also fail to detect some corruption, because the tests innodb.table_flags and innodb.log_file_name would start to fail.

I think that we must try to implement some checks in fil_node_open_file() when first_time_open holds, so that we would always validate the data file when we open it for the very first time. That is a little tricky, because we will not have the dict_table_t available in that context.

Comment by Michael Caplan [ 2018-01-31 ]

Not a work around, but if you are hitting this issue and on a systemd, you might need to tweak the service settings. If TimeoutStartSec is hit when systemd is trying to start up mariadb, it will kill the process.

/etc/systemd/system/mariadb.service.d/tweak.conf

[Service]
TimeoutStartSec=7200

My MariaDB startup time is now weighing in at 36 minutes, which by default would hit the TimeoutStartSec threshold making startup impossible.

Even at 7200 seconds (2 hours), I got into trouble when upgrading from 10.2.11 to 10.2.12. The update process has another tax on restart (which I assume is similar to this recorded issue), that had the restart on upgrade hitting TimeoutStartSec. Not sure what would be an acceptable value for TimeoutStartSec.

Comment by Michael Caplan [ 2018-01-31 ]

@marko are there any workarounds for this issue for Maria 10.2? Or is a rollback to 10.1 the best bet?

Comment by Michael Caplan [ 2018-01-31 ]

FYI, my 36 minute start time is for a DB roughly 290GB and roughly 991K tables

Comment by Daniel Black [ 2018-02-01 ]

michaelcaplan FYI - working on getting co-operation between mariadb and systemd regarding startup/shutdown timings. https://github.com/MariaDB/server/pull/576 . Unfortunately requires systemd v236. So I'm really watching this to make sure I have some idea of which slow points in startup/shutdown can't be prevented.

Comment by Marko Mäkelä [ 2018-02-16 ]

Sorry, but it turns out that there is one more cause of reading all .ibd files at startup. That one was added in MySQL 5.7, which was first merged to MariaDB 10.2.2. See MDEV-15333.

Generated at Thu Feb 08 08:09:00 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.