[MDEV-12610] MariaDB start is slow Created: 2017-04-27  Updated: 2020-08-25  Resolved: 2017-06-09

Status: Closed
Project: MariaDB Server
Component/s: Encryption, Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 10.1.22, 10.1.23
Fix Version/s: 10.1.25

Type: Bug Priority: Major
Reporter: Darshit Gavhane Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 2
Labels: mariadb, mariadb-restart, slow

Issue Links:
Duplicate
is duplicated by MDEV-15333 MariaDB (still) slow start Closed
Problem/Incident
is caused by MDEV-12623 InnoDB: Failing assertion: kv == 0 ||... Closed
Relates
relates to MDEV-9931 InnoDB reads first page of every .ibd... Closed
relates to MDEV-21715 Allow loading buffer pool dumps for e... Open
relates to MDEV-10198 MariaDB restart is too slow Closed

 Description   

It takes about 10 mins to start, had seen the same issue earlier in 10.1.18 which was fixed but has reoccurred in 10.1.22

2017-04-27 15:48:10 139983869696032 [Warning] option 'general_log': boolean value 'xxxxx.log' wasn't recognized. Set to OFF.
2017-04-27 15:48:10 139983869696032 [Warning] 'THREAD_CONCURRENCY' is deprecated and will be removed in a future release.
2017-04-27 15:48:10 139983869696032 [Note] /usr/libexec/mysqld (mysqld 10.1.22-MariaDB) starting as process 19851 ...
2017-04-27 15:48:12 7f5088d3d820 InnoDB: Warning: Using innodb_additional_mem_pool_size is DEPRECATED. This option may be removed in future releases, together with the option innodb_use_sys_malloc and with the InnoDB's internal memory allocator.
2017-04-27 15:48:12 139983869696032 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2017-04-27 15:48:12 139983869696032 [Note] InnoDB: The InnoDB memory heap is disabled
2017-04-27 15:48:12 139983869696032 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-04-27 15:48:12 139983869696032 [Note] InnoDB: GCC builtin __sync_synchronize() is used for memory barrier
2017-04-27 15:48:12 139983869696032 [Note] InnoDB: Compressed tables use zlib 1.2.3
2017-04-27 15:48:12 139983869696032 [Note] InnoDB: Using Linux native AIO
2017-04-27 15:48:12 139983869696032 [Note] InnoDB: Using SSE crc32 instructions
2017-04-27 15:48:12 139983869696032 [Note] InnoDB: Initializing buffer pool, size = 2.0G
2017-04-27 15:48:12 139983869696032 [Note] InnoDB: Completed initialization of buffer pool
2017-04-27 15:48:13 139983869696032 [Note] InnoDB: Highest supported file format is Barracuda.
2017-04-27 15:58:24 139983869696032 [Note] InnoDB: 128 rollback segment(s) are active.
2017-04-27 15:58:24 139983869696032 [Note] InnoDB: Waiting for purge to start
2017-04-27 15:58:24 139983869696032 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.35-80.0 started; log sequence number 657936019547
2017-04-27 15:58:25 139980747806464 [Note] InnoDB: Dumping buffer pool(s) not yet started
2017-04-27 15:58:25 139983869696032 [Note] Plugin 'FEEDBACK' is disabled.
2017-04-27 15:58:25 139983869696032 [Note] Server socket created on IP: '::'.
2017-04-27 15:58:25 139983869696032 [Note] /usr/libexec/mysqld: ready for connections.
Version: '10.1.22-MariaDB'  socket: 'xxxxx.sock'  port: 3306  MariaDB Server



 Comments   
Comment by Marko Mäkelä [ 2017-05-03 ]

If this problem is reproducible, please run "perf top" during the hang, or use http://poormansprofiler.org/ or similar to find out what InnoDB is spending its time on.

Comment by Valerii Kravchuk [ 2017-06-01 ]

We have the same problem with 10.1.23 when there are many tables/.ibd files (24000+)

Comment by Alex Boag-Munroe [ 2017-06-03 ]

I've got the same issue on a system with a few hundred thousand tables, takes 10 minutes or so between

InnoDB: Highest supported file format is Barracuda.
and
[Note] InnoDB: 128 rollback segment(s) are active.

Currently running a debug build while investigating another issue and the log is full of entries like:

2017-06-03 14:33:44 140090905216896 [Note] InnoDB: Created tablespace for space 3981695 name SCHEMANAME/TABLENAME key_id 0 encryption 0.

According to perf top:

  34.05%  mysqld                                   [.] Check::operator()                                                                                                   
  23.32%  mysqld                                   [.] ut_list_map<ut_list_base<fil_node_t>, Check>

I'll run perf top again when I've got a normal startup on standard binaries.

Comment by Alex Boag-Munroe [ 2017-06-03 ]

Starting up while not on a debug binary shows:

   4.71%  [.] fsp_header_get_crypt_offset                                                                                                                                                                                                                                     
   1.91%  [.] ut_crc32_sse42

Comment by Marko Mäkelä [ 2017-06-05 ]

MDEV-9931 reports a similar problem in the past.
This one is likely a regression caused by my MDEV-12623 fix. The function fil_open_single_table_tablespace() should avoid the call to fsp_flags_try_adjust() if the tablespace flags are known to be correct.

We can trivially avoid this call when the default page size is being used, and neither page_compression nor encryption is used. To avoid the start-up penalty in those cases, maybe we should check if the SYS_TABLESPACES.FLAGS are in the incompatible format that was introduced in MariaDB 10.1.0.

Comment by Marko Mäkelä [ 2017-06-05 ]

Ninpo, thank you for the "perf top" results! While the root cause of this slowness appears to be that the function fsp_flags_try_adjust() is being unconditionally invoked on every .ibd file on startup, the top function fsp_header_get_crypt_offset() for the non-debug binary is noteworthy. That the page checksum validation (ut_crc32_sse42) takes some resources is known.

I think that the function fsp_header_get_crypt_offset() is unacceptably complex and slow. In MariaDB 10.2, I replaced the function with fsp_header_get_encryption_offset(), which uses a simple arithmetic formula that returns the equivalent result as the loop. I think that we should port this change to 10.1.

Comment by Alex Boag-Munroe [ 2017-06-05 ]

Probably worth noting that page compression has been in use on these systems at some point, though no tables are using it currently.

innodb_compression_algorithm=zlib is set in my.cnf for when we have looked at it.

Unless I'm misunderstanding what you said about page compression

At any rate, I'm glad I could help. Our 700k table schema seems destined to hit/find edge cases in Maria.

Comment by Jan Lindström (Inactive) [ 2017-06-07 ]

https://github.com/MariaDB/server/commit/6fb92442e2e33d0cbe56395972aef50cc8a84f5d

Comment by Marko Mäkelä [ 2017-06-07 ]

Please address my review comments.
Specifically, I would rather not invoke fil_node_open_file() from other subsystems, and I would try to reduce the size of the patch by trying to avoid changes to internal APIs.

Comment by Jan Lindström (Inactive) [ 2017-06-08 ]

After review fixes:
https://github.com/MariaDB/server/commit/6d0cc8321b603f971f600c3d1b4162da56fc2c7f

Comment by Jan Lindström (Inactive) [ 2017-06-09 ]

Author: Jan Lindström <jan.lindstrom@mariadb.com>
Date: Thu Jun 8 15:40:25 2017 +0300

MDEV-12610: MariaDB start is slow

Problem appears to be that the function fsp_flags_try_adjust()
is being unconditionally invoked on every .ibd file on startup.
Based on performance investigation also the top function
fsp_header_get_crypt_offset() needs to addressed.

Ported implementation of fsp_header_get_encryption_offset()
function from 10.2 to fsp_header_get_crypt_offset().

Introduced a new function fil_crypt_read_crypt_data()
to read page 0 if it is not yet read.

fil_crypt_find_space_to_rotate(): Now that page 0 for every .ibd
file is not read on startup we need to check has page 0 read
from space that we investigate for key rotation, if it is not read
we read it.

fil_space_crypt_get_status(): Now that page 0 for every .ibd
file is not read on startup here also we need to read page 0
if it is not yet read it. This is needed
as tests use IS query to wait until background encryption
or decryption has finished and this function is used to
produce results.

fil_crypt_thread(): Add is_stopping condition for tablespace
so that we do not rotate pages if usage of tablespace should
be stopped. This was needed for failure seen on regression
testing.

fil_space_create: Remove page_0_crypt_read and extra
unnecessary info output.

fil_open_single_table_tablespace(): We call fsp_flags_try_adjust
only when when no errors has happened and server was not started
on read only mode and tablespace validation was requested or
flags contain other table options except low order bits to
FSP_FLAGS_POS_PAGE_SSIZE position.

fil_space_t::page_0_crypt_read removed.

Added test case innodb-first-page-read to test startup when
encryption is on and when encryption is off to check that not
for all tables page 0 is read on startup.

Comment by Nathan Landis [ 2018-01-26 ]

Anyone else still seeing this in recent 10.2 versions? I'm testing an upgrade from 10.0.13 > 10.1.30 > 10.2.12 and seem to be encountering this still (~12min startup delay on ~70k tables that wasn't there before)

Comment by Daniel Black [ 2018-01-27 ]

Logs and configuration would help verify this.

Comment by Nathan Landis [ 2018-01-29 ]

Apologies! Log and config below from the latest version I tried (10.2.7). I'm using MySQL Sandbox for intermediate upgrade testing. I know I'm several releases behind on that, upgrading and retrying (or trying an instance without sandbox) is on my todo list.

180126 16:26:03 mysqld_safe Starting mysqld daemon with databases from /redacted/dir
2018-01-26 16:26:03 139690100025120 [Note] /redacted/10.2.7/bin/mysqld (mysqld 10.2.7-MariaDB) starting as process 20851 ...
2018-01-26 16:26:04 139690100025120 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-01-26 16:26:04 139690100025120 [Note] InnoDB: Uses event mutexes
2018-01-26 16:26:04 139690100025120 [Note] InnoDB: Compressed tables use zlib 1.2.3
2018-01-26 16:26:04 139690100025120 [Note] InnoDB: Using Linux native AIO
2018-01-26 16:26:04 139690100025120 [Note] InnoDB: Number of pools: 1
2018-01-26 16:26:04 139690100025120 [Note] InnoDB: Using SSE2 crc32 instructions
2018-01-26 16:26:04 139690100025120 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2018-01-26 16:26:04 139690100025120 [Note] InnoDB: Completed initialization of buffer pool
2018-01-26 16:26:04 139689372313344 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2018-01-26 16:26:04 139690100025120 [Note] InnoDB: Highest supported file format is Barracuda.
2018-01-26 16:37:23 139690100025120 [Note] InnoDB: 128 out of 128 rollback segments are active.
2018-01-26 16:37:23 139690100025120 [Note] InnoDB: Creating shared tablespace for temporary tables
2018-01-26 16:37:23 139690100025120 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2018-01-26 16:37:23 139690100025120 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2018-01-26 16:37:23 139690100025120 [Note] InnoDB: Waiting for purge to start
2018-01-26 16:37:23 139690100025120 [Note] InnoDB: 5.7.18 started; log sequence number 243661604721
2018-01-26 16:37:23 139690100025120 [Note] Plugin 'FEEDBACK' is disabled.
2018-01-26 16:37:23 139690100025120 [Note] Server socket created on IP: '127.0.0.1'.
2018-01-26 16:37:24 139690100025120 [Note] Reading of all Master_info entries succeded
2018-01-26 16:37:24 139690100025120 [Note] Added new Master_info '' to hash table
2018-01-26 16:37:24 139690100025120 [Note] /redacted/10.2.7/bin/mysqld: ready for connections.
Version: '10.2.7-MariaDB' socket: '/tmp/mysql_sandbox10207.sock' port: 10207 MariaDB Server

---config:

[mysql]
prompt='mysql [\h]

{\u}

(\d) > '
#

[client]
user = redacted
password = redacted
port = 10207
socket = /tmp/mysql_sandbox10207.sock

[mysqld]
user = mysql
port = 10207
socket = /tmp/mysql_sandbox10207.sock
basedir = /redacted/10.2.7
datadir = /redacted/sandbox/shared-data #msb_10_2_7/data
tmpdir = /redacted/sandbox/msb_10_2_7/tmp
lower_case_table_names = 0
pid-file = /redacted/sandbox/msb_10_2_7/data/mysql_sandbox10207.pid
bind-address = 127.0.0.1

  1. log-slow-queries = /redacted/sandbox/msb_10_2_7/data/msandbox-slow.log
  2. log = /redacted/sandbox/msb_10_2_7/data/msandbox-general.log
    #
  3. additional options passed through 'my_clause'
    #
    log-error=/redacted/sandbox/msb_10_2_7/msandbox.err
    innodb_buffer_pool_load_at_startup = OFF
Comment by Marko Mäkelä [ 2018-01-29 ]

lu_nate, MDEV-13869 was introduced in MariaDB 10.2.2 and has not been fixed yet.

Comment by Nathan Landis [ 2018-01-29 ]

Ah, that is very helpful Marko, I missed seeing that issue. I will stop banging my head against in now . Thanks for taking the time to reply!

Comment by Marko Mäkelä [ 2018-02-16 ]

As noted in MDEV-15333, I failed to merge this fix fully to MariaDB 10.2.7. I will try to fix it by 10.2.14, and will write a test to cover this with some instrumentation.

Generated at Thu Feb 08 07:59:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.