[MDEV-13612] Startup takes longer than 5 - 50 minutes after update from MariaDB 5.5.x to 10.2.6 Created: 2017-08-22  Updated: 2018-09-20  Resolved: 2017-11-22

Status: Closed
Project: MariaDB Server
Component/s: Platform Windows, Server, Storage Engine - InnoDB
Affects Version/s: 10.2.6
Fix Version/s: 10.2.10

Type: Bug Priority: Major
Reporter: Michael Graf Assignee: Vladislav Vaintroub
Resolution: Duplicate Votes: 0
Labels: need_feedback
Environment:

Windows Server 2012 R2 Standard


Attachments: Zip Archive Minidumps.zip     File QSZIS5B.err     File QSZIS5M.err     File my.ini     File my.ini     Text File show global variables.txt    
Issue Links:
Relates
relates to MDEV-13941 Innodb/Windows, 10.2 : High NTFS fra... Closed

 Description   

After the update from MariaDB 5.5.x to 10.2.6 the startup of a mirrored system takes now 5 - 50 minutes - with MariaDB 5.5.x the startup took not longer than 10 - 30s.
The MariaDB server was stopped always regularly - so no recovery action is necessary.
The cause for this behavior seems to be the page_cleaner.
Here is an extraction from the MariaDB errorlog:

2017-08-21 13:42:00 7676 [Note] InnoDB: Highest supported file format is Barracuda.
2017-08-21 14:32:36 7676 [Note] InnoDB: 128 out of 128 rollback segments are active.
2017-08-21 14:32:36 7676 [Note] InnoDB: Creating shared tablespace for temporary tables
2017-08-21 14:32:36 7676 [Note] InnoDB: Setting file '.\ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2017-08-21 14:32:36 7676 [Note] InnoDB: File '.\ibtmp1' size is now 12 MB.
2017-08-21 14:32:36 7676 [Note] InnoDB: Waiting for purge to start
2017-08-21 14:32:36 7676 [Note] InnoDB: 5.7.14 started; log sequence number 9834561730508
2017-08-21 14:32:36 19676 [Note] InnoDB: page_cleaner: 1000ms intended loop took 3036315ms. The settings might not be optimal. (flushed=0 and evicted=0, during the time.)



 Comments   
Comment by Elena Stepanova [ 2017-08-24 ]

A similar bug MDEV-12610 was fixed in 10.2.7 (the quoted output there does not have a page_cleaner message, but 10.1 does not have this logging; the important part is the delay between InnoDB: Highest supported file format is Barracuda and the next line.

Could you please upgrade to a recent 10.2 version and check if it makes the problem go away?

Comment by Michael Graf [ 2017-08-25 ]

MariaDB 10.2.8 shows the same startup behavior like 10.2.6.

We went back to 5.5.x to ensure that not the hardware or the OS is the source of this slow startup behavior.
But with 5.5.x the startup times are between 10 - 30s.
So we installed the newest MariaDB 10.2.8 and again the startup time is between 5 - 50 minutes.

Comment by Michael Graf [ 2017-08-25 ]

I must amend that the slow startup behavior can always be reproduced at the mirrored system.

This means that when MariaDB 10.2.8 is started at the main system, stopped regularly and is then started at the backup system we can always reproduce this slow startup behavior.
The startup at the main system is almost as fast as with MariaDB 5.5.x and it was until now only once time that it needed 30 minutes for the startup.

So what system dependent information is stored in MariaDB 10.2.6 that the startup at the backup system is always between 5 - 50 minutes ?

Comment by Elena Stepanova [ 2017-08-25 ]

migra, what is the "main system" and what is the "backup system"? What's the difference between them in terms of hardware and OS? What is mirrored, is it data only, or MariaDB server settings as well?

Comment by Michael Graf [ 2017-08-30 ]

The main system is the system which is used normally to run MariaDB.
When MariaDB is running all changed data from the directory where MariaDB is running (e.g. d:\mariadb**) is mirrored by a 3rd party product to the backup system.
In case of maintenance or a hardware failure of the main system MariaDB is started on the backup system.
There is no difference in hardware or OS between the main system and the backup system.

When we recognized that MariaDB needs longer than 5-50 minutes to start on the backup system we created a checksum on the directory d:\mariadb**.
There was no difference.
So the 3rd party product works correctly.

Meanwhile we are went back to MariaDB 5.5.x and the backup system now starts again within 10-30s.

Comment by Michael Graf [ 2017-09-13 ]

Do you need any further help from us to understand the problem ?

Comment by Elena Stepanova [ 2017-10-01 ]

migra,

While the server is hanging, can you produce a process dump from it?
The attached global variables are still from 10.2.6, which had the bug MDEV-12610, could you please produce the similar output from 10.2.8 and attach it along with the full error log, from the beginning of server startup and till the end, and the process dump?

Thanks.

Comment by Michael Graf [ 2017-10-06 ]

We reproduced the tests with MariaDB 10.2.9.
MariaDB was started on QSZIS5M and stopped regularly.
All changes were mirrored to QSZIS5B.
When MariaDB was started on QSZIS5B the start took 30 minutes instead of 10-30s.
We created 6 Minidumps during the slow start on QSZIS5B.

Minidumps.zip my.ini QSZIS5B.err QSZIS5M.err

Comment by Elena Stepanova [ 2017-10-06 ]

Thank you.
Here are stack traces from dumps #1 and #5 (I suppose the others are similar). Only one thread is doing something meaningful.

Dump #1

 1      [External Code]
 2      mysqld.exe!SyncFileIO::execute(const IORequest & request={...})
 3      mysqld.exe!os_file_io(const IORequest & in_type, void * file=0x0000000000000124, void * buf, unsigned __int64 n=16384, unsigned __int64 offset=2211266560, dberr_t * err=0x000000d449ba8040)
 4      mysqld.exe!os_file_pread(const IORequest & type, void * file, void * buf, unsigned __int64 n, unsigned __int64 offset=2211266560, dberr_t * err=0x000000d449ba8040)
 5      mysqld.exe!os_file_read_page(const IORequest & type={...}, void * file=0x0000000000000124, void * buf=0x000000d452b40000, unsigned __int64 offset=2211266560, unsigned __int64 n=16384, unsigned __int64 * o=0x0000000000000000, bool exit_on_err)
 6      mysqld.exe!os_file_read_func(const IORequest & type, void * file, void * buf, unsigned __int64 offset, unsigned __int64 n=16384)
 7      mysqld.exe!fil_io(const IORequest & type, bool sync=true, const page_id_t & page_id, const page_size_t & page_size, unsigned __int64 byte_offset=0, unsigned __int64 len=16384, void * buf=0x000000d452b40000, void * message=0x000000d452300d00, bool ignore_missing_space=false)
 8      mysqld.exe!buf_read_page_low(dberr_t * err=0x000000d449ba8640, bool sync, unsigned __int64 type=0, unsigned __int64 mode, const page_id_t & page_id={...}, const page_size_t & page_size, bool unzip=false, bool ignore_missing_space=false)
 9      mysqld.exe!buf_read_page(const page_id_t & page_id={...}, const page_size_t & page_size)
 10     mysqld.exe!buf_page_get_gen(const page_id_t & page_id={...}, const page_size_t & page_size, unsigned __int64 rw_latch=1, buf_block_t * guess=0x0000000000000000, unsigned __int64 mode=10, const char * file=0x00007ff6eb2e1f88, unsigned int line=445, mtr_t * mtr=0x000000d449ba8d60, dberr_t * err=0x000000d449ba89e0)
 11     mysqld.exe!btr_pcur_move_to_next_page(btr_pcur_t * cursor=0x000000d449ba8c60, mtr_t * mtr=0x000000d449ba8d60)
 12     mysqld.exe!dict_getnext_system_low(btr_pcur_t * pcur=0x000000d449ba8c60, mtr_t * mtr=0x000000d449ba8d60)
 13     mysqld.exe!dict_check_sys_tables(bool validate=false)
 14     mysqld.exe!dict_check_tablespaces_and_store_max_id(bool validate)
 15     mysqld.exe!innobase_start_or_create_for_mysql()
 16     mysqld.exe!innobase_init(void * p)
 17     mysqld.exe!ha_initialize_handlerton(st_plugin_int * plugin=0x000000d44925c388)
 18     mysqld.exe!plugin_initialize(st_mem_root * tmp_root=0x000000d449baf488, st_plugin_int * plugin=0x000000d44925c388, int * argc=0x00007ff6eb84bfc4, char * * argv, bool options_only=false)
 19     mysqld.exe!plugin_init(int * argc=0x00007ff6eb84bfc4, char * * argv=0x000000d4491e5888, int flags=2)
 20     mysqld.exe!init_server_components()
 21     mysqld.exe!win_main(int argc=39, char * * argv=0x000000d4491e5888)
 22     mysqld.exe!mysql_service(void * p=0x000000d400000000)
 23     mysqld.exe!thread_start<void (__cdecl*)(void * __ptr64)>(void * const parameter=0x00007ff6eb123648)
 24     [External Code]

Dump #5

 2      mysqld.exe!SyncFileIO::execute(const IORequest & request={...})
 3      mysqld.exe!os_file_io(const IORequest & in_type, void * file=0x0000000000000124, void * buf, unsigned __int64 n=16384, unsigned __int64 offset=4900323328, dberr_t * err=0x000000d449ba8150)
 4      mysqld.exe!os_file_pread(const IORequest & type, void * file, void * buf, unsigned __int64 n, unsigned __int64 offset=4900323328, dberr_t * err=0x000000d449ba8150)
 5      mysqld.exe!os_file_read_page(const IORequest & type={...}, void * file=0x0000000000000124, void * buf=0x000000d947020000, unsigned __int64 offset=4900323328, unsigned __int64 n=16384, unsigned __int64 * o=0x0000000000000000, bool exit_on_err)
 6      mysqld.exe!os_file_read_func(const IORequest & type, void * file, void * buf, unsigned __int64 offset, unsigned __int64 n=16384)
 7      mysqld.exe!fil_io(const IORequest & type, bool sync=true, const page_id_t & page_id, const page_size_t & page_size, unsigned __int64 byte_offset=0, unsigned __int64 len=16384, void * buf=0x000000d947020000, void * message=0x000000d94637cf00, bool ignore_missing_space=false)
 8      mysqld.exe!buf_read_page_low(dberr_t * err=0x000000d449ba8750, bool sync, unsigned __int64 type=0, unsigned __int64 mode, const page_id_t & page_id={...}, const page_size_t & page_size, bool unzip=false, bool ignore_missing_space=false)
 9      mysqld.exe!buf_read_page(const page_id_t & page_id={...}, const page_size_t & page_size)
 10     mysqld.exe!buf_page_get_gen(const page_id_t & page_id={...}, const page_size_t & page_size, unsigned __int64 rw_latch=1, buf_block_t * guess=0x0000000000000000, unsigned __int64 mode=10, const char * file=0x00007ff6eb2e1f88, unsigned int line=445, mtr_t * mtr=0x000000d449ba8c80, dberr_t * err=0x000000d449ba8af0)
 11     mysqld.exe!btr_pcur_move_to_next_page(btr_pcur_t * cursor=0x000000d6ac11ed80, mtr_t * mtr=0x000000d449ba8c80)
 12     mysqld.exe!row_sel(sel_node_t * node=0x000000d6ac11e740, que_thr_t * thr=0x000000d6ac120960)
 13     mysqld.exe!row_sel_step(que_thr_t * thr=0x000000d6ac120960)
 14     mysqld.exe!que_thr_step(que_thr_t * thr)
 15     mysqld.exe!que_run_threads_low(que_thr_t * thr=0x000000d6ac120960)
 16     mysqld.exe!que_run_threads(que_thr_t * thr=0x000000d6ac120960)
 17     mysqld.exe!que_eval_sql(pars_info_t * info=0x0000000000000000, const char * sql, unsigned __int64 reserve_dict_mutex=0, trx_t * trx=0x000000d451ede708)
 18     mysqld.exe!row_merge_drop_temp_indexes()
 19     mysqld.exe!recv_recovery_rollback_active()
 20     mysqld.exe!innobase_start_or_create_for_mysql()
 21     mysqld.exe!innobase_init(void * p)
 22     mysqld.exe!ha_initialize_handlerton(st_plugin_int * plugin=0x000000d44925c388)
 23     mysqld.exe!plugin_initialize(st_mem_root * tmp_root=0x000000d449baf488, st_plugin_int * plugin=0x000000d44925c388, int * argc=0x00007ff6eb84bfc4, char * * argv, bool options_only=false)
 24     mysqld.exe!plugin_init(int * argc=0x00007ff6eb84bfc4, char * * argv=0x000000d4491e5888, int flags=2)
 25     mysqld.exe!init_server_components()
 26     mysqld.exe!win_main(int argc=39, char * * argv=0x000000d4491e5888)
 27     mysqld.exe!mysql_service(void * p=0x000000d400000000)
 28     mysqld.exe!thread_start<void (__cdecl*)(void * __ptr64)>(void * const parameter=0x00007ff6eb123648)

Comment by Vladislav Vaintroub [ 2017-10-06 ]

migra so you have 2 identical systems, system1 and system 2. Same versions of mariadb 10.2 binaries, same files (as proven by checksum). The database was shut down regularly. on system 1 startup is fast, and on system2 startup is slow? Is this the case?

Can you tell how big are the databases, in terms of number of tables, maybe overall size,too

Comment by Michael Graf [ 2017-10-10 ]

>The database was shut down regularly. on system 1 startup is fast, and on system2 startup is slow? Is this the case?
Yes.

>Can you tell how big are the databases, in terms of number of tables, maybe overall size,too

DB Name DB Size in MB Number of tables
---------------------------------------------------------------------------
attributes 48.7 3 (InnoDB)
eventmonitor 26629.1 379153 (InnoDB)
information_schema 0.1 62
mysql 0.6 24
performance_schema 0.0 17
qstestdata 8.2 15
status 15099.1 34 (InnoDB)
zisarchdata 29754.5 23924
zislog 14780.0 8
zisslmarchdata 6.3 8
zisslmcalculation 0.0 3
zisslmdata 2.1 55 (InnoDB)
zisslmexport 0.0 5
zissystem 2591.5 466 (InnoDB)

When Mariadb is started first time on QSZIS5B (system2) and the start is interrupted after about 5 minutes due to a restart of system 2 (we don't want to wait 30 minutes or more), the second start on system 2 (after the restart) last only 10-30s.
I think this is very interesting.

Comment by Vladislav Vaintroub [ 2017-10-10 ]

Thanks, It is absolutely interesting migra. One thing that comes to mind is sparseness and fragmentation. maybe you can check how fragmented are the big files on both systems. contig -a -v <path-to-file> from sysinternals can be used for this

here is its download https://docs.microsoft.com/en-us/sysinternals/downloads/contig

Comment by Vladislav Vaintroub [ 2017-10-10 ]

Just recently,we fixed sparseness and high fragmentation in 10.2, so it might be related https://jira.mariadb.org/browse/MDEV-13941 . There is also the 3rd party product you're using that , who knows, might not play very well with NTFS sparse files (i.e creating a lot of fragments)

Comment by Elena Stepanova [ 2017-11-09 ]

migra, did you have a chance to check if 10.2.10 works better for you?

Comment by Michael Graf [ 2017-11-09 ]

Currently we need the 2 servers for the release of our next software version.
We will do the test on monday.

Comment by Michael Graf [ 2017-11-22 ]

We testet the new MariaDB Server 10.2.10 3 times.
It started always under 30s .

Comment by Vladislav Vaintroub [ 2017-11-22 ]

Great, so as I suspected, it was this sparse-ness or fragmentation.

Comment by Marko Mäkelä [ 2018-09-20 ]

For the record, if you were using .ibd files, then startup with a large number of tables would be slow until MariaDB 10.1.32 and 10.2.14 fixed MDEV-12610 and MDEV-15333. The reason was that InnoDB startup would be reading the first page of each data file.

Generated at Thu Feb 08 08:06:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.