[MDEV-27074] container start-up to take a long time Created: 2021-11-17  Updated: 2022-03-29  Resolved: 2021-12-08

Status: Closed
Project: MariaDB Server
Component/s: Docker
Affects Version/s: 10.6.5
Fix Version/s: 10.2.42, 10.3.33, 10.4.23, 10.5.14, 10.6.6, 10.7.2

Type: Bug Priority: Minor
Reporter: Rob Schwyzer Assignee: Daniel Black
Resolution: Fixed Votes: 0
Labels: None
Environment:

Windows 10, Docker Desktop for Windows WSLv2, AMD Ryzen 5800X CPU, 32GB 3600MHz DDR4, and 2TB Samsung 980 Pro PCI-E 4.0 x4 SSD


Issue Links:
Problem/Incident
causes MDEV-28103 container /docker-entrypoint-initdb.d... Closed
Relates
relates to MDEV-23326 aria TRANSACTIONAL=1 significantly sl... Closed
relates to MDEV-27167 system variable deferred-networking t... Closed
relates to MDEV-28188 Database initialization commands can ... Closed

 Description   

On startup the image throws the following warnings, each of which seems to contribute to the container taking longer to finish start-up tasks, making full start-up take over a minute.

Least impactful is the first warning-

2021-11-17 20:47:02 1 [Warning] Failed to load slave replication state from table mysql.gtid_slave_pos: 1017: Can't find file: './mysql/' (errno: 2 "No such file or directory")

This one is probably fine as-is, but if it can be addressed, would be nice.

The bigger issues are these three-

2021-11-17 20:47:07 0 [Note] mariadbd: ready for connections.
Version: '10.6.5-MariaDB-1:10.6.5+maria~focal-log'  socket: '/run/mysqld/mysqld.sock'  port: 0  mariadb.org binary distribution
2021-11-17 20:47:07+00:00 [Note] [Entrypoint]: Temporary server started.
Warning: Unable to load '/usr/share/zoneinfo/leap-seconds.list' as time zone. Skipping it.
Warning: Unable to load '/usr/share/zoneinfo/leapseconds' as time zone. Skipping it.
Warning: Unable to load '/usr/share/zoneinfo/tzdata.zi' as time zone. Skipping it.
2021-11-17 20:47:57 6 [Warning] 'proxies_priv' entry '@% root@3ca6dd5b620d' ignored in --skip-name-resolve mode.
2021-11-17 20:47:57+00:00 [Note] [Entrypoint]: Creating database test
2021-11-17 20:47:57+00:00 [Note] [Entrypoint]: Creating user tester
2021-11-17 20:47:57+00:00 [Note] [Entrypoint]: Giving user tester access to schema test
2021-11-17 20:47:57+00:00 [Note] [Entrypoint]: Stopping temporary server
2021-11-17 20:47:57 0 [Note] mariadbd (initiated by: root[root] @ localhost []): Normal shutdown
2021-11-17 20:47:57 0 [Note] InnoDB: FTS optimize thread exiting.

Key here are the time stamps- it is 20:47:07 before the Warning: Unable to load... entries occur and then 20:47:57 after. It seems like the system waits for these to time out before proceeding.

The amount of time here doesn't look significant, but this was a faster than typical run for this- it can take longer. It also feels inordinately long compared to when the temporary server starts up (and also quite long compared to Nginx and PHP-FPM container start-up times, making MariaDB the bottleneck for launching new EMP stacks). If we can resolve or work-around this issue, the MariaDB Docker container will feel much more responsive and yield a much improved user experience.



 Comments   
Comment by Daniel Black [ 2021-11-26 ]

The unable to load message itself isn't incuring a timeout, these are just that these particular files are zoneinfo file. The timezone loading is quite IO intense. I did reduce the time in https://github.com/MariaDB/mariadb-docker/commit/88ff4eed96a7662c1a06b16397592af954770070, however there are better ways like LOCK TABLES ... WRITE on these that I haven't done. 50 seconds does seem like far too long however.

If you don't need the timezones - MARIADB_INITDB_SKIP_TZINFO=1 can help.

On the replication, yes there is an every growing list of things that should be implicitly part of mysql_install_db. It will be addressed soon.

I'll look into the number of restarts and what is being done in each. Do you have a mariadb configuration file that may hint as to which bits are taking time in the server itself?

Comment by Rob Schwyzer [ 2021-11-26 ]

[mariadb]
server_id=1001
 
max_connections=60
max_user_connections=50
 
lower_case_table_names=1
character_set_server=utf8mb4
collation_server=utf8mb4_unicode_ci
 
log_bin = mariadb-bin
binlog_format = ROW
sync_binlog=1
 
innodb_flush_log_at_trx_commit=1
innodb_buffer_pool_size=2G
innodb_log_file_size=2G

Comment by Daniel Black [ 2021-12-07 ]

Thanks rob.schwyzer@mariadb.com

Can you please test the image quay.io/danielgblack/mariadb-test:10.6_tz_init_lock_rather_than_alter which includes this change. Can you include -e MARIADB_RANDOM_ROOT_PASSWORD=1 in your test as the root password in the output provides a good separate between the time consumed by TZ initialization compared to the other generic user setup.

Enabling the binlog, esp with sync_binlog is obviously going to be added IO. Overall the above fix might improve this somewhat, but I think I can do better.

There's no need for a record of the binary log from the initialization (please correct me if there is). So the part 2, on top of the above, is to remove the the log-bin during initialization. So please test the timings of quay.io/danielgblack/mariadb-test:10.6_no_log_bin_on_temp too which is based on this change.

Comment by Rob Schwyzer [ 2021-12-08 ]

Thank you danblack!

I first tested the image as-is:

2021-12-08 21:09:06+00:00 [Note] [Entrypoint]: Temporary server started.
stack-db-1       | Warning: Unable to load '/usr/share/zoneinfo/leap-seconds.list' as time zone. Skipping it.
stack-db-1       | Warning: Unable to load '/usr/share/zoneinfo/leapseconds' as time zone. Skipping it.
stack-db-1       | Warning: Unable to load '/usr/share/zoneinfo/tzdata.zi' as time zone. Skipping it.
stack-db-1       | 2021-12-08 21:09:55 6 [Warning] 'proxies_priv' entry '@% root@361818db6f5d' ignored in --skip-name-resolve mode.

I next tested the image to use -e MARIADB_RANDOM_ROOT_PASSWORD=1 instead:

stack-db-1       | 2021-12-08 21:12:30+00:00 [Note] [Entrypoint]: Temporary server started.
stack-db-1       | Warning: Unable to load '/usr/share/zoneinfo/leap-seconds.list' as time zone. Skipping it.
stack-db-1       | Warning: Unable to load '/usr/share/zoneinfo/leapseconds' as time zone. Skipping it.
stack-db-1       | Warning: Unable to load '/usr/share/zoneinfo/tzdata.zi' as time zone. Skipping it.
stack-db-1       | 2021-12-08 21:13:18+00:00 [Note] [Entrypoint]: GENERATED ROOT PASSWORD: p&_%W9~FqsSI"jsx9|[:l_e{*"]4?1w.
stack-db-1       | 2021-12-08 21:13:18 6 [Warning] 'proxies_priv' entry '@% root@6154ea28819f' ignored in --skip-name-resolve mode.

No, notable improvement.

Now, using the 2nd image you provided instead, with the generated root password:

stack-db-1       | 2021-12-08 21:16:47+00:00 [Note] [Entrypoint]: Temporary server started.
stack-db-1       | Warning: Unable to load '/usr/share/zoneinfo/leap-seconds.list' as time zone. Skipping it.
stack-db-1       | Warning: Unable to load '/usr/share/zoneinfo/leapseconds' as time zone. Skipping it.
stack-db-1       | Warning: Unable to load '/usr/share/zoneinfo/tzdata.zi' as time zone. Skipping it.
stack-db-1       | 2021-12-08 21:16:48+00:00 [Note] [Entrypoint]: GENERATED ROOT PASSWORD: J+An%nDp|<Q054TG;mo>e^Du)zql>:#H
stack-db-1       | 2021-12-08 21:16:48 5 [Warning] 'proxies_priv' entry '@% root@897de353c4f9' ignored in --skip-name-resolve mode.

Night and day difference. Finally, testing with the original ENV config:

stack-db-1       | 2021-12-08 21:17:43+00:00 [Note] [Entrypoint]: Temporary server started.
stack-db-1       | Warning: Unable to load '/usr/share/zoneinfo/leap-seconds.list' as time zone. Skipping it.
stack-db-1       | Warning: Unable to load '/usr/share/zoneinfo/leapseconds' as time zone. Skipping it.
stack-db-1       | Warning: Unable to load '/usr/share/zoneinfo/tzdata.zi' as time zone. Skipping it.
stack-db-1       | 2021-12-08 21:17:45 5 [Warning] 'proxies_priv' entry '@% root@54fd176a0175' ignored in --skip-name-resolve mode.

So quay.io/danielgblack/mariadb-test:10.6_no_log_bin_on_temp is the real difference-maker here.

Finally, checking on my end with sync_binlog=1 commented out while using the original MariaDB 10.6 container image:

stack-db-1       | 2021-12-08 21:20:02+00:00 [Note] [Entrypoint]: Temporary server started.
stack-db-1       | Warning: Unable to load '/usr/share/zoneinfo/leap-seconds.list' as time zone. Skipping it.
stack-db-1       | Warning: Unable to load '/usr/share/zoneinfo/leapseconds' as time zone. Skipping it.
stack-db-1       | Warning: Unable to load '/usr/share/zoneinfo/tzdata.zi' as time zone. Skipping it.
stack-db-1       | 2021-12-08 21:20:04 6 [Warning] 'proxies_priv' entry '@% root@257c09280fc4' ignored in --skip-name-resolve mode.

As expected, this is all caused by sync_binlog=1 (I spent more time evaluating hardware resource use on my end yesterday and noticed my Samsung 980 Pro 2TB was reporting 100% usage despite pumping out only ~5MB/s which usually means it's doing an enormous number of tiny writes), so your work-around of disabling binary logging for the temporary server part fixes the issue caused by sync_binlog=1. And honestly... this is perfect! There are use-cases where sync_binlog=1 makes sense and where we want to test with that on, and quay.io/danielgblack/mariadb-test:10.6_no_log_bin_on_temp enables this use-case wonderfully.

If quay.io/danielgblack/mariadb-test:10.6_no_log_bin_on_temp can be merged upstream, my concerns here are fully addressed. Thank you!

Comment by Daniel Black [ 2021-12-08 ]

Thanks for testing.

https://github.com/MariaDB/mariadb-docker/commit/c6c9e6cd78d12cdb837348c644de4d543ad3dee1
https://github.com/MariaDB/mariadb-docker/commit/146e09c9c7aa5d301ac96c1939fdd112f802f4ae

Will be out next release, or maybe sooner if I get a few major features of healthcheck/mariadb-upgrade resolved.

I've omitted the TZ initialization improvements, but rerolled up a fix for MDEV-23326 and pinged the reviewer for a server fix. Then you can gain the last second of start up time .

Generated at Thu Feb 08 09:50:08 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.