[MDEV-14772] Mysqldump 2-3 times slower since MariaDB upgrade to 10.1.29 Created: 2017-12-26  Updated: 2021-09-16  Resolved: 2021-09-16

Status: Closed
Project: MariaDB Server
Component/s: Scripts & Clients
Affects Version/s: 10.1.29
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Stijn Assignee: Axel Schwenke
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

CentOS 7.4.1708 (VMWare)
45GB RAM
550GB disk space
8vCPU
innodb tables
2 production servers upgraded from 10.1.15 to 10.1.29


Attachments: File server.cnf    
Sprint: 10.1.32

 Description   

We upgraded 2 of our production database servers to 10.1.29 (from 10.1.15). We also updated some other yum packages at the same time. After the upgrade, the nightly mysqldump backup time has increased from around 4 hours and 15 minutes to 14 hours and 40 minutes (3 times increase) on our first database server and from 6 hours and 30 minutes to 13 hours and 40 minutes (2 times increase) on our second database server.

Normal database performance (sql query times) was not visibly impacted.

We use the following command for our mysqldumps:

echo 'show databases' | /usr/bin/mysql --defaults-extra-file=<file> | grep -v -f <file> | while read DB; do
  /usr/bin/mysqldump --defaults-extra-file=<file> --routines --events $DB > <file>.sql



 Comments   
Comment by Elena Stepanova [ 2017-12-26 ]

stijn, could you please give any hints on the data layout: how many databases does it dump, how many tables / routines / events, what's the amount of data, and which engines are mainly used? I don't mean the exact numbers (unless you have them handy), just the general idea.

Comment by Stijn [ 2017-12-26 ]

On the database servers we have 193 databases (first database server) and 171 databases (second database server). Each database has 158 tables, 2 events and 2 procedures. So about 28000-30000 tables per database server. Most tables are very small, with a few reaching 1GB, 3-4GB maximum. Biggest database is 12GB. All tables are innodb. Total data usage (/var/lib/mysql size) is 430GB and 442GB .

Comment by Elena Stepanova [ 2017-12-28 ]

stijn, thanks. Also, which exact packages are you using? What does the repository file point at?

Did you make any configuration changes at the same time?
Is it possible that some other jobs now run at the same time – not necessarily DB server related, possibly from other packages that you upgraded – which compete with the backup for IO, for example?

Comment by Stijn [ 2018-01-03 ]

We have the following packages installed of MariaDB:
MariaDB-client.x86_64 10.1.29-1.el7.centos
MariaDB-common.x86_64 10.1.29-1.el7.centos
MariaDB-server.x86_64 10.1.29-1.el7.centos
MariaDB-shared.x86_64 10.1.29-1.el7.centos

We upgraded from:
MariaDB-client.x86_64 10.1.15-1.el7.centos
MariaDB-common.x86_64 10.1.15-1.el7.centos
MariaDB-server.x86_64 10.1.15-1.el7.centos
MariaDB-shared.x86_64 10.1.15-1.el7.centos

They are pointed to http://yum.mariadb.org/10.1/centos7-amd64 .

There were no configuration changes made. Only the backups are running during the night.

Comment by Elena Stepanova [ 2018-01-09 ]

I couldn't reproduce the slowdown with 50 databases, 150 tables + 2 events + 2 procedures in each, 100K rows in each table, ~2G each database, ~100G total, same mysqldump command.
If anything, 10.1.29 was a little bit faster:

10.1.15

real    20m41.252s
user    3m29.476s
sys     0m33.788s

10.1.29

real    16m31.284s
user    3m0.912s
sys     0m27.644s

axel, could you please give it a try as we discussed?

I used mostly the provided config file, but had to reduce InnoDB buffer pool size and InnoDB log file size, maybe you'll can use a bigger machine to keep those values intact as well.

Comment by Stijn [ 2018-05-31 ]

We identified a fix for this problem. It seems the issue occurs in mysqldump after a cglib update (version 2.17-157 to version 2.17-196, the same for 2.17-222 which is the latest for CentOS 7 at the moment of testing). This also occurs in MariaDB 10.3.

The mysqldump process is usually in the "D" cpu state. When checking the mysqldump process with gdb, it is stuck most of the time on a "__write_nocancel":
#0 0x00007fd00d4e4cd0 in __write_nocancel () from /lib64/libc.so.6
#1 0x00007fd00d46f0c3 in _IO_new_file_write () from /lib64/libc.so.6
#2 0x00007fd00d4708de in _GI_IO_do_write () from /lib64/libc.so.6
#3 0x00007fd00d46f820 in _GI_IO_file_xsputn () from /lib64/libc.so.6
#4 0x00007fd00d463e5b in fputs () from /lib64/libc.so.6
#5 0x0000557e91fd9f9a in dump_table (table=table@entry=0xXXXXXXXXXXXX "XXX", db=db@entry=0xXXXXXXXXXXXX "XXX") at /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.7/client/mysqldump.c:4103
#6 0x0000557e91fda23b in dump_all_tables_in_db (database=0x************ "***") at /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.7/client/mysqldump.c:4716
#7 0x0000557e91fcdbee in dump_databases (db_names=0x************) at /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.7/client/mysqldump.c:4526
#8 main (argc=1, argv=0x************) at /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.7/client/mysqldump.c:6227

What seems to be the problem is that the stdout buffer for writing to a file from mysqdump with cglib was decreased a lot after the update (maybe to 8KB, I have not found a way to check it on a system, only change it), causing slow dumps if the storage has high latency (nfs, ...).

The fix we use is to put "stdbuf -o 10MB" before the mysqldump command. After this fix, the speed of mysqldump is back to normal.

Comment by Axel Schwenke [ 2018-05-31 ]

Good catch! Can you elaborate on what kind of IO device you are dumping? I wouldn't expect much impact from STDIO buffers, except you use storage that is a) slow and b) unbuffered. An alternative to stdbuf would be using a pipe through buffer:

mysqldump ... | buffer -m ... > dumpfile

Comment by Stijn [ 2018-05-31 ]

We use an nfs mount for storing the mysqldumps with underlying Nimble SAN (which is pretty fast). For a sample database the mysqldump is going at about 22MB/s, but without the fix it is only 5MB/s. I can cp and mv a test database sqldump file (3500MB) to the nfs partition with about 50MB/s, so no big problems there on throughput.

I think the nfs client might be waiting for more data than the standard 8KB until a small timeout before sending the data or something similar.

Comment by Axel Schwenke [ 2021-09-16 ]

This affects an old version of the server (10.1). If this problem persists with an up-to-date version, please open a new ticket.

Generated at Thu Feb 08 08:16:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.