[MDEV-16883] MariaBackup very slow after upgrading from 10.2.14 to 10.3.8 Created: 2018-08-02 Updated: 2018-10-05 Resolved: 2018-10-05 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Backup |
| Affects Version/s: | 10.3.8 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Peter Warmer | Assignee: | Vladislav Vaintroub |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | mariabackup | ||
| Environment: |
Fedora Server 27, Galera cluster with 3 nodes, DB size ~ 1.4TB |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
We have a 3 node Galera cluster and a separate backup server where we run MariaBackup. The backup worked well with version 10.2.14. After upgrading the 4 servers to MariaDB / MariaBackup 10.3.8 the backup is much slower (as you can see in the attached log files). /usr/bin/ssh backup@$IP "/usr/bin/sudo /usr/local/bin/backup-mysql.sh" | mbstream -x -C $BACKUPDIR/$1/$WEEKDAY /usr/bin/mariabackup --host=$IP --user=backup --password=xxxx --use-memory=10G --prepare --target-dir $BACKUPDIR/$1/$WEEKDAY and this is called on the DB server: /usr/bin/mariabackup --backup --use-memory=10G --galera-info --ftwrl-wait-threshold=40 --ftwrl-wait-query-type=all --ftwrl-wait-timeout=180 --kill-long-queries-timeout=20 --kill-long-query-type=all --user=backup --password=xxxx --stream=xbstream --target-dir /var/tmp |
| Comments |
| Comment by Vladislav Vaintroub [ 2018-08-02 ] | ||||||||||||||||||||||
|
There is not a big difference in Mariabackup from 10.2 to 10.3, apart from necessary code changes to make it compilable due to innodb refactoring. (you can compare extra/mariabackup subtrees in the source code) To exclude the external factors, such as networking, can you peterw17 backup the same dataset in 10.2 and 10.3, locally, i,e without --stream, and measure the difference | ||||||||||||||||||||||
| Comment by Peter Warmer [ 2018-08-02 ] | ||||||||||||||||||||||
|
Hi Vladislav, | ||||||||||||||||||||||
| Comment by Peter Warmer [ 2018-08-03 ] | ||||||||||||||||||||||
|
Hi Vladislav, | ||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-08-03 ] | ||||||||||||||||||||||
|
what takes more time as compared to 10.2 --backup. or --prepare? | ||||||||||||||||||||||
| Comment by Peter Warmer [ 2018-08-03 ] | ||||||||||||||||||||||
|
The --backup took much longer, the --prepare not so much. So it's not --use-memory. | ||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-08-03 ] | ||||||||||||||||||||||
|
can you attach the exact command line, server's my.cnf file. | ||||||||||||||||||||||
| Comment by Peter Warmer [ 2018-08-06 ] | ||||||||||||||||||||||
|
There are 3 additional files attached now. The my.cnf of the DB server, backup-DB.sh which runs on the backup server and ssh-starts the backup-mysql.sh script on one of the DB nodes. | ||||||||||||||||||||||
| Comment by Peter Warmer [ 2018-08-08 ] | ||||||||||||||||||||||
|
I switched back to our original backup setup with --stream. And the --prepare part takes much longer with --stream than it did during the local backup I did for testing. So maybe the MariaDB client software is slowing the backup? There is no my.cnf configuration for the client on the backup server and I think there was was none before the update to 10.3 too. Are there chances to improve the client configuration? | ||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-08-08 ] | ||||||||||||||||||||||
|
--stream has nothing to do with mariadb client software. stream is storing output to the stdout, not to a directory. If there is something that slows down things, it is can be anything on your network, because people like to combine stream with netcat etc. | ||||||||||||||||||||||
| Comment by Peter Warmer [ 2018-08-10 ] | ||||||||||||||||||||||
|
So the real problem is the --backup part which takes in 10.3 twice the time than in 10.2 for us. And this is the same with --stream or local backup. We only made the update from 10.2 to 10.3 and didn't change anything else in the configs of the DB, OS and network. Could you find anything suspect in the attached config and log files? | ||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-08-10 ] | ||||||||||||||||||||||
|
To be honest, since there is not enough information I could gather from this bug report. There is no "how to reproduce" info, some speculations about --stream, and "it is slow for me" section.As I mentioned, there is no difference between 10.2 and 10.3. If you would like further help, I think you'd need to add "how to reproduce" section, or collect profiler info. Also, we would need the information on the size of the backup (the data, and the redo log). The current bug description is not very actionable. | ||||||||||||||||||||||
| Comment by Peter Warmer [ 2018-08-13 ] | ||||||||||||||||||||||
|
For the moment I have no additional information for you but one last question. If you compare the beginnings of the attached logs 10_2_14.log and 10_3_8.log there are in 10.3 many more "log scanned up to ..." lines than in 10.2. Is it possible that the size(s) of some buffer(s) should be increased? I tried --use-memory and --innodb-buffer-pool-size, both didn't change this. Do you have any other suggestions for increasings? | ||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-08-13 ] | ||||||||||||||||||||||
|
I gave the suggestion last time , changing the innodb_checksum_algorithm in mariabackup, to innodb and see if it fares better. Because the default has changed, and if you happen to have lot of pages checksummed with old checksum algorithm (which is possible if you upgraded e.g from the 10.1), then "log scanned up to.." is given by the redo log copying thread periodically, as a progress message. you can use --parallel to speed up the backup processing. You can backup when there is not a lot of write activity. I do not have suggestion other than that for the moment. a bug database is not right place for suggestions, you can try mailing list for example, this is better. | ||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-09-11 ] | ||||||||||||||||||||||
|
winstone, could you try to reproduce "very slow" mariabackup in 10.3 vs 10.2? If you do not find anything, feel free to close this one | ||||||||||||||||||||||
| Comment by Zdravelina Sokolovska (Inactive) [ 2018-09-11 ] | ||||||||||||||||||||||
|
peterw17 | ||||||||||||||||||||||
| Comment by Peter Warmer [ 2018-09-12 ] | ||||||||||||||||||||||
|
Hello winstone,
.
.
The wsrep_incoming_addresses fit to "wsrep_cluster_address=gcomm://10.0.50.240,10.0.50.241,10.0.50.242" in my.cnf. Is there any other configuration for the cluster members that I missed? | ||||||||||||||||||||||
| Comment by Zdravelina Sokolovska (Inactive) [ 2018-09-12 ] | ||||||||||||||||||||||
|
peterw17 , | ||||||||||||||||||||||
| Comment by Peter Warmer [ 2018-09-12 ] | ||||||||||||||||||||||
|
Hello winstone, root@dbcluster02:/var/backup/DB/db02#du --summarize Tue/ and the current disk usage of the DB is: root@dbcluster02:/var/backup/DB/db02#du --summarize /var/lib/mysql/ | ||||||||||||||||||||||
| Comment by Stijn [ 2018-09-26 ] | ||||||||||||||||||||||
|
We had a similar issue with mysqldump being 2-3 times slower after the upgrade (issue https://jira.mariadb.org/browse/MDEV-14772 ). The fix was to use "stdbuf -o 10MB" in front of the backup command. | ||||||||||||||||||||||
| Comment by Peter Warmer [ 2018-09-27 ] | ||||||||||||||||||||||
|
Hello Stijn, | ||||||||||||||||||||||
| Comment by Zdravelina Sokolovska (Inactive) [ 2018-10-05 ] | ||||||||||||||||||||||
|
hello stijn, excuse me for delayed feedback. You are operating evidently with increasing data volumes. I intended to try to reproduce with loaded data near to you size,
| ||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2018-10-05 ] | ||||||||||||||||||||||
|
I'm going to close this one, because nobody could localize the problem. Backup time in your case can depend on many factors, such as network load and database load. Check if something will change if you use --innodb_checksum_algorithm=innodb (for mariabackup), it could have an impact since As I mentioned, there is practically no difference in backup from 10.2 to 10.3, so the difference you see will mostly likely be something external. |