[MDEV-14339] Different Galera performance behavior regarding to flushing between MySQL and MariaDB Created: 2017-11-09  Updated: 2020-08-25  Resolved: 2019-05-23

Status: Closed
Project: MariaDB Server
Component/s: Galera, Replication, Storage Engine - InnoDB
Affects Version/s: 10.1.17, 10.1, 10.2.8, 10.2.13
Fix Version/s: 10.4.4

Type: Bug Priority: Critical
Reporter: Hartmut Holzgraefe Assignee: Jan Lindström (Inactive)
Resolution: Duplicate Votes: 1
Labels: None

Attachments: Text File mariadb-10.1.17-bench1.txt     Text File mariadb-10.1.17-bench2.txt     Text File mariadb-10.2.9-bench1.txt     Text File mariadb-10.2.9-bench2.txt     File mariadb-10.3.11-1.log     File mariadb-10.3.11-2.log     File mariadb-10.3.11-3.log     Text File mysql-5.7.18-bench1.txt     Text File mysql-5.7.18-bench2.txt    
Issue Links:
Duplicate
duplicates MDEV-16509 MariaDB 10.1+ with wsrep_on=ON perfor... Closed
Relates
relates to MDEV-11376 AliSQL: [Feature] Issue#11 REDO LOG G... Stalled
relates to MDEV-11937 InnoDB flushes redo log too often Closed
relates to MDEV-16442 Document tc.log usage by default and ... Closed

 Description   

When doing sysbench tests against standalone mysqld vs. single node Galera instance vs. two node cluster performance differs between MySQL and MariaDB setups.

With both MySQL and MariaDB standaline non-Galera nodes show best transactions per second throughput when not having to do any commit time fsync, as expected. Meaning that things are fast with innodb_flush_at_trx_commit != 1 and log_bin disabled or sync_binlog=OFF

A single node MySQL Galera setup shows almost the same numbers, is fast in the same config combinations.

A MariaDB Galera 10.x cluster, with 0 <= x <= 2 at least, shows a slightly different behavior though. Here only the combination "innodb_flush_at_trx_commit != 1, log_bin enabled, sync_binlog=OFF" is almost as fast as a standalone non-galera-enabled instance with the same "no fsync" settings. When disabling log_bin transactions per second go down quite a bit, to similar values as if innodb_flush_at_trx_commit was actually enabled.

I first suspected that group_commit might play a role, but Binlog_commits and Binlog_group_commits counters were always showing equal values in SHOW STATUS output, so this can be ruled out.

When running sysbench against a 2 node cluster performance differences were not as massive as in the single node setup, which was to be expected, but still visible.

So while turning off innodb_flush_at_trx_commit; and sync_binlog if log_bin is enabled, is usually considered safe in a cluster (if the hardware setup is redundant enough, e.g. simultaneous power failure on all nodes can be ruled out), and to improve performance/throughput a little bit, it turns out that with MariaDB the performance improvement is only visible if binlog writing is enabled, which is at least counter-intuitive.



 Comments   
Comment by Hartmut Holzgraefe [ 2017-11-09 ]

Actual benchmark numbers to follow in a bit, I unfortunately created a bit of a mess with my recorded files that I need to sort out first. Or maybe I'll even actually re-run all tests to be sure that there are no mixups left ...

Comment by Hartmut Holzgraefe [ 2018-02-08 ]

Finally, numbers. Unfortunately I ran into a lot of unrelated problems when trying to reproduce this reliably. I'm still not always getting clear numbers like listed below in all cases, but the general trend is now clear:

I'm comparing MariaDB 10.2.9 and MySQL 5.7.18 here as these were the current versions when I originally started testing.

Numbers are produced by setting up three different server layouts:

  • standalone without galera
  • galera enabled, but only a single node configured
  • galera with two nodes

Numbers are generated with Sysbench 1.0, running on the second node
against mysqld on the first.

sysbench oltp_read_write --db-driver=mysql --mysql-host=... --mysql-user=sysbench --mysql-password='' --table_size=100000 --tables=10 --events=1000000 --threads=16 [prepare|run]

Sysbench runs against variations of:

  • setup: no galera, single node, dual node
  • disk type: ssd or hdd (only for the benchmarked node, the other node in dual node setup always has SSD and syncs disabled)
  • log-bin on or off (column 'log' in output below)
  • sync-binlog on or of (column 'syn')
  • innodb-flush-at-trx-commit 0 or 1 (column 'trx')

This leads to 3*2*2*2*2 = 48 variants tested for each server version. Full data for MySQL 5.7.18, MariaDB 10.2.9 and MariaDB 10.1.17 will be attached, for now lets look at the HDD values for MySQL 5.7 and MariaDB 10.2 only. Rows are ordered by transactions per second (tps) result from sysbench, so order may differ between mysqld versions tested.

First testing standalone:

== hdd, standalone ==
 
mariadb-10.2.9               mysql-5.7.18
 
log syn trx      tps         log syn trx      tps
 0   0   0      279.37        0   0   0      247.38
 0   1   0      266.57        1   0   0      226.05
 1   0   0      238.91        0   1   0      220.14
 1   0   1       89.17        0   1   1       78.08
 0   1   1       72.91        0   0   1       76.03
 0   0   1       71.10        1   0   1       74.57
 1   1   0       68.60        1   1   0       61.89
 1   1   1       46.85        1   1   1       51.09

We see that results are roughly similar, things are substantially faster when no syncs happen (trx=0 and syn=0 or log=0 so that syn doesn't matter)

Now looking at single node Galera setup:

== hdd, single node ==
 
mariadb-10.2.9              mysql-5.7.18
 
log syn trx      tps        log syn trx      tps
 1   0   0      314.59       0   1   0      241.61
 0   1   0       31.83       0   0   0      228.04
 0   0   0       29.00       1   0   0      200.84
 1   1   0       16.98       1   1   0       15.78
 1   0   1       15.81       1   0   1       14.77
 0   1   1       11.69       1   1   1        8.34
 0   0   1       11.22       0   0   1        8.04
 1   1   1        8.50       0   1   1        7.70

For MySQL (right column) the "no syncing" results are similar to "no galera", but when syncs come into play things get much slower than in standalone.

On the MariaDB side on the other hand the number for "log-bin, but no syncs on commit" is even a bit higher than without galera, but the other two "no sync" cases get much slower (though still about twice as fast as with sync).

Similar effects can be seen on a dual node cluster:

=== hdd, 2 nodes ==
 
mariadb-10.2.9              mysql-5.7.18
 
log syn trx      tps        log syn trx      tps
 1   0   0      157.27       1   0   0      221.49
 0   1   0       27.89       0   1   0      215.77
 0   0   0       27.19       0   0   0      185.29
 1   1   0       16.67       1   1   0       16.5
 1   0   1       16.45       1   0   1       14.30
 0   1   1       11.34       1   1   1        8.30
 0   0   1       11.12       0   0   1        7.82
 1   1   1        9.05       0   1   1        7.67

Comment by Hartmut Holzgraefe [ 2018-02-08 ]

For each server versions "sysbench run" was executed twice after "sysbench prepare", "bench1" and "bench2" files contain results for first and second run.

Comment by Hartmut Holzgraefe [ 2018-02-08 ]

Galera versions:

  • MySQL 5.7.18 -> wsrep_25.12 Galera 3.21(r8678538)
  • MariaDB 10.2.8 -> wsrep_25.20 Galera 25.3.20(r3703)
  • MariaDB 10.1.17 -> wsrep_25.16 Galera 25.3.17(r3619)
Comment by Andrei Elkin [ 2018-03-08 ]

The Ali ticket aims at closely related issue.

Comment by Andrei Elkin [ 2018-03-08 ]

Non-galera specific case MDEV-11937 can provide some hints.

Comment by Andrei Elkin [ 2018-11-09 ]

I checked the up-to-date 10.1.37 sources code trying to find any apparent suspect for performance degradation when the single galera node changes from ---log-bin=ON to OFF as well as tried verifying benchmarks results.

I could not find any apparent glitch in the sources. While the binlog grouping obviously can't work in --log-bin=OFF
disabled/skipped binlog the lack of grouping in innodb_flush_log_at_trx_commit = 0 should not be critical. The original benchmarking confirms that - see the two last rows of the zero nodes:

log syn trx      tps
 1   0   0      578.71
 0   0   0      574.69

In my benchmarking of 1 node the two numbers are

35
40

I used mysqlslap with 10 connections loading the server mostly (over 90%) with write operations. I did not run it long time, being content with 60 seconds thinking that
the previously detected 10 times slowness would show itself. But it did not.

PS. I am not whether it anyhow mattered in original benchmarking, in my case I spotted that
I did not need any causal consitency so I turned off this

set @@global.wsrep_causal_reads=0;

which made few of my application's selects run at normal speed.

I think it makes sense to measure the figures once again using the original method,
until that I set the bug status Can-not-repeat.

In case it will still be reproduced, I will need to know exact configuration (my.cnf, the single node start script - I use {{ mtr galera.test --start }} for that) as well as the benchmarking
script.

Comment by Hartmut Holzgraefe [ 2018-11-28 ]

Could not reproduce it with the backup of my original test setup (may not have recovered the actual final backup of it).

With manual testing I now see it again though.

Platform: Ubuntu 18.04.1, datadir on SSD

Versions tried: 10.2.8, 10.2.19, 10.3.11

Basic single node configuration:

[mysqld]
datadir=/usr/local/mysql/var
server-id=1
wsrep_on=ON
wsrep_provider=/usr/local/mariadb-10.3.11-linux-systemd-x86_64/lib/galera/libgalera_smm.so
wsrep_cluster_name=test_cluster
wsrep_cluster_address=gcomm://192.168.23.15
wsrep_sst_method=rsync
wsrep_sst_auth=root
wsrep_node_address=192.168.23.15
wsrep_node_name=node-1
binlog-format=ROW

With just extra

[mysqld]
log-bin

using defaults for sync-binlog and innodb-flush-log-at-trx-commit I get about 90tps on all three versions.

With extra

[mysqld]
log-bin
sync-binlog=0
innodb-flush-log-at-trx-commit=0

I see a speedup to about 1000tps.

When I disable the binlog:

[mysqld]
# log-bin
sync-binlog=0
innodb-flush-log-at-trx-commit=0

I get about 250-300tps.

For running sysbench I use

mysql -h 192.168.23.15 -u root -e "drop database if exists sbtest"
mysql -h 192.168.23.15 -u root -e "create database sbtest"
/usr/bin/sysbench oltp_read_write --db-driver=mysql --mysql-host=192.168.23.15 --mysql-user=root --table_size=100000 --tables=10 --events=1000000 --threads=16 prepare
/usr/bin/sysbench oltp_read_write --db-driver=mysql --mysql-host=192.168.23.15 --mysql-user=root --table_size=100000 --tables=10 --events=1000000 --threads=16 run
/usr/bin/sysbench oltp_read_write --db-driver=mysql --mysql-host=192.168.23.15 --mysql-user=root --table_size=100000 --tables=10 --events=1000000 --threads=16 run

I'm going to attach the sysbench output for 10.3.11

Comment by Hartmut Holzgraefe [ 2018-11-28 ]

mariadb-10.3.11-1.log – log-bin enabled, sync/flush off, ~1000tps
mariadb-10.3.11-2.log – log-bin disabled, sync/flush off, ~300tps
mariadb-10.3.11-3.log – log-bin enabled, sync/flush on, ~100tps

Comment by Hartmut Holzgraefe [ 2018-11-28 ]

See also https://jira.mariadb.org/browse/MDEV-16509

Comment by Ranjan Ghosh [ 2018-12-02 ]

Hm. I cannot see any difference. I get 3000 tps consistently with the script mentioned above. BTW: Isn't sync_binlog=0 the default anyway?

Comment by Seppo Jaakola [ 2019-05-23 ]

The fix for MDEV-16509 improved the performance of non binlogging use case (log_bin==OFF). And it looks like MDEV-16509 is a duplicate of this issue, so imo this issue could be closed.

Generated at Thu Feb 08 08:12:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.