[MDEV-9274] I/O performance regression Created: 2015-12-13  Updated: 2021-09-16  Resolved: 2021-09-16

Status: Closed
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.0.22
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: azurit Assignee: Axel Schwenke
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

Debian linux, 64bit


Attachments: PNG File downgrade-5.5.png     File my.cnf     PNG File upgrade-10.0.png    

 Description   

After upgrading to version 10.0.22, the write performance is MUCH lower than on version 5.5 - attaching CPU graphs (upgrade was at 6:18, downgrade at 21:13) and my.cnf. I'm using two 2 TB drives connected in linux SW RAID1.



 Comments   
Comment by Elena Stepanova [ 2015-12-16 ]

axel, could you please look into this?

Comment by Daniel Black [ 2015-12-17 ]

azurit Could this be that the 10.0 is doing a common query using a completely different plan? Tune your slow query logs (long_query_time/ log_slow_verbosity/min_exmained_row_limit=(2K?)) to try to collect the right information. (note mariadb-10.0+ has log_slow_verbosity=explain which will make it easier to compare query plans).

If that isn't the case then getting a breakdown using perf (linux-base package) will aid diagnosis. This records code paths and doesn't include data.

Recording data for a representative sample time (up to a minute - maybe 2) for 5.5 and 10.0 This shouldn't be significantly impacting.

perf record -g -o {file} -p {mysqlpid}

To see its contents:

perf report --no-children --showcpuutilization -i  {file}

Uploading the perf file will help diagnose this.

As downgrading is a bit risky so keeping a 10.0 as a STATEMENT based replication slave may help diagnose this if the IO is caused by data modification statements.

Comment by azurit [ 2016-01-28 ]

I created a 10.0 slave to my 5.5 master (which was previously that 10.0 where i observed the problem with IO) but everything is running ok on the slave (i used MIXED binary log format as STATEMENT was generating lots of warnings). The only difference is that slave is using ext3 filesystem while master has ext4 - the problem is probably related to ext4 (i have also other servers where i'm running 10.0 on ext3 without problems). Unfortunately i don't have any spare server with ext4 so i cannot try replication. The only way i can see now is to, again, upgrade the server where problem was observed, record needed data and downgrade it back to 5.5. What do you mean by 'downgrading is a bit risky'?

Comment by azurit [ 2016-05-01 ]

I'm observing the same problem on another two servers - upgrade to 10.0 caused HUGE i/o load which was, in some cases, taking down the whole server. Downgrade back to 5.5 resolved the problem. Interesting is, that i cannot find what these servers has in common:
server6 - Debian 8 Jessie, 64bit, ext3, linux sw RAID, custom kernel 3.2 + grsecurity
server7 - Debian 8 Jessie, 64bit, ext4, linux sw RAID + lvm, custom kernel 3.2 + grsecurity
server8 - Debian 7 Wheezy, 64bit, ext4, linux sw RAID, custom kernel 3.2 + grsecurity

also server which is running 10.0 without any problems:
server2 - Debian 7 Wheezy, ext3, HW RAID, custom kernel 3.2 + grsecurity

Maybe the combination of linux software RAID and 10.0?

Comment by azurit [ 2016-10-01 ]

I would like to get back to this issue. Just tried to upgrade MariaDB to 10.1 and the same problem occured. The big IO is, probably, caused by jbd2 process, which is on top in iotop output all the time (this wasn't happening with MariaDB 5.5). I'm able to 'fix' this by disabling filesystem barriers (mount -o barrier=0,remount /var). The jbd2 process is still on the top but doing very little io. Any suggestions?

For me, it looks like that MariaDB > 5.5 is doing lot's of little writes to HDD, which triggers jbd2 process really very often and causing high IO load.

Comment by azurit [ 2016-10-02 ]

I was also able to 'fix' it with barrier=1 on FS (like before with 5.5) and innodb_flush_log_at_trx_commit=2 or innodb_flush_log_at_trx_commit=0 (no difference in IO load). Was there any change in innodb log flusing between 5.5 and 10.0?

Comment by Axel Schwenke [ 2021-09-16 ]

This affects an old version of the server (10.0). If this problem persists with an up-to-date version, please open a new ticket.

Generated at Thu Feb 08 07:33:27 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.