[MDEV-6155] Upgraded servers from 5.5 to 10 and replication lag keeps growing Created: 2014-04-23  Updated: 2014-05-26  Due: 2014-05-23  Resolved: 2014-05-26

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.10
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Antonio Fernandes Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: replication
Environment:

CentOS x86_64 (master) and CentOS i386 (slave)



 Description   

I've used MariaDB 5.5 for month with replication without a hiccup. As the upgrade plan sees, I've upgraded the slave to 10 and then the master to 10 also and from that moment I'm seeing constantly increasing slave lag...

From iotop shows

  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
  337 be/3 root        0.00 B/s    3.89 K/s  0.00 % 89.96 % [jbd2/md1-8]
 3313 be/4 mysql      11.66 K/s  116.61 K/s  0.00 %  5.59 % mysqld --~-port=3306

and jbd2 clogs completely IO... witch didn't happen...

Checking process list, and refreshing it constantly, I see constantly Table lock in COMMIT phase...

+----+-----------------+-----------------+---------+---------+--------+----------------------------------+------------------+----------+
| Id | User            | Host            | db      | Command | Time   | State                            | Info             | Progress |
+----+-----------------+-----------------+---------+---------+--------+----------------------------------+------------------+----------+
|  1 | event_scheduler | localhost       | NULL    | Daemon  |   3255 | Waiting on empty queue           | NULL             |    0.000 |
| 10 | system user     |                 | NULL    | Connect |   1991 | Waiting for master to send event | NULL             |    0.000 |
| 11 | system user     |                 | MCPC_SI | Connect | 127178 | Table lock                       | COMMIT           |    0.000 |
| 15 | root            | localhost       | NULL    | Sleep   |      0 |                                  | NULL             |    0.000 |
| 16 | root            | localhost:41879 | MCPC_SI | Sleep   |     15 |                                  | NULL             |    0.000 |
| 17 | root            | localhost       | NULL    | Query   |      0 | init                             | SHOW PROCESSLIST |    0.000 |
+----+-----------------+-----------------+---------+---------+--------+----------------------------------+------------------+----------+

I've changed from innodb to myisam (it could be a storage engine bottleneck) but without any changes...

Any clues on what might be (besides the change from 5.5 to 10)?

Thank you,
António Fernandes



 Comments   
Comment by Elena Stepanova [ 2014-04-23 ]

Hi,

When you said

I've changed from innodb to myisam (it could be a storage engine bottleneck) but without any changes...

did you mean that you converted all your existing tables from InnoDB to MyISAM (ran ALTER TABLE .. ENGINE=MyISAM), and still get the same IO problem?
It's important to know, because there were lots of changes in InnoDB between 5.5 and 5.6/10.0, IO-related included, so we need to know if we can safely ignore them while looking into it.

Also, did you try to stop mysqld and check that when it's not running, jbd2 doesn't clog the IO? I agree it would be a questionable coincidence that it started doing so exactly when you upgraded from 5.5 to 10.0, but coincidences do happen, so it would be good to rule it out.

Comment by Antonio Fernandes [ 2014-04-23 ]

Hi Elena,

Just converted some tables. With innotop I've checked that some specific tables were constantly locked. And changed only those (about 3 or 4).

As for stopping mysqld, the minute I execute STOP SLAVE, the IO goes to "normal" (and also jbd2). I also tried remounting the filesystem (ext4) with noatime and that didn't change visibly the replication performance.

I can now add some more information based on continuous digging:

pt-ioprofile
Wed Apr 23 15:25:48 WEST 2014
Tracing process ID 1925
     total       read    pread64   pwrite64      write      fsync    _llseek filename
 28.534776   0.000000   0.000000   0.029010   0.000000  28.505766   0.000000 /var/lib/mysql/ib_logfile1
  0.030686   0.000000   0.000000   0.000000   0.019066   0.000000   0.011620 /var/lib/mysql/relay-log.info
  0.021215   0.008409   0.004655   0.000000   0.000000   0.000000   0.008151 /var/lib/mysql/MCPC_Marcas/tbl_produtos#P#p6.MYD
  0.018563   0.003063   0.001420   0.011352   0.000000   0.000000   0.002728 /var/lib/mysql/MCPC_SI/tbl_clientes.MYD
  0.008746   0.000000   0.000000   0.000000   0.005074   0.000000   0.003672 /var/lib/mysql/master.info
  0.005730   0.000000   0.000000   0.000000   0.005730   0.000000   0.000000 /var/lib/mysql/lpvs2-relay-bin.000031
  0.004688   0.002383   0.000745   0.000000   0.000000   0.000000   0.001560 /var/lib/mysql/MCPC_SI/tbl_clientes_empresas.MYD
  0.002076   0.000000   0.000023   0.002053   0.000000   0.000000   0.000000 /var/lib/mysql/MCPC_SI/tbl_clientes.MYI
  0.001146   0.000000   0.000000   0.001146   0.000000   0.000000   0.000000 /var/lib/mysql/MCPC_SI/tbl_clientes_empresas.MYI
  0.000899   0.000899   0.000000   0.000000   0.000000   0.000000   0.000000 /var/lib/mysql/lpvs2-relay-bin.000011

I have a bunch of scripts that imported data from flat files to MariaDB. They do it line by line with autocommit ON (by default). I've changed those scripts to use transactions in subsets of lines and the loading occurs way faster...

I'm suspecting that line by line inserts (with commit also line by line) have a worst performance than 5.5. Since the slave has older/worst hardware, it lags... as the master keeps eating data as normal...

Best regards,
António Fernandes

Comment by Elena Stepanova [ 2014-04-23 ]

They do it line by line with autocommit ON (by default). I've changed those scripts to use transactions in subsets of lines and the loading occurs way faster...

I'm suspecting that line by line inserts (with commit also line by line) have a worst performance than 5.5

Indeed, there were many changes in InnoDB which dealt with IO between 5.5 and 5.6/10.0, and since the transactional approach is in favor, the single-statement transactions might require more tuning. What's weird though is that you observed it on MyISAM tables – it shouldn't matter for them whether you do inserts as big transactions or single statements. Maybe it so happened that you converted different tables though, not those which were inserted into.

Given that we are still talking about InnoDB, you could also consider adjusting innodb_flush_log_at_trx_commit ( https://mariadb.com/kb/en/xtradbinnodb-server-system-variables/#innodb_flush_log_at_trx_commit ). The default value 1 is the safest, but since you are tuning a slave, you might be able to afford using a different value. Although, if you have already switched to big transactions, it probably won't make much difference.

There are other ways to tune InnoDB-related IO, see http://dev.mysql.com/doc/refman/5.6/en/optimizing-innodb-diskio.html .

Comment by Antonio Fernandes [ 2014-04-23 ]

Hi,

Change innodb_flush_log_at_trx_commit to 0 did in fact reduce IO (but it can add inconsistency between slave and master in the event of a crash):

Total DISK READ: 453.38 K/s | Total DISK WRITE: 782.76 K/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
  337 be/3 root        0.00 B/s    0.00 B/s  0.00 % 22.65 % [jbd2/md1-8]
 1972 be/4 mysql       0.00 B/s  616.13 K/s  0.00 %  0.61 % mysqld --~-port=3306
 1970 be/4 mysql       0.00 B/s   89.13 K/s  0.00 %  0.12 % mysqld --~-port=3306
 1938 be/4 mysql       0.00 B/s    0.00 B/s  0.00 %  0.09 % mysqld --~-port=3306
 1939 be/4 mysql       0.00 B/s    0.00 B/s  0.00 %  0.04 % mysqld --~-port=3306
 1980 be/4 mysql     449.50 K/s    0.00 B/s  0.00 %  0.00 % mysqld --~-port=3306

Now it's CPU bounded... but that's OK since it's catching up.

As for InnoDB vs MyISAM, i'll continue my tests and post then here.

Best regards,
António Fernandes

Comment by Antonio Fernandes [ 2014-04-28 ]

Hi,

I'm starting to monitor the 2 servers with percona-cacti-templates (never had the need ). Started today with our master server. If all goes well, tomorrow I'll roll out to the problematic slave.

BTW is there any procedure to make MyISAM RW bechmark between 5.5 and 10?

Best regards,
Antonio

Comment by Elena Stepanova [ 2014-04-28 ]

I'm not sure what you mean by a procedure.
If you need a benchmarking tool, sysbench is widely used in MySQL world, although there are others. Alternatively, especially if the existing tools don't represent your workflow well enough, you can create and run your own flow and run it against 5.5 and 10.0.

Comment by Antonio Fernandes [ 2014-05-15 ]

Hello Elena,

I couldn't find time to really test this thoroughly

Do you know if anyone measured single-threaded table lock/unlock performance for MyISAM between 5.5 and 10?

If not, you could close the ticket because I did workaround the problem (don't replicate 6.2 million MyISAM records that are updated every night).

Regards,
António

Comment by Elena Stepanova [ 2014-05-26 ]

I haven't heard of this specific benchmark being done.

As discussed above, closing for now. If you have more information that needs to be looked at, please comment to re-open the report.

Generated at Thu Feb 08 07:09:46 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.