|
At the first glance, it seems to work the same way in 10.0 and 10.1. Can it be that your 10.1 uses parallel replication, while 10.0 does not?
With parallel replication, Slave SQL thread (the main one) does not actually replicate events, so it indeed shows the the process time. Or did you mean something different? If so, can you provide an example?
|
|
Hi Elena,
Both 10.0 and 10.1 use parallel replication. Actually I've paid attention to it while using mytop, but that doesn't matter much.
Here is how it looks like in 10.0 (the mytop snippet)
Id User Host/IP DB Time Cmd State Query
|
-- ---- ------- -- ---- --- ----- ----------
|
|
10031878 system us database 40376 Connec Waiting COMMIT
|
10031879 system us database 40376 Connec Waiting COMMIT
|
10031888 system us database 40376 Connec Waiting COMMIT
|
10031891 system us database 40376 Connec Waiting COMMIT
|
10031892 system us database 40376 Connec Waiting COMMIT
|
10031893 system us database 40376 Connec Waiting COMMIT
|
10031895 system us database 40376 Connec Waiting COMMIT
|
|
Here thread's time is the slave replication lag.
in 10.1 is just 0..all the time. I can check replication lag using slave status, but it's very useful to see such behaviour in mytop or 'show processlist' where SQL Thread's time is actually the delay of a slave (that thread).
Thanks!
|
|
Which one of these do you think is the SQL thread? I'd say none, all of them are worker threads. Can you provide a "fuller" snippet, which includes SQL thread, both for 10.0 and 10.1?
Also, when you are talking about the desired status of SQL thread, which do you actually mean - the SQL thread or parallel worker threads?
|
|
Elena, you are absolutely right, my mistake. I've mentioned all the time parallel worker threads... sorry that I've mistaken you...
So the problem is that parallel worker threads don't display the time behind master in TIME field like in 10.0
Thanks!
|
|
It works for me though (see below), that's why I'm asking for a bigger snippet. It's possible that in your example all work has been done by other thread(s) – I doubt you actually have seven worker threads, do you?
|
Initial state, no events yet
|
+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
|
| 3 | root | localhost:37497 | test | Query | 0 | init | show processlist | 0.000 |
|
| 4 | system user | | NULL | Connect | 12 | Waiting for master to send event | NULL | 0.000 |
|
| 5 | system user | | NULL | Connect | 12 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 6 | system user | | NULL | Connect | 12 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 7 | system user | | NULL | Connect | 12 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 8 | system user | | NULL | Connect | 12 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 9 | system user | | NULL | Connect | 11 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL | 0.000 |
|
+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
|
7 rows in set (0.00 sec)
|
|
CREATE TABLE .. SELECT SLEEP was received on 36th second
|
+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------------------------+----------+
|
| 3 | root | localhost:37497 | test | Query | 0 | init | show processlist | 0.000 |
|
| 4 | system user | | NULL | Connect | 36 | Waiting for master to send event | NULL | 0.000 |
|
| 5 | system user | | NULL | Connect | 36 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 6 | system user | | NULL | Connect | 36 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 7 | system user | | NULL | Connect | 36 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 8 | system user | | test | Connect | 1 | User sleep | create table t1 as select sleep(5) | 0.000 |
|
| 9 | system user | | NULL | Connect | 35 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL | 0.000 |
|
+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------------------------+----------+
|
|
+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------------------------+----------+
|
| 3 | root | localhost:37497 | test | Query | 0 | init | show processlist | 0.000 |
|
| 4 | system user | | NULL | Connect | 38 | Waiting for master to send event | NULL | 0.000 |
|
| 5 | system user | | NULL | Connect | 38 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 6 | system user | | NULL | Connect | 38 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 7 | system user | | NULL | Connect | 38 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 8 | system user | | test | Connect | 4 | User sleep | create table t1 as select sleep(5) | 0.000 |
|
| 9 | system user | | NULL | Connect | 38 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL | 0.000 |
|
+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------------------------+----------+
|
|
Since 40th second all threads are idle again, but time for the one that processed the event shows when it happened
|
+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
|
| 3 | root | localhost:37497 | test | Query | 0 | init | show processlist | 0.000 |
|
| 4 | system user | | NULL | Connect | 44 | Waiting for master to send event | NULL | 0.000 |
|
| 5 | system user | | NULL | Connect | 44 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 6 | system user | | NULL | Connect | 44 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 7 | system user | | NULL | Connect | 44 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 8 | system user | | NULL | Connect | 4 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 9 | system user | | NULL | Connect | 44 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL | 0.000 |
|
+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
|
MariaDB [test]> select @@version;
|
+-----------------+
|
| @@version |
|
+-----------------+
|
| 10.1.12-MariaDB |
|
+-----------------+
|
1 row in set (0.00 sec)
|
If it's different for you, apart from output, please also paste your cnf file(s) (or point at them in previous bug reports if they're already there).
Thanks.
|
|
Hi Elena,
I've made another setup/test and the results are still the same:
MariaDB [(none)]> show processlist;
|
+----+-------------+-----------+----------+---------+------+-----------------------------------------------+------------------------------------------------------------------------------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+----+-------------+-----------+----------+---------+------+-----------------------------------------------+------------------------------------------------------------------------------------------------------+----------+
|
| 2 | system user | | NULL | Connect | 81 | Waiting for master to send event | NULL | 0.000 |
|
| 3 | system user | | NULL | Connect | 81 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 4 | system user | | NULL | Connect | 81 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 5 | system user | | NULL | Connect | 81 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 6 | system user | | NULL | Connect | 81 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 7 | system user | | NULL | Connect | 81 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 8 | system user | | NULL | Connect | 0 | after apply log event | NULL | 0.000 |
|
| 9 | system user | | database | Connect | 0 | Unlocking tables | COMMIT | 0.000 |
|
| 10 | system user | | database | Connect | 0 | init | INSERT INTO sometable SET 1='1', 2='2 | 0.000 |
|
| 11 | system user | | database | Connect | 0 | Waiting for table level lock | INSERT INTO sometable SET 1='2', 2='3 | 0.000 |
|
| 12 | system user | | NULL | Connect | 0 | after apply log event | NULL | 0.000 |
|
| 13 | system user | | NULL | Connect | 30 | Waiting for room in worker thread event queue | NULL | 0.000 |
|
| 31 | root | localhost | NULL | Query | 0 | init | show processlist | 0.000 |
|
+----+-------------+-----------+----------+---------+------+-----------------------------------------------+------------------------------------------------------------------------------------------------------+----------+
|
13 rows in set (0.00 sec)
|
MariaDB [(none)]> select @@version;
|
+-----------------+
|
| @@version |
|
+-----------------+
|
| 10.1.12-MariaDB |
|
+-----------------+
|
1 row in set (0.00 sec)
|
The slave is behind master for around 100 seconds during the checks I've made.
The special slave settings:
slave_domain_parallel_threads=5
|
slave_parallel_threads=10
|
slave_parallel_max_queued=524288
|
slave_compressed_protocol=1
|
Another thing that might be important and I've missed (appology) to previously update you - master is mariadb 10.0
10.0.23 to be exact with binlog_format=STATEMENT and binlog_commit_wait_count=50
That's the only thought I have of why your results are different than mine.
I've only started with 10.1 and some employees that have access to slaves use 'show processlist' to get an idea about delay because 'show slave status' requires special permissions I can't give.
Thanks!
Alex
OT/P.S.:
BTW I can't stop 10.1 service without prior stopping all replication processes. Something similar solved here: https://jira.mariadb.org/browse/MDEV-9595
However, my setup is simple - just 1 replication channel with just 5 parallel replication worker threads. And also log looks different. But I still have to stop replication and only then can stop the server itself.
Otherwise I have to kill mysqld process. I'll recheck that again with the 10.1.13 release...
It looks like this (CentOS7)
Main PID: 29531 (mysqld)
|
Status: "Shutdown in progress"
|
CGroup: /system.slice/mariadb.service
|
└─29531 /usr/sbin/mysqld
|
...
|
Mar 14 20:39:59 DB01 mysqld[29531]: 2016-03-14 20:39:59 139767204940544 [Note] Master 'somedb': Slave I/O thread exiting, read up to log 'mysql-bin.060602', position 21453783
|
Mar 14 20:41:29 DB01 systemd[1]: mariadb.service stop-sigterm timed out. Skipping SIGKILL.
|
Mar 14 20:42:59 DB01 systemd[1]: mariadb.service stop-final-sigterm timed out. Skipping SIGKILL. Entering failed mode.
|
Mar 14 20:42:59 DB01 systemd[1]: Stopped MariaDB database server.
|
Mar 14 20:42:59 DB01 systemd[1]: Unit mariadb.service entered failed state.
|
Mar 14 20:42:59 DB01 systemd[1]: mariadb.service failed.
|
|
|
ShivaS,
Regarding the original problem:
Can you provide an example of how it works for you on 10.0 and on 10.1 without parallel slave?
I see that I might have misunderstood your complaint before, but I'm not certain I understand it right now, because a single snapshot without a progress leaves room for different interpretation.
But if I do understand your complaint correctly, it should work the same way on 10.0 and on a non-parallel 10.1.
Thanks
|
|
Hi Elena,
I've tested that without parallel slave, and the result is the same.
If 10.1 replicates from 10.0, it doesn't report correct time for the slave thread
Have a look at my quick tests:
10.1 <- 10.0
delay 20 sec and increasing (since parallel replication is off)
No info, unless I check it with slave status commands.
1st processlist
+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
|
| 52977 | system user | | NULL | Connect | 64 | Waiting for master to send event | NULL | 0.000 |
|
| 52978 | system user | | database | Connect | 0 | freeing items | COMMIT | 0.000 |
|
| 52979 | system user | | NULL | Connect | 64 | Waiting for master to send event | NULL | 0.000 |
|
| 52980 | system user | | database | Connect | 0 | init | INSERT INTO table SET a='b',b='c | 0.000 |
|
| 52985 | root | localhost | NULL | Query | 0 | init | show processlist | 0.000 |
|
+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
|
|
|
2nd processlist (in a few seconds)
+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
|
| 52977 | system user | | NULL | Connect | 80 | Waiting for master to send event | NULL | 0.000 |
|
| 52978 | system user | | database | Connect | 0 | init | INSERT INTO table SET a='b',b='c | 0.000 |
|
| 52979 | system user | | NULL | Connect | 80 | Waiting for master to send event | NULL | 0.000 |
|
| 52980 | system user | | database | Connect | 0 | NULL | COMMIT | 0.000 |
|
| 52985 | root | localhost | NULL | Query | 0 | init | show processlist | 0.000 |
|
+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
|
|
And here is how I expect it to look like
10.0 <- 10.0
relevant threads have real delays (like 288 and 4433 seconds)
1st processlist
+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
|
| 8176 | replication | 1.1.1.1:1 | NULL | Binlog Dump | 244157 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL | 0.000 |
|
| 56211 | system user | | NULL | Connect | 10 | Waiting for master to send event | NULL | 0.000 |
|
| 56212 | system user | | NULL | Connect | 92 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL | 0.000 |
|
| 56213 | system user | | NULL | Connect | 10 | Queueing master event to the relay log | NULL | 0.000 |
|
| 56214 | system user | | database | Connect | 288 | init | INSERT INTO table SET a='b',b='c | 0.000 |
|
| 56215 | system user | | NULL | Connect | 10 | Waiting for master to send event | NULL | 0.000 |
|
| 56216 | system user | | NULL | Connect | 93 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL | 0.000 |
|
| 56217 | system user | | NULL | Connect | 10 | Queueing master event to the relay log | NULL | 0.000 |
|
| 56218 | system user | | NULL | Connect | 4433 | freeing items | NULL | 0.000 |
|
| 56220 | root | localhost | NULL | Query | 0 | init | show processlist | 0.000 |
|
+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
|
2nd processlist (in a few seconds)
+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
|
| 8176 | replication | 1.1.1.1:1 | NULL | Binlog Dump | 244184 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL | 0.000 |
|
| 56211 | system user | | NULL | Connect | 37 | Waiting for master to send event | NULL | 0.000 |
|
| 56212 | system user | | NULL | Connect | 93 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL | 0.000 |
|
| 56213 | system user | | NULL | Connect | 37 | Waiting for master to send event | NULL | 0.000 |
|
| 56214 | system user | | database | Connect | 300 | init | INSERT INTO table SET a='b',b='c | 0.000 |
|
| 56215 | system user | | NULL | Connect | 37 | Waiting for master to send event | NULL | 0.000 |
|
| 56216 | system user | | NULL | Connect | 93 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL | 0.000 |
|
| 56217 | system user | | NULL | Connect | 37 | Waiting for master to send event | NULL | 0.000 |
|
| 56218 | system user | | database | Connect | 4449 | System lock | INSERT INTO table SET x='y',y='z | 0.000 |
|
| 56228 | root | localhost | NULL | Query | 0 | init | show processlist | 0.000 |
|
+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
|
|
|
|
|
|
Hi Elena,
Just switched master servers to 10.1 and have the same problem with slaves.
Seems like it doesn't depend on master OS/version, it's solely 10.1 problem.
Thanks
|