[MDEV-9720] wrong TIME field value for SQL Thread in processlist Created: 2016-03-13  Updated: 2020-10-20

Status: Open
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.1.12
Fix Version/s: 10.1

Type: Bug Priority: Major
Reporter: Alex Assignee: Andrei Elkin
Resolution: Unresolved Votes: 0
Labels: None


 Description   

Hello,
The TIME field in processlist for a SQL Thread should display the "time in seconds between the last replicated event's timestamp and the slave machine's real time" (from official doc https://mariadb.com/kb/en/mariadb/show-processlist/).
Or in simple words display/calculate "seconds behind master" value.

Instead it seems to display the regular process TIME value (time the process spent in a certain state), which is usually 0 in replication.

It's ok in 5.5 and 10.0.

Thanks,
Alex



 Comments   
Comment by Elena Stepanova [ 2016-03-14 ]

At the first glance, it seems to work the same way in 10.0 and 10.1. Can it be that your 10.1 uses parallel replication, while 10.0 does not?

With parallel replication, Slave SQL thread (the main one) does not actually replicate events, so it indeed shows the the process time. Or did you mean something different? If so, can you provide an example?

Comment by Alex [ 2016-03-14 ]

Hi Elena,
Both 10.0 and 10.1 use parallel replication. Actually I've paid attention to it while using mytop, but that doesn't matter much.

Here is how it looks like in 10.0 (the mytop snippet)

      Id      User         Host/IP         DB       Time    Cmd    State Query
       --      ----         -------         --       ----    ---    ----- ----------
 
10031878 system us                   database      40376 Connec  Waiting COMMIT
 10031879 system us                   database      40376 Connec  Waiting COMMIT
 10031888 system us                   database      40376 Connec  Waiting COMMIT
 10031891 system us                   database      40376 Connec  Waiting COMMIT
 10031892 system us                   database      40376 Connec  Waiting COMMIT
 10031893 system us                   database      40376 Connec  Waiting COMMIT
 10031895 system us                   database      40376 Connec  Waiting COMMIT

Here thread's time is the slave replication lag.
in 10.1 is just 0..all the time. I can check replication lag using slave status, but it's very useful to see such behaviour in mytop or 'show processlist' where SQL Thread's time is actually the delay of a slave (that thread).

Thanks!

Comment by Elena Stepanova [ 2016-03-14 ]

Which one of these do you think is the SQL thread? I'd say none, all of them are worker threads. Can you provide a "fuller" snippet, which includes SQL thread, both for 10.0 and 10.1?
Also, when you are talking about the desired status of SQL thread, which do you actually mean - the SQL thread or parallel worker threads?

Comment by Alex [ 2016-03-14 ]

Elena, you are absolutely right, my mistake. I've mentioned all the time parallel worker threads... sorry that I've mistaken you...

So the problem is that parallel worker threads don't display the time behind master in TIME field like in 10.0

Thanks!

Comment by Elena Stepanova [ 2016-03-14 ]

It works for me though (see below), that's why I'm asking for a bigger snippet. It's possible that in your example all work has been done by other thread(s) – I doubt you actually have seven worker threads, do you?

Initial state, no events yet

+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
| Id | User        | Host            | db   | Command | Time | State                                                                       | Info             | Progress |
+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
|  3 | root        | localhost:37497 | test | Query   |    0 | init                                                                        | show processlist |    0.000 |
|  4 | system user |                 | NULL | Connect |   12 | Waiting for master to send event                                            | NULL             |    0.000 |
|  5 | system user |                 | NULL | Connect |   12 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
|  6 | system user |                 | NULL | Connect |   12 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
|  7 | system user |                 | NULL | Connect |   12 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
|  8 | system user |                 | NULL | Connect |   12 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
|  9 | system user |                 | NULL | Connect |   11 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL             |    0.000 |
+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
7 rows in set (0.00 sec)

CREATE TABLE .. SELECT SLEEP was received on 36th second

+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------------------------+----------+
| Id | User        | Host            | db   | Command | Time | State                                                                       | Info                               | Progress |
+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------------------------+----------+
|  3 | root        | localhost:37497 | test | Query   |    0 | init                                                                        | show processlist                   |    0.000 |
|  4 | system user |                 | NULL | Connect |   36 | Waiting for master to send event                                            | NULL                               |    0.000 |
|  5 | system user |                 | NULL | Connect |   36 | Waiting for work from SQL thread                                            | NULL                               |    0.000 |
|  6 | system user |                 | NULL | Connect |   36 | Waiting for work from SQL thread                                            | NULL                               |    0.000 |
|  7 | system user |                 | NULL | Connect |   36 | Waiting for work from SQL thread                                            | NULL                               |    0.000 |
|  8 | system user |                 | test | Connect |    1 | User sleep                                                                  | create table t1 as select sleep(5) |    0.000 |
|  9 | system user |                 | NULL | Connect |   35 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL                               |    0.000 |
+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------------------------+----------+
 
+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------------------------+----------+
| Id | User        | Host            | db   | Command | Time | State                                                                       | Info                               | Progress |
+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------------------------+----------+
|  3 | root        | localhost:37497 | test | Query   |    0 | init                                                                        | show processlist                   |    0.000 |
|  4 | system user |                 | NULL | Connect |   38 | Waiting for master to send event                                            | NULL                               |    0.000 |
|  5 | system user |                 | NULL | Connect |   38 | Waiting for work from SQL thread                                            | NULL                               |    0.000 |
|  6 | system user |                 | NULL | Connect |   38 | Waiting for work from SQL thread                                            | NULL                               |    0.000 |
|  7 | system user |                 | NULL | Connect |   38 | Waiting for work from SQL thread                                            | NULL                               |    0.000 |
|  8 | system user |                 | test | Connect |    4 | User sleep                                                                  | create table t1 as select sleep(5) |    0.000 |
|  9 | system user |                 | NULL | Connect |   38 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL                               |    0.000 |
+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------------------------+----------+

Since 40th second all threads are idle again, but time for the one that processed the event shows when it happened

+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
| Id | User        | Host            | db   | Command | Time | State                                                                       | Info             | Progress |
+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+
|  3 | root        | localhost:37497 | test | Query   |    0 | init                                                                        | show processlist |    0.000 |
|  4 | system user |                 | NULL | Connect |   44 | Waiting for master to send event                                            | NULL             |    0.000 |
|  5 | system user |                 | NULL | Connect |   44 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
|  6 | system user |                 | NULL | Connect |   44 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
|  7 | system user |                 | NULL | Connect |   44 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
|  8 | system user |                 | NULL | Connect |    4 | Waiting for work from SQL thread                                            | NULL             |    0.000 |
|  9 | system user |                 | NULL | Connect |   44 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL             |    0.000 |
+----+-------------+-----------------+------+---------+------+-----------------------------------------------------------------------------+------------------+----------+

MariaDB [test]> select @@version;
+-----------------+
| @@version       |
+-----------------+
| 10.1.12-MariaDB |
+-----------------+
1 row in set (0.00 sec)

If it's different for you, apart from output, please also paste your cnf file(s) (or point at them in previous bug reports if they're already there).

Thanks.

Comment by Alex [ 2016-03-14 ]

Hi Elena,
I've made another setup/test and the results are still the same:

MariaDB [(none)]> show processlist;
+----+-------------+-----------+----------+---------+------+-----------------------------------------------+------------------------------------------------------------------------------------------------------+----------+
| Id | User        | Host      | db       | Command | Time | State                                         | Info                                                                                                 | Progress |
+----+-------------+-----------+----------+---------+------+-----------------------------------------------+------------------------------------------------------------------------------------------------------+----------+
|  2 | system user |           | NULL     | Connect |   81 | Waiting for master to send event              | NULL                                                                                                 |    0.000 |
|  3 | system user |           | NULL     | Connect |   81 | Waiting for work from SQL thread              | NULL                                                                                                 |    0.000 |
|  4 | system user |           | NULL     | Connect |   81 | Waiting for work from SQL thread              | NULL                                                                                                 |    0.000 |
|  5 | system user |           | NULL     | Connect |   81 | Waiting for work from SQL thread              | NULL                                                                                                 |    0.000 |
|  6 | system user |           | NULL     | Connect |   81 | Waiting for work from SQL thread              | NULL                                                                                                 |    0.000 |
|  7 | system user |           | NULL     | Connect |   81 | Waiting for work from SQL thread              | NULL                                                                                                 |    0.000 |
|  8 | system user |           | NULL     | Connect |    0 | after apply log event                         | NULL                                                                                                 |    0.000 |
|  9 | system user |           | database | Connect |    0 | Unlocking tables                              | COMMIT                                                                                               |    0.000 |
| 10 | system user |           | database | Connect |    0 | init                                          | INSERT  INTO sometable SET 1='1', 2='2 |    0.000 |
| 11 | system user |           | database | Connect |    0 | Waiting for table level lock                  | INSERT  INTO sometable SET 1='2', 2='3 |    0.000 |
| 12 | system user |           | NULL     | Connect |    0 | after apply log event                         | NULL                                                                                                 |    0.000 |
| 13 | system user |           | NULL     | Connect |   30 | Waiting for room in worker thread event queue | NULL                                                                                                 |    0.000 |
| 31 | root        | localhost | NULL     | Query   |    0 | init                                          | show processlist                                                                                     |    0.000 |
+----+-------------+-----------+----------+---------+------+-----------------------------------------------+------------------------------------------------------------------------------------------------------+----------+
13 rows in set (0.00 sec)

MariaDB [(none)]> select @@version;
+-----------------+
| @@version       |
+-----------------+
| 10.1.12-MariaDB |
+-----------------+
1 row in set (0.00 sec)

The slave is behind master for around 100 seconds during the checks I've made.

The special slave settings:

slave_domain_parallel_threads=5
slave_parallel_threads=10
slave_parallel_max_queued=524288
slave_compressed_protocol=1

Another thing that might be important and I've missed (appology) to previously update you - master is mariadb 10.0
10.0.23 to be exact with binlog_format=STATEMENT and binlog_commit_wait_count=50

That's the only thought I have of why your results are different than mine.
I've only started with 10.1 and some employees that have access to slaves use 'show processlist' to get an idea about delay because 'show slave status' requires special permissions I can't give.

Thanks!
Alex

OT/P.S.:
BTW I can't stop 10.1 service without prior stopping all replication processes. Something similar solved here: https://jira.mariadb.org/browse/MDEV-9595
However, my setup is simple - just 1 replication channel with just 5 parallel replication worker threads. And also log looks different. But I still have to stop replication and only then can stop the server itself.
Otherwise I have to kill mysqld process. I'll recheck that again with the 10.1.13 release...

It looks like this (CentOS7)

Main PID: 29531 (mysqld)
   Status: "Shutdown in progress"
   CGroup: /system.slice/mariadb.service
           └─29531 /usr/sbin/mysqld
...
Mar 14 20:39:59 DB01 mysqld[29531]: 2016-03-14 20:39:59 139767204940544 [Note] Master 'somedb': Slave I/O thread exiting, read up to log 'mysql-bin.060602', position 21453783
Mar 14 20:41:29 DB01 systemd[1]: mariadb.service stop-sigterm timed out. Skipping SIGKILL.
Mar 14 20:42:59 DB01 systemd[1]: mariadb.service stop-final-sigterm timed out. Skipping SIGKILL. Entering failed mode.
Mar 14 20:42:59 DB01 systemd[1]: Stopped MariaDB database server.
Mar 14 20:42:59 DB01 systemd[1]: Unit mariadb.service entered failed state.
Mar 14 20:42:59 DB01 systemd[1]: mariadb.service failed.

Comment by Elena Stepanova [ 2016-03-19 ]

ShivaS,
Regarding the original problem:
Can you provide an example of how it works for you on 10.0 and on 10.1 without parallel slave?
I see that I might have misunderstood your complaint before, but I'm not certain I understand it right now, because a single snapshot without a progress leaves room for different interpretation.
But if I do understand your complaint correctly, it should work the same way on 10.0 and on a non-parallel 10.1.

Thanks

Comment by Alex [ 2016-03-20 ]

Hi Elena,
I've tested that without parallel slave, and the result is the same.
If 10.1 replicates from 10.0, it doesn't report correct time for the slave thread
Have a look at my quick tests:

10.1 <- 10.0
delay 20 sec and increasing (since parallel replication is off)
No info, unless I check it with slave status commands.

1st processlist

+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
| Id    | User        | Host      | db       | Command | Time | State                            | Info                                                                                                 | Progress |
+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
| 52977 | system user |           | NULL     | Connect |   64 | Waiting for master to send event | NULL                                                                                                 |    0.000 |
| 52978 | system user |           | database | Connect |    0 | freeing items                    | COMMIT                                                                                               |    0.000 |
| 52979 | system user |           | NULL     | Connect |   64 | Waiting for master to send event | NULL                                                                                                 |    0.000 |
| 52980 | system user |           | database | Connect |    0 | init                             | INSERT  INTO table SET a='b',b='c								        |    0.000 |
| 52985 | root        | localhost | NULL     | Query   |    0 | init                             | show processlist                                                                                     |    0.000 |
+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
 

2nd processlist (in a few seconds)

+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
| Id    | User        | Host      | db       | Command | Time | State                            | Info                                                                                                 | Progress |
+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
| 52977 | system user |           | NULL     | Connect |   80 | Waiting for master to send event | NULL                                                                                                 |    0.000 |
| 52978 | system user |           | database | Connect |    0 | init                             | INSERT  INTO table SET a='b',b='c								        |    0.000 |
| 52979 | system user |           | NULL     | Connect |   80 | Waiting for master to send event | NULL                                                                                                 |    0.000 |
| 52980 | system user |           | database | Connect |    0 | NULL                             | COMMIT                                                                                               |    0.000 |
| 52985 | root        | localhost | NULL     | Query   |    0 | init                             | show processlist                                                                                     |    0.000 |
+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+

And here is how I expect it to look like

10.0 <- 10.0
relevant threads have real delays (like 288 and 4433 seconds)

1st processlist

+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
| Id    | User        | Host      | db       | Command | Time | State                            | Info                                                                                                 | Progress |
+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
|  8176 | replication | 1.1.1.1:1 | NULL     | Binlog Dump | 244157 | Master has sent all binlog to slave; waiting for binlog to be updated       | NULL                                                |    0.000 |
| 56211 | system user |           | NULL     | Connect     |     10 | Waiting for master to send event                                            | NULL                                                |    0.000 |
| 56212 | system user |           | NULL     | Connect     |     92 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL                                                |    0.000 |
| 56213 | system user |           | NULL     | Connect     |     10 | Queueing master event to the relay log                                      | NULL                                                |    0.000 |
| 56214 | system user |           | database | Connect     |    288 | init                                                                        | INSERT  INTO table SET a='b',b='c                   |    0.000 |
| 56215 | system user |           | NULL     | Connect     |     10 | Waiting for master to send event                                            | NULL                                                |    0.000 |
| 56216 | system user |           | NULL     | Connect     |     93 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL                                                |    0.000 |
| 56217 | system user |           | NULL     | Connect     |     10 | Queueing master event to the relay log                                      | NULL                                                |    0.000 |
| 56218 | system user |           | NULL     | Connect     |   4433 | freeing items                                                               | NULL                                                |    0.000 |
| 56220 | root        | localhost | NULL     | Query       |      0 | init                                                                        | show processlist                                    |    0.000 |
+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+

2nd processlist (in a few seconds)

+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
| Id    | User        | Host      | db       | Command | Time | State                            | Info                                                                                                 | Progress |
+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
|  8176 | replication | 1.1.1.1:1 | NULL     | Binlog Dump | 244184 | Master has sent all binlog to slave; waiting for binlog to be updated       | NULL                                                |    0.000 |
| 56211 | system user |           | NULL     | Connect     |     37 | Waiting for master to send event                                            | NULL                                                |    0.000 |
| 56212 | system user |           | NULL     | Connect     |     93 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL                                                |    0.000 |
| 56213 | system user |           | NULL     | Connect     |     37 | Waiting for master to send event                                            | NULL                                                |    0.000 |
| 56214 | system user |           | database | Connect     |    300 | init                                                                        | INSERT  INTO table SET a='b',b='c                   |    0.000 |
| 56215 | system user |           | NULL     | Connect     |     37 | Waiting for master to send event                                            | NULL                                                |    0.000 |
| 56216 | system user |           | NULL     | Connect     |     93 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL                                                |    0.000 |
| 56217 | system user |           | NULL     | Connect     |     37 | Waiting for master to send event                                            | NULL                                                |    0.000 |
| 56218 | system user |           | database | Connect     |   4449 | System lock                                                                 | INSERT  INTO table SET x='y',y='z                   |    0.000 |
| 56228 | root        | localhost | NULL     | Query       |      0 | init                                                                        | show processlist                                    |    0.000 |
+-------+-------------+-----------+----------+---------+------+----------------------------------+------------------------------------------------------------------------------------------------------+----------+
 
 

Comment by Alex [ 2016-03-26 ]

Hi Elena,
Just switched master servers to 10.1 and have the same problem with slaves.
Seems like it doesn't depend on master OS/version, it's solely 10.1 problem.

Thanks

Generated at Thu Feb 08 07:36:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.