[MDEV-26196] After enabling the parallel replication slave lag is increasing very high Created: 2021-07-21  Updated: 2021-11-22  Resolved: 2021-11-22

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.5.11
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Pon Suresh Pandian (Inactive) Assignee: Andrei Elkin
Resolution: Won't Fix Votes: 1
Labels: need_feedback
Environment:

CentOS Linux release 7.7.1908 (Core)


Attachments: File master_10.0     Text File mysqld.log     Text File op.txt     HTML File replication_lag     File slave_10.5    

 Description   

Hi Team,

I have configured the replication from 10.0.38 to 10.5.11. Then I have started the sysbench to load some data. Then I checked the slave lag in my slave server. There is no slave lag and replication is running on single thread.

Then I have enabled the parallel replication,

SET GLOBAL slave_parallel_threads=4;
SET GLOBAL binlog_commit_wait_count=5;
SET GLOBAL sync_binlog = 1;

Again started the sysbench to load some data. Now my slave lag was increasing slowly.

[root@ip-172-31-8-142 sysbench]# sysbench oltp_insert.lua --tables=5 --table_size=20000000  --mysql-host=127.0.0.1 --db-driver=mysql --mysql-user=proxy --mysql-password=Proxy@123 --mysql-port=3306 --mysql-db=sbtest  --time=100 --report-interval=10 --threads=2 prepare
sysbench 1.0.17 (using system LuaJIT 2.0.4)
 
Initializing worker threads...
 
Creating table 'sbtest1'...
Creating table 'sbtest2'...
Inserting 20000000 records into 'sbtest1'
Inserting 20000000 records into 'sbtest2'

Here I have attached my config files for both master & slave server.

Also attached the show slave status output and show process output.

MariaDB [sbtest]> show slave status\G
*************************** 1. row ***************************
                Slave_IO_State: Queueing master event to the relay log
                   Master_Host: 172.31.8.142
                   Master_User: repl
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mysql-bin.000006
           Read_Master_Log_Pos: 587112382
                Relay_Log_File: ip-172-31-15-223-relay-bin.000002
                 Relay_Log_Pos: 3637059
         Relay_Master_Log_File: mysql-bin.000006
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
               Replicate_Do_DB: 
           Replicate_Ignore_DB: 
            Replicate_Do_Table: 
        Replicate_Ignore_Table: 
       Replicate_Wild_Do_Table: 
   Replicate_Wild_Ignore_Table: 
                    Last_Errno: 0
                    Last_Error: 
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 475320641
               Relay_Log_Space: 115429116
               Until_Condition: None
                Until_Log_File: 
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File: 
            Master_SSL_CA_Path: 
               Master_SSL_Cert: 
             Master_SSL_Cipher: 
                Master_SSL_Key: 
         Seconds_Behind_Master: 398
 Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 0
                 Last_IO_Error: 
                Last_SQL_Errno: 0
                Last_SQL_Error: 
   Replicate_Ignore_Server_Ids: 
              Master_Server_Id: 1
                Master_SSL_Crl: 
            Master_SSL_Crlpath: 
                    Using_Gtid: Slave_Pos
                   Gtid_IO_Pos: 201-1-11477
       Replicate_Do_Domain_Ids: 
   Replicate_Ignore_Domain_Ids: 
                 Parallel_Mode: optimistic
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Waiting for room in worker thread event queue
              Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 9941
    Slave_Transactional_Groups: 0
1 row in set (0.270 sec)

MariaDB [sbtest]> show processlist;
+-----+-------------+-----------+--------+--------------+------+-----------------------------------------------+------------------+----------+
| Id  | User        | Host      | db     | Command      | Time | State                                         | Info             | Progress |
+-----+-------------+-----------+--------+--------------+------+-----------------------------------------------+------------------+----------+
|   4 | root        | localhost | sbtest | Query        |    0 | starting                                      | show processlist |    0.000 |
| 181 | system user |           | NULL   | Slave_IO     |   13 | Waiting for master to send event              | NULL             |    0.000 |
| 183 | system user |           | NULL   | Slave_worker |   13 | Waiting for work from SQL thread              | NULL             |    0.000 |
| 184 | system user |           | NULL   | Slave_worker |   13 | Waiting for work from SQL thread              | NULL             |    0.000 |
| 185 | system user |           | NULL   | Slave_worker |    0 | Waiting for prior transaction to commit       | NULL             |    0.000 |
| 186 | system user |           | NULL   | Slave_worker |    0 | closing tables                                | NULL             |    0.000 |
| 182 | system user |           | NULL   | Slave_SQL    |   11 | Waiting for room in worker thread event queue | NULL             |    0.000 |
+-----+-------------+-----------+--------+--------------+------+-----------------------------------------------+------------------+----------+
7 rows in set (0.000 sec)

MariaDB [sbtest]> show global variables like 'slave%';
+----------------------------------+-----------------------------------------------+
| Variable_name                    | Value                                         |
+----------------------------------+-----------------------------------------------+
| slave_compressed_protocol        | OFF                                           |
| slave_ddl_exec_mode              | IDEMPOTENT                                    |
| slave_domain_parallel_threads    | 0                                             |
| slave_exec_mode                  | STRICT                                        |
| slave_load_tmpdir                | /tmp                                          |
| slave_max_allowed_packet         | 1073741824                                    |
| slave_net_timeout                | 60                                            |
| slave_parallel_max_queued        | 131072                                        |
| slave_parallel_mode              | optimistic                                    |
| slave_parallel_threads           | 4                                             |
| slave_parallel_workers           | 4                                             |
| slave_run_triggers_for_rbr       | NO                                            |
| slave_skip_errors                | OFF                                           |
| slave_sql_verify_checksum        | ON                                            |
| slave_transaction_retries        | 10                                            |
| slave_transaction_retry_errors   | 1158,1159,1160,1161,1205,1213,1429,2013,12701 |
| slave_transaction_retry_interval | 0                                             |
| slave_type_conversions           |                                               |
+----------------------------------+-----------------------------------------------+
18 rows in set (0.001 sec)

I have tried to increase the "slave_parallel_max_queued" parameter upto 128 MB. But no luck still the slave is increasing.
Also changed the "slave_parallel_mode" to conservative mode but no luck.



 Comments   
Comment by Andrei Elkin [ 2021-08-10 ]

ponsuresh.pandians: It could be just a configuration issue.

Notice

waiting for room in worker queue

status which must be explained by too small value of


slave_parallel_max_queued | 131072

Please try increasing it and let us know.

Comment by Pon Suresh Pandian (Inactive) [ 2021-08-10 ]

Hi Andrei,

I have already increased upto 128MB but no luck the lag was increased.

Comment by Andrei Elkin [ 2021-08-10 ]

ponsuresh.pandians, howdy. Could you please provide Show-Slave-Status and Show-Processlist with the 128 MB slave_parallel_max_queued?

Adding to that, naturally a few periodical (hourly?) show reports could be very useful
in this (growing?) delay scenario.

Comment by Pon Suresh Pandian (Inactive) [ 2021-08-11 ]

Hi Andrei,

Sure.. Please find the requested outputs. After I stopped the sysbench on master server the lag is increasing upto 867 sec..

MariaDB [(none)]> show global variables like 'slave_parallel_max_queued';
+---------------------------+-----------+
| Variable_name             | Value     |
+---------------------------+-----------+
| slave_parallel_max_queued | 134217728 |
+---------------------------+-----------+
1 row in set (0.001 sec)
 
MariaDB [(none)]> select 134217728/1024/1024;
+---------------------+
| 134217728/1024/1024 |
+---------------------+
|        128.00000000 |
+---------------------+
1 row in set (0.000 sec)
 
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: 172.31.24.106
                   Master_User: repl
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mysql-bin.000004
           Read_Master_Log_Pos: 889720716
                Relay_Log_File: ip-172-31-17-237-relay-bin.000002
                 Relay_Log_Pos: 889012661
         Relay_Master_Log_File: mysql-bin.000003
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
               Replicate_Do_DB: 
           Replicate_Ignore_DB: 
            Replicate_Do_Table: 
        Replicate_Ignore_Table: 
       Replicate_Wild_Do_Table: 
   Replicate_Wild_Ignore_Table: 
                    Last_Errno: 0
                    Last_Error: 
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 889012575
               Relay_Log_Space: 1963532221
               Until_Condition: None
                Until_Log_File: 
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File: 
            Master_SSL_CA_Path: 
               Master_SSL_Cert: 
             Master_SSL_Cipher: 
                Master_SSL_Key: 
         Seconds_Behind_Master: 165
 Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 0
                 Last_IO_Error: 
                Last_SQL_Errno: 0
                Last_SQL_Error: 
   Replicate_Ignore_Server_Ids: 
              Master_Server_Id: 1
                Master_SSL_Crl: 
            Master_SSL_Crlpath: 
                    Using_Gtid: Slave_Pos
                   Gtid_IO_Pos: 201-1-52064
       Replicate_Do_Domain_Ids: 
   Replicate_Ignore_Domain_Ids: 
                 Parallel_Mode: optimistic
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Waiting for room in worker thread event queue
              Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 5462
    Slave_Transactional_Groups: 0
1 row in set (0.000 sec)
 
MariaDB [(none)]> show full processlist;
+----+-------------+-----------+------+--------------+------+-----------------------------------------------+-----------------------+----------+
| Id | User        | Host      | db   | Command      | Time | State                                         | Info                  | Progress |
+----+-------------+-----------+------+--------------+------+-----------------------------------------------+-----------------------+----------+
| 67 | system user |           | NULL | Slave_IO     |  513 | Waiting for master to send event              | NULL                  |    0.000 |
| 69 | system user |           | NULL | Slave_worker |    0 | Waiting for prior transaction to commit       | NULL                  |    0.000 |
| 70 | system user |           | NULL | Slave_worker |    0 | Waiting for prior transaction to commit       | NULL                  |    0.000 |
| 71 | system user |           | NULL | Slave_worker |    0 | closing tables                                | NULL                  |    0.000 |
| 72 | system user |           | NULL | Slave_worker |    0 | Waiting for prior transaction to commit       | NULL                  |    0.000 |
| 68 | system user |           | NULL | Slave_SQL    |   92 | Waiting for room in worker thread event queue | NULL                  |    0.000 |
| 73 | root        | localhost | NULL | Query        |    0 | starting                                      | show full processlist |    0.000 |
+----+-------------+-----------+------+--------------+------+-----------------------------------------------+-----------------------+----------+
7 rows in set (0.000 sec)

Comment by Andrei Elkin [ 2021-08-11 ]

ponsuresh.pandians, howdy.

In the latest report we have 3 workers waiting for the 4th which the following state:

71 | system user | | NULL | Slave_worker | 0 | closing tables

Having just one show-report, it's not clear, whether this state takes some time (hardly, but still needs attention).
We can also suspect that 71 has been processing an about128 MB size event.
Could you please check (with mysqlbinlog) four (as the number of workers) groups of events starting at mysql-bin.000003:889012575 to report all event sizes.
Just paste the first line of each event's output, e.g

# at 4334
#010909  4:46:40 server id 1  end_log_pos 4376 CRC32 0x66c7e680 	GTID 0-1-16 ddl

And secondly, like I said in the previous comment, I need few measurements, say separated from each other for 5 mins.
I hope they can be provided.

Thanks.

Andrei

Comment by Pon Suresh Pandian (Inactive) [ 2021-08-11 ]

Hi Andrei,

I have decoded the binlog for both master & slave servers. Please check it. Also I have attached the periodical report for the show full processlist & show slave status output.

From Master :
-------------

[root@ip-172-31-24-106 mysql]# less /tmp/t.txt | grep -i 889012575 -A 5 -B10
### SET
###   @1=8316 /* INT meta=0 nullable=0 is_null=0 */
###   @2=1008847 /* INT meta=0 nullable=0 is_null=0 */
###   @3='59074100786-30395673519-92582365862-85036053474-42511661822-80049943387-22751241741-93066455078-33849578458-19833344165' /* STRING(120) meta=65144 nullable=0 is_null=0 */
###   @4='19961618934-29000529160-79561458616-08529793478-20871744749' /* STRING(60) meta=65084 nullable=0 is_null=0 */
### INSERT INTO `sbtest`.`sbtest2`
### SET
###   @1=8317 /* INT meta=0 nullable=0 is_null=0 */
###   @2=1000411 /* INT meta=0 nullable=0 is_null=0 */
###   @3='00942787768-04589456858-56363613859-26954360214-03885137841-90790223243-49118401991-63723734666-20265439903-69406001720' /* STRING(120) meta=65144 nullable=0 is_null=0 */
###   @4='27567816782-90768939453-74039568820-61130096963-88901257584' /* STRING(60) meta=65084 nullable=0 is_null=0 */
### INSERT INTO `sbtest`.`sbtest2`
### SET
###   @1=8318 /* INT meta=0 nullable=0 is_null=0 */
###   @2=991417 /* INT meta=0 nullable=0 is_null=0 */
###   @3='85156321622-04195155158-81842030504-15530669746-15248374975-59125065042-70965205688-11740408230-33044241354-55067656456' /* STRING(120) meta=65144 nullable=0 is_null=0 */
--
### SET
###   @1=1602903 /* INT meta=0 nullable=0 is_null=0 */
###   @2=997980 /* INT meta=0 nullable=0 is_null=0 */
###   @3='19245401165-61634791196-08558032160-75446215334-11287894282-62158317689-33315747696-83647671351-85003256312-82116001844' /* STRING(120) meta=65144 nullable=0 is_null=0 */
###   @4='15427896165-07882275704-42136503939-69999184949-70109204507' /* STRING(60) meta=65084 nullable=0 is_null=0 */
### INSERT INTO `sbtest`.`sbtest2`
### SET
###   @1=1602904 /* INT meta=0 nullable=0 is_null=0 */
###   @2=851972 /* INT meta=0 nullable=0 is_null=0 */
###   @3='67961218718-57512777613-81282639629-28761376616-47038664074-13311110657-90397944965-29132245173-82793777758-84313735692' /* STRING(120) meta=65144 nullable=0 is_null=0 */
###   @4='01331952066-24282735035-73039394138-67544492870-22889012575' /* STRING(60) meta=65084 nullable=0 is_null=0 */
### INSERT INTO `sbtest`.`sbtest2`
### SET
###   @1=1602905 /* INT meta=0 nullable=0 is_null=0 */
###   @2=998372 /* INT meta=0 nullable=0 is_null=0 */
###   @3='57036828254-39203173382-99948836467-28245829044-30374712994-72642974076-41577532325-01878141374-11473509824-50658041341' /* STRING(120) meta=65144 nullable=0 is_null=0 */
--
###   @2=1008548 /* INT meta=0 nullable=0 is_null=0 */
###   @3='41765883594-35283064806-69400677627-91786190235-82678384612-28175692335-21016147975-36461569069-80759003260-74951359859' /* STRING(120) meta=65144 nullable=0 is_null=0 */
###   @4='92588648459-11114954858-79032460487-15440339861-13263701619' /* STRING(60) meta=65084 nullable=0 is_null=0 */
### INSERT INTO `sbtest`.`sbtest4`
### SET
###   @1=278627 /* INT meta=0 nullable=0 is_null=0 */
###   @2=1002246 /* INT meta=0 nullable=0 is_null=0 */
###   @3='97587198511-00696167561-74482020591-20340139370-76086500251-57467691776-53001417267-52915161341-99186397045-99945265376' /* STRING(120) meta=65144 nullable=0 is_null=0 */
###   @4='71593207192-21634929961-54476645761-82885980510-70652887516' /* STRING(60) meta=65084 nullable=0 is_null=0 */
# at 889012548
#210811  9:10:54 server id 1  end_log_pos 889012575   Xid = 5539
COMMIT/*!*/;
# at 889012575
#210811  9:10:54 server id 1  end_log_pos 889012613   GTID 201-1-5470
/*!100001 SET @@session.gtid_seq_no=5470*//*!*/;
BEGIN
/*!*/;
# at 889012613

From Slave :
------------

[root@ip-172-31-17-237 mysql]# less  /tmp/t.txt | grep -i 889012575
###   @4='27567816782-90768939453-74039568820-61130096963-88901257584' /* STRING(60) meta=65084 nullable=0 is_null=0 */
###   @4='01331952066-24282735035-73039394138-67544492870-22889012575' /* STRING(60) meta=65084 nullable=0 is_null=0 */

Comment by Andrei Elkin [ 2021-08-11 ]

ponsuresh.pandians, thank your for trying, but sorry, this attachment is not something that I asked. I can not see the event sizes from it.
To repeat, I need two lines, I paste them over again:

 # at 4334
 #010909  4:46:40 server id 1  end_log_pos 4376 ...

for each event in those 4 trx:s that I specified.
The two lines say the beginning and the end of the event, that is its size.

(On the 2nd line of the first event of each trx we will Gtid event so it should be clear
where the trx boundaries are).

Comment by Pon Suresh Pandian (Inactive) [ 2021-08-11 ]

Hi Andrei,

Please find the below output. This is for (mysql-bin.000003) binlog and (889012575) position output.

# at 889012548
#210811  9:10:54 server id 1  end_log_pos 889012575 	Xid = 5539
COMMIT/*!*/;
# at 889012575
#210811  9:10:54 server id 1  end_log_pos 889012613 	GTID 201-1-5470
/*!100001 SET @@session.gtid_seq_no=5470*//*!*/;
BEGIN
/*!*/;
# at 889012613

And I re ran the test again, this is for (mysql-bin.000007) binlog and (846206111) position

[root@ip-172-31-24-106 mysql]# less /tmp/t.txt | grep -i 846206111
###   @3='24739921552-92859805172-35456153230-10715366393-84620611105-45177215237-91470530765-11754233503-02242541633-98239803580' /* STRING(120) meta=65144 nullable=0 is_null=0 */
#210811 12:19:06 server id 1  end_log_pos 846206111 	Xid = 1662
# at 846206111

MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: 172.31.24.106
                   Master_User: repl
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mysql-bin.000008
           Read_Master_Log_Pos: 874659436
                Relay_Log_File: ip-172-31-17-237-relay-bin.000002
                 Relay_Log_Pos: 846206197
         Relay_Master_Log_File: mysql-bin.000007
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
               Replicate_Do_DB: 
           Replicate_Ignore_DB: 
            Replicate_Do_Table: 
        Replicate_Ignore_Table: 
       Replicate_Wild_Do_Table: 
   Replicate_Wild_Ignore_Table: 
                    Last_Errno: 0
                    Last_Error: 
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 846206111
               Relay_Log_Space: 1948470902
               Until_Condition: None
                Until_Log_File: 
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File: 
            Master_SSL_CA_Path: 
               Master_SSL_Cert: 
             Master_SSL_Cipher: 
                Master_SSL_Key: 
         Seconds_Behind_Master: 146
 Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 0
                 Last_IO_Error: 
                Last_SQL_Errno: 0
                Last_SQL_Error: 
   Replicate_Ignore_Server_Ids: 
              Master_Server_Id: 1
                Master_SSL_Crl: 
            Master_SSL_Crlpath: 
                    Using_Gtid: Slave_Pos
                   Gtid_IO_Pos: 201-1-61512
       Replicate_Do_Domain_Ids: 
   Replicate_Ignore_Domain_Ids: 
                 Parallel_Mode: optimistic
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Waiting for room in worker thread event queue
              Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 1631
    Slave_Transactional_Groups: 0
1 row in set (0.000 sec)

Comment by Andrei Elkin [ 2021-08-11 ]

Sorry again. I thought I made it clear that for analysis I need the sizes of every event in 4 event groups that started with
GTID 201-1-5470.

May I ask what was unclear in my request?

Comment by Pon Suresh Pandian (Inactive) [ 2021-08-11 ]

Hi Andrei,

I can't get the event size. So here i have attached the decoded binlog (mysql-bin.000007) please check it.

Link - https://drive.google.com/file/d/1zwIWbpJQU-KIfy2E7GFPAr0Qzp1zysMS/view?usp=sharing

Comment by Andrei Elkin [ 2021-08-12 ]

ponsuresh.pandians, hi. Why did you send me a different file now? I don't think it makes me easier to analyze.
To extract the info I asked simply run

mysqlbinlog  --start-position=889012575 --stop-position=`find yourself to count 4 consequent GTID events. The fifth is where to stop`  master-bin.000003 | grep  '^#'

To
> I can't get the event size.
is not an explanation why you could not do it. Next time please don't shy to ask specifically, I can help you with constructing a grep invocation etc.

Comment by Pon Suresh Pandian (Inactive) [ 2021-08-15 ]

Hi Andrei,

Unfortunately I have terminated that AWS instance. So I created new instance and re produced this scenario.

Here I mentioned requested logs Please check it.

[root@ip-172-31-72-170 mysql]# mysqlbinlog  --start-position=196084134 --stop-position=196085486  mysql-bin.000001 | grep  '^#'
# at 4
#210815 11:16:45 server id 1  end_log_pos 248 	Start: binlog v 4, server v 10.0.38-MariaDB created 210815 11:16:45 at startup
# Warning: this binlog is either in use or was not closed properly.
# at 196084134
#210815 11:32:38 server id 1  end_log_pos 196084172 	GTID 201-1-4034
# at 196084172
# at 196084227
#210815 11:32:38 server id 1  end_log_pos 196084227 	Table_map: `sbtest`.`sbtest4` mapped to number 80
#210815 11:32:38 server id 1  end_log_pos 196084445 	Write_rows: table id 80 flags: STMT_END_F
# at 196084445
#210815 11:32:38 server id 1  end_log_pos 196084472 	Xid = 4082
# at 196084472
#210815 11:32:38 server id 1  end_log_pos 196084510 	GTID 201-1-4035
# at 196084510
# at 196084565
#210815 11:32:38 server id 1  end_log_pos 196084565 	Table_map: `sbtest`.`sbtest4` mapped to number 80
#210815 11:32:38 server id 1  end_log_pos 196084783 	Write_rows: table id 80 flags: STMT_END_F
# at 196084783
#210815 11:32:38 server id 1  end_log_pos 196084810 	Xid = 4083
# at 196084810
#210815 11:32:38 server id 1  end_log_pos 196084848 	GTID 201-1-4036
# at 196084848
# at 196084903
#210815 11:32:38 server id 1  end_log_pos 196084903 	Table_map: `sbtest`.`sbtest4` mapped to number 80
#210815 11:32:38 server id 1  end_log_pos 196085121 	Write_rows: table id 80 flags: STMT_END_F
# at 196085121
#210815 11:32:38 server id 1  end_log_pos 196085148 	Xid = 4084
# at 196085148
#210815 11:32:38 server id 1  end_log_pos 196085186 	GTID 201-1-4037
# at 196085186
# at 196085241
#210815 11:32:38 server id 1  end_log_pos 196085241 	Table_map: `sbtest`.`sbtest4` mapped to number 80
#210815 11:32:38 server id 1  end_log_pos 196085459 	Write_rows: table id 80 flags: STMT_END_F
# at 196085459
#210815 11:32:38 server id 1  end_log_pos 196085486 	Xid = 4085
# End of log file

Comment by Andrei Elkin [ 2021-08-16 ]

ponsuresh.pandians, salute.

I need all bits of information belonging on any existing instance of your configuration.
As it's about a grown latency that includes few periodic reports (from START SLAVE till a moment you will observe a
significantly grown value, do that once in a min or tens of sec:s - depends on your load) from

  • SHOW SLAVE STATUS
  • show processlist
  • save binary logs from which I will need further details once I process the 2 SHOW:s.
  • the same for the error logs

My plan is to find the starting point of any (hopefully) sharp jump of SBM, and having that
look into the binlog for data that may have played some role.

As you can see the last paste is not meaningful to the plan.

I am waiting for such way collected data.

Thank you.

Comment by Pon Suresh Pandian (Inactive) [ 2021-08-18 ]

Hi Andrei,

Here I have attached the show slave status & show processlist outputs. This stats was collected on every 10 sec.

Also Attached the MariaDB error log from slave server. Please check it.

Binlogs - https://drive.google.com/drive/folders/1r8Yy4LvmQ73ocamT2Al24L0recgahAm-?usp=sharing

Comment by Andrei Elkin [ 2021-08-18 ]

ponsuresh.pandians, hi. Thank you for a good piece of reporting!

I see that SMB constantly increases and most of time by the value greater than your 10 sec watching period.
Processlist shows that there's "always" a worker in closing tables state. This may not relate to the parallel replication and should be explained why it apparently takes some long time. Maybe that deals with the fact that
your load is apparently not on transactional (Innodb) tables:

Slave_Non_Transactional_Groups: 4397
    Slave_Transactional_Groups: 0

Notice that in the non-transactional case the parallel replication may not scale up at all.
The simplest I can recommend is to set the table engine to Innodb and check.

You can also dwell yourself into profiling - see SHOW PROFILE ALL and how to set it on.

At this point I really needed not the binlog attachment. Thank you once more for providing that!

Comment by Pon Suresh Pandian (Inactive) [ 2021-08-19 ]

Hi Andrei,

I am using sysbench tool to simulated the load. So all the tables are INNODB (transactional) tables. After enabling the parallel threads, replication lag was increasing this behaviour is completely strange for me ..

MariaDB [sbtest]> show create table sbtest1\G
*************************** 1. row ***************************
       Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT 0,
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=2287661 DEFAULT CHARSET=latin1
1 row in set (0.000 sec)
 
MariaDB [sbtest]> show create table sbtest2\G
*************************** 1. row ***************************
       Table: sbtest2
Create Table: CREATE TABLE `sbtest2` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT 0,
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_2` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=2286165 DEFAULT CHARSET=latin1
1 row in set (0.000 sec)

Comment by Chris Calender (Inactive) [ 2021-09-21 ]

Re-opening this issue, as the feedback was provided and we're able to reproduce on our side. I'm hoping it just closed inadvertently, but if you do need further information from Support, please let us know. This is a customer-facing bug.

Comment by Andrei Elkin [ 2021-09-22 ]

julien.fritsch, I would not mind that at all. But first my questions need to be attended.

Comment by Pon Suresh Pandian (Inactive) [ 2021-09-22 ]

Hi Julien Fritsch ,

Sure...Once again I will re create my test environment and get back to you with more details...

Comment by Andrei Elkin [ 2021-09-24 ]

ponsuresh.pandians: In your last Show-Slave-Status reports I had to suspect non-transactional (not innodb) load was run on your slave.

Now when you report on slave again could you please make sure these two (I am pasting what was the last time)

Slave_Non_Transactional_Groups: 4397
Slave_Transactional_Groups: 0

attributes are present?

Your show-create-table, sorry, does not reveal on what server they were done. Even if the def:s are from the slave server (please confirm), I still need the Slave_{,Non}_Transactional_Groups in order to proceed.

Comment by Pon Suresh Pandian (Inactive) [ 2021-09-27 ]

Hi Andrei,

Thanks for your help. As per your suggestion I did below steps,

I have installed 10.4.19 & configured the master slave GTID based replication.

And loaded some data using sys bench tool. I checked the "show slave status" output. The "Slave_Transactional_Groups" was increasing.

Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 4492

At the same I installed 10.0.38 as a master & 10.5.11 as a slave then loaded some data using sys bench tool. I checked the "show slave status" output. In this case "Slave_Non_Transactional_Groups" was increasing.

Slave_Non_Transactional_Groups: 68809
    Slave_Transactional_Groups: 0

On both scenario my master & slave tables are in INNODB engine.

MariaDB [sbtest]> select TABLE_SCHEMA,TABLE_NAME,ENGINE from information_schema.TABLES where TABLE_SCHEMA='sbtest';
+--------------+------------+--------+
| TABLE_SCHEMA | TABLE_NAME | ENGINE |
+--------------+------------+--------+
| sbtest       | sbtest2    | InnoDB |
| sbtest       | sbtest3    | InnoDB |
| sbtest       | sbtest1    | InnoDB |
| sbtest       | sbtest4    | InnoDB |
| sbtest       | sbtest5    | InnoDB |
+--------------+------------+--------+
5 rows in set (0.000 sec)

Comment by Andrei Elkin [ 2021-09-28 ]

After more discussion and some analysis it turned out the 10.0 master just can't provide parallelizable load.
More to that even if it's transactional the slave interprets it in its status as the opposite (to explain the observed stats).

Comment by Andrei Elkin [ 2021-11-22 ]

salle, thanks for the heads up! Won't fix is done to it.

Generated at Thu Feb 08 09:43:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.