Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-4677

MaxScale BinlogRouter skips large transactions causing data Inconsistency on attached slave

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 6.4.7
    • 2.5.28, 6.4.9, 22.08.8, 23.02.4
    • binlogrouter
    • None
    • OS: CentOS Stream release 8
      MariaDB: 10.11.2
      MaxScale: 6.4.7
    • MXS-SPRINT-187, MXS-SPRINT-188

    Description

      Due to the presence of longtext columns in the usersDiscoveredLessons table, a substantial amount of data ends up in the binlog. Since we are using row-based replication, every row is copied into the binlog, exacerbating the data volume. Table creation script:

      MariaDB [(none)]> show create table test_long_transaction_bug.usersDiscoveredLessons \G;
      *************************** 1. row ***************************
             Table: usersDiscoveredLessons
      Create Table: CREATE TABLE `usersDiscoveredLessons` (
        `id` int(11) NOT NULL AUTO_INCREMENT,
        `userId` int(11) unsigned NOT NULL,
        `groupId` int(11) unsigned DEFAULT NULL,
        `slideIds` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL CHECK (json_valid(`slideIds`)),
        `videoIds` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL CHECK (json_valid(`videoIds`)),
        `taskIds` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL CHECK (json_valid(`taskIds`)),
        `createdAt` timestamp NOT NULL DEFAULT current_timestamp(),
        `updatedAt` timestamp NOT NULL DEFAULT current_timestamp(),
        `deletedAt` timestamp NULL DEFAULT NULL,
        PRIMARY KEY (`id`),
        UNIQUE KEY `userId_groupId_uniq` (`userId`,`groupId`)
      ) ENGINE=InnoDB AUTO_INCREMENT=29297215 DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_unicode_ci
      1 row in set (0.001 sec)
      

      Data sample (the array of slideIds values is significantly shortened, it could include up to 500 Ids):

      MariaDB [(none)]> select * from test_long_transaction_bug.usersDiscoveredLessons LIMIT 1 \G;
      *************************** 1. row ***************************
             id: 28566982
         userId: 102592167
         groupId: 2
      slideIds: [114586208,112104238,28115292,...,96955794]
        videoIds: [112312333,123123444]
      taskIds: []
      createdAt: 2023-07-13 00:00:00
      updatedAt: 2023-07-13 00:00:00
      deletedAt: NULL
      1 row in set (0.001 sec)
      

      The size of the file from which data will be loaded into the table:

      [root@sc-db-main-stage-01 ~]# ls -lah /var/lib/mysql/user*
      -rw-r--r-- 1 root root 7.9G Jul 18 18:47 /var/lib/mysql/usersDiscoveredLessons.sql
      

      The sizes of binlogs:

      $ ansible sc-db-main-stage-0* -i inventory_stage -u lexusrules -b -m shell -a "ls -lath /var/lib/mysql/mysqld*"
      sc-db-main-stage-03 | CHANGED | rc=0 >>
      -rw-rw---- 1 mysql mysql 331M Jul 20 07:46 /var/lib/mysql/mysqld-bin.000005
      -rw-rw---- 1 mysql mysql   20 Jul 19 13:39 /var/lib/mysql/mysqld-bin.index
      sc-db-main-stage-01 | CHANGED | rc=0 >>
      -rw-rw---- 1 mysql mysql 331M Jul 20 07:46 /var/lib/mysql/mysqld-bin.000402
      -rw-rw---- 1 mysql mysql   20 Jul 19 13:39 /var/lib/mysql/mysqld-bin.index
      sc-db-main-stage-04 | CHANGED | rc=0 >>
      -rw-rw---- 1 mysql mysql 331M Jul 20 07:46 /var/lib/mysql/mysqld-bin.000008
      -rw-rw---- 1 mysql mysql   20 Jul 19 13:39 /var/lib/mysql/mysqld-bin.index
      sc-db-main-stage-02 | CHANGED | rc=0 >>
      -rw-rw---- 1 mysql mysql 331M Jul 20 07:46 /var/lib/mysql/mysqld-bin.000400
      -rw-rw---- 1 mysql mysql   20 Jul 19 13:39 /var/lib/mysql/mysqld-bin.index
      

      I am demonstrating that the table is empty after truncating and before initiating the load:

      $ ansible sc-db-main-stage-0* -i inventory_stage -u lexusrules -b -m shell -a "mysql -e 'select count(*) from test_long_transaction_bug.usersDiscoveredLessons;'"
      sc-db-main-stage-01 | CHANGED | rc=0 >>
      count(*)
      0
      sc-db-main-stage-02 | CHANGED | rc=0 >>
      count(*)
      0
      sc-db-main-stage-03 | CHANGED | rc=0 >>
      count(*)
      0
      sc-db-main-stage-04 | CHANGED | rc=0 >>
      count(*)
      0
      

      The actual execution of the load:

      [root@sc-db-main-stage-01 ~]# mysql test_long_transaction_bug -e "SET autocommit=0; source /var/lib/mysql/usersDiscoveredLessons.sql; COMMIT;" &
      

      MaxScale binlog router error:

      [root@sc-maxscale-binlog-stage-01 ~]# tail -n500 /var/log/maxscale/maxscale.log  | grep -v warning
      2023-07-20 05:10:07   error  : (1260) [MariaDBProtocol] Client (172.21.128.78) sent an invalid HandShakeResponse.
      2023-07-20 07:10:04   error  : (1293) [MariaDBProtocol] Client (172.21.128.78) sent an invalid HandShakeResponse.
      2023-07-20 07:59:08   error  : (srv_main_repl); Error received during replication from '172.18.16.194:3306': std::bad_alloc
      

      The result - the data is inconsistent:

      $ ansible sc-db-main-stage-0* -i inventory_stage -u lexusrules -b -m shell -a "mysql -e 'select count(*) from test_long_transaction_bug.usersDiscoveredLessons;'"
      sc-db-main-stage-01 | CHANGED | rc=0 >>
      count(*)
      632852
      sc-db-main-stage-02 | CHANGED | rc=0 >>
      count(*)
      632852
      sc-db-main-stage-03 | CHANGED | rc=0 >>
      count(*)
      632852
      sc-db-main-stage-04 | CHANGED | rc=0 >>
      count(*)
      0
      

      The cluster continues to operate with inconsistent data without any apparent issues:

      $ ansible sc_maxscale -i inventory_stage -u lexusrules -b -m shell -a "maxctrl list servers"
      sc-maxscale-stage-01 | CHANGED | rc=0 >>
      ┌────────────────────────┬───────────────┬──────┬─────────────┬───────────────────────┬─────────────────┐
      │ Server                 │ Address       │ Port │ Connections │ State                 │ GTID            │
      ├────────────────────────┼───────────────┼──────┼─────────────┼───────────────────────┼─────────────────┤
      │ sc-db-main-stage-01    │ 172.18.16.194 │ 3306 │ 582         │ Master, Running       │ 0-14-336882805  │
      ├────────────────────────┼───────────────┼──────┼─────────────┼───────────────────────┼─────────────────┤
      │ sc-db-main-stage-02    │ 172.18.16.103 │ 3306 │ 383         │ Slave, Running        │ 0-14-336882805  │
      ├────────────────────────┼───────────────┼──────┼─────────────┼───────────────────────┼─────────────────┤
      │ sc-db-main-stage-03    │ 172.18.16.197 │ 3306 │ 383         │ Slave, Running        │ 0-14-336882805  │
      ├────────────────────────┼───────────────┼──────┼─────────────┼───────────────────────┼─────────────────┤
      │ sc-db-main-stage-04    │ 172.18.16.114 │ 3306 │ 0           │ Maintenance, Running  │ 0-14-336882805  │
      ├────────────────────────┼───────────────┼──────┼─────────────┼───────────────────────┼─────────────────┤
      │ sc-maxscale-binlog     │ 172.18.16.69  │ 3306 │ 0           │ Binlog Relay, Running │ 0-14-336882805  │
      └────────────────────────┴───────────────┴──────┴─────────────┴───────────────────────┴─────────────────┘
      

      The standard binlog size is 1Gb, but due to large transactions, it can significantly increase (in this case, up to 17Gb). The sizes of new binlogs:

      $ ansible sc-db-main-stage-0* -i inventory_stage -u lexusrules -b -m shell -a "ls -lath /var/lib/mysql/mysqld*"
      [DEPRECATION WARNING]: [defaults]callback_whitelist option, normalizing names to new standard, use callbacks_enabled instead. This feature will be removed from ansible-core in version 2.15. Deprecation warnings
      can be disabled by setting deprecation_warnings=False in ansible.cfg.
      sc-db-main-stage-01 | CHANGED | rc=0 >>
      -rw-rw---- 1 mysql mysql 9.2M Jul 20 08:11 /var/lib/mysql/mysqld-bin.000403
      -rw-rw---- 1 mysql mysql  17G Jul 20 07:58 /var/lib/mysql/mysqld-bin.000402
      -rw-rw---- 1 mysql mysql   40 Jul 20 07:58 /var/lib/mysql/mysqld-bin.index
      sc-db-main-stage-02 | CHANGED | rc=0 >>
      -rw-rw---- 1 mysql mysql 9.2M Jul 20 08:11 /var/lib/mysql/mysqld-bin.000401
      -rw-rw---- 1 mysql mysql  17G Jul 20 08:05 /var/lib/mysql/mysqld-bin.000400
      -rw-rw---- 1 mysql mysql   40 Jul 20 08:05 /var/lib/mysql/mysqld-bin.index
      sc-db-main-stage-03 | CHANGED | rc=0 >>
      -rw-rw---- 1 mysql mysql 9.2M Jul 20 08:11 /var/lib/mysql/mysqld-bin.000006
      -rw-rw---- 1 mysql mysql  17G Jul 20 08:04 /var/lib/mysql/mysqld-bin.000005
      -rw-rw---- 1 mysql mysql   40 Jul 20 08:04 /var/lib/mysql/mysqld-bin.index
      sc-db-main-stage-04 | CHANGED | rc=0 >>
      -rw-rw---- 1 mysql mysql 348M Jul 20 08:11 /var/lib/mysql/mysqld-bin.000008
      -rw-rw---- 1 mysql mysql   20 Jul 19 13:39 /var/lib/mysql/mysqld-bin.index
      

      On version 6.4.3, we encountered the following error. Subsequently, the MaxScale binlog router became unresponsive and necessitated a restart.

      2023-06-30 13:54:36 error: (srv_main_repl); Error received during replication from '172.21.17.206:3306': Could not write final ROTATE to /var/lib/maxscale/binlogs/mysqld-bin.385707
      2023-06-30 13:56:03 error: (srv_main_repl); Error received during replication from '172.21.17.206:3306': Could not write final ROTATE to /var/lib/maxscale/binlogs/mysqld-bin.385710
      

      Attachments

        Issue Links

          Activity

            People

              markus makela markus makela
              lexusrules ALEKSEI KHAMIDOV
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.