[MDEV-16886] mysqdump --master-data=N crashes when SELECT BINLOG_GTID_POS() returns NULL for current master coordinates, for whatever reason Created: 2018-08-02  Updated: 2022-02-22

Status: Open
Project: MariaDB Server
Component/s: Backup, Scripts & Clients
Affects Version/s: 10.2.11, 10.2.16
Fix Version/s: 10.2

Type: Bug Priority: Major
Reporter: Valerii Kravchuk Assignee: Brandon Nesterenko
Resolution: Unresolved Votes: 0
Labels: None
Environment:

RHEL with kernel 3.10.0-862.3.2.el7.x86_64


Issue Links:
Blocks
is blocked by MDEV-17133 Binlog truncated in the middle of eve... Closed

 Description   

Sometimes it happens so that SELECT BINLOG_GTID_POS() returns NULL for specific master coordinates that come from SHOW MASTER STATUS:

MariaDB [(none)]> show master status;
+------------------+-----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+-----------+--------------+------------------+
| mysql-bin.001552 | 313974469 | | |
+------------------+-----------+--------------+------------------+
 
MariaDB [(none)]> SELECT BINLOG_GTID_POS("mysql-bin.001552",313974469);
+-----------------------------------------------+
| BINLOG_GTID_POS("mysql-bin.001552",313974469) |
+-----------------------------------------------+
| NULL |
+-----------------------------------------------+

It seems the code of mysqldump is not ready for this, see https://github.com/MariaDB/server/blob/10.2/client/mysqldump.c#L1298:

static int
get_gtid_pos(char *out_gtid_pos, int master)
{
  MYSQL_RES *res;
  MYSQL_ROW row;
  int found;
 
  if (mysql_query_with_error_report(mysql, &res,
                                    (master ?
                                     "SELECT @@GLOBAL.gtid_binlog_pos" :
                                     "SELECT @@GLOBAL.gtid_slave_pos")))
    return 1;
 
  found= 0;
  if ((row= mysql_fetch_row(res)))
  {
    strmake(out_gtid_pos, row[0], MAX_GTID_LENGTH-1);
    found++;
  }
  mysql_free_result(res);
 
  return (found != 1);
}

As a result, it crashes with the following backtrace:

gdb --args /apps/hpbx_db_prod/mysql/bin/mysqldump --single-transaction --default-character-set utf8 --add-drop-table --routines --master-data=2 -uroot -p******* --databases test --socket=/apps/hpbx_db_prod/mysql/mysql_pPBX1.sock
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /apps/hpbx_db_prod/mariadb-10.2.11-linux-x86_64/bin/mysqldump...done.
(gdb) run
Starting program: /apps/hpbx_db_prod/mysql/bin/mysqldump --single-transaction --default-character-set utf8 --add-drop-table --routines --master-data=2 -uroot ...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
-- MySQL dump 10.16 Distrib 10.2.11-MariaDB, for Linux (x86_64)
--
-- Host: localhost Database: test
-- ------------------------------------------------------
-- Server version 10.2.11-MariaDB-log
 
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;
 
Program received signal SIGSEGV, Segmentation fault.
strmake (dst=0x7fffffffd550 "", src=0x0, length=1022) at /home/buildbot/buildbot/build/strings/strmake.c:36
36 /home/buildbot/buildbot/build/strings/strmake.c: No such file or directory.
Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 libgcc-4.8.5-28.0.1.el7_5.1.x86_64 libstdc++-4.8.5-28.0.1.el7_5.1.x86_64
(gdb) bt
#0 strmake (dst=0x7fffffffd550 "", src=0x0, length=1022) at /home/buildbot/buildbot/build/strings/strmake.c:36
#1 0x0000000000411efa in get_binlog_gtid_pos (out_gtid_pos=<optimized out>, binlog_pos_offset=<optimized out>, binlog_pos_file=<optimized out>)
at /home/buildbot/buildbot/build/client/mysqldump.c:1295
#2 do_show_master_status (mysql_con=<optimized out>, consistent_binlog_pos=1, have_mariadb_gtid=1, use_gtid=0)
at /home/buildbot/buildbot/build/client/mysqldump.c:5115
#3 0x000000000041ba70 in main (argc=1, argv=0xb556a0) at /home/buildbot/buildbot/build/client/mysqldump.c:6094

becasue src in strmake call is 0. This should be checked and crash should be prevented. It's better to generate CHANGE MASTER with normal binary log name and position coordinates (and output a warning about the problem with getting GTID) than just crash silently.



 Comments   
Comment by Valerii Kravchuk [ 2018-08-02 ]

Sorry, it seems my conclusions based on older code, but anyway, both SELECT @@GLOBAL.gtid_binlog_pos and SELECT @@GLOBAL.gtid_slave_pos may also return NULL.according to KB (see https://mariadb.com/kb/en/library/gtid/#gtid_binlog_pos), and it seems this case is not taken into account.

Comment by Elena Stepanova [ 2018-08-02 ]

I don't see why gtid_binlog_pos would ever be NULL, so while crashing mysqldump doesn't look good, I'm afraid it's the least of our problems in this case. Maybe it's even beneficial that it crashed rather than write some garbage to the dump file and pretend that nothing happened.

Let's find out first why it is NULL.

Comment by Valerii Kravchuk [ 2018-08-02 ]

I know at least one easy way to get NULL there:

MariaDB [test]> show master status;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| pc-PC-bin.000042 |      385 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.001 sec)
 
MariaDB [test]> select  BINLOG_GTID_POS('pc-PC-bin.000042',385);
+-----------------------------------------+
| BINLOG_GTID_POS('pc-PC-bin.000042',385) |
+-----------------------------------------+
| 0-1-71                                  |
+-----------------------------------------+
1 row in set (0.035 sec)
 
MariaDB [test]> reset master;
Query OK, 0 rows affected (0.669 sec)
 
MariaDB [test]> show master status;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| pc-PC-bin.000001 |      328 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.001 sec)
 
MariaDB [test]> SELECT @@GLOBAL.gtid_binlog_pos;
+--------------------------+
| @@GLOBAL.gtid_binlog_pos |
+--------------------------+
|                          |
+--------------------------+
1 row in set (0.001 sec)
 
MariaDB [test]> select  BINLOG_GTID_POS('pc-PC-bin.000001',328);
+-----------------------------------------+
| BINLOG_GTID_POS('pc-PC-bin.000001',328) |
+-----------------------------------------+
|                                         |
+-----------------------------------------+
1 row in set (0.024 sec)

Comment by Elena Stepanova [ 2018-08-02 ]

It's not NULL, it's empty line. Empty lines are easy, but I don't expect them to cause any troubles. Do they?

Comment by Valerii Kravchuk [ 2018-08-03 ]

Indeed, it's empty string, '', and mysqldump works with that. So, we still have to figure out what exactly happens in that crashing case. Manual clearly states, though, that we may get the value of NULL, and mysqldump is not ready for that.

Comment by Valerii Kravchuk [ 2018-08-03 ]

Important details: this started to happen since kernel upgrade from kernel-3.10.0-327.18.2.el7.x86_64 to 3.10.0-862.3.2.el7.x86_64 (Meltdown related).

Comment by Elena Stepanova [ 2018-08-03 ]

I'm assigning it to sachin.setiya.007 to fix the mysqldump crash, although I still think that it is merely a cosmetic issue – I agree, it doesn't look good when it crashes, but nothing bad really happens. It's not the server where each crash is a temporary DoS at least, and potential data corruption at worst; and on the bright side, with the failing mysqldump there are more chances that the problem won't be overlooked.

The underlying issue, that BINLOG_GTID_POS and MASTER STATUS somehow went out of sync, seems much more severe. mysqldump, if it runs with options which make it be aware of master/slave status, must fail if such discrepancy occur, the only difference should be that it produces a proper error message.

Comment by Sachin Setiya (Inactive) [ 2018-09-05 ]

This will solve the mysqldump issue , but why we are getting gtid null it will require more investigation

diff --git a/client/mysqldump.c b/client/mysqldump.c
index f67e52c1b12..e504d5c8536 100644
--- a/client/mysqldump.c
+++ b/client/mysqldump.c
@@ -1289,7 +1289,7 @@ get_binlog_gtid_pos(char *binlog_pos_file, char *binlog_pos_offset,
     return err;
 
   err= 1;
-  if ((row= mysql_fetch_row(res)))
+  if ((row= mysql_fetch_row(res)) && row[0])
   {
     strmake(out_gtid_pos, row[0], MAX_GTID_LENGTH-1);
     err= 0;
@@ -1324,7 +1324,7 @@ get_gtid_pos(char *out_gtid_pos, int master)
     return 1;
 
   found= 0;
-  if ((row= mysql_fetch_row(res)))
+  if ((row= mysql_fetch_row(res)) && row[0])
   {
     strmake(out_gtid_pos, row[0], MAX_GTID_LENGTH-1);
     found++;

Comment by Andrei Elkin [ 2018-09-26 ]

sachin.setiya.007 Something like

set @@global.gtid_binlog_state='0-1000-1' /* no such gtid really in binlog*/;

would make NULL out of the function.
However I couldn't call up any crash with doing that and running

mysqldump --master-data=$i --host=127.0.0.1 --port=14280 --user=root  --gtid test

where $i=1,2.

Comment by Andrei Elkin [ 2018-10-08 ]

To refine my previous comment, actually "physical" presence of gtid is not necessary and mere a reference
to it via Gtid_list suffices. NULL out of get_binlog_gtid_pos() must be caused by

SHOW STATUS LIKE 'binlog_snapshot_%'

results which provide the functions' arguments. E.g in case the offset does not point at the end of an event (e.g to an non-existing "future" byte) the function indeed returns NULL.

Comment by Sachin Setiya (Inactive) [ 2018-10-08 ]

Hi!

Can you provide "Show binlog events " for latest binary log file
Show status like 'binlod_snapshot%'
and
gtid_binlog_pos
gtid_binlog_state

Comment by Andrei Elkin [ 2018-10-16 ]

Seeing the binlog file:pos as correct we shall suspect reading "dirty" data from binlog by SELECT BINLOG_GTID_POS() which seems to be possible through MDEV-17133 (the patch is in review). The SELECT function processes binlog files pretty much similarly to the Dump
thread. The MDEV-17133 patch is in review, so I suggest we make that one out and try it here.

Comment by Andrei Elkin [ 2018-10-16 ]

'Relates to' is set only to retain the benefit of a doubt. MDEV-17133 must be the actual issue causing this one. I suggest we wait for its fixes (in review) and try out its patch here.

Generated at Thu Feb 08 08:32:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.