[MDEV-7268] Column of table cannot be converted from type 'decimal(0,?)' to type ' 'decimal(10,7)' Created: 2014-12-05  Updated: 2023-12-11  Resolved: 2015-06-09

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 5.5.40
Fix Version/s: 5.5.44

Type: Bug Priority: Major
Reporter: Daniel Black Assignee: Alexander Barkov
Resolution: Fixed Votes: 1
Labels: None
Environment:

centos-6.5
x86_64


Attachments: File lightning.bin     File master-lightning.frm     File slave-_lightning_old.frm     File slave-lightning.frm    
Issue Links:
Blocks
is blocked by MDEV-32987 CREATE TABLE ... LIKE is not document... Closed
Sprint: 5.5.44

 Description   

If the middle of a pt-online-schema change using --set-vars binlogformat=ROW the following replication error.

The table was created on 5.0.34 and has since been upgraded over time to 5.5.29. Could it be the column is the old decimal format?

http://forums.mysql.com/read.php?26,619649,619718#msg-619718 references (Bug #16416302).

Server version: 5.5.29-MariaDB-log MariaDB Server
Same effects where observed on 5.5.40 slave.

MariaDB [weather]> show slave status\G 
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.124.35
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.046692
          Read_Master_Log_Pos: 67654425
               Relay_Log_File: mysqld-relay-bin.930329
                Relay_Log_Pos: 30748354
        Relay_Master_Log_File: mysql-bin.046557
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1677
                   Last_Error: Column 3 of table 'weather.lightning' cannot be converted from type 'decimal(0,?)' to type 'decimal(10,7)'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 30748070
              Relay_Log_Space: 14291648459
              Until_Condition: None

on master:

mysqlbinlog --start-position 30748070  mysql-bin.046557 | more
 
BEGIN
/*!*/;
# at 30748141
# at 30748202
# at 30748274
#141204 10:12:43 server id 1  end_log_pos 30748202      Table_map: `weather`.`lightning` mapped to number 177647
#141204 10:12:43 server id 1  end_log_pos 30748274      Table_map: `weather`.`_lightning_new` mapped to number 177646
#141204 10:12:43 server id 1  end_log_pos 30748356      Delete_rows: table id 177647 flags: STMT_END_F
 
BINLOG '
65h/VBMBAAAAPQAAACou1QEAAO+1AgAAAAEAB3dlYXRoZXIACWxpZ2h0bmluZwAHAw8MAAAADwQK
AAoAYA==
65h/VBMBAAAASAAAAHIu1QEAAO61AgAAAAEAB3dlYXRoZXIADl9saWdodG5pbmdfbmV3AAcDDwz2
9vYPCgoACgcKBwUBCgBg
65h/VBkBAAAAUgAAAMQu1QEAAO+1AgAAAAEAB/+AG4GOBAVHUEFUU5OG1tNFEgAAIC0zMy42MzA3
NDc5IDE1Mi44MzIwNjk3ICAtNDEuMgJHUw==
'/*!*/;
# at 30748356
#141204 10:12:43 server id 1  end_log_pos 30748383      Xid = 12786901894
COMMIT/*!*/;

decoded version

mysqlbinlog --start-position 30748070  --verbose --base64-output=decode-rows mysql-bin.046557 | more
 
BEGIN
/*!*/;
# at 30748141
# at 30748202
# at 30748274
#141204 10:12:43 server id 1  end_log_pos 30748202      Table_map: `weather`.`lightning` mapped to number 177647
#141204 10:12:43 server id 1  end_log_pos 30748274      Table_map: `weather`.`_lightning_new` mapped to number 177646
#141204 10:12:43 server id 1  end_log_pos 30748356      Delete_rows: table id 177647 flags: STMT_END_F
### DELETE FROM `weather`.`lightning`
### WHERE
###   @1=76448027
###   @2='GPATS'
###   @3=2009-11-16 09:51:23
###   @4=!! Don't know how to handle column type=0 meta=0 (0000)# at 30748356
#141204 10:12:43 server id 1  end_log_pos 30748383      Xid = 12786901894
COMMIT/*!*/;
 

| lightning | CREATE TABLE `lightning` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `source` varchar(10) NOT NULL DEFAULT '',
  `timestamp_utc` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `latitude` decimal(10,7) NOT NULL DEFAULT '0.0000000',
  `longitude` decimal(10,7) NOT NULL DEFAULT '0.0000000',
  `amperage` decimal(5,1) DEFAULT NULL,
  `event_type` varchar(10) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `idx_lightning_1` (`source`,`timestamp_utc`,`latitude`,`longitude`)
) ENGINE=InnoDB AUTO_INCREMENT=484282187 DEFAULT CHARSET=latin1 MAX_ROWS=1000000000 ROW_FORMAT=DYNAMIC |
 
 
| _lightning_new | CREATE TABLE `_lightning_new` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `source` varchar(10) NOT NULL DEFAULT '',
  `timestamp_utc` datetime(3) NOT NULL COMMENT 'UTC time (millisecond precision)',
  `latitude` decimal(10,7) NOT NULL DEFAULT '0.0000000',
  `longitude` decimal(10,7) NOT NULL DEFAULT '0.0000000',
  `amperage` decimal(5,1) DEFAULT NULL,
  `event_type` varchar(10) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `idx_lightning_1` (`source`,`timestamp_utc`,`latitude`,`longitude`)
) ENGINE=InnoDB AUTO_INCREMENT=484282186 DEFAULT CHARSET=latin1 MAX_ROWS=1000000000 ROW_FORMAT=DYNAMIC |

After changing the format to doubles I had different conversion error at the same position.

alter table  _lightning_new modify latitude double  NOT NULL DEFAULT 0.0000000,modify  longitude  double  NOT NULL DEFAULT 0.0000000, modify amperage double DEFAULT NULL;
 
slart slave;
 
MariaDB [weather]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.124.35
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.046693
          Read_Master_Log_Pos: 34510815
               Relay_Log_File: mysqld-relay-bin.930329
                Relay_Log_Pos: 30748354
        Relay_Master_Log_File: mysql-bin.046557
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1677
                   Last_Error: Column 3 of table 'weather._lightning_new' cannot be converted from type 'decimal(10,7)' to type 'double'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 30748070
              Relay_Log_Space: 14363364234
              Until_Condition: None



 Comments   
Comment by Daniel Black [ 2014-12-05 ]

set global slave_type_conversions='ALL_NON_LOSSY,ALL_LOSSY';
start slave;

141205  9:25:11 [ERROR] Slave SQL: Column 3 of table 'weather.lightning' cannot be converted from type 'decimal(0,?)' to type 'decimal(10,7)', Error_code: 1677
141205  9:25:11 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.046557' positio
n 30748070
141205  9:26:56 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.046557' at position 30748070, relay log '/var/mysql_log/mysqld-relay-bin.930329' position: 30
748354
141205  9:26:56 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 5.5.29-MariaDB-log
key_buffer_size=134217728
read_buffer_size=8388608
max_used_connections=12
max_threads=202
thread_count=1
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3443474 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x7fafec0008c0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fafffb4ed98 thread_stack 0x48000
??:0(my_print_stacktrace)[0xa975ee]
??:0(handle_fatal_signal)[0x6d4fcc]
??:0(??)[0x329080f710]
??:0(Field_decimal::store(char const*, unsigned int, charset_info_st const*))[0x6cb812]
??:0(Field_decimal::reset())[0x6cc3f8]
??:0(create_virtual_tmp_table(THD*, List<Create_field>&))[0x5a19d1]
??:0(table_def::create_conversion_table(THD*, Relay_log_info*, TABLE*) const)[0x62f486]
??:0(table_def::compatible_with(THD*, Relay_log_info*, TABLE*, TABLE**) const)[0x62ffa9]
??:0(Rows_log_event::do_apply_event(Relay_log_info const*))[0x7aeb49]
??:0(apply_event_and_update_pos(Log_event*, THD*, Relay_log_info*))[0x50df22]
??:0(exec_relay_log_event(THD*, Relay_log_info*))[0x515c30]
??:0(handle_slave_sql)[0x516f44]
??:0(??)[0x32908079d1]
??:0(??)[0x32904e8b6d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): is an invalid pointer
Connection ID (thread ID): 126479
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_conditi
on_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,s
ubquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_j
oin_buffer_size=off,table_elimination=on,extended_keys=off

Comment by Elena Stepanova [ 2014-12-05 ]

Hi Daniel,

What exactly pt-online-schema-change was executing?
I assume it is relevant somehow since you mentioned it; but I neither see the actual change it was performing, nor whether it was done on master or on slave. Could you please specify?

Thanks.

Comment by Daniel Black [ 2014-12-05 ]

Was performed on the master

pt-online-schema-change --execute  --set-vars binlog_format=ROW --check-interval=9 --max-lag=12 --chunk-time=5 
 --alter "MODIFY timestamp_utc DATETIME(3) NULL COMMENT 'UTC time (millisecond precision)'" D=weather,t=lightning

Other examples I saw indicate that the binlog entry in a insert into lightning and a corresponding insert into _lightning_new.

I then changed the slave directly with

pt-online-schema-change  --execute --no-drop-old-table --recursion-method=none --set-vars sql_log_bin=0 --chunk-size=1000 --alter "MODIFY timestamp_utc DATETIME(3) NULL COMMENT 'UTC time (millisecond precision)'" D=weather,t=lightning,u=openquery,p=xxx,h=localhost
 
recreated the _lightning_new table
create table _lightning_new like lightning;

To enable the replication to continue. Even with these changes the row based replication entry wouldn't process and the same replication error occurred.

Other positions and their binlog decodes:

MariaDB [(none)]> set global sql_slave_skip_counter=1; start slave; select sleep(1); show slave status\G
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.124.35
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.046720
          Read_Master_Log_Pos: 41006219
               Relay_Log_File: mysqld-relay-bin.930329
                Relay_Log_Pos: 30749956
        Relay_Master_Log_File: mysql-bin.046557
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1677
                   Last_Error: Column 3 of table 'weather.lightning' cannot be converted from type 'decimal(0,?)' to type 'decimal(10,7)'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 30749672
              Relay_Log_Space: 17201077980
 
#141204 10:12:43 server id 1  end_log_pos 30749804      Table_map: `weather`.`lightning` mapped to number 177649
#141204 10:12:43 server id 1  end_log_pos 30749876      Table_map: `weather`.`_lightning_new` mapped to number 177646
#141204 10:12:43 server id 1  end_log_pos 30749958      Write_rows: table id 177649
#141204 10:12:43 server id 1  end_log_pos 30750023      Write_rows: table id 177646 flags: STMT_END_F
### INSERT INTO `weather`.`lightning`
### SET
###   @1=484282188
###   @2='GPATS'
###   @3=2014-12-03 23:12:36
###   @4=!! Don't know how to handle column type=0 meta=0 (0000)### INSERT INTO `weather`.`_lightning_new`
### SET
###   @1=484282188
###   @2='GPATS'
###   @3=919636211-25-91 28:24:32
###   @4=000000000
###   @5=000000000
###   @6=000028157.200000000
###   @7='SB/+A1\x00\x00\x00\x00\x00\x00\x00\x03\x0f\x0c���\x0f\x00\x00\x00\x0a\x00\x00\x00\x07\x0a\x07\x0a\x01\x05\x0a\x004y`DMw\x0aMzEzIDEyOC41\x00\x00\x00\x00\x00\x00\x00�~BqI\x00\x0
0�~'
# at 30750023
#141204 10:12:43 server id 1  end_log_pos 30750050      Xid = 12786901928
COMMIT/*!*/;

 set global sql_slave_skip_counter=1; start slave; select sleep(1); show slave status\G
 
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.124.35
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.046720
          Read_Master_Log_Pos: 74418347
               Relay_Log_File: mysqld-relay-bin.930329
                Relay_Log_Pos: 30750334
        Relay_Master_Log_File: mysql-bin.046557
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1677
                   Last_Error: Column 3 of table 'weather.lightning' cannot be converted from type 'decimal(0,?)' to type 'decimal(10,7)'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 30750050
              Relay_Log_Space: 17234490108
 
#141204 10:12:43 server id 1  end_log_pos 30750182      Table_map: `weather`.`lightning` mapped to number 177649
#141204 10:12:43 server id 1  end_log_pos 30750254      Table_map: `weather`.`_lightning_new` mapped to number 177646
#141204 10:12:43 server id 1  end_log_pos 30750336      Write_rows: table id 177649
#141204 10:12:43 server id 1  end_log_pos 30750401      Write_rows: table id 177646 flags: STMT_END_F
### INSERT INTO `weather`.`lightning`
### SET
###   @1=484282189
###   @2='GPATS'
###   @3=2014-12-03 23:12:36
###   @4=!! Don't know how to handle column type=0 meta=0 (0000)### INSERT INTO `weather`.`_lightning_new`
### SET
###   @1=484282189
###   @2='GPATS'
###   @3=919636211-25-91 28:24:32
###   @4=000000000
###   @5=000000000
###   @6=000028412.200000000

Because I still had the _lightning_old table I created a table with create table lightning_test like _lightning_old and did a diff on the frm files and they where identical. I suspect that means that they where both in the new decimal format.

I ended up adding

set global replicate_ignore_table='weather.lightning,weather._lightning_new'; start slave;

Which successfully ignored all those tables and caught up.

Comment by Daniel Black [ 2014-12-06 ]

attached: extract of binlogs on positions mentioned above from the relay log

Comment by Daniel Black [ 2014-12-06 ]

from master:

Jun 21 2011 lightning.frm

lightning CREATE TABLE `lightning` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`source` varchar(10) NOT NULL DEFAULT '',
`timestamp_utc` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`latitude` decimal(10,7) NOT NULL DEFAULT '0.0000000',
`longitude` decimal(10,7) NOT NULL DEFAULT '0.0000000',
`amperage` decimal(5,1) DEFAULT NULL,
`event_type` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `idx_lightning_1` (`source`,`timestamp_utc`,`latitude`,`longitude`)
) ENGINE=InnoDB AUTO_INCREMENT=489156411 DEFAULT CHARSET=latin1 MAX_ROWS=1000000000 ROW_FORMAT=DYNAMIC

From slave:

Dec 5 12:17 lightning.frm
May 11 2014 _lightning_old.frm

date of lightning_old corresponds to about when the test server was created (from mysqldump from memory)

Comment by Daniel Black [ 2014-12-07 ]

I've removed the replication filter and replication is flowing freely on updates to the lightning table. The cause of the replication fail seems to be some interaction between the original table and the trigger on the _lightning_new table.

Comment by Elena Stepanova [ 2014-12-08 ]

So, there are three mysteries here (just describing them as intermediate notes so I don't forget).

First, the master-lightning.frm has columns with type DECIMAL.
It shouldn't happen if the table was initially created in 5.0.x – it has type NEWDECIMAL (as slave-lightning.frm has).
I went back to 5.0.27 and still it creates the table with NEWDECIMAL.
The mystery is how to get DECIMAL on 5.x.

Second, the master-lightning.frm shows MySQL version 5.2.6 (slave-lightning.frm shows 5.5.29 as expected). It means that table on master hasn't been upgraded for quite a while – which can be true, but then, why the problem only appeared now?

Third, according to the very first error, it must have happened before pt-online-schema-change did anything to the table. All versions of the script that I checked do the following (in the given order):

  • create _table_new;
  • alter _table_new;
  • create after-update/delete/insert triggers from table to _table_new;
  • insert into _table_new select from table;
  • rename table to _table_old, _table_new to table;
  • drop _table_old.

In the binlog we see that DELETE touched two tables, lightning and _lightning_new. It means that the DELETE happened after triggers were created, either before or after _lightning_new was populated (the next error, with doubles, suggests it was after, but we can't say for sure since the event is shown corrupted); but it certainly happened before lightning was overridden. And obviously since it didn't happen on master yet, it couldn't have happened on the slave. So, whatever type mismatch existed between the table on master and slave, it shouldn't have anything to do with pt-online-schema-change. But such a coincidence is almost unthinkable; and besides, the problem should have happened many times before. Hence the third mystery.

Comment by Elena Stepanova [ 2014-12-08 ]

danblack,

Questions:

Is it possible at all that the table was initially created with MySQL 4.1, not 5.0?

The attached master-lightning.frm is the old table structure, without microseconds in timestamp_utc; and in your initial description, you mention two tables on master, lightning and _lightning_new.
But with the commandline for pt-online-schema-change that you quoted as executed on master, it shouldn't be possible – it should have renamed lightning to _lightning_old, _lightning_new to lightning and then should have dropped _lightning_old. Can you shed some light on it? Is it possible that pt-online-schema-change actually failed (partially failed) on the master?

Which version of pt-online-schema-change do you use?

Can you check the binlog (if you still have it) to see which parts of the schema change process made it to the log before the DELETE event that broke replication?
By pt-online-schema-change process I mean creation of _lightning_new, triggers, populating _lightning_new, renaming, dropping _lightning_old, and whatever happened in between.

Comment by Daniel Black [ 2014-12-09 ]

> Is it possible at all that the table was initially created with MySQL 4.1, not 5.0?
> The mystery is how to get DECIMAL on 5.x.

I can't see any other realistic option other than the original server was before 5.0.3. Differs from the client account however.

> master-lightning.frm shows MySQL version 5.2.6 (slave-lightning.frm shows 5.5.29 as expected). It means that table on master hasn't been upgraded for quite a while

yes.

> – which can be true, but then, why the problem only appeared now?

despite the binlogs events appearing corrupted by mysqlbinlog, it appears they contain two transactions, one on lightning, and one on _lightning_new

> The attached master-lightning.frm is the old table structure,.. Can you shed some light on it? Is it possible that pt-online-schema-change actually failed (partially failed) on the master?

Yes. it did. It was aborted and triggers etc where removed before it completed.

> version of pt-online-schema-change

pt-online-schema-change 2.1.9

(latest is 2.1.12 which i just updated too).

only upstream known bug is "Fixed bug 1340728: pt-online-schema-change doesn’t work with HASH indexes" (ref: http://www.percona.com/doc/percona-toolkit/2.2/release_notes.html#v2-2-10-released-2014-08-06
)

> Can you check the binlog (if you still have it) to see which parts of the schema change process made it to the log before the DELETE event that broke replication?

mysqlbinlog --base64-output=decode-rows --stop-position 30749956 --verbose mysqld-relay-bin.930329 | tail -n 300

#141204 10:12:43 server id 1  end_log_pos 30746281      Query   thread_id=72014676      exec_time=0     error_code=0
SET TIMESTAMP=1417648363/*!*/;
BEGIN
/*!*/;
# at 30746565
#141204 10:12:43 server id 1  end_log_pos 30746492      Query   thread_id=72014676      exec_time=0     error_code=0
SET TIMESTAMP=1417648363/*!*/;
DELETE FROM lightning
WHERE latitude = '-14.7705896'
AND longitude = '126.2087432'
AND source = 'GPATS'
AND timestamp_utc = '2009-11-16 06:34:56'
/*!*/;
# at 30746776
#141204 10:12:43 server id 1  end_log_pos 30746519      Xid = 12786901782
COMMIT/*!*/;
# at 30746803
#141204 10:12:43 server id 1  end_log_pos 30747188      Query   thread_id=72021643      exec_time=0     error_code=0
SET TIMESTAMP=1417648363/*!*/;
SET @@session.sql_mode=1074266112/*!*/;
CREATE TABLE `weather`.`_lightning_new` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `source` varchar(10) NOT NULL DEFAULT '',
  `timestamp_utc` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `latitude` decimal(10,7) NOT NULL DEFAULT '0.0000000',
  `longitude` decimal(10,7) NOT NULL DEFAULT '0.0000000',
  `amperage` decimal(5,1) DEFAULT NULL,
  `event_type` varchar(10) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `idx_lightning_1` (`source`,`timestamp_utc`,`latitude`,`longitude`)
) ENGINE=InnoDB AUTO_INCREMENT=484282186 DEFAULT CHARSET=latin1 MAX_ROWS=1000000000 ROW_FORMAT=DYNAMIC
/*!*/;
# at 30747472
#141204 10:12:43 server id 1  end_log_pos 30747377      Query   thread_id=72021643      exec_time=0     error_code=0
SET TIMESTAMP=1417648363/*!*/;
ALTER TABLE `weather`.`_lightning_new` MODIFY timestamp_utc DATETIME(3) NOT NULL COMMENT 'UTC time (millisecond precision)'
/*!*/;
# at 30747661
#141204 10:12:43 server id 1  end_log_pos 30747677      Query   thread_id=72021643      exec_time=0     error_code=0
SET TIMESTAMP=1417648363/*!*/;
CREATE DEFINER=`dba`@`localhost` TRIGGER `pt_osc_weather_lightning_del` AFTER DELETE ON `weather`.`lightning` FOR EACH ROW DELETE IGNORE FROM `weather`.`_lightning_new` WHERE `weather`.`_lightning_new`.`id` <=> OLD.`id`
/*!*/;
# at 30747961
#141204 10:12:43 server id 1  end_log_pos 30747748      Query   thread_id=72021573      exec_time=0     error_code=0
SET TIMESTAMP=1417648363/*!*/;
SET @@session.sql_mode=1073741824/*!*/;
BEGIN
/*!*/;
# at 30748032
#141204 10:12:43 server id 1  end_log_pos 30747776      Intvar
SET INSERT_ID=484282186/*!*/;
# at 30748060
#141204 10:12:43 server id 1  end_log_pos 30748043      Query   thread_id=72021573      exec_time=0     error_code=0
SET TIMESTAMP=1417648363/*!*/;
insert into lightning (timestamp_utc, latitude, longitude, amperage, source, event_type) values ('2014-12-03 23:12:36.720', round(-13.2426187, 7), round(128.2046041, 7), round(-17.5, 1), 'GPATS', 'GS')
/*!*/;
# at 30748327
#141204 10:12:43 server id 1  end_log_pos 30748070      Xid = 12786901867
COMMIT/*!*/;
# at 30748354
#141204 10:12:43 server id 1  end_log_pos 30748141      Query   thread_id=72014676      exec_time=0     error_code=0
SET TIMESTAMP=1417648363/*!*/;
BEGIN
/*!*/;
# at 30748425
# at 30748486
# at 30748558
#141204 10:12:43 server id 1  end_log_pos 30748202      Table_map: `weather`.`lightning` mapped to number 177647
#141204 10:12:43 server id 1  end_log_pos 30748274      Table_map: `weather`.`_lightning_new` mapped to number 177646
#141204 10:12:43 server id 1  end_log_pos 30748356      Delete_rows: table id 177647 flags: STMT_END_F
### DELETE FROM `weather`.`lightning`
### WHERE
###   @1=76448027
###   @2='GPATS'
###   @3=2009-11-16 09:51:23
###   @4=!! Don't know how to handle column type=0 meta=0 (0000)# at 30748640
#141204 10:12:43 server id 1  end_log_pos 30748383      Xid = 12786901894
COMMIT/*!*/;
# at 30748667
#141204 10:12:43 server id 1  end_log_pos 30748831      Query   thread_id=72021643      exec_time=0     error_code=0
SET TIMESTAMP=1417648363/*!*/;
SET @@session.sql_mode=1074266112/*!*/;
CREATE DEFINER=`dba`@`localhost` TRIGGER `pt_osc_weather_lightning_upd` AFTER UPDATE ON `weather`.`lightning` FOR EACH ROW REPLACE INTO `weather`.`_lightning_new` (`id`, `source`, `timestamp_utc`, `latitude`, `longitude`, `amperage`, `event_type`) VALUES (NEW.`id`, NEW.`source`, NEW.`timestamp_utc`, NEW.`latitude`, NEW.`longitude`, NEW.`amperage`, NEW.`event_type`)
/*!*/;
# at 30749115
#141204 10:12:43 server id 1  end_log_pos 30748902      Query   thread_id=72021573      exec_time=0     error_code=0
SET TIMESTAMP=1417648363/*!*/;
SET @@session.sql_mode=1073741824/*!*/;
BEGIN
/*!*/;
# at 30749186
#141204 10:12:43 server id 1  end_log_pos 30748930      Intvar
SET INSERT_ID=484282187/*!*/;
# at 30749214
#141204 10:12:43 server id 1  end_log_pos 30749197      Query   thread_id=72021573      exec_time=0     error_code=0
SET TIMESTAMP=1417648363/*!*/;
insert into lightning (timestamp_utc, latitude, longitude, amperage, source, event_type) values ('2014-12-03 23:12:36.752', round(-13.2328134, 7), round(128.1822273, 7), round(-14.1, 1), 'GPATS', 'GS')
/*!*/;
# at 30749481
#141204 10:12:43 server id 1  end_log_pos 30749224      Xid = 12786901916
COMMIT/*!*/;
# at 30749508
#141204 10:12:43 server id 1  end_log_pos 30749672      Query   thread_id=72021643      exec_time=0     error_code=0
SET TIMESTAMP=1417648363/*!*/;
SET @@session.sql_mode=1074266112/*!*/;
CREATE DEFINER=`dba`@`localhost` TRIGGER `pt_osc_weather_lightning_ins` AFTER INSERT ON `weather`.`lightning` FOR EACH ROW REPLACE INTO `weather`.`_lightning_new` (`id`, `source`, `timestamp_utc`, `latitude`, `longitude`, `amperage`, `event_type`) VALUES (NEW.`id`, NEW.`source`, NEW.`timestamp_utc`, NEW.`latitude`, NEW.`longitude`, NEW.`amperage`, NEW.`event_type`)
/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

 
 mysqlbinlog --base64-output=decode-rows --start-position 30749956 --verbose mysqld-relay-bin.930329 | more
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 30749956
#141204 10:12:43 server id 1  end_log_pos 30749743      Query   thread_id=72021573      exec_time=0     error_code=0
SET TIMESTAMP=1417648363/*!*/;
SET @@session.pseudo_thread_id=72021573/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1073741824/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C latin1 *//*!*/;
SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.collation_server=8/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 30750027
# at 30750088
# at 30750160
# at 30750242
#141204 10:12:43 server id 1  end_log_pos 30749804      Table_map: `weather`.`lightning` mapped to number 177649
#141204 10:12:43 server id 1  end_log_pos 30749876      Table_map: `weather`.`_lightning_new` mapped to number 177646
#141204 10:12:43 server id 1  end_log_pos 30749958      Write_rows: table id 177649
#141204 10:12:43 server id 1  end_log_pos 30750023      Write_rows: table id 177646 flags: STMT_END_F
### INSERT INTO `weather`.`lightning`
### SET
###   @1=484282188
###   @2='GPATS'
###   @3=2014-12-03 23:12:36
###   @4=!! Don't know how to handle column type=0 meta=0 (0000)### INSERT INTO `weather`.`_lightning_new`
### SET
###   @1=484282188
###   @2='GPATS'
###   @3=919636211-25-91 28:24:32
###   @4=000000000
###   @5=000000000
###   @6=000028157.200000000
###   @7=***Corrupted replication event was detected. Not printing the value***
# at 30750307
#141204 10:12:43 server id 1  end_log_pos 30750050      Xid = 12786901928
COMMIT/*!*/;
# at 30750334
#141204 10:12:43 server id 1  end_log_pos 30750121      Query   thread_id=72021573      exec_time=0     error_code=0
SET TIMESTAMP=1417648363/*!*/;
BEGIN
/*!*/;
# at 30750405
# at 30750466
# at 30750538
# at 30750620
#141204 10:12:43 server id 1  end_log_pos 30750182      Table_map: `weather`.`lightning` mapped to number 177649
#141204 10:12:43 server id 1  end_log_pos 30750254      Table_map: `weather`.`_lightning_new` mapped to number 177646
#141204 10:12:43 server id 1  end_log_pos 30750336      Write_rows: table id 177649
#141204 10:12:43 server id 1  end_log_pos 30750401      Write_rows: table id 177646 flags: STMT_END_F
### INSERT INTO `weather`.`lightning`
### SET
###   @1=484282189
###   @2='GPATS'
###   @3=2014-12-03 23:12:36
###   @4=!! Don't know how to handle column type=0 meta=0 (0000)### INSERT INTO `weather`.`_lightning_new`
### SET
###   @1=484282189
###   @2='GPATS'
###   @3=919636211-25-91 28:24:32
###   @4=000000000
###   @5=000000000
###   @6=000028412.200000000
###   @7=***Corrupted replication event was detected. Not printing the value***
# at 30750685
#141204 10:12:43 server id 1  end_log_pos 30750428      Xid = 12786901939
COMMIT/*!*/;
# at 30750712
#141204 10:12:43 server id 1  end_log_pos 30750499      Query   thread_id=72021661      exec_time=0     error_code=0
SET TIMESTAMP=1417648363/*!*/;
BEGIN
/*!*/;
# at 30750783
#141204 10:12:43 server id 1  end_log_pos 30750527      Intvar
SET INSERT_ID=36453078/*!*/;
# at 30750811
#141204 10:12:43 server id 1  end_log_pos 30751012      Query   thread_id=72021661      exec_time=0     error_code=0
use `weather`/*!*/;
SET TIMESTAMP=1417648363.748530/*!*/;
SET @@session.time_zone='SYSTEM'/*!*/;
INSERT INTO aviation_obs
....
# at 30751296
#141204 10:12:43 server id 1  end_log_pos 30751040      Intvar
SET INSERT_ID=310074628/*!*/;
# at 30751324
#141204 10:12:43 server id 1  end_log_pos 30751392      Query   thread_id=72021661      exec_time=0     error_code=0
SET TIMESTAMP=1417648363.750896/*!*/;
INSERT INTO aviation_obs_elements
...
# at 30755084
#141204 10:12:43 server id 1  end_log_pos 30754871      Query   thread_id=72014676      exec_time=0     error_code=0
SET TIMESTAMP=1417648363/*!*/;
BEGIN
/*!*/;
# at 30755155
# at 30755216
# at 30755288
#141204 10:12:43 server id 1  end_log_pos 30754932      Table_map: `weather`.`lightning` mapped to number 177649
#141204 10:12:43 server id 1  end_log_pos 30755004      Table_map: `weather`.`_lightning_new` mapped to number 177646
#141204 10:12:43 server id 1  end_log_pos 30755086      Delete_rows: table id 177649 flags: STMT_END_F
### DELETE FROM `weather`.`lightning`
### WHERE
###   @1=76432279
###   @2='GPATS'
###   @3=2009-11-16 06:45:03
###   @4=!! Don't know how to handle column type=0 meta=0 (0000)# at 30755370
#141204 10:12:43 server id 1  end_log_pos 30755113      Xid = 12786902038
COMMIT/*!*/;
# at 30755397
 
....
 
 

Comment by Elena Stepanova [ 2014-12-09 ]

Okay, it starts making sense now.
Lets assume for the time being that the table was created in 4.x (it might still not be true, but then there must be a hole in 5.x that allows creating DECIMAL columns; so it would be very interesting to make sure with absolute certainty that the table origins from 5.0 – in this case we need to search for the hole).

Mystery 1 from the earlier comment:

So, if the table origins from 4.1, the appearance of DECIMAL is natural.
The fact that it survived as DECIMAL till now has already been explained – even mysql_upgrade or ALTER don't change that.

Mysteries 2 and 3:

The table on the master has version 5.2.6 which is apparently last time it was ALTER-ed.
On the other hand, the table on the slave has version 5.5.29, and both slave-lightning and slave-_lightning_old have NEWDECIMAL. Most likely the slave was initialized at some point on a 5.x version, e.g. from a master dump, so the tables were created with new fields. So, the type difference has been there for quite a while.

The problem only appeared now because apparently the server uses binlog_format=mixed (is it so?). While it was operating normally, updates to the `lightning` table were SBR-safe and thus were written to the binary log in the statement format (see the binlog fragment in the last comment). They were applied as such on the slave, internal type difference didn't matter, and the life went on.
But as soon as pt-online-schema-change created triggers on the table, the same updates became unsafe for SBR, and they started getting converted into the row format. And row format is where the problem with type mismatch shows up.

And for even more clarification: why exactly pt-online-schema-change failed? What was the error?

Comment by Elena Stepanova [ 2014-12-09 ]

The "what's next" part.

The workaround for the problem is pretty obvious: do the job that pt-online-schema-change was trying to do – create a new table via the full create table statement (not via CREATE TABLE .. LIKE), alter it as you need, populate from the old table, and then switch to the new one.

For actual fixing, I see two points of misbehavior here. I don't know which of them can and must be fixed, so I will reassign it to the datatype expert bar to make the decision.

Problem 1: the deprecated datatype does not get upgraded.
When mysql_upgrade from a 5.x version checks the table taken from 4.1, it does actually notice that something is seriously wrong with it, and says that ALTER TABLE .. FORCE must be performed (and it actually performs it). However, ALTER TABLE .. FORCE does not change DECIMAL to NEWDECIMAL. Maybe it should?

Problem 2: When a row event which was written for DECIMAL is applied to NEWDECIMAL, the described error occurs. It doesn't seem reasonable, I suppose the slave can handle it better.

Comment by Daniel Black [ 2014-12-10 ]

> Most likely the slave was initialized at some point on a 5.x version, e.g. from a master dump

Yes it was.

> why exactly pt-online-schema-change failed? What was the error?

It was user aborted due to slave errors.

> server uses binlog_format=mixed (is it so?

Yes. In fact this in very new too (last 1-2 weeks ago).

> Problem 1
> ALTER TABLE .. FORCE does not change DECIMAL to NEWDECIMAL. Maybe it should

Only its really needed, with some really big warnings, or leave it for the user to fix, a large table alter can take several hours compared to what is normally a really quick mysql_update.

> Problem 2: When a row event which was written for DECIMAL is applied to NEWDECIMAL, the described error occurs. It doesn't seem reasonable, I suppose the slave can handle it better.

Especially considering its handing it now fine. I guess its reverted to SBR method since its still mixed.

Problem 3:

seg fault with set global slave_type_conversions='ALL_NON_LOSSY,ALL_LOSSY'; (comment 1)

Comment by Elena Stepanova [ 2014-12-10 ]

> Problem 1
> ALTER TABLE .. FORCE does not change DECIMAL to NEWDECIMAL. Maybe it should
Only its really needed, with some really big warnings, or leave it for the user to fix, a large table alter can take several hours compared to what is normally a really quick mysql_update.

Yeah, but keep in mind that mysql_upgrade already performs ALTER TABLE .. FORCE when it encounters a table like that; it's just that this ALTER TABLE doesn't change the internal column type.

I guess its reverted to SBR method since its still mixed.

It should have if you removed the triggers that pt-online-schema-change created, but if you didn't I think it shouldn't have reverted to SBR. In the latter case it would be interesting to see how the binlog looks now.

Problem 3:
seg fault with set global slave_type_conversions='ALL_NON_LOSSY,ALL_LOSSY'

Right, thanks for reminding. And at some point I also had a debug assertion failure that I hopefully will be able to reproduce. Depending on what bar decides about the first two problems, the crash/assertion can either stay a part of this issue (if it gets fixed), or go into a separate one, since segfault should be fixed regardless the fate of the other problems.

Comment by Daniel Black [ 2014-12-10 ]

It should have if you removed the triggers that pt-online-schema-change created,

Yes I removed the triggers, then added replication filters, then started replication to bypass the issue.

In the latter case it would be interesting to see how the binlog looks now.

Not applicable. And enough interesting aspects already

On note there were other slaves, older and perhaps having an original old decimal table definition that didn't encounter this replication error.

Comment by Elena Stepanova [ 2014-12-10 ]

The segfault was an upstream issue, fixed in MySQL 5.5.34 – not sure exactly by which patch, but it was reproducible on older versions of MySQL and on MariaDB 5.5 up to revno 3975 (the merge of MySQL 5.5.34 into MariaDB 5.5), and disappeared after that. Now it returns the error instead:

               Last_SQL_Errno: 1678
               Last_SQL_Error: Can't create conversion table for table 'test.lightning'

which is understandable since the allowed conversions are defined explicitly ( http://dev.mysql.com/doc/refman/5.5/en/replication-features-differing-tables.html#replication-features-attribute-promotion ) and DECIMAL=>NEWDECIMAL is not one of them.

Generated at Thu Feb 08 07:18:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.