|
set global slave_type_conversions='ALL_NON_LOSSY,ALL_LOSSY';
|
start slave;
|
141205 9:25:11 [ERROR] Slave SQL: Column 3 of table 'weather.lightning' cannot be converted from type 'decimal(0,?)' to type 'decimal(10,7)', Error_code: 1677
|
141205 9:25:11 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.046557' positio
|
n 30748070
|
141205 9:26:56 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.046557' at position 30748070, relay log '/var/mysql_log/mysqld-relay-bin.930329' position: 30
|
748354
|
141205 9:26:56 [ERROR] mysqld got signal 11 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 5.5.29-MariaDB-log
|
key_buffer_size=134217728
|
read_buffer_size=8388608
|
max_used_connections=12
|
max_threads=202
|
thread_count=1
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3443474 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x0x7fafec0008c0
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x7fafffb4ed98 thread_stack 0x48000
|
??:0(my_print_stacktrace)[0xa975ee]
|
??:0(handle_fatal_signal)[0x6d4fcc]
|
??:0(??)[0x329080f710]
|
??:0(Field_decimal::store(char const*, unsigned int, charset_info_st const*))[0x6cb812]
|
??:0(Field_decimal::reset())[0x6cc3f8]
|
??:0(create_virtual_tmp_table(THD*, List<Create_field>&))[0x5a19d1]
|
??:0(table_def::create_conversion_table(THD*, Relay_log_info*, TABLE*) const)[0x62f486]
|
??:0(table_def::compatible_with(THD*, Relay_log_info*, TABLE*, TABLE**) const)[0x62ffa9]
|
??:0(Rows_log_event::do_apply_event(Relay_log_info const*))[0x7aeb49]
|
??:0(apply_event_and_update_pos(Log_event*, THD*, Relay_log_info*))[0x50df22]
|
??:0(exec_relay_log_event(THD*, Relay_log_info*))[0x515c30]
|
??:0(handle_slave_sql)[0x516f44]
|
??:0(??)[0x32908079d1]
|
??:0(??)[0x32904e8b6d]
|
|
Trying to get some variables.
|
Some pointers may be invalid and cause the dump to abort.
|
Query (0x0): is an invalid pointer
|
Connection ID (thread ID): 126479
|
Status: NOT_KILLED
|
|
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_conditi
|
on_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,s
|
ubquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_j
|
oin_buffer_size=off,table_elimination=on,extended_keys=off
|
|
|
Hi Daniel,
What exactly pt-online-schema-change was executing?
I assume it is relevant somehow since you mentioned it; but I neither see the actual change it was performing, nor whether it was done on master or on slave. Could you please specify?
Thanks.
|
|
Was performed on the master
pt-online-schema-change --execute --set-vars binlog_format=ROW --check-interval=9 --max-lag=12 --chunk-time=5
|
--alter "MODIFY timestamp_utc DATETIME(3) NULL COMMENT 'UTC time (millisecond precision)'" D=weather,t=lightning
|
Other examples I saw indicate that the binlog entry in a insert into lightning and a corresponding insert into _lightning_new.
I then changed the slave directly with
pt-online-schema-change --execute --no-drop-old-table --recursion-method=none --set-vars sql_log_bin=0 --chunk-size=1000 --alter "MODIFY timestamp_utc DATETIME(3) NULL COMMENT 'UTC time (millisecond precision)'" D=weather,t=lightning,u=openquery,p=xxx,h=localhost
|
|
recreated the _lightning_new table
|
create table _lightning_new like lightning;
|
To enable the replication to continue. Even with these changes the row based replication entry wouldn't process and the same replication error occurred.
Other positions and their binlog decodes:
MariaDB [(none)]> set global sql_slave_skip_counter=1; start slave; select sleep(1); show slave status\G
|
Slave_IO_State: Waiting for master to send event
|
Master_Host: 192.168.124.35
|
Master_User: replication
|
Master_Port: 3306
|
Connect_Retry: 60
|
Master_Log_File: mysql-bin.046720
|
Read_Master_Log_Pos: 41006219
|
Relay_Log_File: mysqld-relay-bin.930329
|
Relay_Log_Pos: 30749956
|
Relay_Master_Log_File: mysql-bin.046557
|
Slave_IO_Running: Yes
|
Slave_SQL_Running: No
|
Replicate_Do_DB:
|
Replicate_Ignore_DB:
|
Replicate_Do_Table:
|
Replicate_Ignore_Table:
|
Replicate_Wild_Do_Table:
|
Replicate_Wild_Ignore_Table:
|
Last_Errno: 1677
|
Last_Error: Column 3 of table 'weather.lightning' cannot be converted from type 'decimal(0,?)' to type 'decimal(10,7)'
|
Skip_Counter: 0
|
Exec_Master_Log_Pos: 30749672
|
Relay_Log_Space: 17201077980
|
|
#141204 10:12:43 server id 1 end_log_pos 30749804 Table_map: `weather`.`lightning` mapped to number 177649
|
#141204 10:12:43 server id 1 end_log_pos 30749876 Table_map: `weather`.`_lightning_new` mapped to number 177646
|
#141204 10:12:43 server id 1 end_log_pos 30749958 Write_rows: table id 177649
|
#141204 10:12:43 server id 1 end_log_pos 30750023 Write_rows: table id 177646 flags: STMT_END_F
|
### INSERT INTO `weather`.`lightning`
|
### SET
|
### @1=484282188
|
### @2='GPATS'
|
### @3=2014-12-03 23:12:36
|
### @4=!! Don't know how to handle column type=0 meta=0 (0000)### INSERT INTO `weather`.`_lightning_new`
|
### SET
|
### @1=484282188
|
### @2='GPATS'
|
### @3=919636211-25-91 28:24:32
|
### @4=000000000
|
### @5=000000000
|
### @6=000028157.200000000
|
### @7='SB/+A1\x00\x00\x00\x00\x00\x00\x00\x03\x0f\x0c���\x0f\x00\x00\x00\x0a\x00\x00\x00\x07\x0a\x07\x0a\x01\x05\x0a\x004y`DMw\x0aMzEzIDEyOC41\x00\x00\x00\x00\x00\x00\x00�~BqI\x00\x0
|
0�~'
|
# at 30750023
|
#141204 10:12:43 server id 1 end_log_pos 30750050 Xid = 12786901928
|
COMMIT/*!*/;
|
set global sql_slave_skip_counter=1; start slave; select sleep(1); show slave status\G
|
|
*************************** 1. row ***************************
|
Slave_IO_State: Waiting for master to send event
|
Master_Host: 192.168.124.35
|
Master_User: replication
|
Master_Port: 3306
|
Connect_Retry: 60
|
Master_Log_File: mysql-bin.046720
|
Read_Master_Log_Pos: 74418347
|
Relay_Log_File: mysqld-relay-bin.930329
|
Relay_Log_Pos: 30750334
|
Relay_Master_Log_File: mysql-bin.046557
|
Slave_IO_Running: Yes
|
Slave_SQL_Running: No
|
Replicate_Do_DB:
|
Replicate_Ignore_DB:
|
Replicate_Do_Table:
|
Replicate_Ignore_Table:
|
Replicate_Wild_Do_Table:
|
Replicate_Wild_Ignore_Table:
|
Last_Errno: 1677
|
Last_Error: Column 3 of table 'weather.lightning' cannot be converted from type 'decimal(0,?)' to type 'decimal(10,7)'
|
Skip_Counter: 0
|
Exec_Master_Log_Pos: 30750050
|
Relay_Log_Space: 17234490108
|
|
#141204 10:12:43 server id 1 end_log_pos 30750182 Table_map: `weather`.`lightning` mapped to number 177649
|
#141204 10:12:43 server id 1 end_log_pos 30750254 Table_map: `weather`.`_lightning_new` mapped to number 177646
|
#141204 10:12:43 server id 1 end_log_pos 30750336 Write_rows: table id 177649
|
#141204 10:12:43 server id 1 end_log_pos 30750401 Write_rows: table id 177646 flags: STMT_END_F
|
### INSERT INTO `weather`.`lightning`
|
### SET
|
### @1=484282189
|
### @2='GPATS'
|
### @3=2014-12-03 23:12:36
|
### @4=!! Don't know how to handle column type=0 meta=0 (0000)### INSERT INTO `weather`.`_lightning_new`
|
### SET
|
### @1=484282189
|
### @2='GPATS'
|
### @3=919636211-25-91 28:24:32
|
### @4=000000000
|
### @5=000000000
|
### @6=000028412.200000000
|
Because I still had the _lightning_old table I created a table with create table lightning_test like _lightning_old and did a diff on the frm files and they where identical. I suspect that means that they where both in the new decimal format.
I ended up adding
set global replicate_ignore_table='weather.lightning,weather._lightning_new'; start slave;
|
Which successfully ignored all those tables and caught up.
|
|
attached: extract of binlogs on positions mentioned above from the relay log
|
|
from master:
Jun 21 2011 lightning.frm
| lightning |
CREATE TABLE `lightning` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`source` varchar(10) NOT NULL DEFAULT '',
`timestamp_utc` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`latitude` decimal(10,7) NOT NULL DEFAULT '0.0000000',
`longitude` decimal(10,7) NOT NULL DEFAULT '0.0000000',
`amperage` decimal(5,1) DEFAULT NULL,
`event_type` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `idx_lightning_1` (`source`,`timestamp_utc`,`latitude`,`longitude`)
) ENGINE=InnoDB AUTO_INCREMENT=489156411 DEFAULT CHARSET=latin1 MAX_ROWS=1000000000 ROW_FORMAT=DYNAMIC |
From slave:
Dec 5 12:17 lightning.frm
May 11 2014 _lightning_old.frm
date of lightning_old corresponds to about when the test server was created (from mysqldump from memory)
|
|
I've removed the replication filter and replication is flowing freely on updates to the lightning table. The cause of the replication fail seems to be some interaction between the original table and the trigger on the _lightning_new table.
|
|
So, there are three mysteries here (just describing them as intermediate notes so I don't forget).
First, the master-lightning.frm has columns with type DECIMAL.
It shouldn't happen if the table was initially created in 5.0.x – it has type NEWDECIMAL (as slave-lightning.frm has).
I went back to 5.0.27 and still it creates the table with NEWDECIMAL.
The mystery is how to get DECIMAL on 5.x.
Second, the master-lightning.frm shows MySQL version 5.2.6 (slave-lightning.frm shows 5.5.29 as expected). It means that table on master hasn't been upgraded for quite a while – which can be true, but then, why the problem only appeared now?
Third, according to the very first error, it must have happened before pt-online-schema-change did anything to the table. All versions of the script that I checked do the following (in the given order):
- create _table_new;
- alter _table_new;
- create after-update/delete/insert triggers from table to _table_new;
- insert into _table_new select from table;
- rename table to _table_old, _table_new to table;
- drop _table_old.
In the binlog we see that DELETE touched two tables, lightning and _lightning_new. It means that the DELETE happened after triggers were created, either before or after _lightning_new was populated (the next error, with doubles, suggests it was after, but we can't say for sure since the event is shown corrupted); but it certainly happened before lightning was overridden. And obviously since it didn't happen on master yet, it couldn't have happened on the slave. So, whatever type mismatch existed between the table on master and slave, it shouldn't have anything to do with pt-online-schema-change. But such a coincidence is almost unthinkable; and besides, the problem should have happened many times before. Hence the third mystery.
|
|
danblack,
Questions:
Is it possible at all that the table was initially created with MySQL 4.1, not 5.0?
The attached master-lightning.frm is the old table structure, without microseconds in timestamp_utc; and in your initial description, you mention two tables on master, lightning and _lightning_new.
But with the commandline for pt-online-schema-change that you quoted as executed on master, it shouldn't be possible – it should have renamed lightning to _lightning_old, _lightning_new to lightning and then should have dropped _lightning_old. Can you shed some light on it? Is it possible that pt-online-schema-change actually failed (partially failed) on the master?
Which version of pt-online-schema-change do you use?
Can you check the binlog (if you still have it) to see which parts of the schema change process made it to the log before the DELETE event that broke replication?
By pt-online-schema-change process I mean creation of _lightning_new, triggers, populating _lightning_new, renaming, dropping _lightning_old, and whatever happened in between.
|
|
> Is it possible at all that the table was initially created with MySQL 4.1, not 5.0?
> The mystery is how to get DECIMAL on 5.x.
I can't see any other realistic option other than the original server was before 5.0.3 . Differs from the client account however.
> master-lightning.frm shows MySQL version 5.2.6 (slave-lightning.frm shows 5.5.29 as expected). It means that table on master hasn't been upgraded for quite a while
yes.
> – which can be true, but then, why the problem only appeared now?
despite the binlogs events appearing corrupted by mysqlbinlog, it appears they contain two transactions, one on lightning, and one on _lightning_new
> The attached master-lightning.frm is the old table structure,.. Can you shed some light on it? Is it possible that pt-online-schema-change actually failed (partially failed) on the master?
Yes. it did. It was aborted and triggers etc where removed before it completed.
> version of pt-online-schema-change
pt-online-schema-change 2.1.9
(latest is 2.1.12 which i just updated too).
only upstream known bug is "Fixed bug 1340728: pt-online-schema-change doesn’t work with HASH indexes" (ref: http://www.percona.com/doc/percona-toolkit/2.2/release_notes.html#v2-2-10-released-2014-08-06
)
> Can you check the binlog (if you still have it) to see which parts of the schema change process made it to the log before the DELETE event that broke replication?
mysqlbinlog --base64-output=decode-rows --stop-position 30749956 --verbose mysqld-relay-bin.930329 | tail -n 300
#141204 10:12:43 server id 1 end_log_pos 30746281 Query thread_id=72014676 exec_time=0 error_code=0
|
SET TIMESTAMP=1417648363/*!*/;
|
BEGIN
|
/*!*/;
|
# at 30746565
|
#141204 10:12:43 server id 1 end_log_pos 30746492 Query thread_id=72014676 exec_time=0 error_code=0
|
SET TIMESTAMP=1417648363/*!*/;
|
DELETE FROM lightning
|
WHERE latitude = '-14.7705896'
|
AND longitude = '126.2087432'
|
AND source = 'GPATS'
|
AND timestamp_utc = '2009-11-16 06:34:56'
|
/*!*/;
|
# at 30746776
|
#141204 10:12:43 server id 1 end_log_pos 30746519 Xid = 12786901782
|
COMMIT/*!*/;
|
# at 30746803
|
#141204 10:12:43 server id 1 end_log_pos 30747188 Query thread_id=72021643 exec_time=0 error_code=0
|
SET TIMESTAMP=1417648363/*!*/;
|
SET @@session.sql_mode=1074266112/*!*/;
|
CREATE TABLE `weather`.`_lightning_new` (
|
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
|
`source` varchar(10) NOT NULL DEFAULT '',
|
`timestamp_utc` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
|
`latitude` decimal(10,7) NOT NULL DEFAULT '0.0000000',
|
`longitude` decimal(10,7) NOT NULL DEFAULT '0.0000000',
|
`amperage` decimal(5,1) DEFAULT NULL,
|
`event_type` varchar(10) DEFAULT NULL,
|
PRIMARY KEY (`id`),
|
UNIQUE KEY `idx_lightning_1` (`source`,`timestamp_utc`,`latitude`,`longitude`)
|
) ENGINE=InnoDB AUTO_INCREMENT=484282186 DEFAULT CHARSET=latin1 MAX_ROWS=1000000000 ROW_FORMAT=DYNAMIC
|
/*!*/;
|
# at 30747472
|
#141204 10:12:43 server id 1 end_log_pos 30747377 Query thread_id=72021643 exec_time=0 error_code=0
|
SET TIMESTAMP=1417648363/*!*/;
|
ALTER TABLE `weather`.`_lightning_new` MODIFY timestamp_utc DATETIME(3) NOT NULL COMMENT 'UTC time (millisecond precision)'
|
/*!*/;
|
# at 30747661
|
#141204 10:12:43 server id 1 end_log_pos 30747677 Query thread_id=72021643 exec_time=0 error_code=0
|
SET TIMESTAMP=1417648363/*!*/;
|
CREATE DEFINER=`dba`@`localhost` TRIGGER `pt_osc_weather_lightning_del` AFTER DELETE ON `weather`.`lightning` FOR EACH ROW DELETE IGNORE FROM `weather`.`_lightning_new` WHERE `weather`.`_lightning_new`.`id` <=> OLD.`id`
|
/*!*/;
|
# at 30747961
|
#141204 10:12:43 server id 1 end_log_pos 30747748 Query thread_id=72021573 exec_time=0 error_code=0
|
SET TIMESTAMP=1417648363/*!*/;
|
SET @@session.sql_mode=1073741824/*!*/;
|
BEGIN
|
/*!*/;
|
# at 30748032
|
#141204 10:12:43 server id 1 end_log_pos 30747776 Intvar
|
SET INSERT_ID=484282186/*!*/;
|
# at 30748060
|
#141204 10:12:43 server id 1 end_log_pos 30748043 Query thread_id=72021573 exec_time=0 error_code=0
|
SET TIMESTAMP=1417648363/*!*/;
|
insert into lightning (timestamp_utc, latitude, longitude, amperage, source, event_type) values ('2014-12-03 23:12:36.720', round(-13.2426187, 7), round(128.2046041, 7), round(-17.5, 1), 'GPATS', 'GS')
|
/*!*/;
|
# at 30748327
|
#141204 10:12:43 server id 1 end_log_pos 30748070 Xid = 12786901867
|
COMMIT/*!*/;
|
# at 30748354
|
#141204 10:12:43 server id 1 end_log_pos 30748141 Query thread_id=72014676 exec_time=0 error_code=0
|
SET TIMESTAMP=1417648363/*!*/;
|
BEGIN
|
/*!*/;
|
# at 30748425
|
# at 30748486
|
# at 30748558
|
#141204 10:12:43 server id 1 end_log_pos 30748202 Table_map: `weather`.`lightning` mapped to number 177647
|
#141204 10:12:43 server id 1 end_log_pos 30748274 Table_map: `weather`.`_lightning_new` mapped to number 177646
|
#141204 10:12:43 server id 1 end_log_pos 30748356 Delete_rows: table id 177647 flags: STMT_END_F
|
### DELETE FROM `weather`.`lightning`
|
### WHERE
|
### @1=76448027
|
### @2='GPATS'
|
### @3=2009-11-16 09:51:23
|
### @4=!! Don't know how to handle column type=0 meta=0 (0000)# at 30748640
|
#141204 10:12:43 server id 1 end_log_pos 30748383 Xid = 12786901894
|
COMMIT/*!*/;
|
# at 30748667
|
#141204 10:12:43 server id 1 end_log_pos 30748831 Query thread_id=72021643 exec_time=0 error_code=0
|
SET TIMESTAMP=1417648363/*!*/;
|
SET @@session.sql_mode=1074266112/*!*/;
|
CREATE DEFINER=`dba`@`localhost` TRIGGER `pt_osc_weather_lightning_upd` AFTER UPDATE ON `weather`.`lightning` FOR EACH ROW REPLACE INTO `weather`.`_lightning_new` (`id`, `source`, `timestamp_utc`, `latitude`, `longitude`, `amperage`, `event_type`) VALUES (NEW.`id`, NEW.`source`, NEW.`timestamp_utc`, NEW.`latitude`, NEW.`longitude`, NEW.`amperage`, NEW.`event_type`)
|
/*!*/;
|
# at 30749115
|
#141204 10:12:43 server id 1 end_log_pos 30748902 Query thread_id=72021573 exec_time=0 error_code=0
|
SET TIMESTAMP=1417648363/*!*/;
|
SET @@session.sql_mode=1073741824/*!*/;
|
BEGIN
|
/*!*/;
|
# at 30749186
|
#141204 10:12:43 server id 1 end_log_pos 30748930 Intvar
|
SET INSERT_ID=484282187/*!*/;
|
# at 30749214
|
#141204 10:12:43 server id 1 end_log_pos 30749197 Query thread_id=72021573 exec_time=0 error_code=0
|
SET TIMESTAMP=1417648363/*!*/;
|
insert into lightning (timestamp_utc, latitude, longitude, amperage, source, event_type) values ('2014-12-03 23:12:36.752', round(-13.2328134, 7), round(128.1822273, 7), round(-14.1, 1), 'GPATS', 'GS')
|
/*!*/;
|
# at 30749481
|
#141204 10:12:43 server id 1 end_log_pos 30749224 Xid = 12786901916
|
COMMIT/*!*/;
|
# at 30749508
|
#141204 10:12:43 server id 1 end_log_pos 30749672 Query thread_id=72021643 exec_time=0 error_code=0
|
SET TIMESTAMP=1417648363/*!*/;
|
SET @@session.sql_mode=1074266112/*!*/;
|
CREATE DEFINER=`dba`@`localhost` TRIGGER `pt_osc_weather_lightning_ins` AFTER INSERT ON `weather`.`lightning` FOR EACH ROW REPLACE INTO `weather`.`_lightning_new` (`id`, `source`, `timestamp_utc`, `latitude`, `longitude`, `amperage`, `event_type`) VALUES (NEW.`id`, NEW.`source`, NEW.`timestamp_utc`, NEW.`latitude`, NEW.`longitude`, NEW.`amperage`, NEW.`event_type`)
|
/*!*/;
|
DELIMITER ;
|
# End of log file
|
ROLLBACK /* added by mysqlbinlog */;
|
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
|
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
|
|
mysqlbinlog --base64-output=decode-rows --start-position 30749956 --verbose mysqld-relay-bin.930329 | more
|
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
|
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
|
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
|
DELIMITER /*!*/;
|
# at 30749956
|
#141204 10:12:43 server id 1 end_log_pos 30749743 Query thread_id=72021573 exec_time=0 error_code=0
|
SET TIMESTAMP=1417648363/*!*/;
|
SET @@session.pseudo_thread_id=72021573/*!*/;
|
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
|
SET @@session.sql_mode=1073741824/*!*/;
|
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
|
/*!\C latin1 *//*!*/;
|
SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.collation_server=8/*!*/;
|
SET @@session.lc_time_names=0/*!*/;
|
SET @@session.collation_database=DEFAULT/*!*/;
|
BEGIN
|
/*!*/;
|
# at 30750027
|
# at 30750088
|
# at 30750160
|
# at 30750242
|
#141204 10:12:43 server id 1 end_log_pos 30749804 Table_map: `weather`.`lightning` mapped to number 177649
|
#141204 10:12:43 server id 1 end_log_pos 30749876 Table_map: `weather`.`_lightning_new` mapped to number 177646
|
#141204 10:12:43 server id 1 end_log_pos 30749958 Write_rows: table id 177649
|
#141204 10:12:43 server id 1 end_log_pos 30750023 Write_rows: table id 177646 flags: STMT_END_F
|
### INSERT INTO `weather`.`lightning`
|
### SET
|
### @1=484282188
|
### @2='GPATS'
|
### @3=2014-12-03 23:12:36
|
### @4=!! Don't know how to handle column type=0 meta=0 (0000)### INSERT INTO `weather`.`_lightning_new`
|
### SET
|
### @1=484282188
|
### @2='GPATS'
|
### @3=919636211-25-91 28:24:32
|
### @4=000000000
|
### @5=000000000
|
### @6=000028157.200000000
|
### @7=***Corrupted replication event was detected. Not printing the value***
|
# at 30750307
|
#141204 10:12:43 server id 1 end_log_pos 30750050 Xid = 12786901928
|
COMMIT/*!*/;
|
# at 30750334
|
#141204 10:12:43 server id 1 end_log_pos 30750121 Query thread_id=72021573 exec_time=0 error_code=0
|
SET TIMESTAMP=1417648363/*!*/;
|
BEGIN
|
/*!*/;
|
# at 30750405
|
# at 30750466
|
# at 30750538
|
# at 30750620
|
#141204 10:12:43 server id 1 end_log_pos 30750182 Table_map: `weather`.`lightning` mapped to number 177649
|
#141204 10:12:43 server id 1 end_log_pos 30750254 Table_map: `weather`.`_lightning_new` mapped to number 177646
|
#141204 10:12:43 server id 1 end_log_pos 30750336 Write_rows: table id 177649
|
#141204 10:12:43 server id 1 end_log_pos 30750401 Write_rows: table id 177646 flags: STMT_END_F
|
### INSERT INTO `weather`.`lightning`
|
### SET
|
### @1=484282189
|
### @2='GPATS'
|
### @3=2014-12-03 23:12:36
|
### @4=!! Don't know how to handle column type=0 meta=0 (0000)### INSERT INTO `weather`.`_lightning_new`
|
### SET
|
### @1=484282189
|
### @2='GPATS'
|
### @3=919636211-25-91 28:24:32
|
### @4=000000000
|
### @5=000000000
|
### @6=000028412.200000000
|
### @7=***Corrupted replication event was detected. Not printing the value***
|
# at 30750685
|
#141204 10:12:43 server id 1 end_log_pos 30750428 Xid = 12786901939
|
COMMIT/*!*/;
|
# at 30750712
|
#141204 10:12:43 server id 1 end_log_pos 30750499 Query thread_id=72021661 exec_time=0 error_code=0
|
SET TIMESTAMP=1417648363/*!*/;
|
BEGIN
|
/*!*/;
|
# at 30750783
|
#141204 10:12:43 server id 1 end_log_pos 30750527 Intvar
|
SET INSERT_ID=36453078/*!*/;
|
# at 30750811
|
#141204 10:12:43 server id 1 end_log_pos 30751012 Query thread_id=72021661 exec_time=0 error_code=0
|
use `weather`/*!*/;
|
SET TIMESTAMP=1417648363.748530/*!*/;
|
SET @@session.time_zone='SYSTEM'/*!*/;
|
INSERT INTO aviation_obs
|
....
|
# at 30751296
|
#141204 10:12:43 server id 1 end_log_pos 30751040 Intvar
|
SET INSERT_ID=310074628/*!*/;
|
# at 30751324
|
#141204 10:12:43 server id 1 end_log_pos 30751392 Query thread_id=72021661 exec_time=0 error_code=0
|
SET TIMESTAMP=1417648363.750896/*!*/;
|
INSERT INTO aviation_obs_elements
|
...
|
# at 30755084
|
#141204 10:12:43 server id 1 end_log_pos 30754871 Query thread_id=72014676 exec_time=0 error_code=0
|
SET TIMESTAMP=1417648363/*!*/;
|
BEGIN
|
/*!*/;
|
# at 30755155
|
# at 30755216
|
# at 30755288
|
#141204 10:12:43 server id 1 end_log_pos 30754932 Table_map: `weather`.`lightning` mapped to number 177649
|
#141204 10:12:43 server id 1 end_log_pos 30755004 Table_map: `weather`.`_lightning_new` mapped to number 177646
|
#141204 10:12:43 server id 1 end_log_pos 30755086 Delete_rows: table id 177649 flags: STMT_END_F
|
### DELETE FROM `weather`.`lightning`
|
### WHERE
|
### @1=76432279
|
### @2='GPATS'
|
### @3=2009-11-16 06:45:03
|
### @4=!! Don't know how to handle column type=0 meta=0 (0000)# at 30755370
|
#141204 10:12:43 server id 1 end_log_pos 30755113 Xid = 12786902038
|
COMMIT/*!*/;
|
# at 30755397
|
|
....
|
|
|
|
|
Okay, it starts making sense now.
Lets assume for the time being that the table was created in 4.x (it might still not be true, but then there must be a hole in 5.x that allows creating DECIMAL columns; so it would be very interesting to make sure with absolute certainty that the table origins from 5.0 – in this case we need to search for the hole).
Mystery 1 from the earlier comment:
So, if the table origins from 4.1, the appearance of DECIMAL is natural.
The fact that it survived as DECIMAL till now has already been explained – even mysql_upgrade or ALTER don't change that.
Mysteries 2 and 3:
The table on the master has version 5.2.6 which is apparently last time it was ALTER-ed.
On the other hand, the table on the slave has version 5.5.29, and both slave-lightning and slave-_lightning_old have NEWDECIMAL. Most likely the slave was initialized at some point on a 5.x version, e.g. from a master dump, so the tables were created with new fields. So, the type difference has been there for quite a while.
The problem only appeared now because apparently the server uses binlog_format=mixed (is it so?). While it was operating normally, updates to the `lightning` table were SBR-safe and thus were written to the binary log in the statement format (see the binlog fragment in the last comment). They were applied as such on the slave, internal type difference didn't matter, and the life went on.
But as soon as pt-online-schema-change created triggers on the table, the same updates became unsafe for SBR, and they started getting converted into the row format. And row format is where the problem with type mismatch shows up.
And for even more clarification: why exactly pt-online-schema-change failed? What was the error?
|
|
The "what's next" part.
The workaround for the problem is pretty obvious: do the job that pt-online-schema-change was trying to do – create a new table via the full create table statement (not via CREATE TABLE .. LIKE), alter it as you need, populate from the old table, and then switch to the new one.
For actual fixing, I see two points of misbehavior here. I don't know which of them can and must be fixed, so I will reassign it to the datatype expert bar to make the decision.
Problem 1: the deprecated datatype does not get upgraded.
When mysql_upgrade from a 5.x version checks the table taken from 4.1, it does actually notice that something is seriously wrong with it, and says that ALTER TABLE .. FORCE must be performed (and it actually performs it). However, ALTER TABLE .. FORCE does not change DECIMAL to NEWDECIMAL. Maybe it should?
Problem 2: When a row event which was written for DECIMAL is applied to NEWDECIMAL, the described error occurs. It doesn't seem reasonable, I suppose the slave can handle it better.
|
|
> Most likely the slave was initialized at some point on a 5.x version, e.g. from a master dump
Yes it was.
> why exactly pt-online-schema-change failed? What was the error?
It was user aborted due to slave errors.
> server uses binlog_format=mixed (is it so?
Yes. In fact this in very new too (last 1-2 weeks ago).
> Problem 1
> ALTER TABLE .. FORCE does not change DECIMAL to NEWDECIMAL. Maybe it should
Only its really needed, with some really big warnings, or leave it for the user to fix, a large table alter can take several hours compared to what is normally a really quick mysql_update.
> Problem 2: When a row event which was written for DECIMAL is applied to NEWDECIMAL, the described error occurs. It doesn't seem reasonable, I suppose the slave can handle it better.
Especially considering its handing it now fine. I guess its reverted to SBR method since its still mixed.
Problem 3:
seg fault with set global slave_type_conversions='ALL_NON_LOSSY,ALL_LOSSY'; (comment 1)
|
> Problem 1
> ALTER TABLE .. FORCE does not change DECIMAL to NEWDECIMAL. Maybe it should
Only its really needed, with some really big warnings, or leave it for the user to fix, a large table alter can take several hours compared to what is normally a really quick mysql_update.
Yeah, but keep in mind that mysql_upgrade already performs ALTER TABLE .. FORCE when it encounters a table like that; it's just that this ALTER TABLE doesn't change the internal column type.
I guess its reverted to SBR method since its still mixed.
It should have if you removed the triggers that pt-online-schema-change created, but if you didn't I think it shouldn't have reverted to SBR. In the latter case it would be interesting to see how the binlog looks now.
Problem 3:
seg fault with set global slave_type_conversions='ALL_NON_LOSSY,ALL_LOSSY'
Right, thanks for reminding. And at some point I also had a debug assertion failure that I hopefully will be able to reproduce. Depending on what bar decides about the first two problems, the crash/assertion can either stay a part of this issue (if it gets fixed), or go into a separate one, since segfault should be fixed regardless the fate of the other problems.
|
It should have if you removed the triggers that pt-online-schema-change created,
Yes I removed the triggers, then added replication filters, then started replication to bypass the issue.
In the latter case it would be interesting to see how the binlog looks now.
Not applicable. And enough interesting aspects already 
On note there were other slaves, older and perhaps having an original old decimal table definition that didn't encounter this replication error.
|
|
The segfault was an upstream issue, fixed in MySQL 5.5.34 – not sure exactly by which patch, but it was reproducible on older versions of MySQL and on MariaDB 5.5 up to revno 3975 (the merge of MySQL 5.5.34 into MariaDB 5.5), and disappeared after that. Now it returns the error instead:
Last_SQL_Errno: 1678
|
Last_SQL_Error: Can't create conversion table for table 'test.lightning'
|
which is understandable since the allowed conversions are defined explicitly ( http://dev.mysql.com/doc/refman/5.5/en/replication-features-differing-tables.html#replication-features-attribute-promotion ) and DECIMAL=>NEWDECIMAL is not one of them.
|