Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.1.22
-
Debian Wheezy, amd64
Description
I have a MariaDB 10.1.22 server as a row-based replication slave from a MariaDB 5.5.55 server. I believe that the problem is internal to the MariaDB 10.1.22 server.
A few rows on the slave have a TIMESTAMP column which is storing the wrong value.
Both systems have their system timezones set to America/Chicago. My application will set the MariaDB session timezone appropriately for whatever it's doing. This ranges across the timezones in the USA, for example, EST5EDT, CST6CDT, MST, etc. The affected rows all would have had their session timezones set to PST8PDT.
Percona's pt-table-checksum identified these inconsistent rows. At first glance, there is no problem. I'll take as my example the row with ID 88809874. Selecting the "created" column (type TIMESTAMP) for that row shows '2016-11-06 01:01:27' both on the master and the slave. Fine.
But that's with the default timezone (Central). When I switch my session to PST8PDT and look again, I see: '2016-11-06 00:01:27' on the master and '2016-11-05 23:01:27' on the slave. [The master's version, shortly after midnight Pacific time, is correct.]
UNIX_TIMESTAMP(created) gives 1478415687 on the master and 1478412087 on the slave. One hour off.
I attempted to fix it by updating the master to increment the created timestamp by one second. Since I'm doing row-based replication, that should cause the right value to propagate to the slave. But it doesn't: on both master and slave the value is incremented by one second, but the hour offset remains.
I checked the master binary log both on the server and the relay log on the slave. In both logs, the created column's value goes from 1478415687 to 1478415688. But SELECT UNIX_TIMESTAMP(created) on the slave still gives 1478412088.
This is why I believe the problem is not related to the master: the relay log does have the correct UNIX timestamp.
The system's timezone (America/Chicago) was undergoing a Daylight Saving shift during that timestamp. '2016-11-06 01:01:28' is ambiguous in the system's timezone: it could refer to either 1478415688 or 1478412088. But in the session timezone, PST8PDT, the correct UNIX time (1478415688) is '2016-11-06 00:01:28', which is unambiguous.
It looks as though the fact that the system's time was ambiguous is "leaking" into the handling of the UNIX timestamp given via replication. But now I'm into the realm of speculation.
This is only happening to a handful of rows out of countless rows that would seem to fit all the criteria. It currently seems to be an issue in 7 rows of one table, and 27 rows of another. I could fix all these problem by setting the timestamp manually on the slave, but I have no problem leaving them as-is in case it's helpful to track down what's going on.
Please let me know what other information I can provide.
Attachments
Issue Links
- relates to
-
MDEV-11895 NO_ZERO_DATE affects timestamp values without any warnings
-
- Closed
-
Here is what I think happened, which explains most of the story above.
Episode 1: Master, 2016-11-06, some time after 1 AM
MariaDB [test]> select @@time_zone;
+-----------------+
| @@time_zone |
+-----------------+
| America/Chicago |
+-----------------+
1 row in set (0.00 sec)
set timestamp = 1478415687; # back to past
MariaDB [test]> create table t1 (t timestamp not null default current_timestamp);
Query OK, 0 rows affected (0.02 sec)
MariaDB [test]> insert into t1 values (null);
Query OK, 1 row affected (0.03 sec)
MariaDB [test]> select t, unix_timestamp(t) from t1;
+---------------------+-------------------+
| t | unix_timestamp(t) |
+---------------------+-------------------+
| 2016-11-06 01:01:27 | 1478415687 |
+---------------------+-------------------+
1 row in set (0.00 sec)
Episode 2: Master binary log, right afterwards
SET TIMESTAMP=1478415687/*!*/;
...
create table t1 (t timestamp not null default current_timestamp)
/*!*/;
# at 372
#161106 1:01:27 server id 1 end_log_pos 440 Query thread_id=12 exec_time=15445981 error_code=0
SET TIMESTAMP=1478415687/*!*/;
BEGIN
/*!*/;
# at 440
# at 481
#161106 1:01:27 server id 1 end_log_pos 481 Table_map: `test`.`t1` mapped to number 55
#161106 1:01:27 server id 1 end_log_pos 515 Write_rows: table id 55 flags: STMT_END_F
### INSERT INTO `test`.`t1`
### SET
### @1=1478415687
# at 515
#161106 1:01:27 server id 1 end_log_pos 542 Xid = 244
COMMIT/*!*/;
Episode 3: Slave, same time, after catching up with master
MariaDB [test]> select @@time_zone;
+-----------------+
| @@time_zone |
+-----------------+
| America/Chicago |
+-----------------+
1 row in set (0.00 sec)
MariaDB [test]> select t, unix_timestamp(t) from t1;
+---------------------+-------------------+
| t | unix_timestamp(t) |
+---------------------+-------------------+
| 2016-11-06 01:01:27 | 1478412087 |
+---------------------+-------------------+
1 row in set (0.00 sec)
So, no special magic is involved, and the timestamp already differs. In fact, it's not that easy to squeeze 1478415687 into the table. If instead of relying on the default value on the master, we would try to insert it explicitly, we'd see the difference on the master already:
Episode 4: Master again
MariaDB [test]> create table t2 like t1;
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> insert into t2 values(from_unixtime('1478415687'));
Query OK, 1 row affected (0.01 sec)
MariaDB [test]> select t, unix_timestamp(t) from t2;
+---------------------+-------------------+
| t | unix_timestamp(t) |
+---------------------+-------------------+
| 2016-11-06 01:01:27 | 1478412087 |
+---------------------+-------------------+
1 row in set (0.00 sec)
So, the value 1478412087 inserted on the slave in the episode 3 is not surprising. As you said, indeed, 1478415687 and 1478412087 in this timezone mean the same time.
Why mysqlbinlog shows 1478415687 while in fact 1478412087 is inserted – it's a question to internals, my guess is there are too many conversions from/to UNIX timestamp happening on the way. I'll pass it over to serg to decide whether it's wrong, maybe it is.
One thing that you mentioned that I couldn't reproduce right away was this:
I was trying to do it via UPDATE t1 SET t = t+1.
Depending on the combination of the current time/zones, it would either cause the master to have 1478412088 (and the slave, consequently), or it would make the replication abort due to the "row not found" error. However, there are many combinations and more than one way to increment the timestamp, it's quite possible that you did it the way that caused the result that you've described. In any case, it's not particularly important since the problem has started long time before that.