[MDEV-12672] Replicated TIMESTAMP fields given wrong value near DST change Created: 2017-05-03 Updated: 2019-04-29 Resolved: 2017-09-21 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Time zones |
| Affects Version/s: | 10.1.22 |
| Fix Version/s: | 10.1.27, 10.2.9 |
| Type: | Bug | Priority: | Major |
| Reporter: | Xan Charbonnet | Assignee: | Sergei Golubchik |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | replication, timezones | ||
| Environment: |
Debian Wheezy, amd64 |
||
| Issue Links: |
|
||||||||
| Description |
|
I have a MariaDB 10.1.22 server as a row-based replication slave from a MariaDB 5.5.55 server. I believe that the problem is internal to the MariaDB 10.1.22 server. A few rows on the slave have a TIMESTAMP column which is storing the wrong value. Both systems have their system timezones set to America/Chicago. My application will set the MariaDB session timezone appropriately for whatever it's doing. This ranges across the timezones in the USA, for example, EST5EDT, CST6CDT, MST, etc. The affected rows all would have had their session timezones set to PST8PDT. Percona's pt-table-checksum identified these inconsistent rows. At first glance, there is no problem. I'll take as my example the row with ID 88809874. Selecting the "created" column (type TIMESTAMP) for that row shows '2016-11-06 01:01:27' both on the master and the slave. Fine. But that's with the default timezone (Central). When I switch my session to PST8PDT and look again, I see: '2016-11-06 00:01:27' on the master and '2016-11-05 23:01:27' on the slave. [The master's version, shortly after midnight Pacific time, is correct.] UNIX_TIMESTAMP(created) gives 1478415687 on the master and 1478412087 on the slave. One hour off. I attempted to fix it by updating the master to increment the created timestamp by one second. Since I'm doing row-based replication, that should cause the right value to propagate to the slave. But it doesn't: on both master and slave the value is incremented by one second, but the hour offset remains. I checked the master binary log both on the server and the relay log on the slave. In both logs, the created column's value goes from 1478415687 to 1478415688. But SELECT UNIX_TIMESTAMP(created) on the slave still gives 1478412088. This is why I believe the problem is not related to the master: the relay log does have the correct UNIX timestamp. The system's timezone (America/Chicago) was undergoing a Daylight Saving shift during that timestamp. '2016-11-06 01:01:28' is ambiguous in the system's timezone: it could refer to either 1478415688 or 1478412088. But in the session timezone, PST8PDT, the correct UNIX time (1478415688) is '2016-11-06 00:01:28', which is unambiguous. It looks as though the fact that the system's time was ambiguous is "leaking" into the handling of the UNIX timestamp given via replication. But now I'm into the realm of speculation. This is only happening to a handful of rows out of countless rows that would seem to fit all the criteria. It currently seems to be an issue in 7 rows of one table, and 27 rows of another. I could fix all these problem by setting the timestamp manually on the slave, but I have no problem leaving them as-is in case it's helpful to track down what's going on. Please let me know what other information I can provide. |
| Comments |
| Comment by Elena Stepanova [ 2017-06-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here is what I think happened, which explains most of the story above.
So, no special magic is involved, and the timestamp already differs. In fact, it's not that easy to squeeze 1478415687 into the table. If instead of relying on the default value on the master, we would try to insert it explicitly, we'd see the difference on the master already:
So, the value 1478412087 inserted on the slave in the episode 3 is not surprising. As you said, indeed, 1478415687 and 1478412087 in this timezone mean the same time. Why mysqlbinlog shows 1478415687 while in fact 1478412087 is inserted – it's a question to internals, my guess is there are too many conversions from/to UNIX timestamp happening on the way. I'll pass it over to serg to decide whether it's wrong, maybe it is. One thing that you mentioned that I couldn't reproduce right away was this:
I was trying to do it via UPDATE t1 SET t = t+1. Depending on the combination of the current time/zones, it would either cause the master to have 1478412088 (and the slave, consequently), or it would make the replication abort due to the "row not found" error. However, there are many combinations and more than one way to increment the timestamp, it's quite possible that you did it the way that caused the result that you've described. In any case, it's not particularly important since the problem has started long time before that. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Xan Charbonnet [ 2017-06-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Elena, Thank you very much for taking a look at this! The extra bit that I think your analysis is missing is that the session time zone was "PST8PDT" when this timestamp was inserted. So in the session timezone, 1478415687 is not a weird timestamp, and there shouldn't be any problem squeezing that value into a table. The problem seems to be a result of the system's timezone being weird at that moment. I agree that 1478415687 might be hard to squeeze into a table in "America/Chicago"/"CST6CDT". But in "PST8PDT", it shouldn't be. Or am I wrong? For the part where I was unable to fix it by incrementing the timestamp: I'm not sure that's so easily dismissed. It sounds to me like it may actually be the same problem happening again. If the binlog contains a specific UNIX timestamp, and what gets recorded in the slave is a different UNIX timestamp, simply because the system was undergoing a DST change at the time, then something is very weird and wrong. And it sounds almost exactly like the "main" problem. I was updating the timestamp via "created = created + INTERVAL 1 SECOND". But I don't think that should matter, as long as the row-based binlog shows the correct UNIX time, which it does. It's the slave that's doing something weird with it. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2017-06-16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hmm, I cannot repeat it. Here's what I tried:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Xan Charbonnet [ 2017-06-22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello Sergei, Thank you for looking into this. I also can't re-create this problem. And for the most part, it didn't take place: virtually all timestamps in this situation were handled correctly. But there are something like a couple of dozen rows which were bitten by this behavior. And not only that, but the behavior continues: when the binary log tells the slave to update to a new timestamp (a specific timestamp, not an offset), then it still handles the timestamp incorrectly. That's what makes me think it's worth digging into further. I can fix my slave by manually setting the timestamp, and that would very likely be the end of my problem. I'm leaving it unfixed for now in case it's helpful for diagnosing this. Please let me know what other information I can provide, or if you don't think it's worth investigating further, I'll just fix my data and move on. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2017-06-23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It appears to be important which server versions participate in replication. 5.5 => 10.1 and 10.0 => 10.1 exhibit the problem; 5.5 => 5.5, 5.5 => 10.0, 10.0 => 10.1 and 10.1 => 10.1 don't. I might have made a mistake in one or two combinations, but there is certainly a dependency on the versions. My scenario below is for 5.5 => 10.1 combination, as in the initial description, but with current 5.5 and 10.1 builds. Maybe some steps are excessive, but executing all of them seems to trigger the result reliably.
Please note that with this scenario, even though we set PST8PDT on the master both times, it's not seen in the binary log:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2017-08-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've also repeated it by running this in 5.5:
And loading the binlog into 10.1 with
What happens here, 5.5 creates a table with MYSQL_TYPE_TIMESTAMP field, Field_timestamp. But 10.1 creates for the same CREATE TABLE statement a MYSQL_TYPE_TIMESTAMP2 field, Field_timestampf. And table_def::compatible_with() notices that field types differ and creates a conversion table. When a conversion table exists, unpack_row() uses Copy_field interface, and do_field_temporal() works via MYSQL_TIME structure. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2017-09-19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Sent review suggestion to maria-developers@ |