|
Is it possible that these connections at some point ran set timestamp = ...?
It would freeze the value.
|
|
no, there is no way that such query got ran there, plus
show global variables like 'timestamp'
doesn't return anything.
|
|
It's a local variable, please try show variables like 'timestamp', or just select @@timestamp (run it a couple of times to see if it changes); but doing it only makes sense inside those connections where the values are broken.
Meanwhile, trying to look for other explanations,
- are all connections we are talking about local?
- you've set 10.0.19-galera as the affected version, is it really what you are using? If it is, are you running a galera cluster, or a standalone server?
- can you run show variables from a connection where the time value is wrong, and from a connection where the time value, and attach both outputs?
- is there anything suspicious in the error log?
|
|
I can run those commands from inside the connections that are broken.
for such connections, show variables like 'timestamp' always return the same value.
- no, connections are not local.
- yes, it's galera 10.0.19, there is cluser of 3 machines but times are correct on all of them
- sure, I attached both. (
- no, I dont have anything there.
|
|
attached logs, show variables log of both, working and non-working connections.
|
|
Thanks.
The timestamp value is very round, it seems highly unlikely that it would just get stuck at such precise moment.
If the connections didn't set the timestamp directly, could it be that they ran a procedure which would set it, maybe by mistake?
Please consider the following example (overly simplified, but it can explain what I mean):
MariaDB [test]> create procedure pr()
|
-> begin
|
-> set timestamp = 1434363000;
|
-> select * from t1 where t > timestamp;
|
-> end |
|
Query OK, 0 rows affected (0.13 sec)
|
Calling this procedure will set the timestamp to the given value for the whole session (not just for the scope of the procedure), and from now on the functions based on it will be stuck.
You can try to run SHOW STATUS LIKE 'Com%' to see what commands these connections executed (hopefully they did not run FLUSH STATUS). Com_set_option value will show if any SET commands have been run. Of course, they could have been local variables as well, but at least it might give us a clue.
|
|
If you have additional information, please comment to re-open the report.
|
|
sorry for delay, we restarted server and thought it was fixed but happened again.
this time value is 1436786595.
SHOW STATUS LIKE 'Com%' show that Com_set_option = 1 but it's because first query after establishing connection is:
set names 'utf8'
it's the only query that gets executed about setting variables.
all of them are simple select/update/delete, (80% of them are just "select 1" for keepalive)
|
|
I have full query log for that connection, after removing selects updates and deletes there I only have connect, select db and set names 'utf8', so there should be something else I guess
|
|
more details: in show processlist() for that connection I have quite big time, which doesn't become 0 if I execute some command on that connection. for example, now I have
[<<"2211649">>,<<"db_name">>,<<"80.92.89.69:40643">>, <<"db_name">>,<<"Sleep">>,<<"800659">>,<<>>,null, <<"0.000">>]
for connection id 2211649, time is 800659, which is WAY more than timeout that I have set (7200), so it should have been disconnected.
plus, if I execute some command on that connection, that value does not become 0.
by the way, that value is not frozen, and it shows approximately time between real time and now() that its frozen on (on that connection, now() shows 2015-07-14 07:29:26, which is 800659 seconds from current time here.
checked logs, it was sending keepalive queries normally all the times, every 20 min.
and also, other connections are fine there, only 2 are corrupt,
|
|
Happened on our server twice. This week on Ubuntu 14.0.4 galera 10.0.20 (2 servers in cluster). It was 3-12 months in past for the first time (don't remember).
We don't use "set timestamp" command. Connections come from Connector/J 5.1.35, Apache Commons Pool (part of Tomcat 8.0.23). We don't use Docker.
It may be connected to Galera Cluster and ALTER TABLE command (or not). I have executed ALTER TABLE ~5 min before stucked time.
|
|
nirbhay_c, could you please look at it from Galera perspective, maybe you will have some ideas what's going on or how to investigate it further.
I don't remember ever having such complaints about stand-alone server, and now we have two, both for Galera 10.0, statistically it's insufficient yet noteworthy.
|
|
A very similar issue with no reproducible test case reported here :
https://bugs.launchpad.net/percona-server/+bug/1372501
|
|
Can't reproduce. I do not see how the timestamp would freeze for a connection besides SET TIMESTAMP.
|
|
Launchpad bug seems to be the same.
About reproducible test case. I don't think we will have one. I have seen this bug only twice in ~6 months. First time on 24.2.2015.
There is trigger mentioned on launchpad. It could be related to it. We have one BEFORE UPDATE and one AFTER INSERT. We also use temporary table in some commands. We don't use procedures, events, views or prepared statements.
|
|
Some thoughts out of my code inspection:
- Query start/end time does not actually get updated during query execution.
This shouldn't be the case according to my code analysis.
- SELECT NOW() is cached by some proxy and doesn't actually reach the server.
Only possible if all other queries in that connection are cached too, since INFORMATION_SCHEMA.PROCESSLIST also shows stale value.
- System call that loads current timestamp returns the same value in this thread (OS bug).
This might be the case: all reporters seem to have Ubuntu, most have 14.04. Can be verified with gdb.
- timestamp (and corresponding internal variables) is set after all internally.
Despite of SET timestamp, this can only be done by some third party loaded library/plugin (check galera?).
- timestamp (and corresponding internal variables) is set after all externally.
Doesn't seem to be the case, since Com_set_option=1.
- Some loss of precision during conversion of start_time.
Value of 1434363000.000000 looks very suspicious. But later it was claimed that value might be 1436786595 as well, which breaks this hypothesis.
- Query start/end time is adjusted internally.
This can only be done by some third party loaded library/plugin (check galera?).
So far I see no other reasonable ways to get persistent value out of SELECT NOW(). To get this issue solved we should eliminate all false assumptions.
|
|
We are seeing the same problem every few weeks in mariadb galera 10.0.17 with 3 nodes. We haven't been able to reproduce the problem in laboratory. Our environment is Red Hat Enterprise Linux Server release 6.5 kernel 2.6.32-431.el6.x86_64.
We don't use "SET TIMESTAMP=..." in our application, but my understanding is that the replication and galera logs are full of those statements, so we believe there's the cause of the problem.
Someone mentioned "ALTER table". We use it a lot too.
Someone mentioned large values of the time column in "show processlist". We see that too.
Some one else is reporting the same problem:
https://groups.google.com/forum/#!topic/codership-team/nwWSuVwwnU4
I hope any of this helps to narrow the problem.
|
|
Hello all,
We have the same issue few months ago. We have few connections stick in time - datetime getting functions returns always the same value until we restart the connection. The only resolution was to restart the connections. We do not have a success with reproducing it. I reported it @CoderShip mailing list and they narrow me to this Bug Report Issue.
If you need more info or we can assist with something useful please shoot it
We have long term running services with persistent connections to the databases. We do not use SET TIMESTAMP in our code.
We use FreeBSD as OS. Bellow are the version strings of mysql.
mysql Ver 14.14 Distrib 5.6.16, for FreeBSD9.2 (amd64) using EditLine wrapper
Server version: 5.6.16-log MySQL Community Server (GPL), wsrep_25.5.r4064
We spare time to search for cases like this, but there are only few reports with alike behavior:
https://bugs.mysql.com/bug.php?id=73999
https://bugs.launchpad.net/percona-server/+bug/1341606
https://www.percona.com/blog/2007/06/25/does-slow-query-log-logs-all-slow-queries/
Cheers
Rumen Palov
E-CARD LTD.
|
|
MariaDB, we seem to have identified the root cause for this bug and we have a fix:
commit 9dc0e58ac776e70e59f0dd6588e11bcc6f9765c0
|
Author: Daniele Sciascia <daniele.sciascia@galeracluster.com>
|
Date: Thu Jul 14 14:29:59 2016 +0200
|
|
MW-292 Reset timestamp after transaction replay
|
|
Transaction replay causes the THD to re-apply the replication
|
events from execution, using the same path appliers do. While
|
applying the log events, the THD's timestamp is set to the
|
timestamp of the event.
|
Setting the timestamp explicitly causes function NOW() to
|
always the timestamp that was set. To avoid this behavior we
|
reset the timestamp after replaying is done.
|
|
diff --git a/sql/sql_class.h b/sql/sql_class.h
|
index 5866621..f4a9c23 100644
|
--- a/sql/sql_class.h
|
+++ b/sql/sql_class.h
|
@@ -83,6 +83,7 @@ struct wsrep_thd_shadow {
|
ulong tx_isolation;
|
char *db;
|
size_t db_length;
|
+ struct timeval user_time;
|
};
|
#endif
|
class Reprepare_observer;
|
diff --git a/sql/wsrep_thd.cc b/sql/wsrep_thd.cc
|
index 9a64386..0778389 100644
|
--- a/sql/wsrep_thd.cc
|
+++ b/sql/wsrep_thd.cc
|
@@ -172,6 +172,8 @@ static void wsrep_prepare_bf_thd(THD *thd, struct wsrep_thd_shadow* shadow)
|
shadow->db = thd->db;
|
shadow->db_length = thd->db_length;
|
thd->reset_db(NULL, 0);
|
+
|
+ shadow->user_time = thd->user_time;
|
}
|
|
static void wsrep_return_from_bf_mode(THD *thd, struct wsrep_thd_shadow* shadow)
|
@@ -182,6 +184,7 @@ static void wsrep_return_from_bf_mode(THD *thd, struct wsrep_thd_shadow* shadow)
|
thd->net.vio = shadow->vio;
|
thd->variables.tx_isolation = shadow->tx_isolation;
|
thd->reset_db(shadow->db, shadow->db_length);
|
+ thd->user_time = shadow->user_time;
|
}
|
|
void wsrep_replay_transaction(THD *thd)
|
Test:
commit a0a0d45cce8b962ce9752aacb6b3b3c0468ea26e
|
Author: Philip Stoev <philip.stoev@galeracluster.com>
|
Date: Fri Jul 15 01:13:32 2016 -0700
|
|
Galera MTR Tests: Test case for MW-292 : NOW() returns stale timestamp after transaction replay
|
|
diff --git a/mysql-test/suite/galera/r/MW-292.result b/mysql-test/suite/galera/r/MW-292.result
|
new file mode 100644
|
index 0000000..f038f88
|
--- /dev/null
|
+++ b/mysql-test/suite/galera/r/MW-292.result
|
@@ -0,0 +1,30 @@
|
+CREATE TABLE rand_table (f1 FLOAT);
|
+CREATE TABLE t1 (f1 INTEGER PRIMARY KEY, f2 CHAR(1));
|
+INSERT INTO t1 VALUES (1, 'a');
|
+INSERT INTO t1 VALUES (2, 'a');
|
+SET AUTOCOMMIT=ON;
|
+START TRANSACTION;
|
+UPDATE t1 SET f2 = 'b' WHERE f1 = 1;
|
+SELECT * FROM t1 WHERE f1 = 2 FOR UPDATE;
|
+f1 f2
|
+2 a
|
+SET GLOBAL wsrep_provider_options = 'dbug=d,commit_monitor_enter_sync';
|
+COMMIT;;
|
+SET SESSION wsrep_sync_wait = 0;
|
+SET SESSION wsrep_on = 0;
|
+SET SESSION wsrep_on = 1;
|
+UPDATE t1 SET f2 = 'c' WHERE f1 = 2;
|
+SET GLOBAL wsrep_provider_options = 'dbug=';
|
+SET GLOBAL wsrep_provider_options = 'signal=commit_monitor_enter_sync';
|
+SELECT TIMEDIFF(SYSDATE(), NOW()) < 2;
|
+TIMEDIFF(SYSDATE(), NOW()) < 2
|
+1
|
+INSERT INTO rand_table VALUES (RAND()),(RAND()),(RAND()),(RAND()),(RAND());
|
+INSERT INTO rand_table VALUES (RAND()),(RAND()),(RAND()),(RAND()),(RAND());
|
+SELECT COUNT(DISTINCT f1) = 10 FROM rand_table;
|
+COUNT(DISTINCT f1) = 10
|
+1
|
+wsrep_local_replays
|
+1
|
+DROP TABLE t1;
|
+DROP TABLE rand_table;
|
diff --git a/mysql-test/suite/galera/t/MW-292.test b/mysql-test/suite/galera/t/MW-292.test
|
new file mode 100644
|
index 0000000..945d9f4
|
--- /dev/null
|
+++ b/mysql-test/suite/galera/t/MW-292.test
|
@@ -0,0 +1,79 @@
|
+#
|
+# MW-292 Reset timestamp after transaction replay
|
+#
|
+# We force transaction replay to happen and then we check that NOW() is not stuck in time.
|
+# As a bonus we also check that RAND() continues to return random values after replay
|
+#
|
+#
|
+
|
+--source include/galera_cluster.inc
|
+--source include/have_innodb.inc
|
+--source include/have_debug_sync.inc
|
+--source suite/galera/include/galera_have_debug_sync.inc
|
+
|
+--let $wsrep_local_replays_old = `SELECT VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_local_replays'`
|
+
|
+CREATE TABLE rand_table (f1 FLOAT);
|
+CREATE TABLE t1 (f1 INTEGER PRIMARY KEY, f2 CHAR(1));
|
+INSERT INTO t1 VALUES (1, 'a');
|
+INSERT INTO t1 VALUES (2, 'a');
|
+
|
+--connection node_1
|
+SET AUTOCOMMIT=ON;
|
+START TRANSACTION;
|
+
|
+UPDATE t1 SET f2 = 'b' WHERE f1 = 1;
|
+SELECT * FROM t1 WHERE f1 = 2 FOR UPDATE;
|
+
|
+# Block the commit
|
+--connect node_1a, 127.0.0.1, root, , test, $NODE_MYPORT_1
|
+--let $galera_sync_point = commit_monitor_enter_sync
|
+--source include/galera_set_sync_point.inc
|
+
|
+--connection node_1
|
+--send COMMIT;
|
+
|
+# Wait until commit is blocked
|
+--connection node_1a
|
+SET SESSION wsrep_sync_wait = 0;
|
+--source include/galera_wait_sync_point.inc
|
+
|
+# Issue a conflicting update on node #2
|
+--connection node_2
|
+UPDATE t1 SET f2 = 'c' WHERE f1 = 2;
|
+
|
+# Wait for both transactions to be blocked
|
+--connection node_1a
|
+--let $wait_condition = SELECT COUNT(*) = 1 FROM INFORMATION_SCHEMA.PROCESSLIST WHERE STATE = 'System lock';
|
+--source include/wait_condition.inc
|
+
|
+--let $wait_condition = SELECT COUNT(*) = 1 FROM INFORMATION_SCHEMA.PROCESSLIST WHERE STATE = 'init' AND INFO = 'COMMIT';
|
+--source include/wait_condition.inc
|
+
|
+# Unblock the commit
|
+--connection node_1a
|
+--source include/galera_clear_sync_point.inc
|
+--source include/galera_signal_sync_point.inc
|
+
|
+# Commit succeeds via replay
|
+--connection node_1
|
+--reap
|
+
|
+# Confirm that NOW() is not stuck in time relative to SYSDATE();
|
+--sleep 3
|
+SELECT TIMEDIFF(SYSDATE(), NOW()) < 2;
|
+
|
+INSERT INTO rand_table VALUES (RAND()),(RAND()),(RAND()),(RAND()),(RAND());
|
+INSERT INTO rand_table VALUES (RAND()),(RAND()),(RAND()),(RAND()),(RAND());
|
+
|
+SELECT COUNT(DISTINCT f1) = 10 FROM rand_table;
|
+
|
+# wsrep_local_replays has increased by 1
|
+--let $wsrep_local_replays_new = `SELECT VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_local_replays'`
|
+--disable_query_log
|
+--eval SELECT $wsrep_local_replays_new - $wsrep_local_replays_old = 1 AS wsrep_local_replays;
|
+--enable_query_log
|
+
|
+--connection node_2
|
+DROP TABLE t1;
|
+DROP TABLE rand_table;
|
|