|
To add to my post, I know Galera cluster doesn't support unlock tables, But I have not used it yet and I don't know why it query runs
|
|
Hi, unfortunately it is it is not feasible to troubleshoot such issue with amount of provided information.
To start troubleshooting of 'coonections stuck' we usually need at last:
1. Full Error log from all nodes
2. Exact commands executed during upgrade (if applicable)
3. Output of SQL commands while issue is present:
show global variables;
show global status;
show engine innodb status\G
show processlist;
do sleep(5);
show global status;
show engine innodb status\G
show processlist;
4. Additionally - basic OS stats may help to detect eventual OS issues, at least:
top -cbn1
df -h
5. gdb stack traces provide a lot of info and is the most valuable way to troubleshoot such problems:
gdb -ex "set pagination 0" -ex "thread apply all bt" \ --batch -p $(pidof mysqld) >> stack.txt
gdb -ex "set pagination 0" -ex "thread apply all bt" \ --batch -p $(pidof mysqld) >> stack.txt
gdb -ex "set pagination 0" -ex "thread apply all bt" \ --batch -p $(pidof mysqld) >> stack.txt
gdb -ex "set pagination 0" -ex "thread apply all bt" \ --batch -p $(pidof mysqld) >> stack.txt
So - no much help here, but if you provide at least Error log - we may try to see if any related messages are there.
|
|
Hi Andrii.
I checked error logs and there isn't any warning,notice or error about this query in error log.
One of the query that stuck in this state is :
INSERT INTO `posts_current` (`id`, `url`, `text`, `time`, `user_id`, `verb`, `replies_count`, `plusoners_count`, `resharers_count`, `influence`) VALUES ('z12rcbvh4o2ljj1x504cfl0xdy2mftmipa0', 'https://plus.google.com/112423312294961736109/posts/UZJrbccHpTX', 'هر دو تاشون به من میان', '2017-09-01 04:28:33', '112423312294961736109', 'post', '0', '0', '0', '1.2') ON DUPLICATE KEY UPDATE replies_count = VALUES (replies_count) , plusoners_count = VALUES(plusoners_count) , resharers_count = VALUES(resharers_count) , influence = VALUES(influence)
|
I attached Number 3 ( Output of SQL commands) and number 4 (basic OS stats may)
|
|
I think it's related to "ON DUPLICATE KEY UPDATE"
|
|
Please look at picture number 2.
I realized today that other than this state(Unlocking tables), there are other state that cause query stuck.
All of these are Insert query with ON DUPLICATE KEY UPDATE
|
|
Hi Hamoon,
The provided outputs in response.sql don't indicate any query or connection stuck.
The state of connection '1651519' (with transaction id 25976454509) doesn't change 'Unlocking tables', but if you look closely - the queries `INSERT INTO posts_current` have different values.
So, at most I can conclude that InnoDB is overloaded because of connection 25942000280, but no 'query stuck' problems are observed :
---TRANSACTION 25942000280, ACTIVE 30327 sec
|
MySQL thread id 442596, OS thread handle 0x7faa0def1700, query id 180281015 **.**.**.60 root Sending data
|
SELECT /*!40001 SQL_NO_CACHE */ * FROM `jsons`
|
That also should mean that connection 1651519 do change its state, but (because of overload?) the outputs always show only 'Unlocking tables'
|
|
Oh, I'm so sorry Andrii
yes I manually edited all of queries to hidden original data but I can send original data for only you(But i don't know how)
So z12rcbvh4o2ljj1x504cfl0xdy2mftmipa0 and z12rcbvh4o2la1x504cfl0xdy2mftmipa0 are same and I mistakenly changed two different values for them to hide original data.
And about Transaction Id 25942000280
It was for mysqldump to backup of our data and It is done now but another queries that I attached (2.png) are still alive.
|
|
Thank you for confirmation and providing detailed outputs. With all this information I can conclude that most probably some problem indeed present, but unfortunately we don't have any solution or exact reason so far.
Things which may help to move forward:
1. You mentioned that the problem occurs 'sometimes'. Could you provide more specific frequency - e.g. 'once a week / once a month / few times per day / etc'?
2. Please explain how you usually recover from the problem state? E.g. do you switch over write load to one of read nodes?
3. gdb outputs mentioned earlier indeed will help the most . I understand that you may hesitate to run gdb in production and I cannot guarantee complete safety, but maybe we can collect them as part of switchover ?
In my understanding it will be the best if you collect them before attempting to KILL the queries and also afterward.
4. feel free to collect the same outputs as in response.sql whenever issue occurs - maybe we will have chance to spot some pattern when the problem arises.
|
|
Thank you Andrii.
About your questions
1- I think it's related to MariaDB 10.2 because our data were about 3 years on MariaDb 10.0 and 10.1 without any problem with this configuration. But since when we upgrated to mariadb 10.2 this problem occur. This problem occurs very soon when we start mariadb because we have heavy write concurrency.
2-when this problem happen, I can't do anything for this session such as kill session . Because it's still alive, even when I try too kill it. But database is work normaly and i can write to this table with new session without any problem.
3-I don't know gdb and in fact I afraid to exec this command in production.
4- I'm going to run show innodb status and attach the output it.
Sorry for my English 
Thank you
|
|
HamoonDBA , your English is good, just it is still not clear how your system recovers from the problem.
I understand that Server remains operational and KILL command doesn't help - but for how long connection remains in 'Unlocking tables' ? E.g. is it hanging for few hours after KILL or few days or weeks or until you restart Server? (how long it usually takes before restart then?)
|
|
Thank you Andrii
Ok. We have three nodes in our Cluster.
When this problem happen it is remains until restart Server. The important point is that I can't restart MariaDB gracefully because Mariadb Stuck for this query to complete kill it so I have to kill mariadb process and then restart again,because I couldn't restart gracefully after restart the node is going to SST method.
Two of the nodes (node1 and node2) are stop now because I want to solve the problem then join two nodes again to cluster.
But in the last on node3, after this problem happened again, after three days the node was crash with this Error:
2017-09-05 20:20:40 0x7faa3381e700 InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.8/storage/innobase/dict/dict0stats.cc line 1572
|
InnoDB: Failing assertion: offsets_rec != NULL
|
InnoDB: We intentionally generate a memory trap.
|
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
|
InnoDB: If you get repeated assertion failures or crashes, even
|
InnoDB: immediately after the mysqld startup, there may be
|
InnoDB: corruption in the InnoDB tablespace. Please refer to
|
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
|
InnoDB: about forcing recovery.
|
170905 20:20:40 [ERROR] mysqld got signal 6 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.2.8-MariaDB-log
|
key_buffer_size=134217728
|
read_buffer_size=131072
|
max_used_connections=400
|
max_threads=502
|
thread_count=280
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1234023 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x0
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x0 thread_stack 0x49000
|
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7fbb519aca7e]
|
/usr/sbin/mysqld(handle_fatal_signal+0x30d)[0x7fbb513f2dbd]
|
/lib64/libpthread.so.0(+0xf370)[0x7fbb5095b370]
|
/lib64/libc.so.6(gsignal+0x37)[0x7fbb4eee41d7]
|
/lib64/libc.so.6(abort+0x148)[0x7fbb4eee58c8]
|
/usr/sbin/mysqld(+0x423578)[0x7fbb511ae578]
|
/usr/sbin/mysqld(+0xa76c70)[0x7fbb51801c70]
|
/usr/sbin/mysqld(+0xa78e77)[0x7fbb51803e77]
|
/usr/sbin/mysqld(+0xa7c69d)[0x7fbb5180769d]
|
/usr/sbin/mysqld(+0xa7e733)[0x7fbb51809733]
|
/lib64/libpthread.so.0(+0x7dc5)[0x7fbb50953dc5]
|
/lib64/libc.so.6(clone+0x6d)[0x7fbb4efa676d]
|
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
|
information that should help you find out what is causing the crash.
|
and after start again this problem happened again (Unlocking tables state)
|
|
HamoonDBA this looks related to MDEV-13534 ; it is possible that disabling InnoDB persistent statistics may help to work around the problem. Small chance exists that it may help the problem connection as well.
To disable persistent stats - put following line into [mysqld] section of .cnf file:
innodb_stats_persistent=0
To disable persistent stats for current server without restart - execute SQL command:
set global innodb_stats_persistent=0;
|
The bug is fixed in 10.2.9 , which is not released yet
|
|
Hi Andrii
I disabled innodb_stats_persistent Online without restart. after change this variable it seems stuck query doesn't happen again but the current stuck sessions still persist, So I try to restart server but as I said I had to restart MariaDB with Kill -9 process.
I'm testing again to see problem happen again or not.
|
|
Andrii Nikitin unfortunately I had to downgrade all of our nodes to MariaDB 10.0
My boss forced me to downgrade (because we had only one node) and I couldn't test it again to be sure problem solve or not.

|