[MDEV-20546] INSERT SELECT randomly blocks in Copy to temp table Created: 2019-09-10 Updated: 2021-02-22 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Data Manipulation - Insert, Storage Engine - InnoDB |
| Affects Version/s: | 10.1.40, 10.3.27 |
| Fix Version/s: | 10.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Thomas Mischke | Assignee: | Oleksandr Byelkin |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | innodb | ||
| Environment: |
Linux version 3.10.0-957.21.3.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Tue Jun 18 16:35:19 UTC 2019 |
||
| Attachments: |
|
| Description |
|
Hello, we have a INSERT SELECT statement that started to block randomly about a week ago. Normally the statement executes in less than 5 seconds, but when it blocks, it does not return in more than 3 hours (longer we did not wait). The statement blocks randomly, that means sometimes it runs fine and then a minute later it may block. The statement looks as follows: insert into findispyear(fyperi,fytyp,fywert,fyMenge,fyupdt,fylgrx,fymat,fywareh) Some tests showed that it is most likely that the table atad01 is involved in this issue: The atad01 gets quite a lot of inserts. As long as these inserts are running (or even some inserts are pending when the statement above starts), then it is more likely for the statement to block. When we stop the inserts into atad01, then no blocking could be observed. Therefore here the table definition of atad01 and an example of the insert statements we are using there: CREATE TABLE `atad01` ( INSERT INTO `atad01` (`ATBWID`, `ATZO01`, `ATZO02`, `ATZO03`, `ATZO04`, `ATZO05`, `ATZO06`, `ATZO07`, `ATZO08`, `ATZO09`, `ATZO10`, `ATBART`, `ATBLGN`, `ATBPOS`, `ATCHGN`, `ATKDLI`, `ATBNCH`, `ATPBVB`, `ATBWMG`, `ATBWWT`, `ATBWDT`, `ATBWZT`, `ATCMT1`, `ATUPDT`, `ATUPZT`, `ATUPUS`, `ATUPBS`, `ATSRCE`, `ATBUID`, `ATSTOCKEVALUATIONTYPE`, `ATIGNORELOCK`, `ATPBVBLGR`, `ATCMTUSER`, `ATCMTERROR`, `ATWTVB`, `ATCONSHIST`, `ATZOTECHID`) VALUES (9, 1, NULL, NULL, NULL, 46432, 24, NULL, NULL, 529, NULL, 'STD', 'PICK529', 1, NULL, 0, 'LENS', 12, -1, 0.8100, 20120619, 110337, NULL, 20120619, 110337, 'user', 'OV-VM', 'BULK', NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL); At the moment we are still in the stage where we try to understand, what is blocking. So at the moment we would be glad for hints, what to check, when the statement hangs, to get more details about what is going on. As so often: We were not able to reproduce this behavior on our test server, it happens on the live server only. As mentioned above, the problem started last week. Before that, the statement never blocked before, and the table structures, database version, statements running on those tables did not change for many weeks. Just the size of atad01 is constantly growing (currently around 7GB and 16 million lines. We double checked the free disk space for table space and temp files, both show many GB free space. After stopping such a statement, it shows up in the slow query log. Here is an example of such an entry, where the statement took more than 300 seconds before it was stopped, but the lock time is close to 0.
Any suggestions, how to continue? Best regards, Thomas Mischke |
| Comments |
| Comment by Thomas Mischke [ 2021-02-18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In the meantime we switched to a newer version of MariaDB (10.3.27). The problem still exists, and we found that quite a lot of our "insert into ... select ..." statements from time to time run into this problem. Therefore I would like to know, if there is anything we can do to help you to find the cause of this problem. Right now we have the problem again, that is one statement hangs around (which normally takes just a second or so), while another statement waits for the first to finish. Here are the details: This statement currently blocks:
The client already closed the connection, but the statement is still there. It shows "Sending data" as current doing. Half an hour later this statement was started: truncate table w_findispmon02RS It is currently hanging in "waiting for table metadata lock". In case it can help you, I attach the output of "show engine innodb status". Best regards, Thomas Mischke | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Thomas Mischke [ 2021-02-22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello again, after some tests at the weekend I think I can add some more details here. It seems that some index statistics are not refreshed in time and that there a wrong execution plan is selected, which takes several hours instead of just a few seconds. Here are the details: Each morning the table is truncated and filled with fresh data. This refreshing is done by several statements, all of them are doing some "insert into ... select ..." one after the other in one connection using multiple transactions. Nearly all of these statements are always fast. All these fast statements are reading data from other tables. Now a statement comes that reads from the same table it writes to. And this statement sometimes uses the wrong execution plan. Then I get something like the following:
But if I ask for the execution plan 10 seconds later again, I get a better execution plan:
What I did for now is adding "force index" to the statement, so that it always performs good. (I tried to insert a "select sleep(10)", but this was not transferred to our slave and so the slave came into the same situation). So for me the problem is solved. All I would like to know: Is this a known behavior? Is this considered a bug (because the execution plan should be fine right no matter when the sql is to be executed), or a feature (refresh of index statistics in the background for more performance on the insert statements itself), or is there even a configuration variable we can use to set this behavior? Best regards, Thomas Mischke |