[MDEV-20546] INSERT SELECT randomly blocks in Copy to temp table Created: 2019-09-10  Updated: 2021-02-22

Status: Open
Project: MariaDB Server
Component/s: Data Manipulation - Insert, Storage Engine - InnoDB
Affects Version/s: 10.1.40, 10.3.27
Fix Version/s: 10.1

Type: Bug Priority: Major
Reporter: Thomas Mischke Assignee: Oleksandr Byelkin
Resolution: Unresolved Votes: 0
Labels: innodb
Environment:

Linux version 3.10.0-957.21.3.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Tue Jun 18 16:35:19 UTC 2019


Attachments: Text File innodbstatus_insert_select_blocks.txt    

 Description   

Hello,

we have a INSERT SELECT statement that started to block randomly about a week ago. Normally the statement executes in less than 5 seconds, but when it blocks, it does not return in more than 3 hours (longer we did not wait).

The statement blocks randomly, that means sometimes it runs fine and then a minute later it may block.

The statement looks as follows:

insert into findispyear(fyperi,fytyp,fywert,fyMenge,fyupdt,fylgrx,fymat,fywareh)
select f1peri as Peri,'22. + WE aus Vorperioden' as Art,round(sum(atbwmg*atbwwt),4) as WEWert,
sum(atbwmg) as fyMenge,
date_format(current_date, '%Y%m%d')as fyupdt,
f1lgrx as LgRx,f1mat as fymat,atzo01 as fywareh
from werh01
join werd01 on wezo01=wercid
join atad01 on weatid=atpkey
join w_findispmon01 on f1whse=atzo01 and f1zo05=atzo05
where weendperi=f1peri
and atbwdt <(f1peri*100)+1
group by art,Peri,f1lgrx,f1mat,atzo01

Some tests showed that it is most likely that the table atad01 is involved in this issue: The atad01 gets quite a lot of inserts. As long as these inserts are running (or even some inserts are pending when the statement above starts), then it is more likely for the statement to block. When we stop the inserts into atad01, then no blocking could be observed.

Therefore here the table definition of atad01 and an example of the insert statements we are using there:

CREATE TABLE `atad01` (
`ATBWID` INT(11) NULL DEFAULT NULL,
`ATZO01` INT(11) NULL DEFAULT NULL,
`ATZO02` INT(11) NULL DEFAULT NULL,
`ATZO03` INT(11) NULL DEFAULT NULL,
`ATZO04` INT(11) NULL DEFAULT NULL,
`ATZO05` INT(11) NULL DEFAULT NULL,
`ATZO06` INT(11) NULL DEFAULT NULL,
`ATZO07` INT(11) NULL DEFAULT NULL,
`ATZO08` INT(11) NULL DEFAULT NULL,
`ATZO09` INT(11) NULL DEFAULT NULL,
`ATZO10` INT(11) NULL DEFAULT NULL,
`ATBART` VARCHAR(10) NULL DEFAULT NULL,
`ATBLGN` VARCHAR(64) NULL DEFAULT NULL,
`ATBPOS` INT(11) NULL DEFAULT NULL,
`ATCHGN` VARCHAR(10) NULL DEFAULT NULL,
`ATKDLI` INT(11) NULL DEFAULT NULL,
`ATBNCH` VARCHAR(10) NULL DEFAULT NULL,
`ATPBVB` INT(11) NULL DEFAULT NULL,
`ATBWMG` INT(11) NULL DEFAULT NULL,
`ATBWWT` DECIMAL(13,4) NULL DEFAULT NULL,
`ATBWDT` INT(11) NULL DEFAULT NULL,
`ATBWZT` INT(11) NULL DEFAULT NULL,
`ATCMT1` VARCHAR(77) NULL DEFAULT NULL,
`ATUPDT` INT(11) NULL DEFAULT NULL,
`ATUPZT` INT(11) NULL DEFAULT NULL,
`ATUPUS` VARCHAR(50) NULL DEFAULT NULL,
`ATUPBS` VARCHAR(50) NULL DEFAULT NULL,
`ATPKEY` INT(11) NOT NULL AUTO_INCREMENT,
`ATSRCE` VARCHAR(10) NULL DEFAULT NULL,
`ATBUID` INT(11) NULL DEFAULT NULL,
`ATSTOCKEVALUATIONTYPE` VARCHAR(25) NULL DEFAULT NULL,
`ATIGNORELOCK` BIT(1) NULL DEFAULT NULL,
`ATPBVBLGR` INT(11) NULL DEFAULT NULL,
`ATCMTUSER` VARCHAR(1024) NULL DEFAULT NULL,
`ATCMTERROR` VARCHAR(1024) NULL DEFAULT NULL,
`ATWTVB` DECIMAL(13,4) NULL DEFAULT NULL,
`ATCONSHIST` VARCHAR(10) NULL DEFAULT NULL,
`ATZOTECHID` INT(11) NULL DEFAULT NULL,
PRIMARY KEY (`ATPKEY`),
INDEX `Art_ARTicle` (`ATBART`, `ATZO05`),
INDEX `iATBWID_atad01` (`ATBWID`),
INDEX `iDocid` (`ATBLGN`, `ATZO03`, `ATZO04`),
INDEX `iLens` (`ATZO03`, `ATZO04`),
INDEX `iWDBWID` (`ATZO08`),
INDEX `iAS_KEY` (`ATBLGN`, `ATBPOS`),
INDEX `izo06` (`ATZO06`),
INDEX `iLocation` (`ATZO01`, `ATZO02`),
INDEX `iArticle` (`ATZO05`),
INDEX `atbwdt` (`ATBWDT`),
INDEX `iATZO07_atad01` (`ATZO07`),
INDEX `iATZOTECHID_atad01` (`ATZOTECHID`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
ROW_FORMAT=COMPACT
AUTO_INCREMENT=17721126

INSERT INTO `atad01` (`ATBWID`, `ATZO01`, `ATZO02`, `ATZO03`, `ATZO04`, `ATZO05`, `ATZO06`, `ATZO07`, `ATZO08`, `ATZO09`, `ATZO10`, `ATBART`, `ATBLGN`, `ATBPOS`, `ATCHGN`, `ATKDLI`, `ATBNCH`, `ATPBVB`, `ATBWMG`, `ATBWWT`, `ATBWDT`, `ATBWZT`, `ATCMT1`, `ATUPDT`, `ATUPZT`, `ATUPUS`, `ATUPBS`, `ATSRCE`, `ATBUID`, `ATSTOCKEVALUATIONTYPE`, `ATIGNORELOCK`, `ATPBVBLGR`, `ATCMTUSER`, `ATCMTERROR`, `ATWTVB`, `ATCONSHIST`, `ATZOTECHID`) VALUES (9, 1, NULL, NULL, NULL, 46432, 24, NULL, NULL, 529, NULL, 'STD', 'PICK529', 1, NULL, 0, 'LENS', 12, -1, 0.8100, 20120619, 110337, NULL, 20120619, 110337, 'user', 'OV-VM', 'BULK', NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL);

At the moment we are still in the stage where we try to understand, what is blocking. So at the moment we would be glad for hints, what to check, when the statement hangs, to get more details about what is going on.

As so often: We were not able to reproduce this behavior on our test server, it happens on the live server only.

As mentioned above, the problem started last week. Before that, the statement never blocked before, and the table structures, database version, statements running on those tables did not change for many weeks. Just the size of atad01 is constantly growing (currently around 7GB and 16 million lines.

We double checked the free disk space for table space and temp files, both show many GB free space.

After stopping such a statement, it shows up in the slow query log. Here is an example of such an entry, where the statement took more than 300 seconds before it was stopped, but the lock time is close to 0.

  1. Time: 190910 11:30:38
  2. User@Host: OWStudio[OWStudio] @ [192.168.184.6]
  3. Thread_id: 3381816 Schema: optovision QC_hit: No
  4. Query_time: 308.844872 Lock_time: 0.000713 Rows_sent: 0 Rows_examined: 0
  5. Rows_affected: 0
    SET timestamp=1568107838;
  6. WE-PeriodenAbgrenzung
    insert into findispyear(fyperi,fytyp,fywert,fyMenge,fyupdt,fylgrx,fymat,fywareh)
    select f1peri as Peri,'22. + WE aus Vorperioden' as Art,round(sum(atbwmg*atbwwt),4) as WEWert,
    sum(atbwmg) as fyMenge,
    date_format(current_date, '%Y%m%d')as fyupdt,
    f1lgrx as LgRx,f1mat as fymat,atzo01 as fywareh
    from werh01
    join werd01 on wezo01=wercid
    join atad01 on weatid=atpkey
    join w_findispmon01 on f1whse=atzo01 and f1zo05=atzo05
    where weendperi=f1peri
    and atbwdt <(f1peri*100)+1
    group by art,Peri,f1lgrx,f1mat,atzo01

Any suggestions, how to continue?

Best regards,

Thomas Mischke



 Comments   
Comment by Thomas Mischke [ 2021-02-18 ]

In the meantime we switched to a newer version of MariaDB (10.3.27). The problem still exists, and we found that quite a lot of our "insert into ... select ..." statements from time to time run into this problem.

Therefore I would like to know, if there is anything we can do to help you to find the cause of this problem.

Right now we have the problem again, that is one statement hangs around (which normally takes just a second or so), while another statement waits for the first to finish.

Here are the details:

This statement currently blocks:

insert into w_findispmon02RS(fdzo05,fdwhse,fdkalk,fdkpos,FDECOD,FDSTXT,FDPERI,            
FDhrMG ,FDnrMG,FDhrwt ,FDnrwt,fdupdt )                                     
select reb.fdzo05,NULL,'99DIF','93','SUMDIF',
case when beb.FDhrMG is null then 'Diff.durch fehlenden EB ' else 'Differenz AB+Bew-EB' end,
reb.FDPeri, coalesce(beb.FDhrMG, 0)-(reb.FDhrMG) , 0 ,                              
coalesce(beb.FDhrWT, 0)-(reb.FDhrWT+bob.FDhrWT) , 0  , date_format(current_date, '%Y%m%d')                                               
from w_findispmon02RS reb                                                   
left join w_findispmon02RS beb on beb.fdzo05=reb.fdzo05 and beb.fdperi=reb.fdperi and beb.fdkalk='99DIF'    
                   and beb.fdecod='EBLAN'    
left join w_findispmon02RS bob on bob.fdzo05=reb.fdzo05 and bob.fdperi=reb.fdperi and bob.fdkalk='99DIF'    
          and bob.fdecod='SUMDIF' and bob.fdkpos='92'                                    
where reb.fdkalk='90EB' and reb.fdecod='SUMEB'

The client already closed the connection, but the statement is still there. It shows "Sending data" as current doing.

Half an hour later this statement was started:

truncate table w_findispmon02RS

It is currently hanging in "waiting for table metadata lock".

In case it can help you, I attach the output of "show engine innodb status".
innodbstatus_insert_select_blocks.txt

Best regards,

Thomas Mischke

Comment by Thomas Mischke [ 2021-02-22 ]

Hello again,

after some tests at the weekend I think I can add some more details here. It seems that some index statistics are not refreshed in time and that there a wrong execution plan is selected, which takes several hours instead of just a few seconds.

Here are the details: Each morning the table is truncated and filled with fresh data. This refreshing is done by several statements, all of them are doing some "insert into ... select ..." one after the other in one connection using multiple transactions. Nearly all of these statements are always fast. All these fast statements are reading data from other tables.

Now a statement comes that reads from the same table it writes to. And this statement sometimes uses the wrong execution plan. Then I get something like the following:

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE reb ref FDKALK,FDECOD FDKALK 43 const 97254 Using index condition; Using where; Using temporary
1 SIMPLE beb ref fdzo05,FDKALK,FDECOD FDKALK 43 const 89296 Using where
1 SIMPLE bob ref fdzo05,FDKALK,FDECOD FDECOD 43 const 1 Using where

But if I ask for the execution plan 10 seconds later again, I get a better execution plan:

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE reb ref FDKALK,FDECOD FDKALK 43 const 97254 Using index condition; Using where; Using temporary
1 SIMPLE beb ref fdzo05,FDKALK,FDECOD fdzo05 5 test.reb.FDZO05 1 Using where
1 SIMPLE bob ref fdzo05,FDKALK,FDECOD fdzo05 5 test.reb.FDZO05 1 Using where

What I did for now is adding "force index" to the statement, so that it always performs good. (I tried to insert a "select sleep(10)", but this was not transferred to our slave and so the slave came into the same situation).

So for me the problem is solved. All I would like to know: Is this a known behavior? Is this considered a bug (because the execution plan should be fine right no matter when the sql is to be executed), or a feature (refresh of index statistics in the background for more performance on the insert statements itself), or is there even a configuration variable we can use to set this behavior?

Best regards,

Thomas Mischke

Generated at Thu Feb 08 09:00:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.