[MDEV-17401] LOAD DATA from very big file into MyISAM table results in EOF error and corrupt index Created: 2018-10-09  Updated: 2020-08-25  Resolved: 2019-01-25

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - MyISAM
Affects Version/s: 5.5, 10.0, 10.1, 10.3.10, 10.3.12, 10.2
Fix Version/s: 10.4.2, 10.2.19, 10.1.38, 5.5.63, 10.0.38, 10.3.13

Type: Bug Priority: Critical
Reporter: Geoff Montee (Inactive) Assignee: Oleksandr Byelkin
Resolution: Fixed Votes: 0
Labels: None

Attachments: File MDEV-17401.diff    
Issue Links:
Relates
relates to MDEV-16461 MyISAM creates defect indexes on varc... Confirmed

 Description   

A LOAD DATA statement is consistently failing with the following error:

MariaDB [(none)]> LOAD DATA  LOCAL INFILE '/home/ec2-user/tab.txt' INTO TABLE db1.`tab` IGNORE 0 LINES;
ERROR 9 (HY000): Unexpected end-of-file found when reading file '/tmp/STWraeLT' (Errcode: 175 "File too short; Expected more data in file")

The resulting table is also corrupt, and cannot seem to be repaired:

MariaDB [db1]> CHECK TABLE tab;
+-------------------+-------+----------+------------------------------------------------------------------------+
| Table             | Op    | Msg_type | Msg_text                                                               |
+-------------------+-------+----------+------------------------------------------------------------------------+
| db1.tab | check | error    | Can't read indexpage from filepos: -1                                  |
| db1.tab | check | Error    | Index for table './db1/tab.MYI' is corrupt; try to repair it |
| db1.tab | check | error    | Corrupt                                                                |
+-------------------+-------+----------+------------------------------------------------------------------------+
3 rows in set (1 min 51.569 sec)
 
MariaDB [db1]> REPAIR TABLE tab;
+-------------------+--------+----------+----------------------------------------------------------------------------------------------------------------------------+
| Table             | Op     | Msg_type | Msg_text                                                                                                                   |
+-------------------+--------+----------+----------------------------------------------------------------------------------------------------------------------------+
| db1.tab | repair | Error    | Unexpected end-of-file found when reading file '/tmp/STjk9WiV' (Errcode: 175 "File too short; Expected more data in file") |
| db1.tab | repair | Error    | Unexpected end-of-file found when reading file '/tmp/STpQXG9A' (Errcode: 175 "File too short; Expected more data in file") |
| db1.tab | repair | status   | OK                                                                                                                         |
+-------------------+--------+----------+----------------------------------------------------------------------------------------------------------------------------+
3 rows in set (36 min 31.957 sec)
 
MariaDB [db1]> CHECK TABLE tab;
+-------------------+-------+----------+-------------------------------------------------------------+
| Table             | Op    | Msg_type | Msg_text                                                    |
+-------------------+-------+----------+-------------------------------------------------------------+
| db1.tab | check | error    | Found key at page -1 that points to record outside datafile |
| db1.tab | check | error    | Corrupt                                                     |
+-------------------+-------+----------+-------------------------------------------------------------+
2 rows in set (15.393 sec)

The table has the following definition:

MariaDB [(none)]> SHOW CREATE TABLE db1.tab\G
*************************** 1. row ***************************
       Table: tab
Create Table: CREATE TABLE `tab` (
  `tab_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `col2` int(10) unsigned NOT NULL,
  `col3` int(10) unsigned NOT NULL,
  `col4` int(10) unsigned NOT NULL,
  `col5` int(10) unsigned NOT NULL,
  `col6` tinyint(1) NOT NULL,
  `col7` double DEFAULT NULL,
  `col8` varchar(18) DEFAULT NULL,
  PRIMARY KEY (`tab_id`),
  UNIQUE KEY `unique_idx` (`col2`,`col3`,`col4`,`col6`),
  UNIQUE KEY `col8_idx` (`col8`),
  KEY `col3_idx` (`col3`,`col4`),
  KEY `col2_idx` (`col2`)
) ENGINE=MyISAM AUTO_INCREMENT=205163534 DEFAULT CHARSET=latin1
1 row in set (0.005 sec)

The file being loaded into the table is 14 GB large, and it has 205163533 rows in it.

The only non-default values in the configuration file are:

log_error=mysqld.err
myisam_sort_buffer_size = 4G
bulk_insert_buffer_size = 2G



 Comments   
Comment by Elena Stepanova [ 2018-10-09 ]

I could reproduce it, and it doesn't seem to be related to a lack of space in the tmpdir (I didn't monitor mine, but it had over 1T to begin with, so it's unlikely it temporarily ran out of space).
Reproducing is simple, but very time-consuming, so debugging might be difficult. I'll try to see if I can reduce it somehow, but experimenting itself also takes a lot of time.

Comment by Oleksandr Byelkin [ 2018-10-09 ]

Repeatable on non tmp dir (so it is not lack of space) and repatable only with :

--mysqld="--log-error=mysqld.err" --mysqld="--myisam-sort-buffer-size=4G" --mysqld="--bulk-insert-buffer-size=2G"

Comment by Elena Stepanova [ 2018-10-09 ]

Right, I also tried to run with

myisam_sort_buffer_size=1G
bulk_insert_buffer_size=1G

and it appears it helps.

Both values are dynamic, but I can't say yet with certainty whether changing the values at runtime helps; I tried to start the server with them, and at least LOAD DATA went without errors with these settings (CHECK TABLE is still running).

Comment by Oleksandr Byelkin [ 2018-10-10 ]

It fails with only -mysqld="-myisam-sort-buffer-size=4G"
(probably 32/64 bit problem)

Comment by Oleksandr Byelkin [ 2018-10-10 ]

--bulk-insert-buffer-size=2G is irrelevant to the fail, so the sort buffer is cause of the problem

Comment by Oleksandr Byelkin [ 2018-10-11 ]

problem is that linux can not read whole requested chunk of the file (see man 2 read)

Comment by Oleksandr Byelkin [ 2018-10-12 ]

revision-id: cc5c19ae5233ba90de086de76043774ae6c78cd7 (mariadb-5.5.61-30-gcc5c19ae523)
parent(s): acf8fc1ff8a7b2d49e25279670b04b8eb096ce0c
author: Oleksandr Byelkin
committer: Oleksandr Byelkin
timestamp: 2018-10-12 09:07:05 +0200
message:

MDEV-17401: LOAD DATA from very big file into MyISAM table results in EOF error and corrupt index

my_read fixed as in higher versions.
my_pread made as my_read aware of partial read of huge chunks of files
MY_FULL_IO enabled for file operations

Comment by Oleksandr Byelkin [ 2018-10-12 ]

I also added here patch for 10.3 (can be helpful for merges from 5.5 up)

Comment by Oleksandr Byelkin [ 2018-10-30 ]

Index checked in the new version:

MariaDB [test]> explain select sum(col2) from tab;
+------+-------------+-------+-------+---------------+----------+---------+------+-----------+-------------+
| id   | select_type | table | type  | possible_keys | key      | key_len | ref  | rows      | Extra       |
+------+-------------+-------+-------+---------------+----------+---------+------+-----------+-------------+
|    1 | SIMPLE      | tab   | index | NULL          | col2_idx | 4       | NULL | 205163533 | Using index |
+------+-------------+-------+-------+---------------+----------+---------+------+-----------+-------------+
1 row in set (0.00 sec)
 
MariaDB [test]> select sum(col2) from tab;
+-------------+
| sum(col2)   |
+-------------+
| 64923196368 |
+-------------+
1 row in set (4 min 34.69 sec)
 
sanja@SanjaLaptop:~/Downloads$ cat  motif_feature.txt|perl count.perl 
64923196368

Comment by Oleksandr Byelkin [ 2018-10-30 ]

revision-id: f95616dc4774328381044b53263d6a23ae19bfb4 (mariadb-5.5.61-39-gf95616dc477)
parent(s): 893ebb739e24637985892da1555f2e6f1a8ac5b6
author: Oleksandr Byelkin
committer: Oleksandr Byelkin
timestamp: 2018-10-30 19:36:17 +0100
message:

MDEV-17401: LOAD DATA from very big file into MyISAM table results in EOF error and corrupt index

my_read fixed as in higher versions.
my_pread made as my_read aware of partial read of huge chunks of files
MY_FULL_IO enabled for file operations

Comment by Richard Stracke [ 2019-01-18 ]

Issue reoccured in 10.3.12

Comment by Oleksandr Byelkin [ 2019-01-23 ]

OK, the problem in 10.3 is that the flag MY_FULL_IO is not set in merge_buffers()

Generated at Thu Feb 08 08:36:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.