[MDEV-11291] Unable to create an index on a connect table (CSV type) which is bigger than 300MB Created: 2016-11-16  Updated: 2017-10-17

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - Connect
Affects Version/s: 10.1.18
Fix Version/s: 10.1

Type: Bug Priority: Major
Reporter: Jonny Zhu Assignee: Olivier Bertrand
Resolution: Unresolved Votes: 0
Labels: connect-engine, index, size


 Description   

Hello

I am trying to load a big CSV file (bigger than 1GB) into a connect table and create an index for it.

I fould if the CSV file is smaller than 200MB, erverything is ok.

If the CSV file is bigger than 300MB, it will be created successfully and could be displayed by "show index from <tablename>" statement. However the dnx file will cerntainly not be created. Then if I issue "optimize table <tablename>" statement, it will also return a OK message in 0.00 second, but nothing changed and nothing created on disk.

Another thing I found is that, if you create a connect table with a primary key for a csv file which has not been created on disk, index will also not created on disk even after I insert records into this table.

Could you please take a look at this issue?

Thank you very much.

Jonny Zhu
Nov.16th, 2016



 Comments   
Comment by Jonny Zhu [ 2016-11-16 ]

The steps to reproduce the issue:

1. Prepare the csv file by audit log for test data

[mysql@zhuvm1 test]$ cp /DMZENT01/datadg/db/server_audit.log /home/mysql/test/
[mysql@zhuvm1 test]$ ll
total 232
-rwxrwxrwx 1 root  root     213 Nov 11 01:13 mkbigcsv.sh
-rwxrwxrwx 1 mysql mysql    172 Nov 11 06:21 parse.pl
-rw-rw---- 1 mysql mysql 225774 Nov 16 09:12 server_audit.log

2. Repeat the csv file for 1500 times to get a large csv file (>300MB)

[mysql@zhuvm1 test]$ cat mkbigcsv.sh
#repeate a text file to make a large one
#sample:
#mkbigcsv <souce name> <how many times> <target name>
 
 
 
#!/bin/bash
 
echo $1
echo $2
echo $3
 
i=0
while [ $i -lt $2 ];
do
        cat $1 >> $3
        let i+=1
done
 
 
 
[mysql@zhuvm1 test]$ ./mkbigcsv.sh server_audit.log 1500 test.log
server_audit.log
1500
test.log
[mysql@zhuvm1 test]$ ll
total 524264
-rwxrwxrwx 1 root  root        204 Nov 16 09:17 mkbigcsv.sh
-rwxrwxrwx 1 mysql mysql       172 Nov 11 06:21 parse.pl
-rw-rw---- 1 mysql mysql    225774 Nov 16 09:12 server_audit.log
-rw-rw-r-- 1 mysql mysql 338661000 Nov 16 09:17 test.log

3. Create a connect table linked to the big csv file. The `timestamp` colume is the key.

 CREATE TABLE `test` (									
  `timestamp` varchar(30) NOT NULL,									
  `serverhost` varchar(255) DEFAULT NULL,									
  `username` varchar(255) DEFAULT NULL,									
  `host` varchar(255) DEFAULT NULL,									
  `connectionid` varchar(255) DEFAULT NULL,									
  `queryid` varchar(255) DEFAULT NULL,									
  `operation` varchar(255) DEFAULT NULL,									
  `dbname` varchar(255) DEFAULT NULL,									
  `object` varchar(10000) DEFAULT NULL,									
  `retcode` varchar(255) DEFAULT NULL,									
   KEY `idx` (`timestamp`)									
) ENGINE=CONNECT DEFAULT CHARSET=utf8 `table_type`=CSV `file_name`='/home/mysql/test/test.log' `sep_char`=',';	
 
 
Query OK, 0 rows affected (0.00 sec)
 
 
 
 
09:22:21 (root@localhost) [sampledb]> show index from test;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| test  |          1 | idx |            1 | timestamp   | A         |        NULL |     NULL | NULL   |      | XINDEX     |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
1 row in set (0.00 sec)
 
 
 
 
09:25:29 (root@localhost) [sampledb]> optimize table test;
+---------------+----------+----------+----------+
| Table         | Op       | Msg_type | Msg_text |
+---------------+----------+----------+----------+
| sampledb.test | optimize | status   | OK       |
+---------------+----------+----------+----------+
1 row in set (0.00 sec)

4. Although you can see the index by "show index" statement, but it is not created on disk actually.

09:20:39 (root@localhost) [sampledb]> select count(*) from test;
ERROR 1296 (HY000): Got error 122 'Open error 2 in mode rb 
on /home/mysql/test/test.dnx: No such file or directory' from CONNECT
 
 
[mysql@zhuvm1 test]$ ll /home/mysql/test/
total 330956
-rwxrwxrwx 1 root  root        204 Nov 16 09:17 mkbigcsv.sh
-rwxrwxrwx 1 mysql mysql       172 Nov 11 06:21 parse.pl
-rw-rw---- 1 mysql mysql    225774 Nov 16 09:12 server_audit.log
-rw-rw-r-- 1 mysql mysql 338661000 Nov 16 09:17 test.log
[mysql@zhuvm1 test]$
 

Comment by Olivier Bertrand [ 2016-11-17 ]

I made another big CSV file from other means. When it was 278M large, an index could be made without problem also it is a fairly long process. But when it was increased to more than 500M, making the index failed. This was due to query time out.

What happens then depends on how the client program handles query timeout and also whether it uses transaction. If query timeout is not tested as an error, the definition of the index may be commited but no file being written.

Concerning your other problem, I cannot reproduce it:

CREATE TABLE xcsv (
 n int(11) key NOT NULL,
 msg char(32) NOT NULL
) ENGINE=CONNECT TABLE_TYPE=CSV FILE_NAME='x.csv';

At this point x.csv does not exist.

INSERT INTO xcsv VALUES(1,'One'),(2,'Two'),(3,'Three');

The index is made and the files x.csv and x.dnx both exist.

Comment by Olivier Bertrand [ 2016-11-17 ]

I made the index successfully on the big table (565M) using the release version of the MariaDB server (previously I was in debug mode). It took 1932 seconds to make it. I had a query timeout set to one hour.
So it seems that this is the cause of index making failure.

Comment by Jonny Zhu [ 2016-11-18 ]

Thank you for the reply. But as you can see in the commands and return messages I attached, it was "Query OK", not timeout error that I had.

Could you please let me know which version of MariaDB did you tried ? (I am now using 10.1.18 )

Thank you

Comment by Olivier Bertrand [ 2016-11-18 ]

Same one, 10.1.18 on Windows 7 32 bits.
However, in my example I constructed the index after creating the table. Now, to follow your example, I did it the same way:

MariaDB [california]> CREATE TABLE `bigxcsv` (
    ->   `Recordnb` int(8) NOT NULL COMMENT 'Patient Discharge Record Number',
    ->   `Diagnb` smallint(2) NOT NULL COMMENT 'Diagnosis number',
    ->   `RLN` varchar(9) NOT NULL COMMENT 'linkage number SSN',
    ->   `Diag` int(5) NOT NULL COMMENT 'diagnosis',
    ->   `DiagPresent` char(1) NOT NULL COMMENT 'diagnosis present at admission',
    ->   INDEX XDG (diag)
    -> ) ENGINE=CONNECT `table_type`=CSV `file_name`='E:/Californie/diag.csv' `header`=1;
Query OK, 0 rows affected (36 min 38.55 sec)

The big difference is that in your example the CREATE TABLE lasted 0 sec! No wonder the index file was not generated.

Now why was it not generated? Currently I can't tell not being able to reproduce this.

BTW, on which machine and operating system are you running?

Note: I used an index, not a primary key, because of duplicate values.

Comment by Jonny Zhu [ 2016-11-21 ]

0 second means, I think, the index was not created. If some errors happen, I expect some error messages returned by DB server but got nothing.

I have this problem on a CentOS7 x64 server. MariaDB was installed by "yum install" command. May I ask you to try the same again on a CentOS system?

Thank you

Jonny Zhu

Comment by Olivier Bertrand [ 2016-11-22 ]

Sorry, I have no such system available. Just Windows and Ubuntu.

Comment by Jonny Zhu [ 2016-11-24 ]

I understand that you are unable to reproduce the issue on windows platform. But as you can see the log I attached, this issue is happening at a 100% possiblilty on CentOS7 platform.

Coud you please let me know what I can do with it? Any suggestion would be good reference.

Jonny

Comment by Jonny Zhu [ 2016-12-07 ]

Has anybody of you a centos enviorment? I guess your team are also supporting MariaDB on CentOS platform, besides windows and ubuntu. And... I just tried, that it tooks no more than 30 min to install a new CenOS system with an empty MariaDB in a VMware / Hyper-V virtual machine.

Comment by Alice Sherepa [ 2017-04-21 ]

[Note] may be related to https://jira.mariadb.org/browse/MDEV-12551

Generated at Thu Feb 08 07:48:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.