Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12603

Insert command truncates ZIP CSV table

Details

    Description

      commands below show that after insert all rows in ZIP CSV table(CONNECT engine) disappear, zip file is also rewritten.

      --------------
      CREATE TABLE `c1` (
      	`n1` varchar(255)  NOT NULL,
       	`n2` varchar(255)  NOT NULL
      ) ENGINE=CONNECT  `table_type`=CSV `file_name`='/home/alice/t/d1.zip' sep_char=';' zipped=1
      --------------
      Query OK, 0 rows affected (0.06 sec)
       
      --------------
      select count(*) from c1
      --------------
      +----------+
      | count(*) |
      +----------+
      |      101 |
      +----------+
      1 row in set (0.00 sec)
       
      --------------
      insert into c1 select 1,1
      --------------
      Query OK, 1 row affected (0.00 sec)
      Records: 1  Duplicates: 0  Warnings: 0
       
      --------------
      select count(*) from c1
      --------------
      +----------+
      | count(*) |
      +----------+
      |        1 |
      +----------+
      1 row in set (0.00 sec)
      

      Attachments

        Activity

          alice Alice Sherepa added a comment -

          When table is created from zip file with mutiple files in it(option_list='Entry=.csv), all files in zip file are truncated with file with a name ".csv", if OPTION_LIST='MULENTRIES=1', then file inside zip file has name "-", so zip file is impossible to decompress it because of error in the name.

          CREATE TABLE `c1` (
          	`n1` varchar(255)  NOT NULL,
           	`n2` varchar(255)  NOT NULL
          ) ENGINE=CONNECT  `table_type`=CSV `file_name`='/home/alice/Downloads/Downloads.zip' zipped=1 sep_char="|" option_list='Entry=*.csv'
          --------------
          Query OK, 0 rows affected (0.05 sec)
           
          --------------
          insert into c1 values ('1','1')
          --------------
          Query OK, 1 row affected (0.00 sec)
           
           
          CREATE TABLE `c2` (
          	`n1` varchar(255)  NOT NULL,
           	`n2` varchar(255)  NOT NULL
          ) ENGINE=CONNECT  `table_type`=CSV `file_name`='/home/alice/Downloads/Downloads1.zip' zipped=1 sep_char="|" OPTION_LIST='MULENTRIES=1'
          --------------
          Query OK, 0 rows affected (0.02 sec)
           
          --------------
          insert into c2 values ('1','1')
          --------------
          Query OK, 1 row affected (0.00 sec)
          

          alice Alice Sherepa added a comment - When table is created from zip file with mutiple files in it(option_list='Entry= .csv), all files in zip file are truncated with file with a name " .csv", if OPTION_LIST='MULENTRIES=1', then file inside zip file has name "-", so zip file is impossible to decompress it because of error in the name. CREATE TABLE `c1` ( `n1` varchar (255) NOT NULL , `n2` varchar (255) NOT NULL ) ENGINE= CONNECT `table_type`=CSV `file_name`= '/home/alice/Downloads/Downloads.zip' zipped=1 sep_char= "|" option_list= 'Entry=*.csv' -------------- Query OK, 0 rows affected (0.05 sec)   -------------- insert into c1 values ( '1' , '1' ) -------------- Query OK, 1 row affected (0.00 sec)     CREATE TABLE `c2` ( `n1` varchar (255) NOT NULL , `n2` varchar (255) NOT NULL ) ENGINE= CONNECT `table_type`=CSV `file_name`= '/home/alice/Downloads/Downloads1.zip' zipped=1 sep_char= "|" OPTION_LIST= 'MULENTRIES=1' -------------- Query OK, 0 rows affected (0.02 sec)   -------------- insert into c2 values ( '1' , '1' ) -------------- Query OK, 1 row affected (0.00 sec)

          Please read the documentation at:
          https://mariadb.com/kb/en/mariadb/connect-zipped-file-tables/

          This is explained in the paragraph Creating new zipped tables

          bertrandop Olivier Bertrand added a comment - Please read the documentation at: https://mariadb.com/kb/en/mariadb/connect-zipped-file-tables/ This is explained in the paragraph Creating new zipped tables

          bertrandop:

          Hi Olivier,

          Regarding the initial issue (INSERT into an existing table overriding the contents), we know that it's the current design, but I'd like to ask you to reconsider before it's too late.

          As we are trying to achieve the state when CONNECT engine can be officially recommended for production, it is important that the engine provides some safety which can be normally expected from a stable engine.

          It is quite common that an engine has limitations upon certain types of DML/DDL, there is nothing wrong with it; but then it should actually refuse to perform the operation rather than silently damage the data. If you take ARCHIVE engine for example, it cannot do UPDATE, but then it just says so:

          MariaDB [test]> update ta set i = 2;
          ERROR 1031 (HY000): Storage engine ARCHIVE of the table `test`.`ta` doesn't have this option
          

          I believe CONNECT engine, when restrictions apply, should behave in a similar way.

          In this particular case, a more reasonable approach would be to allow to create the data file using the "Insert" method, but there is no functional value in allowing to replace the data the same way. If somebody wants to discard the existing data and create a new set, they can (and should) drop it explicitly. Replacing only opens the wide possibility for human errors.

          Please also note that while technically documentation does warn about data replacement, it does so not very clearly.
          First, it says

          This is a read-only implementation. No insert, update or delete.

          OK, for me it means that the user can be sure that no damage will be done – that's what "read-only" normally stands for.

          Then it says

          Creating new zipped tables
          Tables can be created to access already existing zip files. However, is it also possible to make the zip file from an existing file or table. Two ways are available to make the zip file:
          Insert method
          ...

          OK, so as long as I don't need to create new zip files, it does not concern me.
          But only there it finally says:

          when executing an insert statement the inserted records are not added but replace the existing ones

          Which contradicts the first statement and does not really have much to do with the section it is under.

          We cannot rely on users to read and memorize the fine print, and while technically the data loss would be their fault, they won't be happy.

          elenst Elena Stepanova added a comment - bertrandop : Hi Olivier, Regarding the initial issue ( INSERT into an existing table overriding the contents), we know that it's the current design, but I'd like to ask you to reconsider before it's too late. As we are trying to achieve the state when CONNECT engine can be officially recommended for production, it is important that the engine provides some safety which can be normally expected from a stable engine. It is quite common that an engine has limitations upon certain types of DML/DDL, there is nothing wrong with it; but then it should actually refuse to perform the operation rather than silently damage the data. If you take ARCHIVE engine for example, it cannot do UPDATE , but then it just says so: MariaDB [test]> update ta set i = 2; ERROR 1031 (HY000): Storage engine ARCHIVE of the table `test`.`ta` doesn't have this option I believe CONNECT engine, when restrictions apply, should behave in a similar way. In this particular case, a more reasonable approach would be to allow to create the data file using the "Insert" method, but there is no functional value in allowing to replace the data the same way. If somebody wants to discard the existing data and create a new set, they can (and should) drop it explicitly. Replacing only opens the wide possibility for human errors. Please also note that while technically documentation does warn about data replacement, it does so not very clearly. First, it says This is a read-only implementation. No insert, update or delete. OK, for me it means that the user can be sure that no damage will be done – that's what "read-only" normally stands for. Then it says Creating new zipped tables Tables can be created to access already existing zip files. However, is it also possible to make the zip file from an existing file or table. Two ways are available to make the zip file: Insert method ... OK, so as long as I don't need to create new zip files, it does not concern me. But only there it finally says: when executing an insert statement the inserted records are not added but replace the existing ones Which contradicts the first statement and does not really have much to do with the section it is under. We cannot rely on users to read and memorize the fine print, and while technically the data loss would be their fault, they won't be happy.

          I agree. I shall reconsider this.

          bertrandop Olivier Bertrand added a comment - I agree. I shall reconsider this.

          Done

          bertrandop Olivier Bertrand added a comment - Done

          People

            bertrandop Olivier Bertrand
            alice Alice Sherepa
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.