[MCOL-5509] columnstore_info.load_from_s3 | Misleading Messages Created: 2023-06-07  Updated: 2023-06-22  Resolved: 2023-06-22

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: None
Fix Version/s: 23.02.4

Type: Bug Priority: Major
Reporter: Allen Herrera Assignee: Leonid Fedorov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MCOL-5013 Support Load data from AWS S3 : UDF... Closed
Assigned for Testing: Daniel Lee Daniel Lee (Inactive)

 Description   

Reproduction

CREATE DATABASE db1;
 
CREATE TABLE db1.tab1 (
   id INT,
   color varchar(50),
   quantity INT
) ENGINE=ColumnStore;
 
# See developer comments for key/secret
SET columnstore_s3_region='us-west-2';
SET columnstore_s3_key='x';
SET columnstore_s3_secret='x';
CALL columnstore_info.load_from_s3('s3://allens-spark-test-bucket/','junk_data.csv','db1', 'tab1', ',', '"', '');
 
select count(*) from  db1.tab1;

Notice the message
Success = false
and inserted =* 0 *

+----------------------------------------------------------------------------------------------------+
| columnstore_dataload(bucket, filename, dbname, table_name, terminated_by, enclosed_by, escaped_by) |
+----------------------------------------------------------------------------------------------------+
| {"success": false, "inserted": 0, "processed": 0}                                                  |
+----------------------------------------------------------------------------------------------------+
1 row in set (3.100 sec)
 
Query OK, 0 rows affected (3.100 sec)

But in reality the records were indeed created



 Comments   
Comment by Leonid Fedorov [ 2023-06-19 ]

It absolutely will be merged into develop-23-02, after QA confirmed the fix is working. Current CMAPI state is the same in both branches, so no need for extra testing on stable branch, if this works on develop

Comment by Daniel Lee (Inactive) [ 2023-06-19 ]

Build verified: develop branch

engine: d2d0e08690b280019f6afd112a5183dd8a140595
server: b704fd8f074710ec574ef3c40a6ba4ad72ba03e7
buildNo: 8021

Reproduced the hanging issue in 23.02.3.

Verified the fix in the build mentioned above.

MariaDB [mytest]> CALL columnstore_info.load_from_s3('s3://allens-spark-test-bucket/','junk_data.csv','db1', 'tab1', ',', '', '');
----------------------------------------------------------------------------------------------------

columnstore_dataload(bucket, filename, dbname, table_name, terminated_by, enclosed_by, escaped_by)

----------------------------------------------------------------------------------------------------

{"success": true, "inserted": "5", "processed": "6"}

----------------------------------------------------------------------------------------------------
1 row in set (2.206 sec)

Query OK, 0 rows affected (2.209 sec)

NOTES

When trying to reproduce the issue in 23.02.3, I also did some tests with DBT3 orders and lineitem dataset with "|" field terminator. It did not have the reported issue.

Comment by Daniel Lee (Inactive) [ 2023-06-22 ]

Build verified: develop-23.02 branch

engine: 5342234f1500b307a6a7d0e5996bb7cf1a9900d4
server: 5a40ae2789db79b718cee3f53a411a9bd6f8309d
buildNo: 8030

MariaDB [(none)]>
MariaDB [(none)]> CREATE TABLE db1.tab1 (
-> id INT,
-> color varchar(50),
-> quantity INT
-> ) ENGINE=ColumnStore;
Query OK, 0 rows affected (0.143 sec)

MariaDB [(none)]> SET columnstore_s3_region='us-west-2';
Query OK, 0 rows affected (0.000 sec)

MariaDB [(none)]> SET columnstore_s3_key='x';
Query OK, 0 rows affected (0.000 sec)

MariaDB [(none)]> SET columnstore_s3_secret='x';
Query OK, 0 rows affected (0.000 sec)

MariaDB [(none)]> CALL columnstore_info.load_from_s3('s3://allens-spark-test-bucket/','junk_data.csv','db1', 'tab1', ',', '"', '');
----------------------------------------------------------------------------------------------------

columnstore_dataload(bucket, filename, dbname, table_name, terminated_by, enclosed_by, escaped_by)

----------------------------------------------------------------------------------------------------

{"success": true, "inserted": "5", "processed": "6"}

----------------------------------------------------------------------------------------------------
1 row in set (2.234 sec)

Query OK, 0 rows affected (2.234 sec)

MariaDB [(none)]> select count from db1.tab1;
----------

count

----------

5

----------
1 row in set (0.050 sec)

Comment by Leonid Fedorov [ 2023-06-22 ]

That's how cpimport informs, you can check it manually. There is extra empty line here, it's skipped.

Generated at Thu Feb 08 02:58:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.