[MCOL-5510] columnstore_info.load_from_s3 | Connection forever hangs if cpimport fails Created: 2023-06-07  Updated: 2023-06-22  Resolved: 2023-06-22

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: None
Fix Version/s: 23.02.4

Type: Bug Priority: Major
Reporter: Allen Herrera Assignee: Leonid Fedorov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MCOL-5013 Support Load data from AWS S3 : UDF... Closed
Assigned for Testing: Daniel Lee Daniel Lee (Inactive)

 Description   

Reproduction

 
CREATE DATABASE db1;
 
CREATE TABLE db1.tab1 (
   id INT,
   color varchar(50),
   quantity INT
) ENGINE=ColumnStore;
 
# See developer comments for key/secret
SET columnstore_s3_region='us-west-2';
SET columnstore_s3_key='x';
SET columnstore_s3_secret='x';
CALL columnstore_info.load_from_s3('s3://allens-spark-test-bucket/','wrong_junk_data.csv','db1', 'tab1', ',', '', '');

Notice in your docker container when you tail /var/log/messages

Jun  7 22:43:48 mcs1 writeengine[2098]: 48.894326 |0|0|0| I 19 CAL0075: ClearTableLock: Rollback dbfile    for table db1.tab1 (OID-3003), column 3008. HWM compressed column file: dbRoot-1; part#-0; seg#-0; rawFreeBlks-30 (abbrev); restoredChunk-24576 bytes; truncated to 32768 bytes
Jun  7 22:43:48 mcs1 writeengine[2098]: 48.899023 |0|0|0| I 19 CAL0085: ClearTableLock: Ending bulk rollback for table db1.tab1 (OID-3003); lock-19; initiated by cpimport.bin.
Jun  7 22:43:48 mcs1 cpimport.bin[2098]: 48.901176 |0|0|0| I 34 CAL0082: End BulkLoad: JobId-3003; status-FAILED

This is extra bad for SkySQL because users cant see these logs and their connection just hangs there forever until a connection timeout is hit



 Comments   
Comment by Leonid Fedorov [ 2023-06-19 ]

It absolutely will be merged into develop-23-02, after QA confirmed the fix is working. Current CMAPI state is the same in both branches, so no need for extra testing on stable branch, if this works on develop

Comment by Daniel Lee (Inactive) [ 2023-06-19 ]

Build verified: develop branch

engine: d2d0e08690b280019f6afd112a5183dd8a140595
server: b704fd8f074710ec574ef3c40a6ba4ad72ba03e7
buildNo: 8021

Reproduced the hanging issue in 23.02.3.

Verified the fix in the build mentioned above.

MariaDB [mytest]> CALL columnstore_info.load_from_s3('s3://allens-spark-test-bucket/','wrong_junk_data.csv','db1', 'tab1', ',', '', '');
-----------------------------------------------------------------------------------------------------------------------------------------------

columnstore_dataload(bucket, filename, dbname, table_name, terminated_by, enclosed_by, escaped_by)

-----------------------------------------------------------------------------------------------------------------------------------------------

{"error": "2023-06-19 19:12:27 (13398) ERR : Actual error row count(11) exceeds the max error rows(10) allowed for table db1.tab1 [1451]\n"}

-----------------------------------------------------------------------------------------------------------------------------------------------
1 row in set (2.207 sec)

NOTES

When trying to reproduce the issue in 23.02.3, I also did some tests with DBT3 orders and lineitem dataset with "|" field terminator. It did not have the reported issue.

Comment by Daniel Lee (Inactive) [ 2023-06-22 ]

engine: 5342234f1500b307a6a7d0e5996bb7cf1a9900d4
server: 5a40ae2789db79b718cee3f53a411a9bd6f8309d
buildNo: 8030

MariaDB [(none)]> CREATE TABLE db1.tab1 (
-> id INT,
-> color varchar(50),
-> quantity INT
-> ) ENGINE=ColumnStore;
Query OK, 0 rows affected (0.143 sec)

MariaDB [(none)]> SET columnstore_s3_region='us-west-2';
Query OK, 0 rows affected (0.000 sec)

MariaDB [(none)]> SET columnstore_s3_key='x';
Query OK, 0 rows affected (0.000 sec)

MariaDB [(none)]> SET columnstore_s3_secret='x';
Query OK, 0 rows affected (0.000 sec)

MariaDB [(none)]> CALL columnstore_info.load_from_s3('s3://allens-spark-test-bucket/','wrong_junk_data.csv','db1', 'tab1', ',', '"', '');
-----------------------------------------------------------------------------------------------------------------------------------------------

columnstore_dataload(bucket, filename, dbname, table_name, terminated_by, enclosed_by, escaped_by)

-----------------------------------------------------------------------------------------------------------------------------------------------

{"error": "2023-06-22 13:47:25 (13149) ERR : Actual error row count(11) exceeds the max error rows(10) allowed for table db1.tab1 [1451]\n"}

-----------------------------------------------------------------------------------------------------------------------------------------------
1 row in set (2.207 sec)

Query OK, 0 rows affected (2.207 sec)

Generated at Thu Feb 08 02:58:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.