[MCOL-4771] Rand() Killing PrimProc Under Certain Circumstances Created: 2021-06-22  Updated: 2021-12-10  Resolved: 2021-08-10

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 5.5.2
Fix Version/s: 6.2.1, 6.2.2

Type: Task Priority: Major
Reporter: Todd Stoffel (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by MCOL-4487 ORDER BY RAND() ERROR 1815 (HY000): I... Closed
Sprint: 2021-9, 2021-10

 Description   
  • It occurs on some tables, but no others (flights, but not airlines)
  • Sometimes it blows up on limit1, sometimes on limit 2, sometimes on 10, sometimes on 10000
  • Errors vary. In case of dockers the cluster is hosed and requires recycling (on prem systemd repairs it).
  • Verified in 5.5.2 and 5.6.1

To reproduce:

Starting with the flights sample data (bts):
https://github.com/mariadb-corporation/mariadb-columnstore-samples

MariaDB [bts]> create table test as select *, md5(rand(1)) from flights;
ERROR 1815 (HY000): Internal error: st: 0 TupleBPS::receiveMultiPrimitiveMessages() IDB-2035: An internal error occurred.  Check the error log file & contact support
 
MariaDB [gjd]> use bts
Database changed
MariaDB [bts]> select rand() from flights limit 1;
+---------------------+
| rand()              |
+---------------------+
| 0.06580742547829396 |
+---------------------+
1 row in set (0.118 sec)
 
MariaDB [bts]> select rand() from flights limit 2;
ERROR 1815 (HY000): Internal error: TupleBPS::run() caught DistributedEngineComm::write: Broken Pipe error
MariaDB [bts]>

MariaDB [bts]> create table test as select *, md5(rand(1)) from flights;
ERROR 1815 (HY000): Internal error: IDB-2045: At least one PrimProc closed the connection unexpectedly.



 Comments   
Comment by David Hall (Inactive) [ 2021-08-06 ]

The code inserted to handle multiple rand() calls is unneeded and was not thread safe, causing the crash. The code added to make certain function objects use fDynamicFunctor handles the multiple rand() problem.

Comment by Daniel Lee (Inactive) [ 2021-08-10 ]

Build verified: 6.2.1 (#2947)

Reproduced the issue in 5.6.1 using 1gb dbt3 database. The same statement may fail at different times.

MariaDB [mytest]> select rand() from orders limit 1;
ERROR 1815 (HY000): Internal error: IDB-2045: At least one PrimProc closed the connection unexpectedly.
MariaDB [mytest]> create table test as select *, md5(rand(1)) from orders;
ERROR 1815 (HY000): Internal error: IDB-2045: At least one PrimProc closed the connection unexpectedly.
MariaDB [mytest]> select rand() from orders limit 2;
+---------------------+
| rand()              |
+---------------------+
| 0.13503894408702752 |
| 0.11773851617960121 |
+---------------------+
2 rows in set (0.128 sec)
MariaDB [mytest]> select rand() from orders limit 2;
ERROR 1815 (HY000): Internal error: IDB-2045: At least one PrimProc closed the connection unexpectedly.

Verified the fix in 6.2.1. The issue is no longer occurring.

Generated at Thu Feb 08 02:52:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.