[MDEV-6232] incorrect row estimates in Connect Created: 2014-05-12  Updated: 2014-05-22  Resolved: 2014-05-22

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.11
Fix Version/s: 10.0.12

Type: Bug Priority: Major
Reporter: Sergei Golubchik Assignee: Olivier Bertrand
Resolution: Won't Fix Votes: 0
Labels: connect-engine


 Description   

Copy the following test case into a file under storage/connect/mysql-test/connect/t:

CREATE TABLE t1 (c INT PRIMARY KEY) ENGINE=CONNECT TABLE_TYPE=CSV;
INSERT INTO t1 VALUES (1),(2);
--query_vertical EXPLAIN SELECT AVG(c) FROM t1
drop table t1;

Observe that the result contains

rows    4

even though the table has only two rows. This becomes important when the number of rows in the table grows large.



 Comments   
Comment by Olivier Bertrand [ 2014-05-12 ]

I don't understand this result. A functional query such as above always return 1 row.

Now the issue is about the "records" info item. When asked for the number of rows in a table, CONNECT can give an exact number for fix record length tables by dividing the file size by the lrecl. However, for variable record size tables such as DOS, CSV or FMT, CONNECT returns an estimate of the maximum number of rows obtained by dividing the file size by the record minimum size. This is done to give this information quickly without reading the whole file.

This can be reconsider if there is a need to have an exact number for some applications. The discussion is open.

Comment by Sergei Golubchik [ 2014-05-22 ]

For CSV tables with the variable row length, the Connect engine has no a priori way of knowing the exact number of rows in the table.

Generated at Thu Feb 08 07:10:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.