[MDEV-6024] CONNECT PIVOT should use SELECT DISTINCT instead of SELECT * Created: 2014-04-04 Updated: 2014-04-08 Resolved: 2014-04-08 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | None |
| Fix Version/s: | 10.0.11 |
| Type: | Task | Priority: | Minor |
| Reporter: | Kolbe Kegel (Inactive) | Assignee: | Olivier Bertrand |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | connect-engine | ||
| Description |
|
When creating a CONNECT PIVOT table, the engine executes SELECT * on the target table. This seems really inefficient to me. In an ideal situation, there will be some usable index on the pivot column, so SELECT DISTINCT pivotCol will be much faster. To find the names of columns, a separate query to find table structure could be executed. I can't think of why the engine would need to do SELECT * when creating the table. Is there some explanation I'm missing? |
| Comments |
| Comment by Olivier Bertrand [ 2014-04-07 ] | ||||||||||||||||||||||||||||||||||||
|
Well, the reason is that this table reading may have two purposes: 1 - Find the default pivot and/or fact columns when they have not been specified in the create table statement. It is true that this is not optimized when pivoting very big tables. As a matter of facts, all tests made so far where made on relatively small tables to have results of reasonable size. However, you're right, this should be optimized in case of big source tables. I shall study what can be done, for instance: a) Not doing the 1 above if the pivot and fact columns are specified. Thanks for your suggestion. | ||||||||||||||||||||||||||||||||||||
| Comment by Kolbe Kegel (Inactive) [ 2014-04-07 ] | ||||||||||||||||||||||||||||||||||||
|
Olivier, it would also be possible to use DESC or a query against information_schema.tables to get the list of columns and their qualities (data types, nullability, et cetera). That could help you reject a column from automatic pivot selection if it is a BLOB or if it is nullable or something, if you ever in the future want to use some logic like that. | ||||||||||||||||||||||||||||||||||||
| Comment by Olivier Bertrand [ 2014-04-08 ] | ||||||||||||||||||||||||||||||||||||
|
Kolbe, many things can be done. However, don't forget that a Pivot table can be made on a Srcdef query that is not in any information schema table. Meanwhile, I have implemented and tested what I described in my last comment. Here is what can be told about it: a) Nothing can be done about pivot tables based on a Srcdef query. The current way to process them must be kept. The documentation should warn users that this is to avoid when referring to big tables. For instance I made tests on a 10 million rows table, diag described as:
Because the Recordnb and DiagPresent columns should not be in the pivot table I created a view: create view vdiag as select RLN, diagnb, diag from diag; And created the pivot table by: create table diag_pivot engine=connect table_type=pivot tabname=vdiag option_list='PivotCol=diagnb,FncCol=diag'; The CREATE TABLE was very fast (less than one second) but using the table by: select * from diag_pivot limit 50; took almost two hours, even I made on diag an index on (RLN,diagnb). This is because MySQL does nor use indexes when working on views (at least not in this case) Next step: create table diag_pivot2 engine=connect table_type=pivot tabname=diag option_list='PivotCol=diagnb,FncCol=diag'; Here I used the table than dropped the unwanted columns. The creation was as fast and the use faster (20 minutes instead of almost 2 hours) | ||||||||||||||||||||||||||||||||||||
| Comment by Olivier Bertrand [ 2014-04-08 ] | ||||||||||||||||||||||||||||||||||||
|
In addition to what has been said in my last comments, I added a new option to the Pivot table allowing to make discovery skip unwanted columns from a source table. Doing so the last create example is: create table diag_pivot2 engine=connect table_type=pivot tabname=diag option_list='PivotCol=diagnb,FncCol=diag,SkipCol=recordnb;diagpresent'; Doing so there is no more need to create a view or to have to drop columns with an alter table. Note that because of the syntax of OPTION_LIST, the columns to skip must be separated by semi-colons. |