[MDEV-17902] CONNECT engine JDBC and ODBC very unclear documentation on Memory option - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.3.11
Fix Version/s: N/A
Component/s: Documentation, Storage Engine - Connect
Labels:
None
Environment:
centos7

Description

The documentation on Connect Engine ODBC - which Connect Engine JDBC references - is very unclear with regards to the Memory option. Suggest a clean rewrite of that doc section.

Memory=0 what does it mean? can it be used with order by? can it be used with random access (columns specified as indexed)?

Memory=1 same questions

Memory=2 same questions

similar to but not directly related, a new section clearly indicating how random access is enabled

if column is indexed - what will happen?
if column is not indexed - what will happen?
how do these interact with the scrollable option?

Thank you.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Connect-1.6.8.docx
424 kB
2018-12-07 21:42

Activity

Ascending order - Click to sort in descending order

Olivier Bertrand added a comment - 2018-12-07 21:51

Sorry. I don't know why, but the online documentation has shifted numbers from the original document. You can find it attach here. Connect-1.6.8.docx It should say:

0 - No memory used.

1 - Memory size required is calculated during the first sequential table read. The allocated memory is filled during the second sequential read. Then the table rows are retrieved from memory.

2 - A first query is executed to get the result set size and the needed memory is allocated. It is filled on the first sequential reading. Then random access of the table is possible.

Let me know whether it is still unclear.

Olivier Bertrand added a comment - 2018-12-07 21:51 Sorry. I don't know why, but the online documentation has shifted numbers from the original document. You can find it attach here. Connect-1.6.8.docx It should say: 0 - No memory used. 1 - Memory size required is calculated during the first sequential table read. The allocated memory is filled during the second sequential read. Then the table rows are retrieved from memory. 2 - A first query is executed to get the result set size and the needed memory is allocated. It is filled on the first sequential reading. Then random access of the table is possible. Let me know whether it is still unclear.

Robert Dyas added a comment - 2018-12-07 22:07

What does Memory=3 do? I use it and it doesn't generate an error? is it deprecated?
Also, it appears that with some tables ORDER BY works with tables defined with Memory=0 but with other tables ORDER BY doesn't work but does with Memory=3. It appears to be driver dependent.

Robert Dyas added a comment - 2018-12-07 22:07 What does Memory=3 do? I use it and it doesn't generate an error? is it deprecated? Also, it appears that with some tables ORDER BY works with tables defined with Memory=0 but with other tables ORDER BY doesn't work but does with Memory=3. It appears to be driver dependent.

Olivier Bertrand added a comment - 2018-12-07 22:13

Memory=3 does not exists. Indeed it should be flagged as an error. About ORDER BY see MDEV-17461.

Olivier Bertrand added a comment - 2018-12-07 22:13 Memory=3 does not exists. Indeed it should be flagged as an error. About ORDER BY see MDEV-17461 .

Robert Dyas added a comment - 2018-12-11 20:59

Hi Olivier,

Yes still unclear.
Which memory options should be used in which scenarios?
For example, it looks like Memory=2 allows ORDER BY of the results, but Memory=0 does not. However, if the results set is huge Memory=2 can crash the server.

Your existing documentation is more on how it works internally... what would be better is documentation on how a user would choose which option is best.

IF the following is correct, maybe something like
Memory=0 Best for potentially large results sets or servers with limited memory. Results cannot use ORDER BY (and any other limitations).
Memroy=1 (I have no idea what the use case is)
Memory=2 Best for smaller results sets or servers with plenty of memory. Results can use ORDER BY.

I'm sure there is more to it than that, but that is the type of info I believe most people will benefit from.

Also a table listing the same type of info for when to INDEX a column and when not to would be helpful. Same for scrollable=YES/NO.

Thank you!

Robert Dyas added a comment - 2018-12-11 20:59 Hi Olivier, Yes still unclear. Which memory options should be used in which scenarios? For example, it looks like Memory=2 allows ORDER BY of the results, but Memory=0 does not. However, if the results set is huge Memory=2 can crash the server. Your existing documentation is more on how it works internally... what would be better is documentation on how a user would choose which option is best. IF the following is correct, maybe something like Memory=0 Best for potentially large results sets or servers with limited memory. Results cannot use ORDER BY (and any other limitations). Memroy=1 (I have no idea what the use case is) Memory=2 Best for smaller results sets or servers with plenty of memory. Results can use ORDER BY. I'm sure there is more to it than that, but that is the type of info I believe most people will benefit from. Also a table listing the same type of info for when to INDEX a column and when not to would be helpful. Same for scrollable=YES/NO. Thank you!

Olivier Bertrand added a comment - 2018-12-15 16:07

Here is how the corresponding chapter could be re-written:

Random Access of ODBC Tables

Note: In CONNECT version 1.03 ODBC tables are not indexable. Version 1.04 adds remote indexing facility to the ODBC table type.

However, some queries require random access to an ODBC table; for instance, when it is joined to another table or used in an order by queries applied to a long column or large tables.

There are several ways to enable random (position) access to a CONNECT ODBC table. They are depending on eventual indexing or from the following table options:

Option	Type	Used For
Block_Size	Integer	Specifying the rowset size.
Memory*	Integer	Storing the result set in memory.
Scrollable*	Boolean	Using a scrollable cursor.

: To be specified in the option_list.

Using memory means that the data retrieved from the remote table will be locally memory stored. It is not always possible, for instance with large tables, but is the faster way to deal with random access.

When dealing with small tables, the simpler way to enable random access is to specify a rowset size equal or larger than the table size (or the result set size if a push down where clause is used). This means that the whole result is in memory on the first FETCH and CONNECT will use it for further positional accesses.

Another way to have the result set in memory is to use the MEMORY option. This option can be set to the following values:

0 - No memory used (the default). Best when the table is read sequentially as in SELECT statements with only eventual WHERE clauses.
1 - Memory size required is calculated during the first sequential table read. The allocated memory is filled during the second sequential read. Then the table rows are retrieved from the memory. This to be used when the table will be accessed many times randomly such as in sub-select or being the target table of a join.
2 - A first query is executed to get the result set size and the needed memory is allocated. It is filled on the first sequential reading. Then random access of the table is possible. This can be used in case of ORDER BY clauses when MariaDB uses position reading.

Note that the best way to handle ORDER BY is to set the max_length_for_sort_data variable to a larger value (its default value is 1024 that is pretty small). Indeed, it requires less memory to be used, particularly when a WHERE clause limits the retrieved data set. This is because in the case of an ORDER BY query, MariaDB firstly retrieves sequentially the result set and the position of each records. Often the sort can be done from that result set if it is not too big. But if it is too big, or if it implies some “long” columns, only the positions are sorted and MariaDB retrieves the final result from the table read in random order. If setting the max_length_for_sort_data variable is not feasible or does not work, to be able to retrieve table data from memory after the first sequential read, the MEMORY option must be set to 2.

For tables too large to be stored in memory another possibility is to make your table to use a scrollable cursor. In this case each randomly accessed row can be retrieved from the data source specifying its cursor position, which is reasonably fast. However, scrollable cursors are not supported by all data sources.

From CONNECT version 1.04, another way to provide random access is to specify some columns to be indexed. This should be done only when the corresponding column of the source table is also indexed. This can be used for tables too large to be stored in memory and is similar to the remote indexing used by the MYSQL table type and by the FEDERATED engine.

Then remains the possibility to sequentially extract the requested data from the external table and to construct another local table from the data source. For instance, to construct a table containing the CUSTOMER table data, create the table as:

create table Custloc as select * from customer;

Now you can use custloc for fast database operations on the copied customer table data, eventually restricted by a WHERE clause.

Olivier Bertrand added a comment - 2018-12-15 16:07 Here is how the corresponding chapter could be re-written: Random Access of ODBC Tables Note : In CONNECT version 1.03 ODBC tables are not indexable. Version 1.04 adds remote indexing facility to the ODBC table type. However, some queries require random access to an ODBC table; for instance, when it is joined to another table or used in an order by queries applied to a long column or large tables. There are several ways to enable random (position) access to a CONNECT ODBC table. They are depending on eventual indexing or from the following table options: Option Type Used For Block_Size Integer Specifying the rowset size. Memory* Integer Storing the result set in memory. Scrollable* Boolean Using a scrollable cursor. : To be specified in the option_list. Using memory means that the data retrieved from the remote table will be locally memory stored. It is not always possible, for instance with large tables, but is the faster way to deal with random access. When dealing with small tables, the simpler way to enable random access is to specify a rowset size equal or larger than the table size (or the result set size if a push down where clause is used). This means that the whole result is in memory on the first FETCH and CONNECT will use it for further positional accesses. Another way to have the result set in memory is to use the MEMORY option. This option can be set to the following values: 0 - No memory used (the default). Best when the table is read sequentially as in SELECT statements with only eventual WHERE clauses. 1 - Memory size required is calculated during the first sequential table read. The allocated memory is filled during the second sequential read. Then the table rows are retrieved from the memory. This to be used when the table will be accessed many times randomly such as in sub-select or being the target table of a join. 2 - A first query is executed to get the result set size and the needed memory is allocated. It is filled on the first sequential reading. Then random access of the table is possible. This can be used in case of ORDER BY clauses when MariaDB uses position reading. Note that the best way to handle ORDER BY is to set the max_length_for_sort_data variable to a larger value (its default value is 1024 that is pretty small). Indeed, it requires less memory to be used, particularly when a WHERE clause limits the retrieved data set. This is because in the case of an ORDER BY query, MariaDB firstly retrieves sequentially the result set and the position of each records. Often the sort can be done from that result set if it is not too big. But if it is too big, or if it implies some “long” columns, only the positions are sorted and MariaDB retrieves the final result from the table read in random order. If setting the max_length_for_sort_data variable is not feasible or does not work, to be able to retrieve table data from memory after the first sequential read, the MEMORY option must be set to 2. For tables too large to be stored in memory another possibility is to make your table to use a scrollable cursor. In this case each randomly accessed row can be retrieved from the data source specifying its cursor position, which is reasonably fast. However, scrollable cursors are not supported by all data sources. From CONNECT version 1.04, another way to provide random access is to specify some columns to be indexed. This should be done only when the corresponding column of the source table is also indexed. This can be used for tables too large to be stored in memory and is similar to the remote indexing used by the MYSQL table type and by the FEDERATED engine. Then remains the possibility to sequentially extract the requested data from the external table and to construct another local table from the data source. For instance, to construct a table containing the CUSTOMER table data, create the table as: create table Custloc as select * from customer; Now you can use custloc for fast database operations on the copied customer table data, eventually restricted by a WHERE clause.

markus makela added a comment - 2022-10-03 16:37

The latest comment seems identical to what is in the KB right now. Should this be closed as the documentation seems complete?

markus makela added a comment - 2022-10-03 16:37 The latest comment seems identical to what is in the KB right now . Should this be closed as the documentation seems complete?

People

Assignee:: Ian Gilfillan

Reporter:: Robert Dyas

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2018-12-04 19:25

Updated:: 2022-10-05 11:51

Resolved:: 2022-10-05 11:51

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Activity

Random Access of ODBC Tables

People

Dates

Git Integration