[MXS-4151] Schemarouter duplicate checks are excessively slow - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 2.5.20, 6.3.1
Fix Version/s: 2.5.21, 6.4.0
Component/s: schemarouter
Labels:
None

Description

If the duplicate checks are enabled, the cost of performing the duplicate check grows very fast as the number of tables increases. With around 50000 tables and with the default duplicate checks, it takes on average 25 seconds to do the duplicate checks. With ignore_tables_regex=.* the time drops to around 500 milliseconds of which a large part is network latency.

The reason why it is so slow is that for each visible table, a lookup into the table location is done while the result is being iterated. As the location lookup processes all tables (a somewhat dumb approach), it ends up iterating the table once per row resulting in roughly quadratic complexity. By first inserting all the elements into the resulting container, the duplicate check can be done later in a single pass over the whole container. This results in linear complexity which works out far better.

Attachments

Activity

People

Assignee:: markus makela

Reporter:: markus makela

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2022-06-02 08:53

Updated:: 2024-07-07 16:28

Resolved:: 2022-06-03 06:59

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.