Details

    • Task
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.2.2
    • OTHER
    • None
    • 10.2.2-1, 10.2.2-1, 10.2.2-2, 10.2.2-3, 10.2.2-4

    Description

      Improve scalability by implementing multi-instance table cache.

      Attachments

        Issue Links

          Activity

            serg, please review patch for this task.

            svoj Sergey Vojtovich added a comment - serg , please review patch for this task.

            Waiting for feedback.

            svoj Sergey Vojtovich added a comment - Waiting for feedback.

            serg, please review 2 last patches for this task.

            I'm still not happy with autosizing:

            • numbers used for autosizing are valid for my host, not sure if they'll work properly for others
            • we do trylock() and then lock() rather often: up to 30% of cases (performance concern)
            • additional code on rather a hot path (performance concern)
            • I couldn't get perfect 3 instances for my host with autosizing: it either gets 2 or raising number of instances up to limit (everything under 480 for waits)
            • we can't avoid warm-up (bad for benchmarks)
            svoj Sergey Vojtovich added a comment - serg , please review 2 last patches for this task. I'm still not happy with autosizing: numbers used for autosizing are valid for my host, not sure if they'll work properly for others we do trylock() and then lock() rather often: up to 30% of cases (performance concern) additional code on rather a hot path (performance concern) I couldn't get perfect 3 instances for my host with autosizing: it either gets 2 or raising number of instances up to limit (everything under 480 for waits) we can't avoid warm-up (bad for benchmarks)

            we do trylock() and then lock() rather often: up to 30% of cases (performance concern)

            You increase the number of instances when lock/trylock ratio reaches 50%. May be you should do it earlier? At 30%, may be?

            additional code on rather a hot path (performance concern)

            That should normally be just ++mutex_nowaits, shouldn't it?

            I couldn't get perfect 3 instances for my host with autosizing: it either gets 2 or raising number of instances up to limit (everything under 480 for waits)

            Interesting. Why would you think is that? What did you do in your benchmarks? You've never had only 1 instance?

            we can't avoid warm-up (bad for benchmarks)

            True. How long a warm-up is needed, what was your impression?
            Anyway, any proper benchmark does a warm-up anyway, so it this your warm-up with shorter than what benchmarks typically do, it should be fine.

            serg Sergei Golubchik added a comment - we do trylock() and then lock() rather often: up to 30% of cases (performance concern) You increase the number of instances when lock/trylock ratio reaches 50%. May be you should do it earlier? At 30%, may be? additional code on rather a hot path (performance concern) That should normally be just ++mutex_nowaits , shouldn't it? I couldn't get perfect 3 instances for my host with autosizing: it either gets 2 or raising number of instances up to limit (everything under 480 for waits) Interesting. Why would you think is that? What did you do in your benchmarks? You've never had only 1 instance? we can't avoid warm-up (bad for benchmarks) True. How long a warm-up is needed, what was your impression? Anyway, any proper benchmark does a warm-up anyway, so it this your warm-up with shorter than what benchmarks typically do, it should be fine.

            Yes, number of instances is increased at 50%. As I mentioned, if I increase it at 48%, number of instances is quickly raising up to the limit.

            It's a bit more than ++mutex_nowaits, but close enough.

            It was multi-table OLTP RO benchmark with 40 threads. I had 1 instance initially.

            With current numbers warm-up up to 2 instances takes under 5 seconds. With lower numbers it was raising up to the limit in under 1 minute.

            There's another option, but it's a bit more expensive: count number of waiting threads and activate instances when there're e.g. 10 waiters. This will add 2 atomic adds per lock.

            svoj Sergey Vojtovich added a comment - Yes, number of instances is increased at 50%. As I mentioned, if I increase it at 48%, number of instances is quickly raising up to the limit. It's a bit more than ++mutex_nowaits, but close enough. It was multi-table OLTP RO benchmark with 40 threads. I had 1 instance initially. With current numbers warm-up up to 2 instances takes under 5 seconds. With lower numbers it was raising up to the limit in under 1 minute. There's another option, but it's a bit more expensive: count number of waiting threads and activate instances when there're e.g. 10 waiters. This will add 2 atomic adds per lock.

            Changed assignee while waiting for feedback.

            svoj Sergey Vojtovich added a comment - Changed assignee while waiting for feedback.

            ok to push (cd1b39b and whatever you have in the same branch)

            serg Sergei Golubchik added a comment - ok to push (cd1b39b and whatever you have in the same branch)

            Final autosizing implementation:

            Instance is considered contested if more than 20% of mutex acquisiotions
            can't be served immediately. Up to 100 000 probes may be performed to avoid
            instance activation on short sporadic peaks. 100 000 is estimated maximum
            number of queries one instance can serve in one second.
             
            These numbers work well on a 2 socket / 20 core / 40 threads Intel Broadwell
            system, that is expected number of instances is activated within reasonable
            warmup time. It may have to be adjusted for other systems.
             
            Only TABLE object acquistion is instrumented. We intentionally avoid this
            overhead on TABLE object release. All other table cache mutex acquistions
            are considered out of hot path and are not instrumented either.
            

            svoj Sergey Vojtovich added a comment - Final autosizing implementation: Instance is considered contested if more than 20% of mutex acquisiotions can't be served immediately. Up to 100 000 probes may be performed to avoid instance activation on short sporadic peaks. 100 000 is estimated maximum number of queries one instance can serve in one second.   These numbers work well on a 2 socket / 20 core / 40 threads Intel Broadwell system, that is expected number of instances is activated within reasonable warmup time. It may have to be adjusted for other systems.   Only TABLE object acquistion is instrumented. We intentionally avoid this overhead on TABLE object release. All other table cache mutex acquistions are considered out of hot path and are not instrumented either.

            People

              svoj Sergey Vojtovich
              svoj Sergey Vojtovich
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.