Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-ost create too many sessions when server response slowly #859

Closed
sephiroth17lh opened this issue Jul 9, 2020 · 5 comments
Closed

gh-ost create too many sessions when server response slowly #859

sephiroth17lh opened this issue Jul 9, 2020 · 5 comments

Comments

@sephiroth17lh
Copy link

Summary

When I run lots of gh-ost instances(100) on a server(6c12t, 32GB), I found that the server was run out of resources(max-open-files). And then, I checked the sessions in mysqld and I found

mysql> select * from information_schema.processlist where state='statistics' and info like '%_ghc%' limit 1;
+-------+------+-----------------+---------+---------+------+------------+-------------------------------------------------------------------------------------+---------+-----------+---------------+
| ID    | USER | HOST            | DB      | COMMAND | TIME | STATE      | INFO                                                                                | TIME_MS | ROWS_SENT | ROWS_EXAMINED |
+-------+------+-----------------+---------+---------+------+------------+-------------------------------------------------------------------------------------+---------+-----------+---------------+
| 28392 | root | localhost:50304 | ddltest | Query   |   98 | statistics | select hint, value from `ddltest`.`_t87_ghc` where hint = 'heartbeat' and id <= 255 |   97262 |         0 |             0 |
+-------+------+-----------------+---------+---------+------+------------+-------------------------------------------------------------------------------------+---------+-----------+---------------+
1 row in set (24.52 sec)

mysql> select count(*) from information_schema.processlist where state='statistics' and info like '%_ghc%';
+----------+
| count(*) |
+----------+
|     4568 |
+----------+
1 row in set (15.87 sec)

Then, I used another sql "select info, count(*) from information_schema.processlist where state='statistics' and info like '%_ghc%' group by info". I found that some gh-ost created hundreds of session.

Code view

I checked the source code and found that the function collectReplicationLagmaybe the reason

// collectReplicationLag reads the latest changelog heartbeat value
func (this *Throttler) collectReplicationLag(firstThrottlingCollected chan<- bool) {
    collectFunc := func() error {
......
            if heartbeatValue, err := this.inspector.readChangelogState("heartbeat"); err != nil {
                return log.Errore(err)
            } else {
                this.parseChangelogHeartbeat(heartbeatValue)  // #### send query contain 'select hint, value..' #####
            }
......
    }

    collectFunc()
    firstThrottlingCollected <- true

    ticker := time.Tick(time.Duration(this.migrationContext.HeartbeatIntervalMilliseconds) * time.Millisecond)
    for range ticker {
        if atomic.LoadInt64(&this.finishedMigrating) > 0 {
            return
        }
        go collectFunc()   // #### Here continuely create session #####
    }
}

The golang code of gh-ost as bellow. It use go collectFunc() continuly create new session.

possible fix

Normally, if the server response slowly, it's means that the server should not apply online-ddl. But, if the server is suddenly on heavy pressure, gh-ost should reduce the queries on server rather than continuely create session uninterruptedly

@timvaillancourt
Copy link
Collaborator

@sephiroth17lh in #931 gh-ost was updated to limit the number of connections

Can you please test with the master branch and confirm?

@sephiroth17lh
Copy link
Author

@sephiroth17lh in #931 gh-ost was updated to limit the number of connections

Can you please test with the master branch and confirm?

Sorry for checking email not in time. I will test master branch in this weekend.

@sephiroth17lh
Copy link
Author

@sephiroth17lh in #931 gh-ost was updated to limit the number of connections

Can you please test with the master branch and confirm?

Hi. I tested master branch with 150 gh-ost instances started at the same time. The number of connections created by gh-ost is under control.

However, I found another problem, some sessions are blocked in "drop /* gh-ost */ table if exists lhtest._t_xxx_del" (xxx is the ordinal number of table name). Since this test is meaningless for a real online system, maybe metadata dead lock should considered by DBA.

the sessions on table "t_51" as following:
mysql> select * from metadata_locks where OBJECT_SCHEMA='lhtest' and object_name like '%t_51%' \G
*************************** 1. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: lhtest
OBJECT_NAME: _t_51_del
COLUMN_NAME: NULL
OBJECT_INSTANCE_BEGIN: 139829239327088
LOCK_TYPE: EXCLUSIVE
LOCK_DURATION: TRANSACTION
LOCK_STATUS: PENDING
SOURCE: sql_parse.cc:6107
OWNER_THREAD_ID: 1078
OWNER_EVENT_ID: 263
*************************** 2. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: lhtest
OBJECT_NAME: _t_51_del
COLUMN_NAME: NULL
OBJECT_INSTANCE_BEGIN: 139826889113696
LOCK_TYPE: SHARED_NO_READ_WRITE
LOCK_DURATION: TRANSACTION
LOCK_STATUS: GRANTED
SOURCE: sql_parse.cc:6107
OWNER_THREAD_ID: 1050
OWNER_EVENT_ID: 41
*************************** 3. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: lhtest
OBJECT_NAME: t_51
COLUMN_NAME: NULL
OBJECT_INSTANCE_BEGIN: 139826888673968
LOCK_TYPE: SHARED_NO_READ_WRITE
LOCK_DURATION: TRANSACTION
LOCK_STATUS: GRANTED
SOURCE: sql_parse.cc:6107
OWNER_THREAD_ID: 1050
OWNER_EVENT_ID: 41
*************************** 4. row ***************************
OBJECT_TYPE: COLUMN STATISTICS
OBJECT_SCHEMA: lhtest
OBJECT_NAME: _t_51_del
COLUMN_NAME: id
OBJECT_INSTANCE_BEGIN: 139826889159136
LOCK_TYPE: SHARED_READ
LOCK_DURATION: STATEMENT
LOCK_STATUS: GRANTED
SOURCE: sql_base.cc:579
OWNER_THREAD_ID: 1050
OWNER_EVENT_ID: 41
4 rows in set (0.00 sec)

@shlomi-noach
Copy link
Contributor

I tested master branch with 150 gh-ost instances started at the same time.

We discourage multiple parallel gh-ost executions. I won't be looking into this scenario myself.

@timvaillancourt
Copy link
Collaborator

@sephiroth17lh thank you for confirming that fix!

I agree with @shlomi-noach that ghost is designed for one execution at a time

Closing this issue as solved. Feel free to start a new issue about this new finding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants