Skip to content

Timeout errors #389

Closed
Closed
@andrielfn

Description

@andrielfn

Hi all,

It all started in the Heroku review-envs when we upgraded our application from (Erlang 21.3.7 and Elixir 1.8.2) to (Erlang 22.0.7 and Elixir 1.9.1).
We noticed, for some reason, that downgrading Erlang back to 21.3.7 solved the timeouts.

In this first moment, messages were not being consumed/produced and thousands of timeout errors were flooding the logs. Then we bumped to kafka_ex 0.10, which seemed to have improved a bit (fewer timeouts in the logs and messages being consumed).

However, in the last week, we started to get the same timeouts in production but starting with a lower frequency (~10 timeouts/day).

Some extra information:

  • In the Heroku review-envs we use Basic Kafka instances (Multi-Tenant);
  • In production, we use a Private Standard Kafka instance;
  • We currently have 28 topics created;

I also tried the kafka_ex on the master branch and with the new kayrock API, but seems that it didn't help. Lots of timeouts keep flooding the logs.

The error on 0.10:

2019-11-22T22:25:24.679435+00:00 app[web.1]: 22:25:24.672 [error] GenServer #PID<0.2066.0> terminating
2019-11-22T22:25:24.679445+00:00 app[web.1]: ** (MatchError) no match of right hand side value: :timeout
2019-11-22T22:25:24.679448+00:00 app[web.1]: (kafka_ex) lib/kafka_ex/gen_consumer.ex:678: KafkaEx.GenConsumer.consume/1
2019-11-22T22:25:24.679450+00:00 app[web.1]: (kafka_ex) lib/kafka_ex/gen_consumer.ex:630: KafkaEx.GenConsumer.handle_info/2
2019-11-22T22:25:24.679452+00:00 app[web.1]: (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
2019-11-22T22:25:24.679454+00:00 app[web.1]: (stdlib) gen_server.erl:711: :gen_server.handle_msg/6
2019-11-22T22:25:24.679458+00:00 app[web.1]: (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
2019-11-22T22:25:24.679461+00:00 app[web.1]: Last message: :timeout
2019-11-22T22:25:24.679465+00:00 app[web.1]: State: %KafkaEx.GenConsumer.State{acked_offset: 0, auto_offset_reset: :earliest, commit_interval: 5000, commit_threshold: 500, committed_offset: 0, consumer_module: FOO.Consumer, consumer_state: nil, current_offset: 0, fetch_options: [auto_commit: false, worker_name: #PID<0.2289.0>], group: "foo.xxx.yyy", last_commit: -576460703586, partition: 0, topic: "bar.baz.bla", worker_name: #PID<0.2289.0>}

And the error now with master:

2019-11-28T19:30:08.763137+00:00 app[web.1]: 19:30:08.762 [debug] Successfully connected to broker "xxxxxxxxx.amazonaws.com":9096
2019-11-28T19:30:08.816252+00:00 app[web.1]: 19:30:08.815 [debug] Successfully connected to broker "xxxxxxxxx.amazonaws.com":9096
2019-11-28T19:30:08.863527+00:00 app[web.1]: 19:30:08.863 [error] Receiving data from broker "xxxxxxxxx.amazonaws.com":9096 failed with :timeout
2019-11-28T19:30:08.869546+00:00 app[web.1]: 19:30:08.869 [debug] Successfully connected to broker "xxxxxxxxx.amazonaws.com":9096
2019-11-28T19:30:08.879048+00:00 app[web.1]: 19:30:08.878 [error] GenServer #PID<0.9105.93> terminating
2019-11-28T19:30:08.879052+00:00 app[web.1]: ** (stop) {:error, :timeout}
2019-11-28T19:30:08.879053+00:00 app[web.1]: Last message: :timeout
2019-11-28T19:30:08.879063+00:00 app[web.1]: State: %KafkaEx.GenConsumer.State{acked_offset: 0, api_versions: %{fetch: 0, offset_commit: 0, offset_fetch: 0}, auto_offset_reset: :earliest, commit_interval: 5000, commit_threshold: 500, committed_offset: 0, consumer_module: FOO.Consumer, consumer_state: nil, current_offset: 0, fetch_options: [auto_commit: false, worker_name: #PID<0.19074.91>], generation_id: 136, group: "foo-51686.foo_service", last_commit: -576424651131, member_id: "kafka_ex-fe6cdc27-cf12-4c45-9beb-76fcc2a1cdf8", partition: 0, topic: "foo.zzz.yyy", worker_name: #PID<0.19074.91>}
2019-11-28T19:30:08.926883+00:00 app[web.1]: 19:30:08.926 [debug] Successfully connected to broker "xxxxxxxxx.amazonaws.com":9096
2019-11-28T19:30:08.971555+00:00 app[web.1]: 19:30:08.971 [debug] Successfully connected to broker "xxxxxxxxx.amazonaws.com":9096

The Successfully connected... and Receiving data from broker... messages are also logged indefinitely, many per second.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions