Timeout errors

Hi all,

It all started in the Heroku review-envs when we upgraded our application from (Erlang 21.3.7 and Elixir 1.8.2) to (Erlang 22.0.7 and Elixir 1.9.1). 
We noticed, for some reason, that downgrading Erlang back to 21.3.7 solved the timeouts. 

In this first moment, messages were not being consumed/produced and thousands of timeout errors were flooding the logs. Then we bumped to `kafka_ex` 0.10, which seemed to have improved a bit (fewer timeouts in the logs and messages being consumed).

However, in the last week, we started to get the same timeouts in production but starting with a lower frequency (~10 timeouts/day).

Some extra information:

- In the Heroku review-envs we use Basic Kafka instances (Multi-Tenant);
- In production, we use a Private Standard Kafka instance;
- We currently have 28 topics created;

I also tried the `kafka_ex` on the master branch and with the new `kayrock` API, but seems that it didn't help. Lots of timeouts keep flooding the logs.

The error on 0.10:

```
2019-11-22T22:25:24.679435+00:00 app[web.1]: 22:25:24.672 [error] GenServer #PID<0.2066.0> terminating
2019-11-22T22:25:24.679445+00:00 app[web.1]: ** (MatchError) no match of right hand side value: :timeout
2019-11-22T22:25:24.679448+00:00 app[web.1]: (kafka_ex) lib/kafka_ex/gen_consumer.ex:678: KafkaEx.GenConsumer.consume/1
2019-11-22T22:25:24.679450+00:00 app[web.1]: (kafka_ex) lib/kafka_ex/gen_consumer.ex:630: KafkaEx.GenConsumer.handle_info/2
2019-11-22T22:25:24.679452+00:00 app[web.1]: (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
2019-11-22T22:25:24.679454+00:00 app[web.1]: (stdlib) gen_server.erl:711: :gen_server.handle_msg/6
2019-11-22T22:25:24.679458+00:00 app[web.1]: (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
2019-11-22T22:25:24.679461+00:00 app[web.1]: Last message: :timeout
2019-11-22T22:25:24.679465+00:00 app[web.1]: State: %KafkaEx.GenConsumer.State{acked_offset: 0, auto_offset_reset: :earliest, commit_interval: 5000, commit_threshold: 500, committed_offset: 0, consumer_module: FOO.Consumer, consumer_state: nil, current_offset: 0, fetch_options: [auto_commit: false, worker_name: #PID<0.2289.0>], group: "foo.xxx.yyy", last_commit: -576460703586, partition: 0, topic: "bar.baz.bla", worker_name: #PID<0.2289.0>}
```

And the error now with master:

```
2019-11-28T19:30:08.763137+00:00 app[web.1]: 19:30:08.762 [debug] Successfully connected to broker "xxxxxxxxx.amazonaws.com":9096
2019-11-28T19:30:08.816252+00:00 app[web.1]: 19:30:08.815 [debug] Successfully connected to broker "xxxxxxxxx.amazonaws.com":9096
2019-11-28T19:30:08.863527+00:00 app[web.1]: 19:30:08.863 [error] Receiving data from broker "xxxxxxxxx.amazonaws.com":9096 failed with :timeout
2019-11-28T19:30:08.869546+00:00 app[web.1]: 19:30:08.869 [debug] Successfully connected to broker "xxxxxxxxx.amazonaws.com":9096
2019-11-28T19:30:08.879048+00:00 app[web.1]: 19:30:08.878 [error] GenServer #PID<0.9105.93> terminating
2019-11-28T19:30:08.879052+00:00 app[web.1]: ** (stop) {:error, :timeout}
2019-11-28T19:30:08.879053+00:00 app[web.1]: Last message: :timeout
2019-11-28T19:30:08.879063+00:00 app[web.1]: State: %KafkaEx.GenConsumer.State{acked_offset: 0, api_versions: %{fetch: 0, offset_commit: 0, offset_fetch: 0}, auto_offset_reset: :earliest, commit_interval: 5000, commit_threshold: 500, committed_offset: 0, consumer_module: FOO.Consumer, consumer_state: nil, current_offset: 0, fetch_options: [auto_commit: false, worker_name: #PID<0.19074.91>], generation_id: 136, group: "foo-51686.foo_service", last_commit: -576424651131, member_id: "kafka_ex-fe6cdc27-cf12-4c45-9beb-76fcc2a1cdf8", partition: 0, topic: "foo.zzz.yyy", worker_name: #PID<0.19074.91>}
2019-11-28T19:30:08.926883+00:00 app[web.1]: 19:30:08.926 [debug] Successfully connected to broker "xxxxxxxxx.amazonaws.com":9096
2019-11-28T19:30:08.971555+00:00 app[web.1]: 19:30:08.971 [debug] Successfully connected to broker "xxxxxxxxx.amazonaws.com":9096
````

The `Successfully connected...` and `Receiving data from broker...` messages are also logged indefinitely, many per second.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Timeout errors #389

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Timeout errors #389

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions