Skip to content

Commit 00b22bf

Browse files
fulmicotonguilload
authored andcommitted
The super large grace period of 1 day has proved to be harmful on
Cicada. This PR lowers it to 2h. For reminder, starting the detection of the node as dead, the node gets into a zombie state for 1h. We do share its KVs. From timeofdeath+1h to timeofdeath+2h, we won't share the node. After 2h, we will delete the node from the state.
1 parent e6c5396 commit 00b22bf

File tree

1 file changed

+6
-1
lines changed
  • quickwit/quickwit-cluster/src

1 file changed

+6
-1
lines changed

quickwit/quickwit-cluster/src/lib.rs

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ mod metrics;
2828
mod node;
2929

3030
use std::net::SocketAddr;
31+
use std::time::Duration;
3132

3233
use async_trait::async_trait;
3334
pub use chitchat::transport::ChannelTransport;
@@ -147,13 +148,17 @@ pub async fn start_cluster_service(node_config: &NodeConfig) -> anyhow::Result<C
147148
indexing_tasks,
148149
indexing_cpu_capacity,
149150
};
151+
let failure_detector_config = FailureDetectorConfig {
152+
dead_node_grace_period: Duration::from_secs(2 * 60 * 60), // 2 hours
153+
..Default::default()
154+
};
150155
let cluster = Cluster::join(
151156
cluster_id,
152157
self_node,
153158
gossip_listen_addr,
154159
peer_seed_addrs,
155160
node_config.gossip_interval,
156-
FailureDetectorConfig::default(),
161+
failure_detector_config,
157162
&CountingUdpTransport,
158163
)
159164
.await?;

0 commit comments

Comments
 (0)