Skip to content

fix(canal/prometheus): 修复延迟指标的缺陷 #5426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sunxien
Copy link

@sunxien sunxien commented Apr 5, 2025

-【缺陷描述】
Admin控制台修改canal.properties配置后,Server自动进行重启,在Grafan上观测到同步实例的PUT/GET/ACK延迟居高不下。

-【原因定位】
源库心跳正常触发,MemoryStoreWithBuffer正常推进位点,profiling 正常统计,但是延迟指标依然是越来越来越高。Prometheus调用collect接口采集到的exec time时间始终是固定不变的。经debug排查到StoreCollector采集器内存hold的StoreMetricsHolder与CanalInstance实例中的引用已经不同啦,CanalIntance重启时已经被重建过一份新的实例,但是StoreMetricsHolder却没有保存到内存Hold中。原因是Map.putIfAbsent调用引起。

-【修复效果】

  • 修复后,重启Server心跳正常推进,延迟瞬间降下来。梳理其他Collector代码都是调用Map.put,只有这里使用putIfAbsent可能是粗心导致吧。修复效果

-【缺陷描述】
   Admin控制台修改canal.properties配置后,Server自动进行重启,在Grafan上观测到同步实例的PUT/GET/ACK延迟居高不下。

-【原因定位】
   源库心跳正常触发,MemoryStoreWithBuffer正常推进位点,profiling 正常统计,但是延迟指标依然是越来越来越高。Prometheus调用collect接口采集到的exec time时间始终是固定不变的。经debug排查到StoreCollector采集器内存hold的StoreMetricsHolder与CanalInstance实例中的引用已经不同啦,CanalIntance重启时已经被重建过一份新的实例,但是StoreMetricsHolder却没有保存到内存Hold中。原因是Map.putIfAbsent调用引起。

-【修复效果】
- 修复后,重启Server心跳正常推进,延迟瞬间降下来。梳理其他Collector代码都是调用Map.put,只有这里使用putIfAbsent可能是粗心导致吧。修复效果
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants