Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(canal/client-adapter): 修复adapter插件重试错误缺陷 #5427

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

sunxien
Copy link

@sunxien sunxien commented Apr 5, 2025

-【缺陷描述】
  adapter运行期间,针对server或者Instance进行过一次重启或者断网测试,发现adapter通过client拉取数据报错以后持续打印错误日志,永远不得恢复。虽然adapter processor 针对错误有重试逻辑,但是client的tcp连接已经发生中断,且无法恢复。

-【问题定位】
  adapter processor 针对错误的处理比较粗,需要细分是从上游get message报错,还是下游sync data报错?
  如果是 get message报错,特别是tcp 断开,需要跳出重试,重新尝试重连。如果是下游 sync data 报错,再考虑重试写入。下游插件开发者自行保证连接重连机制、写入幂等。框架只提供重试。

  • 说明:顺带更新一下 .gitignore 文件

sunxien added 2 commits April 5, 2025 21:05
-【缺陷描述】
   Admin控制台修改canal.properties配置后,Server自动进行重启,在Grafan上观测到同步实例的PUT/GET/ACK延迟居高不下。

-【原因定位】
   源库心跳正常触发,MemoryStoreWithBuffer正常推进位点,profiling 正常统计,但是延迟指标依然是越来越来越高。Prometheus调用collect接口采集到的exec time时间始终是固定不变的。经debug排查到StoreCollector采集器内存hold的StoreMetricsHolder与CanalInstance实例中的引用已经不同啦,CanalIntance重启时已经被重建过一份新的实例,但是StoreMetricsHolder却没有保存到内存Hold中。原因是Map.putIfAbsent调用引起。

-【修复效果】
- 修复后,重启Server心跳正常推进,延迟瞬间降下来。梳理其他Collector代码都是调用Map.put,只有这里使用putIfAbsent可能是粗心导致吧。修复效果
-【缺陷描述】
  adapter运行期间,针对server或者Instance进行过一次重启或者断网测试,发现adapter通过client拉取数据报错以后持续打印错误日志,永远不得恢复。虽然adapter processor 针对错误有重试逻辑,但是client的tcp连接已经发生中断,且无法恢复。

-【问题定位】
  adapter processor 针对错误的处理比较粗,需要细分是从上游get message报错,还是下游sync data报错?
  如果是 get message报错,特别是tcp 断开,需要跳出重试,重新尝试重连。如果是下游 sync data 报错,再考虑重试写入。下游插件开发者自行保证连接重连机制、写入幂等。框架只提供重试。

- 说明:顺带更新一下 .gitignore 文件
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants