[Feature] Support KV cache offloading and disagg prefill with LMCache connector. #12953

YaoJiayi · 2025-02-08T08:44:32Z

LMCache (https://github.com/LMCache/LMCache/tree/dev) uses the kv_transfer interface to support both KV cache offloading and disagg prefill.

The original interfaces recv_kv_caches_and_hidden_states and send_kv_caches_and_hidden_states in kv_connector are used as wrappers to call lmcache_retrieve_kv (retrieves the KV from local cpu, local disk, or remote storage to vllm paged memory) and lmcache_store_kv (extracts the KV from vllm paged memory to local cpu, local disk, or remote storage) respectively.

github-actions · 2025-02-08T08:44:43Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

vllm/distributed/kv_transfer/kv_connector/base.py

vllm/distributed/parallel_state.py

examples/offline_inference/disaggregated_prefill_lmcache.py

vllm/distributed/kv_transfer/kv_connector/factory.py

examples/offline_inference/disaggregated_prefill_lmcache.py

Leaf996 · 2025-02-19T09:18:52Z

Do we still need lmcache_vllm ? or just need lmcache ?

Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>

YaoJiayi · 2025-02-19T17:39:42Z

Do we still need lmcache_vllm ? or just need lmcache ?

lmcache-vllm repo is not needed if this PR gets merged.

rzwei · 2025-02-20T12:44:34Z

vllm/distributed/kv_transfer/kv_connector/lmcache_connector.py

+        model_executable: torch.nn.Module,
+        model_input: "ModelInputForGPUWithSamplingMetadata",
+        kv_caches: List[torch.Tensor],
+        hidden_or_intermediate_states: Union[torch.Tensor,


why not send hidden_or_intermediate_states to remote cache

Currently LMCache assumes that the user only stores KV caches. I guess the API can be extended but it requires some API change. @YaoJiayi does this align with what you are thinking?

This is correct:) The last token will be re-prefilled for disagg prefill for now.

KuntaiDu

LGTM.

KuntaiDu · 2025-02-22T13:41:18Z

vllm/distributed/kv_transfer/kv_connector/lmcache_connector.py

+        model_executable: torch.nn.Module,
+        model_input: "ModelInputForGPUWithSamplingMetadata",
+        kv_caches: List[torch.Tensor],
+        hidden_or_intermediate_states: Union[torch.Tensor,


Currently LMCache assumes that the user only stores KV caches. I guess the API can be extended but it requires some API change. @YaoJiayi does this align with what you are thinking?

… connector. (vllm-project#12953)

… connector. (vllm-project#12953) Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

… connector. (vllm-project#12953)

YaoJiayi changed the title ~~[Enhancement] Support KV cache offloading and disagg prefill with LMCache connector.~~ [Feature] Support KV cache offloading and disagg prefill with LMCache connector. Feb 8, 2025

YaoJiayi marked this pull request as ready for review February 8, 2025 09:04

KuntaiDu reviewed Feb 9, 2025

View reviewed changes

KuntaiDu requested review from youkaichao, comaniac, robertgshaw2-redhat and simon-mo February 9, 2025 13:40

KuntaiDu self-assigned this Feb 10, 2025

vMaroon reviewed Feb 18, 2025

View reviewed changes

examples/offline_inference/disaggregated_prefill_lmcache.py Outdated Show resolved Hide resolved

YaoJiayi added 5 commits February 19, 2025 11:28

add disagg prefill example

281fa0a

Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>

add disagg prefill example

d0b0733

Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>

fix minor comment

cfae57e

Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>

fix minor issues

0aeccd7

Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>

fix empty line

12b713e

Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>

YaoJiayi force-pushed the localdev/lmcache-connector branch from 1cff5c6 to 12b713e Compare February 19, 2025 17:36

YaoJiayi requested a review from KuntaiDu February 19, 2025 17:40

rzwei reviewed Feb 20, 2025

View reviewed changes

KuntaiDu reviewed Feb 22, 2025

View reviewed changes

KuntaiDu self-requested a review February 22, 2025 13:43

KuntaiDu approved these changes Feb 22, 2025

View reviewed changes

KuntaiDu enabled auto-merge (squash) February 24, 2025 16:11

KuntaiDu added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 24, 2025

simon-mo merged commit 2f42a48 into vllm-project:main Feb 25, 2025
59 of 63 checks passed

maobaolong mentioned this pull request Mar 1, 2025

[CI/Build] Update Dockerfile to align vllm v0.7.3 LMCache/LMCache#374

Closed

KuntaiDu mentioned this pull request Feb 28, 2025

[RFC]: Disaggregated prefilling and KV cache transfer roadmap #10818

Open

28 tasks

maobaolong mentioned this pull request Mar 3, 2025

[CI/Build] Update Dockerfile to align vllm v0.7.3 LMCache/LMCache#381

Closed

Akshat-Tripathi pushed a commit to krai/vllm that referenced this pull request Mar 3, 2025

[Feature] Support KV cache offloading and disagg prefill with LMCache…

a5f5674

… connector. (vllm-project#12953)

mergify bot added the documentation Improvements or additions to documentation label Mar 6, 2025

ShangmingCai mentioned this pull request Mar 13, 2025

[Misc] Remove stale func in KVTransferConfig #14746

Merged

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[Feature] Support KV cache offloading and disagg prefill with LMCache…

f1e4ac8

… connector. (vllm-project#12953) Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

popsiclexu mentioned this pull request Apr 29, 2025

[Question] Compatible lmcache version for vllm 0.7.3 LMCache/LMCache#555

Open

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[Feature] Support KV cache offloading and disagg prefill with LMCache…

564d730

… connector. (vllm-project#12953)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature] Support KV cache offloading and disagg prefill with LMCache connector. #12953

[Feature] Support KV cache offloading and disagg prefill with LMCache connector. #12953

Uh oh!

YaoJiayi commented Feb 8, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Feb 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Leaf996 commented Feb 19, 2025

Uh oh!

YaoJiayi commented Feb 19, 2025

Uh oh!

rzwei Feb 20, 2025

Uh oh!

KuntaiDu Feb 22, 2025

Uh oh!

YaoJiayi Feb 23, 2025

Uh oh!

KuntaiDu left a comment •

edited

Loading

Uh oh!

KuntaiDu Feb 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Feature] Support KV cache offloading and disagg prefill with LMCache connector. #12953

[Feature] Support KV cache offloading and disagg prefill with LMCache connector. #12953

Uh oh!

Conversation

YaoJiayi commented Feb 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Leaf996 commented Feb 19, 2025

Uh oh!

YaoJiayi commented Feb 19, 2025

Uh oh!

rzwei Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

KuntaiDu Feb 22, 2025

Choose a reason for hiding this comment

Uh oh!

YaoJiayi Feb 23, 2025

Choose a reason for hiding this comment

Uh oh!

KuntaiDu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KuntaiDu Feb 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

YaoJiayi commented Feb 8, 2025 •

edited by github-actions bot

Loading

KuntaiDu left a comment •

edited

Loading