llama.cpp
feature

Cache reuse

An advanced technique for reducing the
prompt-processing time by "shifting" chunks
of the previous context to new positions.

# prompt 0 (cached)
AAABCCCCDDDEEFGGHHHIIIxxx

# prompt 1 (reuse from prompt 0)
AAACCCCEEGGIIIyyy
req:
RoPE encoding
uses:
Partial context updates, Thinking mode
> llama-server --cache-reuse 256 [...]