Cache reuse
An advanced technique for reducing the
prompt-processing time by "shifting" chunks
of the previous context to new positions.
# prompt 0 (cached)
AAABCCCCDDDEEFGGHHHIIIxxx
# prompt 1 (reuse from prompt 0)
AAACCCCEEGGIIIyyy
AAABCCCCDDDEEFGGHHHIIIxxx
# prompt 1 (reuse from prompt 0)
AAACCCCEEGGIIIyyy
> llama-server
--cache-reuse 256
[...]