KV-psi, using Linux PSI to to trim an LLM KV cache
Share
AI Summary
KV-psi leverages Linux Pressure Stall Information (PSI) to dynamically trim the KV cache of an LLM during inference, targeting memory-constrained edge devices with unified memory like the Jetson Orin. It is designed for developers deploying large language models on resource-limited hardware, where memory pressure can cause system stalls. The project is interesting because it uses OS-level memory pressure signals—rather than model-level heuristics—to make real-time cache eviction decisions, potentially improving throughput and stability on unified memory systems.
Cross-platform signals
Y
ViewHacker News
8
points
0
comments
Updated 2026-07-01