KV-psi, using Linux PSI to to trim an LLM KV cache

Tracked since 2026-06-28

AI Summary

KV-psi leverages Linux Pressure Stall Information (PSI) to dynamically trim the KV cache of an LLM during inference, targeting memory-constrained edge devices with unified memory like the Jetson Orin. It is designed for developers deploying large language models on resource-limited hardware, where memory pressure can cause system stalls. The project is interesting because it uses OS-level memory pressure signals—rather than model-level heuristics—to make real-time cache eviction decisions, potentially improving throughput and stability on unified memory systems.

Cross-platform signals

Hacker News

View

points

comments

Updated 2026-07-01