A small Llama 3 model running in Docker on private hardware. It answers from a little library of reference notes (retrieval-augmented generation) and cites what it used — so it stays grounded instead of guessing.
Answers come from the reference notes when relevant, with sources shown. Runs on CPU, so replies take a few seconds.