This week was a little lighter in terms of workload. Because of the lapse in assignments, I decided to spend some time looking into running an LLM locally.
In the last few months, there have been many developments in the AI space related to making LLM models smaller and more efficient. Recently, companies like Alibaba, Google, Mistral and Deepseek have released open source LLM models for public use, and I want to see how well they will run on consumer hardware.
For what it’s worth, I have a generally mid-grade computer by modern standards. Running an NVIDIA 3060, along with 32 GB of ram, I can optimistically hope for 1-2 tokens per second, but that should be OK. I will be using OLLAMA to run the most recent devstral small model. If that ends up working well I will eventually benchmark it against other models.
It may be a little too much to get into the installation and usage of of a locally run LLM in this post, so I’ll wrap this up by mentioning that there are many alternatives to using OLLAMA and Devstral like I have chosen.
In short, I chose OLLAMA because I want a GUI for interacting with whatever LLM model I choose to use. There are multiple other open source alternatives, like llama.cpp, but I don’t want to have to interact with my AI coding agent via the command line. Once I get things up and running, I will have more to share about the user experience of running these kinds of models.