Using LLMs on Your Phone Locally in Three Easy Steps
This post was published here at Medium, click for details:
Recently, I’ve been exploring ways to run Large Language Models locally on my phone, and I’m excited to share what I’ve learned. With the latest mobile devices packing increasingly powerful hardware, running your own AI models locally has become a reality. The recent release of smaller but capable Llama 3.2 models has made it possible to run these on both iOS and Android devices. This is particularly useful when you need to handle tasks like drafting emails in situations where you either don’t have internet access or have privacy concerns about using cloud-based AI services.
iOS Users
If you’re using Apple devices like me, setting up a local LLM is surprisingly straightforward. Just head to the App Store and download “LLM Farm” — it’s an impressive local LLM runner developed by Artem (Link) and it’s completely open-source! While Apple’s own AI capabilities are on the horizon, using open-source models might be the way to go if you want to stay current with new small models or if privacy is a priority for you.
After downloading the app, simply open it and download Llama 3.2 from the model download menu. You can choose other options as you like, but I think Llama 3.2 is a decent choice.

Once the model is downloaded, go back to the main menu and click “+” symbol to add the model. You need to set it up by choosing the options (just the template and model) as showed below:

Now, you can start using it immediately for tasks like email composition.

Android Users
For Android devices, the setup process is a bit more involved, but it gives you complete control over your data, models, and chat interfaces. You’ll need a relatively recent device — I’m using my Pixel 7 Pro with 12GB of RAM and a capable processor, which handles the 3B model smoothly. Generally, you’ll want at least 6GB of RAM to run Llama 3.2 3B models effectively. If you have a small RAM size or you want to have faster inference speed, try Llama 3.2 1B. The good news is that you don’t need a rooted device — any regular phone will do!
Setup Steps:
- Download a ChatUI app like ConfiChat (Link), which offers a sleek and intuitive interface for interacting with your local LLM. Just head to Github page here and go to the “Releases” section to download the pre-build android packages (apk). After that, install it on your device.

2. Similarly, install a terminal emulator like Termux (Link). You don’t need to grant it extensive permissions — the default access is sufficient. Once installed, run these commands:
*before running the below commands, make sure you disabled the auto-dim / screen lock functions on your phone!
pkg install proot-distro
pd install archlinux


Depending on your network and device, you may need to wait for 5 minutes until the installation complete. After installing the Arch Linux environment, you can login by simply typing:
pd login archlinux
Remember, every time (e.g., after closing Termux, reboot your phone) you want to use Ollama, you need to login linux first!
3. Install Ollama (check it out here), a fantastic tool for managing and running LLMs locally:
curl -fsSL https://ollama.com/install.sh | sh
This process may also take a few minutes. Make sure that your device is unlocked and fully charged.

Then start Ollama and download the model:
ollama serve &
ollama pull llama3.2

If you want to check if the model is ready or what models you’ve downloaded in Ollama, try this:
ollama list

That’s it! Once the model is ready, just switch back to ConfiChat and start chatting. If new models become available on Ollama in the future, you can easily download them using the same process in step 3.

Since we have completed the above 3 steps, next time when we start our phone, we can directly head to Termux and type the below commands to start ollama:
pd login archlinux
ollama serve &
After executing these two commands, we can switch back to ConfiChat.
Important Considerations
Keep in mind that the LLM’s performance depends heavily on your device’s CPU and RAM. Since all inference is done using CPU (even on iOS devices), don’t expect blazing-fast performance. In my experience, these small models excel at simple tasks like drafting emails and constructing basic sentences — they’re not meant for complex operations. However, with the potential implementation of Apple’s CoreML in the future, local LLM performance could improve dramatically.
The ability to run LLMs locally on our phones marks an exciting step forward in making AI more accessible and private. While these local solutions might not match the power of cloud-based services, they offer a practical alternative for basic tasks while keeping your data completely under your control. As mobile hardware continues to advance and models become more efficient, I expect we’ll see even more impressive capabilities in the near future.
Comments
Post a Comment