Why Smaller AI Models Are Becoming More Relevant

As tech companies scale back from massive cloud-based AI models, smaller and more efficient systems are quietly reshaping how AI is deployed in everyday devices.

Aug 04, 2025

Image of tiny robot cat in front of Planet Earth. — Header image created using Substack’s AI generator.

While large AI models like GPT-4.5, Claude 3.5, and Gemini 1.5 Pro continue to set new benchmarks, there’s growing interest in smaller, more efficient models that can run locally—on devices like laptops, smartphones, and even microcontrollers.

This shift reflects practical considerations around cost, privacy, and real-time performance.

What Are “Small” or “Tiny” AI Models?

Small language models (SLMs) are scaled-down versions of larger AI systems. They typically contain under 10 billion parameters and are optimized for speed, efficiency, and local deployment.

Recent examples as of mid–2025 include:

Phi-3 Mini (Microsoft): Compact, high-performing, optimized for inference speed.
Gemini Nano 1.5 (Google): Built into Android 15 and Pixel devices for on-device features.
LLaMA 3 8B (Meta): Open weights, actively used for research and mobile applications.
Mistral 7B and 8x7B (Mistral.ai): Modular and performant on consumer hardware.
Command R+ (Reka): Emerging as a strong choice for multimodal lightweight applications.

These models are often embedded directly into apps and services, removing the need for constant access to cloud infrastructure.

Image of tiny cat inside computer. — Image created using Substack’s AI generator.

Why Smaller Models Are Gaining Momentum

Data Privacy
On-device AI keeps user data local, making it suitable for healthcare, finance, and regulated industries.
Lower Cost of Deployment
Cloud inference—especially with large models—is expensive. Smaller models can run on CPUs or consumer-grade GPUs.
Faster Response Times
Local inference avoids round-trip latency. This is especially important for wearables, automotive, and real-time control systems.
Better Energy Efficiency
Reduces carbon footprint and hardware demands. This aligns with sustainable tech goals and corporate ESG initiatives.

Key Use Cases in 2025

Mobile Devices: Gemini Nano now powers Android’s smart replies, transcription, and on-device summarization.
Wearables: AI-powered fitness trackers offer real-time coaching using lightweight models.
Education: Offline tutoring apps in emerging markets use small models to provide consistent access.
Customer Support: Companies are fine-tuning small models for brand-specific virtual assistants.
Software Development: Local code completion tools (e.g., using StarCoder2 variants) are widely adopted for privacy-conscious teams.

Image of very small cat inside a microchip. — Image created using Substack’s AI generator.

Limitations and Trade-Offs

Lower Accuracy on Complex Tasks: Small models tend to underperform on open-ended reasoning or nuanced dialogue.
Fewer Modalities: Most are still primarily text-based, though multimodal variants are emerging.
Less General Knowledge: Due to smaller training sets or parameter constraints.

Still, the trade-offs are often acceptable—especially in contexts where speed, control, and data protection are more important than creative output.

Tools to Experiment With

Ollama – Now supports automatic model quantization for faster local use.
LM Studio – Desktop interface for managing and testing local LLMs.
Hugging Face Transformers – Continues to expand with fine-tuned small models.
[Apple CoreML + Transformer APIs] – With iOS 19, Apple supports on-device inference for developers.
Reka’s SDKs – Focused on high-efficiency multimodal interaction in edge environments

Final Thoughts

As of August 2025, small AI models are no longer niche—they’re a core part of many production systems. Their ability to run privately, quickly, and affordably is helping democratize access to AI-powered tools.

For developers, researchers, and product designers, learning to work with these models—whether for edge deployment, fine-tuning, or hybrid cloud/offline experiences—is becoming a valuable skillset in its own right.

Image of tiny cat next to huge robot cat. — Image created using Substack’s AI generator.

📚 Bibliography

Google (June 4, 2025). Google Unveils Gemini Nano for On‑Device AI in Android Apps. https://securityboulevard.com/2025/06/google-unveils-gemini-nano-for-on-device-ai-in-android-apps/

Google Developers Blog (May 20, 2025). Gemini Nano | AI. https://developer.android.com/ai/gemini-nano

Microsoft News (April 23, 2024). Tiny but mighty: The Phi‑3 small language models with big potential. https://news.microsoft.com/source/features/ai/the-phi-3-small-language-models-with-big-potential/

Microsoft Azure Blog (July 25, 2024). Announcing Phi-3 fine-tuning, new generative AI models, and other Azure AI updates to empower organizations to customize and scale AI applications https://azure.microsoft.com/en-us/blog/announcing-phi-3-fine-tuning-new-generative-ai-models-and-other-azure-ai-updates-to-empower-organizations-to-customize-and-scale-ai-applications/

Ollama Library Page (May 1, 2025). Phi‑3‑Mini‑128K‑Instruct. https://huggingface.co/microsoft/Phi-3-mini-128k-instruct

Meta AI (April 18, 2024). Introducing Meta Llama 3: The most capable openly available LLM to date https://ai.meta.com/blog/meta-llama-3/

Meta (July 23, 2024). Open Source AI is the Path Forward. https://about.fb.com/news/2024/07/open-source-ai-is-the-path-forward/

Wikipedia (April 2025 update). Llama (language model). https://en.wikipedia.org/wiki/Llama_(language_model)

Mistral AI (March 17, 2025). Mistral Small 3.1 launch (per Wikipedia summary). https://en.wikipedia.org/wiki/Mistral_AI

The PhilaVerse

Discussion about this post