Open-Source AI Takes the Lead: From Mathematical Reasoning to Multimodal Intelligence
The AI landscape is buzzing with a wave of new open-source model and tool releases, signaling a rapid acceleration in real-world capabilities. From specialized mathematical reasoning to web security and coding assistance, these innovations are expanding access to advanced AI building blocks for developers and teams worldwide.
Key takeaways
- Mathematical prowess: Nous Research’s Nomos 1 reports standout performance on Putnam-style math evaluation, highlighting how specialized training can outperform sheer scale.
- Web security enhanced: Anubis introduces an open-source “AI firewall” approach that adds computational friction against bot-driven scraping and abuse.
- Coding and reasoning gains: Models like GLM-4.7 and MiMo-V2-Flash are pushing open-source coding + multi-step reasoning forward, with agent-style workflows in mind.
- Multimodal leap: ERNIE-4.5-VL-28B-A3B-Thinking brings efficient multimodal reasoning and “thinking with images” behaviors into the open-source arena.
Mathematical reasoning breakthroughs
Nous Research has unveiled Nomos 1, an open-source AI system that reports a strong score on the 2024 William Lowell Putnam Mathematical Competition — often used as a proxy for deep mathematical reasoning under pressure.
Built on top of Alibaba’s Qwen-family foundation, Nomos 1 underscores a key trend in 2025: specialized post-training and reasoning techniques are becoming as important as model size.
Nomos 1 also highlights a system-level approach to reasoning: a two-phase process — a solving phase with parallel attempts and self-scoring, followed by a finalization phase that consolidates the best outputs. The release of both the model and its evaluation harness lowers the barrier for serious experimentation in mathematical reasoning and complex system modeling.
Fortifying the web with AI firewalls
In cybersecurity, TecharoHQ’s Anubis offers a fresh open-source take on defending websites from scraping and abusive traffic.
Anubis works as a reverse proxy that introduces a lightweight computational challenge before granting access. The goal is simple: make automated scraping expensive at scale, while keeping the friction minimal for real users.
It’s configurable, allowing operators to tune difficulty and carve out exceptions for trusted services — a pragmatic approach for anyone balancing protection with usability.
Elevating coding and general reasoning
Open-source LLMs aren’t just catching up they’re specializing.
GLM-4.7 (Zhipu AI / Z.AI) is positioned as a foundation model optimized for deeper reasoning, better structured outputs, and stronger coding performance — all qualities that matter when you want models to behave less like “chat” and more like execution-oriented agent brains.
MiMo-V2-Flash (Xiaomi) follows a similar trajectory: efficiency-first design, fast inference, and a focus on coding, reasoning, and agentic scenarios. Xiaomi’s messaging emphasizes “high performance per cost,” suggesting a future where capability isn’t locked behind heavyweight deployments.
Together, these releases reinforce a broader shift: open-source models are increasingly built to be used inside workflows, not just inside chats.
Multimodal AI takes a leap forward
Baidu’s ERNIE-4.5-VL-28B-A3B-Thinking brings multimodal intelligence (text + images) into a more efficient, open release format.
The model uses a Mixture-of-Experts (MoE) design that activates only a portion of parameters per task — aiming to preserve quality while improving efficiency.
One particularly interesting capability: “Thinking with Images” — behavior that mimics how humans examine visuals by zooming into details and shifting attention dynamically. That opens up stronger agent workflows for tasks like diagram understanding, screenshot analysis, product research, and visual QA.
So what comes next?
If the past year was about releasing better models, 2025 is about putting them to work.
The challenge isn’t just tracking new open models — it’s connecting models + tools into usable, repeatable systems: agents, workflows, and applications that reliably execute.
The real winners won’t be the teams with the longest model list, but the ones who can turn models + tools into reliable agents.
If you want a practical place to start, try building a small agent workflow in Shinkai (web or desktop) and swap models as you iterate. It’s designed for builders who want to turn model capability into execution with the option to keep privacy and control as first-class constraints.
🐙 Your AI. Your Rules.