Skip to content

9.0.0 — The Local AI & Extensibility Update

We've introduced full, production-ready support for local AI via Ollama, allowing you to keep your data entirely on your machine. We've also integrated the Model Context Protocol (MCP), so the agent can safely interact with external databases, APIs, and local services. Finally, we've rebuilt the chat interface for speed and stability.

This was genuinely painful to implement with all the security and robustness, but I hope it is worth it. I still think most users should use cloud models for most use cases, but here you are. MCP is useful for all but I wanted to add it because Ollama does not support search or code tools like our default Gemini models.

Local Models (Ollama)

We've brought full, production-ready support for local models via Ollama.

  • Reliable JSON mode and streaming: The agent now uses a custom JSON lexer to parse local model outputs. This means complex logic and tool calls work more reliably, and responses stream token-by-token.
  • Custom dimensions: For local embedding models (like nomic-embed-text), you can now explicitly set the embedding dimension.

Model Context Protocol (MCP) Integration

The Researcher agent now supports MCP. This lets you connect the agent to any standard MCP server.

  • Connect to external tools: Whether using local execution (stdio) or remote URLs (streamable_http), you can query SQL databases, read external wikis, or access local files directly from the chat context.
  • Security-first execution: MCP connections support custom HTTP headers for authentication. More importantly, remote code execution is gated behind a mathematical "Trust but Verify" hash—meaning tools cannot run without your explicit approval, effectively preventing zero-click execution vulnerabilities.
  • Dynamic resources: The agent can explicitly query connected MCP servers to inject specific resources into its context when needed.

Speed and UI

  • Native Gemini 3.1 Grounding: The main agent loop now natively supports how Gemini 3.1 handles google_search. Web search and citations happen directly in the main response stream rather than requiring a secondary workflow, which makes responses noticeably faster.
  • Flicker-free streaming: We rewrote the UI renderer. Characters now stream directly onto the screen, and heavy Markdown formatting is only applied once the response finishes, eliminating the layout thrashing seen in older versions.
  • Real-time status: The chat shows exactly what the agent is doing background (e.g., "Thinking...", "Searching..."). You can also hit a "Stop" button to instantly cancel long-running agent responses or recursive tool loops.
  • Interactive links: Internal wikilinks (e.g., [[Note Name]]) generated by the AI are now clickable right inside the chat.

Quality of Life Improvements

  • Model filtering: You can hide specific models from the dropdown menus in the Advanced Settings to keep the list relevant to what you actually use.
  • Granular web search control: An explicit toggle lets you completely disable Google search integration for a simpler offline workflow.
  • Context assembly: Context window budgets are now configured per model, and the assembler ensures a single large document cannot consume the entire available context budget.