AI as Coding Assistant

Coding assistants are everywhere—but can they go beyond autocomplete? I put 3 VS Code extensions to the test to see if they can harness MCP tools and take a step toward true multi-agent collaboration

Jun 24, 2025

Background

I've been tinkering with a personal AI project that uses a fairly complex graph—a type of state machine built atop LangGraph. I wanted the AI to write tests for it. I knew I could prompt a model to generate a decent starting point and iterate from there, but this time I aimed higher: I wanted to test a multi-agent approach to AI-assisted development. The long tail of refinements I usually make manually could instead be codified into MCP tools

Setup and Constraints

I chose open-source models like Qwen3, Mistral, and Llama.
All models were hosted locally via LM Studio.
Everything had to run on my aging GTX 1070 GPU.
My IDE was VS Code, configured with the following extensions:
- Roo Code
- Continue
- Cline

All three support RAG (retrieval-augmented generation), file-specific referencing (e.g., @graph.py), and MCP configuration. I regularly mix or pair them to observe how their capabilities evolve.

Default Prompt Flow and Its Limitations

While I adjusted prompts iteratively, I consistently ran into issues—some are probably due to model limitations:

The model entered “think aloud” mode: rambling plans without delivering results.
Tests were generated, but most were poor in quality.
Mocks were often incorrect.
Integration tests built on some imaginary configuration rather than testing the existing one.
Some unit tests were missing—or entirely absent.
Imports were often missing or unused.

Here’s the prompt I used:

Write tests for @graph.py and put them into @test_graph.py . Use pytest as the testing framework, and use fixtures and asyncio annotations as appropriate.
Make sure the tests are easy to follow and provide confidence in changing the graph frequently whilst evolving the tests as well.
Mock the dependencies to other modules in this project. Make sure the test cases test meaningful things.
Create tests for each of the graph nodes, for the each of the edges and an integration test that tests 'graph' object in graph.py.

This usually gets me ~80% of the way. Sometimes, one extension fills the gaps left by another due to differing prompts and heuristics.

Extension Showdown

Roo Code - Good real-time Copilot-like experience
Continue - Occasionally impressive planning and decent coverage—including integration tests
Cline - Very slow, reliant on legacy transport (had to run my server with sse).

Despite using the same model and prompt, none of them produced fully functional tests out-of-the-box. I could have gone back and asked them to fix the issues, but my goal here was to evaluate autonomous capability—not hand-hold the process. Of the three, Continue came closest to usable output

How VS Code Extensions Use MCP

All three extensions used MCP for filesystem access and VS Code interaction. Roo Code and Cline even bundled browsers, though they weren't triggered in my tests. Cline’s using a legacy SSE transport but was workable.

In each case, the extension was the sole agent—responsible for orchestration, prompting, and tool use. That raised a big question:

> What if tools had a “mind” of their own?

Dependency Inversion for MCP Tools

Anthropic’s findings are compelling: "Subagents outperformed single-agent Claude Opus 4 by 90.2%." So why are IDE extensions still monolithic?

I propose flipping the model: decentralize tool intelligence. Instead of bundling all logic into the extension, we could offload responsibility to tool agents. In essence, Dependency Inversion for MCP: tools control execution, while the IDE remains simple and focused on UX.

Benefits:

Clearer control of flow, state, and dependencies.
Flexibility to use different models/configs per tool(agent).
Opportunity to introduce additional security or proprietary logic.

Cost:

Some rework—features built into extensions may need to be redeveloped as services.
Added complexity in tool orchestration (which arguably should exist anyway).

As the industry shifts toward multi-agent systems for complex workflows—and with the MCP marketplace steadily maturing—the long-term benefits are poised to outweigh the investment overhead.

Experiment: Forcing Tool Delegation

To test this approach, I built a minimal MCP tool that returns a string with generated test cases (implementation details aren’t critical here). I prompted the model:

Generate test cases for "agent.graph" module in file "…\src\agent\graph.py" for the "graph" object. You must use local mcp tools!
Don't make things up.
Don't analyze the request and try to answer it.
Your job is to select the best tool and delegate the task it. If you don't know or something is unclear, ask the user to clarify.

I picked Qwen3-4b for its reliable tool usage and simplified the prompt to focus on delegation.

Experiment: Results

Continue

Configuration

Tool View

Result

Didn’t recognize MCP tools.

Roo Code

Configuration

Tool View

Result

Lacked context-awareness. It spammed the context window rather than aligning tools to the task.

Cline

Configuration

Tool View

Result

The only extension that successfully used my MCP tool—despite relying on outdated transport.

And the response:

The result wasn’t accepted but it was to naïve and meant to test tool usage rather than to provide any real value.

Wrapping up

Only Cline managed to leverage my MCP tool, albeit through legacy transport. I spent a solid day exploring how to build testable MCP servers, FastAPI integration, and learned a lot in the process.

For now, I’ll stick with this setup. Cline might evolve into the ideal assistant—but I’ll keep refining the tools and exploring alternatives. The dream is an IDE agent that delegates effectively and respects tool boundaries.

Plenty of engineering quirks surfaced along the way—enough for a deeper technical dive in a follow-up post, including the code for hosting those MCP tools locally.

Everything I write are my opinions and perspectives and do not represent my past, current of future employers.

Eyal’s Substack

Discussion about this post