Starting new personal-AI project

I've been playing with personal AI for a while, and when I saw the quick start project from google, I thought that it would be great starting point to build and explore a few ideas

Jun 11, 2025

Introduction

I’ve been a fan of HuggingFace ecosystem and have been using it for a while to host and run models on my antiquated GPU (GTX 1070 Ti). One of the benefits of using HuggingFace SDKs is that you can run most of up to 8B params models quite comfortably (but with patience) on a commodity hardware. This is not really the case for things like vLLM or SGLang that have higher GPU compatibility expectations.

I heard about LM Studio about 4 months ago, and their experience, simplicity and convenience kept me using them as my main inference provider, still sourcing the models (GGUF) from HuggingFace, and running them using llama.cpp.

This was quite effective as my OpenAI compatible inference server, running locally on my machine. You can also think about it as an edge-AI, which can be quite interesting for IP-sensitive interactions and workloads.

When Google released a full stack quick start for a basic researcher agent, I got excited, as it delivered the following value to me:

I wanted to adapt it to run with my local stack, I also preferred it to be more modular and interchangeable.
I wanted to expand it beyond the basic recursive search, and enhance it over time.
The last time I built a web UI was a decade ago, and the quick stack came with React front end!

Implementation

I forked the work into my repo: https://github.com/eyal-lantzman/personal-ai and made a few tweaks.

This is what I came up with, and some of the additional work I plan to do:

Replacing Gemini with Qwen3 I swapped out Gemini for Qwen3. Even the 1.7B version performs well in tool usage and reasoning. However, you need to be mindful and reformat ‘<think>…. </think>’ from a UX perspective or remove it entirely, which what I’ve done for now. Next steps:
- Incorporate the best practices for Qwen configuration to optimize outcomes
Switching Google search with DuckDuckGo I replaced Google Search with DuckDuckGo. While DuckDuckGo is a simple alternative, it’s not a direct apples-to-apples comparison—Google Search integrates more seamlessly with Gemini. However, DuckDuckGo is privacy-friendly, which is an important consideration for me. Next steps:
- Improve grounding and citation mechanisms for better reliability
- Enhance search summaries to extract richer, more contextual summaries (for personal use!)
- Adjust and tune rate limiter when calling DDG, and replace magic numbers. Unfortunately, DuckDuckGo doesn’t publicly document its limits, so I’ll need to experiment further.

The architecture remained the same despite replacing the different dependencies, but it wasn’t easy to get there.

Observations and Learnings

No MCP (Model Control Protocol) There’s no MCP in this project—at least for now, there’s a need for it. However, LangChain and LangGraph provide a simple yet robust interface for defining and integrating tools, which the AI was able to pick up and use effectively.
LLM Tool Integration Varies Not all LLMs that are supposed to be instruct-tuned for tools actually engage with them consistently—except for Qwen. I tested models from all major providers on Hugging Face, focusing on releases from the last six months that were 8B parameters or smaller. In the end, I chose Qwen for its balance of size and reliability.
Search-Based Retrieval Unlike many RAG (Retrieval-Augmented Generation) patterns, this setup doesn’t host a vector database—though search is still present in a form of a service and this is still RAG. The risk profile is quite different given that you gain more powerful insight at the cost of exposure to external data related risks.
Hidden complexity in simple graph representation The graph visual hides the engineering complexity originating in distributed system paradigms e.g. map-reduce. These complexities will have impact on robustness, maintainability and total cost of ownership that is not intuitive from the visual we get from LangGraph.
Testing Practices despite the many potential failure points—race conditions, infinite loops, and map-join failures (e.g., a final node not waiting for all the incoming nodes)—testing strategies aren’t explicitly mentioned. .
Dev Loops The project wasn’t optimized for debugging and e.g. ‘--debug-port 2025’.

Future Work

As mentioned earlier, I plan to improve key areas such as model configuration and search. But beyond technical refinements, a broader question remains: Does the current state-machine truly represent how you would research a topic? What would you improve?

This is where I plan to experiment further—making the system more effective in connecting the dots between previous research topics, identifying mental models, and refining ideas. The goal is to enhance synergy and create a more intuitive research workflow.

Try the latest version and provide me feedback:

https://github.com/eyal-lantzman/personal-ai

Wrap up

With these tools now effectively a commodity—accessible to anyone running models on their own machine—the real question is: How do we adapt, evolve, and leverage these new norms?

The key is mindfulness:

Recognize what is becoming a commodity.
Identify the unique value you bring.
Invest in growing that value rather than competing on what is easily replicable.

This is what a 1.7B model and DuckDuckGo can achieve today on a commodity hardware!

Update:

Write up on next round of changes: https://eyallantzman.substack.com/publish/post/165703319

Everything I write are my opinions and perspectives and do not represent my past, current of future employers.

Eyal’s Substack

Discussion about this post