Close Menu
SkytikSkytik

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    SkytikSkytik
    • Home
    • AI Tools
    • Online Tools
    • Tech News
    • Guides
    • Reviews
    • SEO & Marketing
    • Social Media Tools
    SkytikSkytik
    Home»AI Tools»I Replaced Vector DBs with Google’s Memory Agent Pattern for my notes in Obsidian
    AI Tools

    I Replaced Vector DBs with Google’s Memory Agent Pattern for my notes in Obsidian

    AwaisBy AwaisApril 3, 2026No Comments13 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    I Replaced Vector DBs with Google’s Memory Agent Pattern for my notes in Obsidian
    Share
    Facebook Twitter LinkedIn Pinterest Email

    This started because my Obsidian assistant kept getting amnesia. I didn’t want to stand up Pinecone or Redis just so Claude could remember that Alice approved the Q3 budget last week. Turns out, with 200K+ context windows, you might not need any of that.

    I want to share a new mechanism that I’ve started running. It’s a system built on SQLite and direct LLM reasoning, no vector databases, no embedding pipeline. Vector search was mostly a workaround for tiny context windows and keeping prompts from getting messy. With modern context sizes, you can often skip that and just let the model read your memories directly.


    The Setup

    I take detailed notes, both in my personal life and at work. I used to scrawl in notebooks that would get misplaced or get stuck on a shelf and never be referenced again. A few years ago, I moved to Obsidian for everything, and it has been fantastic. In the last year, I’ve started hooking up genAI to my notes. Today I run both Claude Code (for my personal notes) and Kiro-CLI (for my work notes). I can ask questions, get them to do roll-ups for leadership, track my goals, and write my reports. But it’s always had one big Achilles’ heel: memory. When I ask about a meeting, it uses an Obsidian MCP to search my vault. It’s time-consuming, error-prone, and I need it to be better.

    The obvious fix is a vector database. Embed the memories. Store the vectors. Do a similarity search at query time. It works. But it also means a Redis stack, a Pinecone account, or a locally running Chroma instance, plus an embedding API, plus pipeline code to stitch it all together. For a personal tool, that’s a lot, and there is a real risk that it won’t work exactly like I need it to. I need to ask, what happened on ‘Feb 1 2026’ or ‘recap the last meeting I had with this person’, things that embeddings and RAG aren’t great with.

    Then I ran across Google’s always-on-memory agent https://github.com/GoogleCloudPlatform/generative-ai/tree/main/gemini/agents/always-on-memory-agent. The idea is pretty simple: don’t do a similarity search at all; just give the LLM your recent memories directly and let it reason over them.

    I wanted to know if that held up on AWS Bedrock with Claude Haiku 4.5. So I built it (along with Claude Code, of course) and added in some extra bells and whistles.

    Visit my GitHub repo, but make sure to come back!

    https://github.com/ccrngd1/ProtoGensis/tree/main/memory-agent-bedrock


    An Insight That Changes the Math

    Older models topped out at 4K or 8K tokens. You couldn’t fit more than a few documents in a prompt. Embeddings let you retrieve the relevant documents without loading everything. That was genuinely necessary. Haiku 4.5 offers a context window of 250k, so what can we do with that?

    A structured memory (summary, entities, topics, importance score) runs about 300 tokens. Which means we can get about 650 memories before you hit the ceiling. In practice, it’s a bit less since the system prompt and query also consume tokens, but for a personal assistant that tracks meetings, notes, and conversations, that’s months of context.

    No embeddings, no vector indexes, no cosine similarity.

    The LLM reasons directly over semantics, and it’s better at that than cosine similarity.


    The Architecture

    The orchestrator isn’t a separate service. It’s a Python class inside the FastAPI process that coordinates the three agents.

    The IngestAgent job is simple: take raw text and ask Haiku what’s worth remembering. It extracts a summary, entities (names, places, things), topics, and an importance score from 0 to 1. That package goes into the `memories` table.

    The ConsolidateAgent runs with intelligent scheduling: at startup if any memories exist, when a threshold is reached (5+ memories by default), and daily as a forced pass. When triggered, it batches unconsolidated memories and asks Haiku to find cross-cutting connections and generate insights. Results land in a `consolidations` table. The system tracks the last consolidation timestamp to ensure regular processing even with low memory accumulation.

    The QueryAgent reads recent memories plus consolidation insights into a single prompt and returns a synthesized answer with citation IDs. That’s the whole query path.


    What Actually Gets Stored

    When you ingest text like “Met with Alice today. Q3 budget is approved, $2.4M,” the system doesn’t just dump that raw string into the database. Instead, the IngestAgent sends it to Haiku and asks, “What’s important here?”

    The LLM extracts structured metadata:

    {
      "id": "a3f1c9d2-...",
      "summary": "Alice confirmed Q3 budget approval of $2.4M",
      "entities": ["Alice", "Q3 budget"],
      "topics": ["finance", "meetings"],
      "importance": 0.82,
      "source": "notes",
      "timestamp": "2026-03-27T14:23:15.123456+00:00",
      "consolidated": 0
    }

    The memories table holds these individual records. At ~300 tokens per memory when formatted into a prompt (including the metadata), the theoretical ceiling is around 650 memories in Haiku’s 200K context window. I intentionally set the default to be 50 recent memories, so I am well short of that ceiling.

    When the ConsolidateAgent runs, it doesn’t just summarize memories. It reasons over them. It finds patterns, draws connections, and generates insights about what the memories mean together. Those insights get stored as separate records in the consolidations table:

    {
      "id": "3c765a26-...",
      "memory_ids": ["a3f1c9d2-...", "b7e4f8a1-...", "c9d2e5b3-..."],
      "connections": "All three meetings with Alice mentioned budget concerns...",
      "insights": "Budget oversight appears to be a recurring priority...",
      "timestamp": "2026-03-27T14:28:00.000000+00:00"
    }

    When you query, the system loads both the raw memories *and* the consolidation insights into the same prompt. The LLM reasons over both layers at once, including recent facts plus synthesized patterns. That’s how you get answers like “Alice has raised budget concerns in three separate meetings [memory:a3f1c9d2, memory:b7e4f8a1] and the pattern suggests this is a high priority [consolidation:3c765a26].”

    This two-table design is the entire persistence layer. A single SQLite file. No Redis. No Pinecone. No embedding pipeline. Just structured records that an LLM can reason over directly.

    What the Consolidation Agent Actually Does

    Most memory systems are purely retrieval. They store, search, and return similar text. The consolidation agent works differently; It reads a batch of unconsolidated memories and asks, “What connects these?”, “What do these have in common?”, “How do these relate?”

    Those insights get written as a separate consolidations record. When you query, you get both the raw memories and the synthesized insights. The agent isn’t just recalling. It’s reasoning.

    The sleeping brain analogy from the original Google implementation seem pretty accurate. During idle time, the system is processing rather than just waiting. This is something I often struggle with when building agents: how can I make them more autonomous so that they can work when I don’t, and this is a good use of that “downtime”.

    For a personal tool, this matters. “You’ve had three meetings with Alice this month, and all of them mentioned budget concerns” is more useful than three individual recall hits.

    The original design used a simple threshold for consolidation: it waited for 5 memories before consolidating. That works for active use. But if you’re only ingesting sporadically, a note here, an image there, you might wait days before hitting the threshold. Meanwhile, those memories sit unprocessed, and queries don’t benefit from the consolidation agent’s pattern recognition.

    So, I decided to add two more triggers. When the server starts, it checks for unconsolidated memories from the previous session and processes them immediately. No waiting. And on a daily timer (configurable), it forces a consolidation pass if anything is waiting, regardless of whether the 5-memory threshold has been met. So even a single note per week still gets consolidated within 24 hours.

    The original threshold-based mode still runs for active use. But now there’s a safety net underneath it. If you’re actively ingesting, the threshold catches it. If you’re not, the daily pass does. And on restart, nothing falls through the cracks.

    File Watching and Change Detection

    I have an Obsidian vault with hundreds of notes, and I don’t want to manually ingest each one. I want to point the watcher at the vault and let it handle the rest. That’s exactly what this does.

    On startup, the watcher scans the directory and ingests everything it hasn’t seen before. It runs two modes in the background: a quick scan every 60 seconds checks for new files (fast, no hash calculation, just “is this path in the database?”), and a full scan every 30 minutes, calculates SHA256 hashes, and compares them to stored values. If a file has changed, the system deletes the old memories, cleans up any consolidations that referenced them, re-ingests the new version, and updates the tracking record. No duplicates. No stale data.

    For personal note workflows, the watcher covers what you’d expect:

    • Text files (.txt, .md, .json, .csv, .log, .yaml, .yml)
    • Images (.png, .jpg, .jpeg, .gif, .webp), analyzed via Claude Haiku’s vision capabilities
    • PDFs (.pdf), text extracted via PyPDF2

    Recursive scanning and directory exclusions are configurable. Edit a note in Obsidian, and within 30 minutes, the agent’s memory reflects the change.


    Why No Vector DB

    Whether you need embeddings for your personal notes boils down to two things: how many notes you have and how you want to search them.

    Vector search is genuinely necessary when you have millions of documents and can’t fit the relevant ones in context. It’s a retrieval optimization for large-scale problems.

    At personal scale, you’re working with hundreds of memories, not millions. Vector means you’re running an embedding pipeline, paying for the API calls, managing the index, and implementing similarity search to solve a problem that a 200K context window already solves.

    Here’s how I think about the tradeoffs:

    Complexity
    Accuracy
    Scale

    I couldn’t justify having to setup and maintain a vector database, even FAISS for the few notes that I generate.

    On top of that, this new method gives me better accuracy for the way I need to search my notes.


    Seeing It In Action

    Here’s what using it actually looks like. Configuration is handled via a .env file with sensible defaults. You can copy of the example directly and start using it (assuming you have run aws configure on you’re machine already).

    cp .env.example .env

    Then, start the server with the file watcher active

    ./scripts/run-with-watcher.sh

    CURL the /ingest endpoint with to test a sample ingestion. This is option, just to demonstrate how it works. You can skip this if you’re setting up in a real use case.

    -H "Content-Type: application/json" \
    -d '{"text": "Met with Alice today. Q3 budget is approved, $2.4M.", "source": "notes"}'

    The response will look like

    {
      "id": "a3f1c9d2-...",
      "summary": "Alice confirmed Q3 budget approval of $2.4M.",
      "entities": ["Alice", "Q3 budget"],
      "topics": ["finance", "meetings"],
      "importance": 0.82,
      "source": "notes"
    }

    To query it later CURL the query endpoint with

    query?q=What+did+Alice+say+about+the+budget

    Or use the CLI:

    python cli.py ingest "Paris is the capital of France." --source wikipedia
    python cli.py query "What do you know about France?"
    python cli.py consolidate  # trigger manually
    python cli.py status       # see memory count, consolidation state

    Making It Useful Beyond CURL

    curl works, but you’re not going to curl your memory system at 2 am when you have an idea, so the project has two integration paths.

    Claude Code / Kiro-CLI skill. I added a native skill that auto-activates when relevant. Say “remember that Alice approved the Q3 budget” and it stores it without you needing to invoke anything. Ask “what did Alice say about the budget?” next week, and it checks memory before answering. It handles ingestion, queries, file uploads, and status checks through natural conversation. This is how I interact with the memory system most often, since I tend to live in CC/Kiro most of the time.

    CLI. For terminal users or scripting

    python cli.py ingest "Paris is the capital of France." --source wikipedia
    
    python cli.py query "What do you know about France?"
    
    python cli.py consolidate
    
    python cli.py status
    
    python cli.py list --limit 10

    The CLI talks to the same SQLite database, so you can mix API, CLI, and skill usage interchangeably. Ingest from a script, query from Claude Code, and check status from the terminal. It all hits the same store.


    What’s Next

    The good news, the system works, and I’m using it today, but here are a few additions it could benefit from.

    Importance-weighted query filtering. Right now, the query agent reads the N most recent memories. That means old but important memories can get pushed out by recent noise. I want to filter by importance score before building the context, but I’m not sure yet how aggressive to be. I don’t want a high-importance memory from two months ago to disappear just because I ingested a bunch of meeting notes this week.

    Metadata filtering. Similarly, since each memory has associated metadata, I could use that metadata to filter out memories that are obviously wrong. If I’m asking questions about Alice, I don’t need any memories that only involve Bob or Charlie. For my use case, this could be based on my note hierarchy, since I keep notes aligned to customers and/or specific projects.

    Delete and update endpoints. The store is append-only right now. That’s fine until you ingest something wrong and need to fix it. DELETE /memory/{id} is an obvious gap. I just haven’t needed it badly enough yet to build it.

    MCP integration. Wrapping this as an MCP server would let any Claude-compatible client use it as persistent memory. That’s probably the highest-lift thing on this list, but it’s also the most work.


    Try It

    The project is up on GitHub as part of an ongoing series I started, where I implement research papers, explore leading-edge ideas, and repurpose handy tools for bedrock (https://github.com/ccrngd1/ProtoGensis/tree/main/memory-agent-bedrock).

    It’s Python with no exotic dependencies, just boto3, FastAPI, and SQLite.

    The default model is `us.anthropic.claude-haiku-4-5-20251001-v1:0` (Bedrock cross-region inference profile), configurable via .env.

    A note on security: the server has no authentication by default; it’s designed for local use. If you expose it on a network, add auth first. The SQLite database will contain everything you’ve ever ingested, so treat it accordingly (chmod 600 memory.db is a good start).

    If you’re building personal AI tooling and stalling on the memory problem, this pattern is worth a look. Let me know if you decide to try it out, how it works for you, and which project you’re using it on.


    About

    Nicholaus Lawson is a Solution Architect with a background in software engineering and AIML. He has worked across many verticals, including Industrial Automation, Health Care, Financial Services, and Software companies, from start-ups to large enterprises.

    This article and any opinions expressed by Nicholaus are his own and not a reflection of his current, past, or future employers or any of his colleagues or affiliates.

    Feel free to connect with Nicholaus via LinkedIn at https://www.linkedin.com/in/nicholaus-lawson/

    Agent DBs Googles Memory Notes Obsidian pattern replaced Vector
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Awais
    • Website

    Related Posts

    [2604.00362] In harmony with gpt-oss

    April 3, 2026

    Signals: Trajectory Sampling and Triage for Agentic Interactions

    April 3, 2026

    Paper Reconstruction Evaluation: Evaluating Presentation and Hallucination in AI-written Papers

    April 3, 2026

    Multimodal Analysis of State-Funded News Coverage of the Israel-Hamas War on YouTube Shorts

    April 3, 2026

    Implicit Execution Tracing for Multi-Agent Attribution

    April 2, 2026

    How to Handle Classical Data in Quantum Models

    April 2, 2026
    Leave A Reply Cancel Reply

    Top Posts

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 20250 Views

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 20250 Views

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 20250 Views
    Don't Miss

    The Artemis II Mission Has Five Separate Hot Sauces For Some Reason

    April 3, 2026

    Welcome to Open Tab, a weekly roundup of news, gossip, and stories that have stayed…

    Why Agentic AI Shopping Feels Unnatural And May Not Threaten SEO

    April 3, 2026

    [2604.00362] In harmony with gpt-oss

    April 3, 2026

    The 9 best AI tools for social media management in 2026

    April 3, 2026
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Keep WordPress Out Of Your Mouth

    April 3, 2026

    Signals: Trajectory Sampling and Triage for Agentic Interactions

    April 3, 2026
    Most Popular

    13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

    November 18, 20257 Views

    How to watch the 2026 GRAMMY Awards online from anywhere

    February 1, 20263 Views

    Corporate Reputation Management Strategies | Sprout Social

    November 19, 20252 Views
    Our Picks

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Disclaimer

    © 2025 skytik.cc. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.