Karpathy Tweeted an Idea. I Spent a Day Putting It in My Obsidian Vault.
Andrej Karpathy posted a gist called 'LLM Wiki' — a pattern where the LLM maintains a compounding markdown wiki between you and your raw documents. I adapted it to my existing Obsidian vault without breaking what was already there. Here's what I changed, what surprised me, and the meta-lesson the wiki taught me about itself.
Series: VIBE.LOG
- 1. The Layout Vocabulary Cheat Sheet: What to Call That Thing on Your Screen
- 2. I Spent 3 Hours Trying to Proxy a Blog Subdomain. Here's My Descent Into Madness.
- 3. The Complete SEO Guide: How to Make Google Actually Notice Your Website
- 4. Why Your Next.js Favicon Isn't Showing (And the Three Ways to Actually Fix It)
- 5. GitHub Keeps Telling Me My Branch Is Fine. And Also Not Fine. At the Same Time.
- 6. Mobile-First Playground: Making an Astrology Grid Actually Work on a Phone (And Go Viral While Doing It)
- 7. Playground Is Live: The Destiny Grid, Real Astrology, and Why I'm Shipping a Toy Every Month
- 8. The Interactive Component Cheat Sheet: What to Call That Clickable Thing
- 9. Google Rejected My Site for 'Low-Value Content.' Here's What I Actually Fixed.
- 10. I Actually Fixed Everything. Here's What That Looked Like.
- 11. I Hired 131 AI Employees Today. Here's How.
- 12. I Let My AI Run 72 Backtests While I Watched. It Picked the Winner.
- 13. I Taught My AI to Stop Asking Questions. It Took Five Rewrites.
- 14. Obsidian Turned My Scattered Notes Into a Second Brain. Here's How to Set It Up.
- 15. The Destiny Grid Gets Its East Wing: I Rebuilt Saju (四柱八字) in TypeScript
- 16. Molecule Me: Your Personality, Encoded in Chemistry
- 17. OpenAI Just Built a Plugin for Their Competitor's Tool. I Installed It.
- 18. I Combined Two Open-Source Repos Into an AI That Plans, Builds, and Reviews Its Own Code
- 19. I Built a Weekly Directory for Claude Code Agents (Because My Brain Couldn't Keep Up)
- 20. Karpathy Tweeted an Idea. I Spent a Day Putting It in My Obsidian Vault. ← you are here
- 21. Keystroke Aura — How I Turned Typing Rhythm into a Personality Test
A few weeks ago Andrej Karpathy dropped a gist called "LLM Wiki" that I read three times before doing anything about it. Then I spent a Saturday adapting it to my Obsidian vault.
The result: I asked the LLM the same question twice. The first time it took 40 seconds. The second time it took 3.
Not because I cached anything. Because the second time, the answer was already on a wiki page — written by the LLM after the first question. That's the whole point of the pattern. Once you see it work, you can't unsee why RAG felt off.
This post is the boring part: what I changed in my vault, what almost broke, and the meta-lesson my own wiki taught me about itself in the first 48 hours.
🧠 The Idea (in One Paragraph)
Most people use LLMs with documents like this: dump files into ChatGPT or NotebookLM, ask a question, get an answer. The LLM searches the docs at query time, finds chunks, generates a reply, and forgets everything. Next question? Same dance from scratch.
Karpathy's pattern: insert a layer between you and the raw documents. That layer is a folder of markdown files — a wiki — written and maintained entirely by the LLM. New source comes in? The LLM updates 10–15 wiki pages, writes a synthesis, flags contradictions. Question comes in? The LLM reads the wiki first (already synthesized), gives you the answer, and files the answer back as a new page.
Knowledge compounds. The wiki gets richer every week whether you show up or not.
"Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase." — Karpathy
That line is the whole pitch. If you've ever managed a codebase with a teammate who never gets bored and never forgets to update cross-references, you already know the upside.
🤔 Why Not Just Use RAG?
I had a NotebookLM-style RAG setup before this. It worked. It also annoyed me, and I couldn't articulate why for months. Karpathy's gist crystallized it:
| RAG | Karpathy's LLM Wiki |
|---|---|
| Stateless — every query starts from zero | Stateful — answers build on previous answers |
| Re-derives synthesis on every question | Synthesis already written, just reads it |
| Cross-references implicit (in your head) | Cross-references explicit ([[wikilinks]]) |
| Contradictions invisible | Contradictions flagged on the page |
| Vector DB infrastructure required | Just markdown files in folders |
| Scales to millions of docs | Sweet spot ~100 sources, hundreds of pages |
The RAG advantage at scale is real. But for "stuff I think about a lot," ~100 sources is more than enough, and you give up zero infrastructure to use markdown.
The boring truth: I had been treating my notes like a search index when I wanted them to behave like a brain.
🏗️ Three Layers (Pick a Domain Before You Build Anything)
Karpathy's gist describes three layers. The architecture is dead simple:
┌─────────────────────────────────────────────────┐
│ YOU │
│ (curate sources, ask questions) │
└─────────────────┬───────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ THE WIKI (50-Wiki/<domain>/) ← LLM owns it │
│ │
│ index.md log.md │
│ concepts/ experiments/ patterns/ │
│ (cross-linked markdown — never edited by you) │
└─────────────────┬───────────────────────────────┘
│ reads from
▼
┌─────────────────────────────────────────────────┐
│ RAW SOURCES (40-Sources/<domain>/) ← immutable│
│ │
│ papers, articles, transcripts, │
│ meeting notes, web clips │
└─────────────────────────────────────────────────┘
▲
│
┌─────────────────────────────────────────────────┐
│ THE SCHEMA (CLAUDE.md at vault root) │
│ Tells the LLM how to ingest, query, lint │
└─────────────────────────────────────────────────┘Key detail: You write to Sources. The LLM writes to Wiki. The schema (CLAUDE.md) is the contract that keeps the LLM disciplined.
Pick one domain before you start. A "Wiki for everything" is too vague to be useful. Mine started with one project — research notes on a single subject — so I could see the pattern actually compound before extending it. Don't build for hypothetical breadth.
📂 What I Added to My Existing Vault (Non-Destructively)
I had an existing PARA + MOC vault from my Obsidian setup post. I did not want to bulldoze it. The whole point of "second brain" is continuity.
So I added two folders and one file. That's it.
Obsidian/
├── HOME.md ← unchanged
├── 00-Active-Context.md ← unchanged (human dashboard)
├── 10-Projects/ ← unchanged (PARA)
├── 20-Areas/ ← unchanged
├── 30-Resources/ ← unchanged
│
├── 40-Sources/ ★ NEW ← raw, immutable
│ └── <domain>/
│ ├── papers/
│ ├── articles/ (Obsidian Web Clipper)
│ └── reports/
│
├── 50-Wiki/ ★ NEW ← LLM-owned, 100%
│ └── <domain>/
│ ├── index.md
│ ├── log.md
│ ├── concepts/
│ ├── experiments/
│ └── patterns/
│
└── CLAUDE.md ★ NEW ← schema (Claude Code reads on every session)The PARA folders stay as the "human entry point." The new layers are where the LLM lives. They never collide because the LLM has explicit rules in CLAUDE.md about which directories it owns.
Don't confuse: PARA classifies "where to put a thing." LLM Wiki synthesizes "what we've learned." Same vault, different jobs. Trying to merge them is how you ruin both.
📜 The Schema File (CLAUDE.md)
This is the file that turns a generic LLM into a disciplined wiki maintainer. It lives at the vault root, and Claude Code reads it on every session because that's how Claude Code works.
Mine has six sections. Skeleton:
# Vault Operating Rules for Claude Code
## 1. Two Worlds — Don't Confuse Them
| Area | Who writes | Who reads |
| Human dashboards | You + Claude (on demand) | You |
| LLM Wiki | Claude (autonomous) | You + Claude |
| Sources | Immutable | Claude only |
## 2. Domain Structure
- `<domain>` directories under both 40-Sources and 50-Wiki
## 3. Modes
### Mode A: Ingest (when you add a new source)
Step 0 (gate): Reality Sync — git log, file mtime check
Step 1: Copy source into 40-Sources with frontmatter
Step 2-7: Update affected wiki pages, append log entry...
### Mode B: Query (when you ask a question)
Step 1: Read index.md first
Step 2: Drill into pages, synthesize answer
Step 3: File good answers back into wiki
### Mode C: Lint (periodic health check)
Contradictions, orphan pages, stale claims, missing cross-refs
## 4. Domain-Specific Rules
...
## 5. Filename Conventions
## 6. Frontmatter StandardsThe schema is what differentiates "an Obsidian folder I tell the LLM to look at" from "an LLM that knows how to maintain a knowledge base." Without it, every session re-derives the conventions and they drift.
Tip: Write the schema with the LLM in your first conversation. Don't try to anticipate everything. Ship a 200-line skeleton, expand it whenever you catch the LLM doing something inconsistent.
🔁 The Three Operations
Once the schema is in place, three operations drive everything.
Ingest
You drop a source. You say "ingest this." The LLM:
┌──────────────────┐
│ source.md │ ← drops into 40-Sources/<domain>/
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Read + extract │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Touch 5–15 │ ← update concepts/, experiments/
│ wiki pages │ create new pages where needed
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Update index.md │
│ Append log.md │
└──────────────────┘A single source can touch ten or more pages because the LLM is updating cross-references, flagging contradictions with what it already knows, and noting where new data supersedes old.
Key detail: The first ingest in a new domain creates lots of pages. By the third ingest, the ratio flips — you start updating existing pages more than creating new ones. That ratio flip is the signal that the wiki is stabilizing.
Query
You ask a question. The LLM reads index.md first (the catalog), drills into the relevant pages, synthesizes an answer.
The non-obvious part: good answers get filed back as wiki pages. A comparison you asked for, an insight you stumbled into, a connection nobody else made — these would normally die in chat history. Filing them as pages means future-you (and future-LLM) inherits them.
Lint
Periodically — I do it weekly — you ask for a health check:
Lint the wiki. Check for: contradictions between pages, stale claims newer sources have superseded, orphan pages with no inbound links, important concepts mentioned but lacking their own page, missing cross-references.
The LLM produces a lint report. You triage. Most of the time it surfaces things you forgot, which is exactly what you'd want a tireless librarian to do.
📈 Compounding, Empirically
I tracked the first four ingests in one domain. The pattern carapathy describes ("compounding artifact") shows up immediately in the new-pages-vs-updates ratio:
| Ingest # | Source size | New pages created | Existing pages updated | Ratio (new : updated) |
|---|---|---|---|---|
| 1 | small (2KB) | 4 | 0 | 4 : 0 |
| 2 | large (28KB) | 7 | 2 | 7 : 2 |
| 3 | small (3KB) | 1 | 3 | 1 : 3 ⭐ |
| 4 | medium (4 files) | 4 | 11 | 1 : 2.75 ⭐ |
By ingest #3, the wiki was already doing the thing it's supposed to do — extending what existed instead of starting over. That's the moment the system pays for the schema work.
The boring truth: Most posts about knowledge management would show this curve and stop. The interesting thing is what's inside the updates — pages flagging "this contradicts what we said two ingests ago," pages noting "this hypothesis we made earlier was confirmed by this new source." That's the actual value. The page count is just a proxy.
🪞 The Meta-Lesson the Wiki Taught Me About Itself
Within 48 hours of starting this, the wiki caught a problem with how I was using it. I'm not making this up. Here's what happened:
I ingested a "recommendation document" — basically a list of 18 improvements someone had drafted for a project. The LLM faithfully created wiki pages for each recommendation, marked them as "TBD" (to be done), and registered them in the index.
Then later, when I asked the LLM to start working on those recommendations, the first thing it did was check git log and report back: "All 18 of these are already implemented. The work was done the same day the recommendation document was written."
The wiki had no idea. It assumed "recommendation = future work," because that's what recommendation documents usually mean. Nobody told it to cross-check against the actual repository.
I added one rule to the schema:
Step 0 — Reality Sync Gate: Before ingesting any recommendation/plan/spec document, run
git log --since=<source date>and compare against the affected files' mtimes. If the work is already done, mark each wiki page as "✅ Implemented (commit SHA)" instead of leaving it as TBD.
That's the meta-lesson. The wiki has blind spots that only show up when reality is checked against it. You can't see those blind spots until you actually try to act on what the wiki says. So the lint operation isn't enough — you also need a pre-ingest gate that asks "is this thing real, or just proposed?"
I've started thinking of this as the wiki's biggest weakness: it treats every source as new information by default. It's bad at recognizing when a source describes something that has already happened. Once you know to look for it, you can patch it. But you have to know.
🛠️ How to Prompt AI (To Bootstrap Your Own)
If you want to start tonight:
I want to build a Karpathy-style LLM Wiki inside my existing Obsidian vault. My vault uses [PARA / a different system]. I want to add new layers (Sources + Wiki) without modifying my existing notes. Walk me through:
- Picking a single domain to start with (one I think about a lot)
- The minimum schema to put in CLAUDE.md
- The first ingest — give me a source I can drop in tonight
- The success metric: when do I know it's working?
Don't give me a generic guide. Tailor it to me — ask for my domain first.
The "ask for my domain first" line matters. Generic guides about knowledge management are everywhere. The pattern only clicks when it's bound to a topic you already care about.
🧰 Things That Surprised Me
A handful of things I wasn't expecting:
The schema is the project, not the wiki. I spent more time iterating on CLAUDE.md than writing any single wiki page. The schema is what keeps two LLM sessions from drifting away from each other.
Synthesis pages are 70% of the value. Not the entity pages, not the concept pages. The pages where the LLM compares two sources or compiles a hypothesis from five — those are the ones I keep going back to.
Frontmatter pays off after page 30. I almost skipped frontmatter. Then I tried to ask "show me everything updated in the last week" and realized I needed updated: YYYY-MM-DD on every page. Add it from day one even if you don't use it yet.
Obsidian's graph view becomes meaningful. Before this, my graph was a hairball of human-made [[wikilinks]]. After this, the wiki cluster forms a separate tight constellation, and you can literally see which concepts are central.
The wiki gives you something to fight with. When the LLM writes a synthesis page that argues a position, you suddenly notice you disagree with parts of it. Disagreement is generative. RAG never gave me that — it just answered, and I either accepted or didn't.
🔑 Quick Reference
| You're thinking... | Action |
|---|---|
| "I have docs scattered across 5 places" | Pick one domain. Put its sources in 40-Sources/<domain>/ first. |
| "I'm not sure what to ingest first" | The most recent thing you read that you wished you remembered better. |
| "My existing notes are a mess — should I clean them first?" | No. Add the new layers next to them. Touch nothing. |
| "Should I use vector embeddings?" | Not until you hit ~200 pages. The index file works at small-to-mid scale. |
| "What if I disagree with the LLM's synthesis?" | File your disagreement as a wiki page. The wiki holds both views and notes the contradiction. |
| "How often do I lint?" | Weekly. Tag a lint-YYYY-MM-DD.md so you can see the trend. |
| "Will this break if I switch LLM models?" | The schema is plain markdown. Any model that can follow instructions can take over. |
🔗 Resources
- Karpathy's original gist — the source. Read it before mine.
- My Obsidian setup post — the PARA + MOC base I built this on top of.
- qmd — local search engine for markdown files (BM25 + vector + LLM rerank). Optional, useful past ~hundreds of pages.
- Obsidian Web Clipper — for getting articles into your Sources directory in one click.
🪶 The Realization
I've been writing markdown notes for years and watching them rot. The folders fill up. The cross-references go stale. The synthesis I made last March is gone because I wrote it in a chat window and the chat window is gone.
The thing Karpathy's pattern actually solves is that last failure mode. Synthesis used to be ephemeral. Now it lives on a page that the LLM will keep current. Every question I ask becomes a page. Every page gets updated when a new source contradicts or extends it. Every week, lint surfaces what's drifted.
It's the first time my notes have felt like an asset that grows by itself. Vannevar Bush sketched this in 1945 (the Memex) and gave up because nobody could solve "who does the maintenance." The LLM does the maintenance. That's the whole story.
I'll write a follow-up once I've used it for a month and seen what breaks. The wiki will probably tell me what to write about.
2026.04.19
Written by
Jay
Licensed Pharmacist · Senior Researcher
Building production-grade AI tools across medicine, finance, and productivity — without a CS degree. Domain expertise first, code second.
About the author →Related posts