Raymond Riter

GBC-AI: a sermon RAG for my church, running entirely on my own hardware

rpr2998@gmail.com (Raymond Riter) — Sun, 18 Jan 2026 00:00:00 GMT

I attend Grace Bible Church. Every Sunday there's a sermon, posted to YouTube on Monday, and an archive going back years that nobody actually queries. The information is there — it's just locked inside ~30-minute videos with no transcripts, no timestamps, and no way to ask "what did Pastor Bryan say about Romans 8 the last time he preached on it?"

GBC-AI is the answer. It's a local-first sermon RAG that ingests every video, transcribes and diarizes it, indexes the chunks, and lets a congregant ask questions with cited answers that link back to the exact timestamp in the source video.

The shape of it

Three entry points, one pipeline behind them:

app.py — Streamlit chat UI, the thing congregants actually use
api/main.py — FastAPI REST surface, the thing future integrations consume
launch.py — unified launcher with venv setup, preflight checks, model warm-up

The data flow:

YouTube / local video
        │
        ▼  yt-dlp
download_manager.py
        │
        ▼  5-tier pipeline
ingest.py
   ├─ faster-whisper        (large-v3-turbo, GPU)
   ├─ pyannote.audio        (speaker diarization)
   ├─ chunker               (semantic + windowed)
   ├─ BGE embeddings        (1024-dim)
   └─ ChromaDB              (persistent vector store)
        │
        ▼
chat_engine.py
   ├─ RAG retrieval         (top-k + hybrid search)
   └─ LM Studio streaming   (currently Gemma-4-26b-a4b)

A central llm_client.py wraps the streaming client with retries and timeouts. SQLite via database.py replaced an earlier JSON store once the dataset grew. Pydantic env config in settings.py keeps all the secrets and tunables in one place.

Ten phases, all done

The implementation plan ran across ten phases — every one is complete and merged:

Structured logging — Loguru everywhere, JSON sinks for prod
Ruff — locked down style + lint at CI time
Exception hierarchy — IngestError, RetrievalError, etc., with structured fields
LLM client centralization — retries, timeouts, token tracking
DB migration — JSON → SQLite for query-shaped data, ChromaDB stays for vectors
API layer — FastAPI with versioned routes
Auth — token-based, integrates with our existing church accounts
Docker — Compose stack for the API + DB; Streamlit still runs natively
Observability — Langfuse for trace inspection
Pipeline orchestration — Prefect for the ingest jobs

Last meaningful work was the multi-model routing layer and a Redis-backed task queue. There's an IMPROVEMENT_PLAN_V2.md with the next round of work.

What runs where

This whole thing is local-first by design. The 179 GB on disk breaks down roughly as:

sermon-rag/videos/ — raw MP4/MKV, the actual sermons (>100 GB)
sermon-rag/audio/ — extracted audio + voice separation models
sermon-rag/db/ — ChromaDB persistent vector store + SQLite
venv with CUDA torch — ~20–40 GB

The LLM weights live in LM Studio's own directory, not in the repo. The 24 GB on the RTX 3090 is enough to run a Gemma-4-26b-a4b for the retrieval-augmented Q&A while keeping ChromaDB warm.

The hard parts

Whisper accuracy on theological vocabulary. "Propitiation" and "soteriology" don't appear in Whisper's training data the way "weather" does. I tried hot-word lists and they hurt more than they helped — Whisper's a sequence model, not a dictionary. The fix was a post-process pass that uses an LLM with a custom Bible-specific vocabulary prompt to rewrite obvious mishearings.

Citations with timestamps. The answer "Pastor Bryan talked about the imputed righteousness of Christ in his Romans 4 series" is useless without a deeplink. Every chunk in ChromaDB carries (video_id, start_seconds, end_seconds). The chat engine surfaces those as YouTube ?t= links in its output, so the user can jump to the moment.

Cache race conditions. Documented in cache.py:43–46 and IMPROVEMENT_PLAN_V2.md. Multiple tabs hitting the API simultaneously could trigger duplicate work. The fix is a proper double-checked lock with a TTL guard, but I'd rather rip the cache out and front the LLM with Redis instead.

What's next

The roadmap I have written down:

Fix the 26 failing tests in test_ingest.py (heavy module-level imports block clean mocking)
Replace the homegrown cache with Redis-fronted memoization
Add the email-digest feature — weekly auto-summary mailed to opted-in congregants
Sermon comparison — "show me every time this pastor preached on grace, sorted by year"

This was the project that taught me how much you can do with a single 24 GB GPU if you're disciplined about model sizes. It's also the project that's most directly useful to people in my actual life — which is its own kind of vindication.

The code is at github.com/Raymondriter/GBC-AI.

helpmetopray.org: a quiet place to pray, and to be prayed for

rpr2998@gmail.com (Raymond Riter) — Sun, 15 Mar 2026 00:00:00 GMT

"Where two or three gather in my name, there am I with them." — Matthew 18:20

Most prayer apps are dressed-up journals. You open them, type a request into a private feed, and that's it. There's no presence on the other side. That's not what prayer is — it's not even what the model in Matthew 18 describes.

So I built helpmetopray.org. The differentiator is not another journal or devotional feed — it's felt presence. When you post a request, real people pray for it. When you open the app, you can sit with a stranger's burden for two minutes. Jesus-centered by default, extensible later.

The four ways in

Once you're past the landing page, the daily dashboard offers four routes:

/pray/solo — phrase-by-phrase walk through the Lord's Prayer. The original blueprint Jesus gave when his disciples asked how to pray. Walked slowly, with each line as its own screen.
/pray/stranger — you receive one real request submitted by another user. You sit with it. You Amen. The request goes back into the queue for someone else to pray. You don't see the requester's name; they don't see yours.
/pray/request — you share a request, with a visibility scope: private (just you), circle (specific people), or world (anonymous, goes into the stranger pool).
/pray/journal — your open and answered prayers. The only spot where the experience is journal-shaped.

There's also a /pray/alongside placeholder for what I think will be the killer feature: a silent presence room. Open the app, see how many other people are praying right now, sit in the same digital silence. No content, no chat, just presence. That's the v2.

The anonymous-first decision

A surprising number of people who want to pray won't sign up for an account to do it. So the entire app works without one.

Anonymous visitors keep everything in their browser's localStorage. They can use /pray/solo, walk the Lord's Prayer, write private requests, and journal answered prayers — all without an account. Sign-in is only required to post a request strangers can pray for or to receive other strangers' requests, because those two flows require a server-side queue.

That asymmetry is the core UX bet. Most apps lock everything behind sign-up. helpmetopray locks only the social actions. The reflective ones are free.

The stack

Framework        Next.js 16 (App Router) + React 19
Styling          Tailwind v4
State            Zustand with localStorage persistence (anonymous users)
Backend          Supabase (Postgres + Auth + Realtime + Edge Functions)
                 — swapped behind a Backend interface so anonymous mode
                   uses a localStorage adapter, signed-in mode uses Supabase
Deploy           Vercel, custom domain helpmetopray.org

The Backend interface in src/lib/backend.ts is the thing I'm proudest of architecturally. It lets the entire app stay agnostic about whether you're signed in. Want to test the signed-in flow without a Supabase project? Plug in the mock backend. Want to swap Supabase for something else later? It's one file.

Why this exists

I'm a Christian, and I noticed that I would intend to pray and not actually do it. The friction wasn't theological — it was UX. By the time I'd opened my Bible app, scrolled past the daily devotional, dismissed the upsell, and found the prayer journal feature buried three menus deep, the moment had passed.

I wanted something that respected the moment. Open, tap one of four cards, do the thing, close.

The other half of why this exists is the stranger pool. The number of times I've gotten a text from a friend saying "please pray for X" and felt the weight of it lift slightly is significant. That's something software can carry at scale. People are willing to do this for strangers; they just need a place where it's normal to.

What's next

The silent presence room is the big one. After that:

Push notifications for circle requests (opt-in, never aggressive)
A "your request was prayed for" trickle that returns to the requester anonymously — closes the loop without identifying the prayer
An offline mode that queues requests for sync

The site is live now. If you want to use it, just open it. No app to install, no account required to start.

helpmetopray.org

Job-Shorts: rendering every chapter of the Book of Job as a 60-second AI video

rpr2998@gmail.com (Raymond Riter) — Wed, 22 Apr 2026 00:00:00 GMT

The Bible is public domain. AI video gen is here. Short-form Bible content has a real audience. I wanted to know if I could turn reading Job into a 42-episode YouTube/TikTok series rendered entirely on my desktop.

I'm calling it Job-Shorts. The pipeline is local-first, commercial-safe, and routes every text-generation call through my Claude Code subscription so the per-chapter cost is the electricity it takes the 3090 to render.

The pipeline

chapter_number + KJV text
        │
        ▼
1. SCRIPT GEN     (Claude via subscription, ~10s)
        → 150-word narration (hook / setup / tension / payoff / turn / CTA)
2. BREAKDOWN      (Claude, ~15s)
        → 4–6 visual beats + character lock + style lock
3. KEYFRAMES      (ComfyUI / Flux Dev / Z-Image)
        → 1 preview image per beat
4. NARRATION      (F5-TTS, local)
        → audio + word-level timestamps
5. VIDEO GEN      (ComfyUI / LTX 2.3 or HunyuanVideo)
        → each beat at narration-matched length, 2 takes
6. EVALUATOR      (Claude vision)
        → picks the best take by scoring rendered frames
7. CAPTIONS       (faster-whisper)
        → burned word-by-word captions from narration
8. ASSEMBLE       (FFmpeg)
        → concat + narration + music bed (sidechain-ducked) + verse overlays
9. PUBLISHING     (Claude)
        → title + description + hashtags + thumbnail
        │
        ▼
output/chapter_N/final.mp4 (1080x1920, vertical, ready to upload)

End-to-end on the 3090: roughly 30–60 minutes per chapter, mostly unattended.

The LLM routing trick

There's a job_shorts.llm module that fronts every text call. It picks one of three backends:

Backend	When picked	Cost
`claude_code`	default if `claude` is on PATH	uses your Claude Code subscription
`claude_api`	only if `llm_backend=claude_api`	pay-per-token
`ollama`	fallback	free, local

claude_code works by invoking claude -p as a subprocess and reading from your existing auth. So all script generation, scene breakdown, evaluator scoring, and publishing metadata go through my Max plan — no per-token spend.

That single decision turns the math on its head. Without it, generating 42 chapters at GPT-4-class quality would cost real money. With it, the only cost is the electricity for ComfyUI to render the video.

Patterns I borrowed (and the ones I had to invent)

Borrowed from other text-to-film projects — these are well-established now:

Character lock — full physical description injected verbatim into every prompt
Style lock — 20–40 word visual style string locked across all prompts
World reconstruction — every prompt fully self-contained, no inter-clip memory assumed
Storyboard-before-video — cheap keyframe preview before expensive video gen
Multiple takes + AI evaluator — generate 2–3 takes, LLM scores PASS/FAIL
Duration calc from word count — narration WPM dictates clip length
Resume-from-crash state file — JSON state after every step

What I had to add for Bible content specifically:

Whisper caption timing — accurate word-level burned captions for muted viewing (this is non-negotiable on Shorts/TikTok)
Series-wide consistency — series.json keeps Job + style identical across all 42 episodes
KJV auto-fetch — pulls public-domain Bible text from bible-api.com so I never type a verse
Verse chyron overlay — quoted scripture appears on screen with proper formatting
Music bed with sidechain ducking — auto-select + duck under narration
Batch mode — process N chapters overnight unattended

The CLI surface

# See your hardware tier and recommended models
python -m job_shorts.cli info

# Verify the LLM backend works
python -m job_shorts.cli test-llm

# Just write the script — fast, free, no rendering
python -m job_shorts.cli script 1

# Generate one chapter end-to-end (supervised)
python -m job_shorts.cli chapter 1

# Batch chapters 1 through 10
python -m job_shorts.cli batch 1-10

# Fully autonomous overnight: auto-launch services, vision-evaluator, no gates
python -m job_shorts.cli auto-batch 1-42

# Resume a crashed run
python -m job_shorts.cli resume output/chapter_03

The fully-autonomous mode is the one I actually use. Start it, walk away, the rig knows how to relaunch ComfyUI or Ollama if either dies, and there's a JSON state file after every step so a power blip doesn't cost a chapter.

Where it's at

Phase 0 (23 modules) is complete. Phase 1 — first end-to-end render — is the next step. The interesting open questions are around the evaluator: how do you teach a vision model to spot when LTX has slipped into uncanny-valley territory before you commit to a take? The current heuristic is naive (luminance variance + character consistency check). I think there's a smarter version that compares each frame back to the keyframe storyboard.

If you want to read the actual code or watch progress, it lives on my GitHub. The whole point of this project is that anyone with a 12 GB+ GPU should be able to fork it and render their own Bible (or any other public-domain text). The pipeline doesn't care that it's Job.

LexiGrow: a clinical-grade tracker for my kids' first words

rpr2998@gmail.com (Raymond Riter) — Thu, 26 Feb 2026 00:00:00 GMT

When you have a baby, you start writing down the words. "Dada" got logged on a sticky note. "More" went into the Notes app. By the time you have a toddler you have words in four places, none of them queryable, none of them comparable to anything clinical.

LexiGrow is the app I built so my wife and I would stop losing track. Then I realized the proper version of this tool already existed in the speech-language pathology world — it's just not consumer-facing. So I rebuilt it from the spec.

MB-CDI as the foundation

The MacArthur-Bates Communicative Development Inventories are the standard parent-report instrument for early language acquisition. Two relevant forms:

Words & Gestures — 8–18 months, 396 vocabulary items + a gestures inventory
Words & Sentences — 16–30 months, expanded vocabulary + early grammar

LexiGrow ships with both vocabulary banks bundled as JSON (assets/data/mbcdi_vocabulary.json, assets/data/mbcdi_gestures.json). You don't track free-text words — you check off MB-CDI items as your child produces them. That means the data is directly comparable to clinical norms, not just a personal log.

The stack

Framework        Flutter 3.27+ / Dart 3.6+
Local DB         Isar (offline-first, fast, type-safe)
State / arch     BLoC, feature-first
Theming          Material 3 light + dark
Future backend   Firestore for sync (Phase 8 of the modernization plan)
Compliance       COPPA/GDPR services in core/compliance/

The project layout is feature-first so each module owns its own state:

lib/
├── main.dart
├── core/
│   ├── compliance/      COPPA/GDPR services
│   └── theme/           Material 3 light + dark
├── data/
│   ├── models/          Isar collections (VocabularyItem,
│   │                                       GestureItem,
│   │                                       ChildProfile)
│   └── repositories/    Isar-backed data access
└── features/            each feature owns its BLoC
    ├── analytics/
    ├── gestures/
    ├── home/
    ├── onboarding/
    ├── reports/
    ├── vocabulary/
    └── quick_add/       voice + photo input

The data model is two tables and an inventory: VocabularyItem, GestureItem, and a ChildProfile. Everything else — frequency-of-use, age-at-first-production, comprehension-vs-production split — derives from those.

Bilingual children

This is where consumer apps fall over. If your kid says "dog" in English and "perro" in Spanish, those are not two vocabulary items — they're one concept. The MB-CDI norms only work if you count conceptual vocabulary, not surface form.

I baked that algorithm in. The doc is at design/bilingual_logic.md in the repo. The short version:

Every MB-CDI item has a concept ID
Each child profile has 1–N spoken languages
A vocabulary entry is (concept_id, language, age_at_first_production)
Comprehension and production are scored at the concept level, not the surface form

That single decision means LexiGrow can give a bilingual kid a fair number against the norms instead of penalizing them for speaking two languages.

What's hard about this

The hardest part is not over-medicalizing it. Parents don't want a clinical assessment tool that makes them anxious about every milestone. The MB-CDI is a percentile system; it's normal for a 14-month-old to be at the 30th percentile and a 22-month-old to be at the 90th. LexiGrow has to communicate that without either downplaying real concerns or generating false alarms.

The current design is:

Show progress against age-band norms, never against a single rigid line
Surface percentile ranges, not a single number
Explicitly flag the "this is a wide range" framing in onboarding
Provide a "share with your pediatrician" report — the only output styled like a clinical artifact

If a parent wants the clinical output, they can ask for it. If they want the encouraging output, that's the default.

What's next

Phase 1 of the modernization plan adds integration tests. Phase 5 renames voice/ to quick_add/ and consolidates the photo + voice inputs. Phase 8 adds Firestore sync for multi-device families. After that, the obvious next thing is a longitudinal report — a year-by-year report card you can keep across siblings.

This was the first Flutter project I'd shipped to a place where it actually had to feel like a native iOS app. Material 3 helps. The remaining iOS-feels-different work is mostly haptics, swipe-back, and the photo picker — and the photo picker is half of why God invented image_picker_ios.

If you're a parent who wants to log first words properly, or a clinician who's tired of recommending tools that don't use the MB-CDI, the repo lives on my GitHub.

Traeger Peppered Beef Jerky

rpr2998@gmail.com (Raymond Riter) — Fri, 22 Nov 2024 00:00:00 GMT

Taken from this Traeger recipe, https://www.traeger.com/recipes/peppered-beef-jerky.

Used eye of the round for the meat, hand sliced from whole Costco cut. Used dragons milk dark stout for the beer in marinade, and everything else listed in recipe which we doubled.

Marinated nearly 24hrs overnight. Dried out on paper towels hit with extra pepper.

Finally 180° smoke for 4hrs 20mins on the Traeger. First time making my own and since it was warm, having warm beef jerky but we loved it.

RayFitnessPal: building the MyFitnessPal that MyFitnessPal refuses to build

rpr2998@gmail.com (Raymond Riter) — Fri, 10 Apr 2026 00:00:00 GMT

I've used MyFitnessPal on and off since 2014. Every couple years the UX gets a little worse, the premium paywall a little more aggressive, and the food database a little more cluttered with phantom entries from people who don't know how a serving size works. So I built my own.

The official-sounding name is "Nutrition Tracking App." The disk-folder name is myfitnesspal-clone. The name I actually call it is RayFitnessPal.

What it does

Food logging — search, scan, or photograph a meal and get detailed nutrition
AI food recognition — Gemini analyzes a photo and names what's on the plate, then macros get pulled from the actual nutrition databases
Barcode scanning — point the phone, get the food, done
Camera integration — meal photos saved against entries, plus AI-recognition source
Real-time calculations — serving-size scaling that doesn't make you do arithmetic
Meal planning — schedule future meals
Water tracking — daily intake with reminders
Recipe builder — combine ingredients into a custom recipe with computed nutrition
Exercise integration — workouts and fitness activity logs
Progress analytics — trends, weekly summaries, goal hit-rate
Smart notifications — reminder cadence based on actual behavior
Data export/import — your data is yours

Crucially: dark mode that actually works, offline support via Dexie + localStorage, and real-time sync via Supabase for authenticated users.

The stack

Frontend     Next.js 15, React 19, TypeScript
Styling      Tailwind CSS
Backend      Supabase (Auth + Postgres + Realtime)
Food DB      USDA FoodData Central (free) + Nutritionix (free tier)
AI           Google Gemini (food recognition + Quick-Log)
Camera       WebRTC getUserMedia
Barcode      ZXing + HTML5 QR Code
Offline      Dexie (IndexedDB) + localStorage for preferences

The interesting choice here is going to free public APIs for the food database. The USDA FoodData Central is genuinely huge — about 1.9M entries — and it's a US government dataset, so it's not going behind a paywall. Nutritionix's free tier fills in the gaps for branded foods. Between them you get coverage that's competitive with MyFitnessPal Premium without any subscription.

The Gemini Quick-Log trick

The single feature that justifies the whole project is "Quick-Log." You point your phone at your plate, snap a photo, and a few seconds later you have a logged meal with macros within ~10% of correct.

The flow is:

Capture the image via getUserMedia
Send it to Gemini with a prompt that asks for a JSON list of {food_name, estimated_grams} per visible item
For each food_name, query Nutritionix for the closest match
Scale macros by estimated_grams / standard_serving_grams
Sum, show the user, let them edit before saving

The estimated_grams output from Gemini is genuinely accurate enough — within a slice of bread for most everyday meals. I don't trust it for restaurant portions, but for cooking-at-home it's better than my own eyeballing.

What was hard

The Firebase → Supabase migration. The original prototype was Firebase and the data model was wrong for relational queries (you can't really say "show me protein totals per week" cleanly in Firestore without denormalizing aggressively). Supabase + Postgres made every analytics query 3–5 lines instead of a custom Cloud Function. The data types are generated directly from the schema, so the TypeScript layer never drifts from the database.

Offline-first via Dexie was the second-hardest. Every log is staged locally first, then synced. The sync resolver has to handle: same-meal-edited-on-two-devices, item-deleted-on-one-device-but-not-the-other, and the "I logged 3 weeks of meals while on a camping trip with no service" case.

What I'd build next

A coach that actually understands my goals. The current "Quick-Log" tells me what's on the plate; it doesn't tell me whether I should be eating it given that I told it I'm trying to hit 180g protein this week. That's the next feature: a passive nudge layer driven by the same Gemini calls that already see every meal.

The codebase lives on my GitHub. It is enormously over-engineered for a personal nutrition app. That's the point.

rsbot: building an autonomous OSRS bot that actually has a plan

rpr2998@gmail.com (Raymond Riter) — Thu, 04 Dec 2025 00:00:00 GMT

The interesting thing about an Old School RuneScape bot is not the macro that clicks the tree. The macro is solved. What's not solved is what to do next.

Most bots run a single skill forever. Mine has a goal planner. You give it a long-term target — "combat to 70, total wealth to 1M GP, 200+ quest points" — and it picks its own activity based on current stats, inventory, gear, location, and how close it is to each goal. When it finishes one task, it picks the next one without me being there.

It's called rsbot. It's the largest active codebase I have on disk by source-file count (2,073 files). Most of those files are the verbose modular architecture, not bloat.

The shape

Entry:    run_autonomous.py
              │
              ▼
          BotEngine ──→ AutonomousPlayer (script)
              │
              ├─ DetectionBridge ──→ YOLOv8 pipeline
              ├─ AntiBan          ──→ humanlike timing
              ├─ GoalPlanner      ──→ long-term targets
              └─ TaskManager      ──→ short-term actions

Core layout:

bot/
├── core/                state machine, task manager, anti-ban, goal planner
├── modules/
│   ├── autonomous_combat.py
│   ├── autonomous_skilling.py     (woodcutting, mining, fishing,
│   │                               cooking, smithing, crafting)
│   ├── autonomous_banking.py
│   ├── autonomous_travel.py
│   ├── autonomous_questing.py
│   └── gearing_manager.py
├── ai/
│   ├── ai_brain.py                 LLM decision-making
│   └── strategic_advisor.py
├── game/                inventory, player, bank, login
└── input/               RuneLite plugin OR window-constrained mouse

Two input modes: a RuneLite plugin that posts intents over localhost, and a fallback pyautogui mode that constrains itself to a window rect. The latter is what you ship to a friend who won't install RuneLite plugins.

Perception

YOLOv8 does the heavy lifting. I trained it on the obvious classes — trees, ore rocks, fishing spots, monsters, bank booths, doors — using screenshots from my own gameplay. mss grabs frames; OpenCV preprocesses; YOLO outputs bounding boxes; a coordinate transform converts those to game-window pixels.

detection_bridge.py is the layer that turns "raw detections" into "labeled entities I can decide about." It dedupes overlapping boxes, applies confidence thresholds per class, and tags entities with stable IDs across frames so the planner can say "the same chicken I was already attacking, not a different one."

Strategy

ai_brain.py wraps a local LLM (LM Studio) for higher-level decisions. The planner doesn't ask the LLM what to do on every frame — that would be slow and wasteful. It asks the LLM when a decision is genuinely ambiguous: which of three nearby trees is best when one is closer but another is in safer territory; whether to bank or keep killing when the inventory is at 27/28; whether the current skill grind is the best one toward the goal.

The state machine handles everything that's mechanical. The LLM handles everything that's interesting.

Goals, not scripts

The thing I'm most proud of is the goal planner. You set it like this:

goals:
  - { type: skill,  skill: combat,  target: 70 }
  - { type: wealth, target_gp: 1_000_000 }
  - { type: quest,  target_qp: 200 }

Then goal_planner.py computes a heuristic ranking on every cycle:

score(activity) =
    expected_progress_per_minute(activity, current_state) /
        cost_of_setup(activity, current_state) *
        priority_weight(active_goal, activity)

The bot picks the highest-scoring activity, kicks off the appropriate module, and keeps going until the goal is achieved or a higher-scoring activity emerges.

That's why this isn't a fishing bot or a combat bot — it's "the bot for the goals you set." Combat to 70 might mean killing chickens to 30, hill giants to 50, and dust devils to 70. The planner makes those transitions on its own.

State

What works:

Combat (recent successful run vs hill giants — full kill loop, prayer flick, loot pickup, restocks)
Skilling (woodcutting, mining, fishing, cooking, smithing, crafting all working)
Travel (route planner picks the optimal teleport/run path)
Antiban (humanlike timing distributions, mouse movement Bezier curves, idle injection)

What's broken right now:

Banking. Debug logs from April show task timeouts on bank-opening — 40 minutes stuck on a gear-acquisition task that should take 30 seconds. There are five bank_open_fail screenshots saved as evidence. The current retry logic is too crude (15 retries on timeout); the fix is to detect "bank booth is occluded" vs "I clicked the wrong thing" and recover differently.
Questing is skeleton-only. rune_mysteries.py exists but isn't wired into the main loop.

What's next

Three threads I'd pull, in order:

Fix banking. The whole goal-planner falls over when the bot can't reliably bank. Diagnose the failure modes from the saved screenshots, replace the retry loop with state-driven recovery.
Wire questing into the main loop. Once questing is a primitive the planner can pick, the QP goals start working.
Re-enable RL training. There's scaffolding at rl/osrs_env.py and rl/combat_env.py that's disabled. The OpenAI Gym wrapper around the game state would let me train a real RL agent for the combat sub-loop instead of hand-tuning prayer flicks.

There's also a strong consolidation opportunity with The Visual Bridge, which solves the same vision problem from a different angle — it's an MCP server that wraps YOLOv8 + Moondream2 for any OSRS window. Visual Bridge could be the perception layer for rsbot, freeing the bot to focus on planning.

This is a project that taught me how complex an "agent that does one thing autonomously" actually is. Most days I'm not sure if I'm building software or breeding a creature.

The Tesla OSS suite: 85K words of research distilled into 8 repos

rpr2998@gmail.com (Raymond Riter) — Fri, 01 May 2026 00:00:00 GMT

I have a Model Y 2026 AWD on order. While I waited for delivery, I did what any sensible person would do: I built the eight pieces of software I'd want it to come with.

This started as a research dump. ~85K words of curated notes, 1,409 raw ideas pruned to 915, 40 clusters, three waves of buy-vs-build analysis. None of that ships. It's just kindling. What ships is a workspace called tesla with eight repos arranged in dependency order.

The shape

The library that everything else depends on comes first:

tesla-clip-tools — the shared library. Ten source plugins (Tesla, Wyze, Reolink, UniFi, Ring, Nest, Eufy, Frigate, Arlo, Blink), a sampler, generic VLM backends, and SEI primitives. Without this, every downstream repo would reimplement frame sampling and prompt scaffolding. v0.7.0, 135 tests.

Two consumers prove the abstraction:

sentrytriage (v0.14.0, 115 tests) — local AI Sentry triage with a FastAPI dashboard, thumbs feedback, tune-prompt, A/B evaluator with train/test split, and embedded video. pip install sentrytriage && triage demo boots a 30-second walkthrough.
fsd-disengagement-studio (v0.5.0, 106 tests) — catalog and dashboard for FSD disengagements. Three paths: manual save, real-time Home Assistant trigger, and bulk SEI backfill. Per-driver splitter on top.

Five independent repos cover orthogonal surfaces where the clip library wouldn't help:

hey-nabu-climate-concierge (v0.4.0 Python + v0.5.0 PWA, 33 + 62 tests) — DIY voice climate concierge. FastMCP server + Tesla-browser PWA + real Home Assistant WebSocket state subscription + IndexedDB history.
teslakit (v0.1.0, 45 tests) — Docker Compose monorepo bundling Home Assistant + TeslaMate + Tesla HTTP Proxy + BLE bridge. Replaces Tessie's $13–20/month subscription.
deliveryday-companion (v0.2.0, 54 tests) — delivery-day acceptance checklist with photos and a signed PDF report. Model Y 2026 AWD preset baked in (50 steps).
tesla-changelog-diff (v0.3.0, GitHub-only, 61 tests) — per-VIN OTA diff bot. Three VLM backends (OpenAI, Anthropic, Gemini) for HMI screenshot diffs.
guestkey-issuer (v0.1, Go scaffold) — the missing ~80-line primitive for WhitelistOperation.addImpermanentKey (ROLE_GUEST=8). Dry-run only for now.

Why these eight, in this order

The constraint I set was that every repo had to be testable without a vehicle in hand. Synthetic seeds, mock Fleet API clients, deterministic fixtures, plugins for non-Tesla cameras. That constraint is the reason the OSS releases are portable to anyone — Tesla owner or not.

tesla-clip-tools had to come first because everything downstream — Sentry triage, FSD disengagement classification, future plugins — needs the same source/sampler/VLM/SEI primitives. The two consumers exercise the abstraction. The five independents don't share a dependency graph; they just share a Tesla.

The workspace itself

There's a doctor.py at the root that runs every Python repo's test suite and prints a green/yellow/red dashboard. There's an index.html that renders the workspace overview with a Mermaid diagram, hero stats, and live-link cards — no build step, just open it. There's a start-all-demos.{ps1,sh} that boots three dashboards in parallel.

tesla/
├── tesla-clip-tools/         shared library (135 tests)
├── sentrytriage/             AI Sentry triage (115 tests)
├── fsd-disengagement-studio/ FSD disengagement catalog (106 tests)
├── hey-nabu-climate-concierge/ voice concierge (95 tests across Python + PWA)
├── teslakit/                 Tessie replacement (45 tests)
├── deliveryday-companion/    acceptance checklist (54 tests)
├── tesla-changelog-diff/     per-VIN OTA diff (61 tests)
├── guestkey-issuer/          Go scaffold for ROLE_GUEST=8
├── doctor.py                 workspace test runner
└── index.html                browser-rendered overview

What I learned

A few things only became obvious once I had eight things shipping side by side:

The shared library should be the second thing you ship, not the first. I built tesla-clip-tools first as a library, but I didn't understand its real shape until I'd shipped sentrytriage against it. The right move is to ship sentrytriage from a single-file embedded version, then extract the library on the way to fsd-disengagement-studio.

Synthetic seeds save the project. If triage demo didn't boot in 30 seconds with no Tesla and no cloud, nobody would ever try the thing. Determinism is a feature.

Repo count > monorepo for this kind of OSS. Each repo can be pip install'd on its own. Each has its own LICENSE (mostly MIT, AGPL for teslakit because of TeslaMate's contagion). Each can graduate independently.

When the Model Y actually arrives, the delivery checklist gets a real workout. Until then, the synthetic data keeps the repos honest.