Mark's Dev Blog

My Thoughts on AI, Part 2: Agent Setup, Workflow, and Tools

Thu, 07 May 2026 15:00:00 -0500

Introduction

Hopefully you've already read Part 1: Fears, Opinions, and Mental Journey so you understand how I ended up with this mindset and approach. Or maybe you saw how long it was and just immediately bailed out :) Welcome either way.

Part 1 was the braindump. Story, thoughts, feelings.

Here's the part that probably more of you are interested in :) What my actual agent setup and dev workflow looks like. How I approach using AI for writing code, what tools I use, how I have them configured.

You can see a cleaned-up version of my config at https://github.com/markerikson/opencode-config-example - follow along in that repo for the pieces as I describe them here.

As I said in Part 1: I am not trying to sell anything, change anyone's mind, or say I am an expert. I don't have the answers.

BUT I DO HAVE SOME OPINIONS NOW! :) I'm not saying that everybody should follow them, but this is where I've landed.

Agent Setup Overview

Agent: OpenCode and CodeNomad

I've settled on OpenCode as my agent of choice.

I quickly decided I couldn't use TUIs for any development work. I've always been a GUI / IDE user anyway. I briefly tried both Claude Code and OpenCode TUIs, and bounced off hard. It's not that I am against terminals or CLI tools, I use a lot of them! But I also like having multiple tabs, my choice of coding font, syntax highlighting, and even just straightforward copy/paste abilities. Maybe I gave up too quickly, but I really struggled to get any of the TUIs to do that.

I know Claude Code is the market leader for coding agent. Maybe if I gave it another shot I'd find some use out of it.

On the other hand, I've always been enough of a tinkerer that I do customize my tech and tool setups. Yeah, I'm a lifelong Windows user, but I've got a comprehensive suite of tools and techniques and customizations I've built up over the years. I use Android because I want to fiddle and customize and configure the system the way I want it.

OpenCode fit that bill when I tried it. It works well for me, and I'm happy with it.

So if I'm not using the TUI, what's the alternative?

I found a great third-party OpenCode web UI called CodeNomad. Tried it, loved it, works great.

My personal laptop is all Windows, but my day job is WSL within Windows. I can start CodeNomad's server from within the WSL environment, browse to the web UI from the Windows side, and not have to deal with any cross-platform filesystem limitations.

I can have multiple sessions open in one browser tab (with a tabbed UI inside of the page), or have multiple browser tabs talking to the same server. I get full GUI copy and paste, good tool and edit/diff views, and a lot more.

So, I now am actually doing my primary development there in CodeNomad. Not an IDE.

Model: Opus 4.6

We primarily use Anthropic models at work. Honestly I've never even tried Gemini / Codex / GPT-whatever / Kimi / etc. I've really only used Opus and Sonnet.

Opus 4.5 and 4.6 have both produced great results for me. Sonnet is decent.

There's probably a lot I'm missing. I don't want to burn a lot of time trying out models and running evals every few days, or switching back and forth just to eke out another 1% gainzzz. That's fine.

I just need something that works well enough for me and that I'm comfortable with, and Opus 4.5/4.6 have fit that bill perfectly.

(As of writing, I haven't tried out 4.7, and I'm kinda scared to. Too many weird reports about its behavior.)

Also worth noting that I'm using via API, not Max plans.

IDE / VCS: VS Code and Fork

I've been using VS Code as my editor and IDE of choice for many years. It's not perfect, but it works.

Ironically, I now use it more as a file and diff viewer than as an editor :) I mean, I am writing this blog post itself in VS Code. But for daily dev work, I do the session driving in CodeNomad in the browser, and then I flip over to VS Code to review and commit the diff.

Honestly I don't like VS Code's Git panels and UI! They feel very awkward. The diff hunk staging view in particular is bad, especially compared to other IDEs like WebStorm or purpose-built Git GUIs. (Think I saw they maybe made some improvements in the last couple VSC releases - I haven't updated lately.)

Fork has been my Git GUI client of choice for the last few years. Excellent, absolutely worth it.

The issue here is that all my work repos are inside the WSL environment. It's technically possible to point Fork at a cross-platform WSL share of the repo folder, but the file changes tend to not refresh well, so I've mostly fallen back to just doing Git diff operations in VS Code. I think there may be some Fork tweaks that make it work better in WSL - haven't dug into those. I do know there's some Linux-native Git GUIs out there, I just haven't spent time evaluating those lately either.

Daily Development Workflow

My workflow is:

A long-running "Orchestrator"-mode parent session with the overall context for the major development work that I'm doing. This session's job is solely to spawn child subtasks where I do the real work interactively.
Child subtasks dedicated to some subset of the actual work. This may be codebase research and exploration, feature planning, or actual code development. These are highly interactive - I spend all my time discussing, directing, and driving these subtasks.

The core principles here are: I'm in control. I'm the one who knows what I'm working on and what I want to accomplish. I decide what the tasks are, and how to accomplish them. I decide when to move from research to implementation, when to keep digging further or pivot a session from the original goal to a side quest, and when a task is actually done. I need to be mentally engaged, understand what is happening and why, review the code, and actually commit it when I'm satisfied.

Most of the time I have one active parent session, and 1-3 active subtask sessions. I do bounce between the active subtasks and context-switch. Normally these are all still part of the same actual workstream. I have tried running sessions for 2-3 different workstreams at the same time, and it's tough, so I mostly stick with one workstream at a time now.

I'm sure a lot of folks will say "but that's so slow, you could be moving so much faster!".

I know. That's the point :) I'm intentionally choosing to limit the workflow to what I can manage in my own head, so that I am still fully mentally engaged and building my own understanding of the system.

I also specifically aim to keep as many of the moving pieces as deterministic and scripted as possible, especially for things that can be automated like file management tasks.

Parent Orchestrator Session for Project Management

I start every new parent session by running my /context command. This tells the agent to read current-focus.md and the last 2-3 days worth of progress file entries. That way it has baseline knowledge of the current repo, recent work, and what I'm actually working on.

From there, I give specific instructions for whatever I'm working on. This is usually several paragraphs of "here's the overall goal I'm working towards today, here's what I'm specifically trying to accomplish, here's how I want to get there". Then I specifically instruct the orchestrator parent session to spawn one or more subtasks to do the actual work. It then sits and waits for those to complete.

OpenCode subtasks are essentially async functions with a return value, where the subtask can return some message to the parent. However, the child often returns a response with the initial result, but I then keep driving the subtask much further and do a lot more work, so the initial response doesn't capture the actual work accomplished. Another problem is that if I don't like what a subtask is doing and hit the "Stop" button in that session, it cancels the response to the parent and returns an empty value. The orchestrator has a bad habit of seeing that empty response and saying "oh, the child didn't complete, let me spawn another subtask to pick up where it left off". I've had to specifically instruct it "never spawn more subtasks until explicitly told to do so!"

My instructions for spawning a subtask are surprisingly lax in some ways. I've done a bunch of work to give the orchestrator context on what I'm doing, why, and what the next goal is. At that point, I often say "spawn a new subtask to....", and leave most of the details up to the orchestrator. Sometimes I'll give fairly specific bullet points: "have the child read files A, B, and C for context. Then, come up with an initial plan to build this feature, and pause. I'll review and confirm implementation." Other times, especially when the parent session has gone on for a while and the orchestrator has already spun up a half dozen child tasks for earlier steps, I might just say "yep, kick off phase 3", but by that point I have high confidence that we're on the right track and there's an established pattern for what we're doing.

I've had some parent sessions last up to a couple weeks as I'm focused on an ongoing effort. Other times it might be a fresh parent session each day. Either way, it's about the overall goal that I'm working towards right now.

Here's what a typical orchestrator session looks like - the tail end of the /context being loaded, and my initial instructions for what I want to do in this session.

Subtasks for Development

Subtasks are where I do all the actual work.

I let the orchestrator write up the full prompt for child subtasks. This is normally enough to get the task started, read a bunch of files, orient itself, and "get the full picture" :)

The entire subtask session from there is highly interactive. If it's a research or planning session, I review the results and provide feedback on the output. If it's a coding session, I look at the initial plan for the changes and provide specific guidance and direction. I want to be very sure both me and the agent understand what we're trying to do, why, and how. I'll provide guidance on specific techniques to use, ask questions about edge cases and ideas, have it expand my own thinking as much as possible. I only tell it "okay, go make these changes" once I'm very sure about the intent.

I usually try to keep an eye on the agent as it's making edits. I don't want to have to explicitly approve operations as it's running (and we'll talk about permissions management more later), but I at least want a sense of what it's doing and if it looks like it's heading in the right direction. If not, I'll mash the "Stop" button and course-correct.

Once the edits are done, I'll run / test / etc as needed. I also try to review the code as much as possible. I then do the actual Git commits myself - staging, messages, commits.

In a lot of ways this is "one developer workflow, but with extra steps". Fair :) The speedup comes from the agent's ability to do the research for me, expand "here's what I want to do" into a plan for how to do it, and then turn the "what and how" plan into specific code edits faster than I can. My fingers are fast, but my brain always had to work through the intent first. Here, I get to focus more on the intent and the desired behavior, and let the AI define most of the code-level changes to accomplish that.

I frequently drive subtasks for extended periods of time. I try to keep a given subtask on track, but sometimes I end up veering into side quests because it's easier to keep going here rather than start up a whole new subtask and load context for the freshly discovered problem all over again. Early on, this definitely led to hitting session context limits, which forced me to create commands to reload an entire session's transcript and run those after the entire session got auto-compacted. Now that I have the OpenCode Dynamic Context Plugin and better tools and instructions for reading files, it's rare for even an extended session to end up over 100K context. The agent is very proactive about compressing recent tool calls and discussions in order to keep context manageable, and this seems to produce much better results than a big-bang "summarize the entire session so far" compaction. (Granted, sometimes it over-compresses recent chunks of the discussion and gets a bit confused on what we were doing :) )

Once I am actually ready to wrap up a given subtask, I specifically manually trigger the two record keeping commands. /progress tells the subtask to append a new "what did we accomplish?" entry to today's progress log file. /subtask-complete writes a standalone version of that update with more details, suitable for the parent orchestrator session to read. I then switch back to the parent session and run /subtask-resume, which reads all outstanding subtask handoff files so it knows what the N most recent subtasks accomplished. There's some duplication here, but I view it as /progress is for the permanent record of what I did today, /subtask-complete is the function return call results for the subtask to ensure the parent session knows where things stand.

Here's a typical child subtask, in this case for working on Replay MCP's ReactComponent MCP tool. First, the initial orchestrator-provided prompt:

Then my responses to some of the agent's analysis and implementation suggestions:

and finally nailing down some design decisions:

OpenCode Config

I have a very customized OpenCode config and setup that I've built to directly support that workflow.

Permissions Management

I don't YOLO or --dangerously-skip-permissions.

I run my agents directly on my own machines, no sandboxing. That also means that I need some amount of safety checks on commands that get run.

I do want to eliminate as many unnecessary permissions prompts as I can. Even if I'm watching and directly driving an agent session, it does get very annoying to see a bunch of permission prompts in the chat session blocking and waiting for my approval, when they're pretty clearly harmless.

So, as I said in Part 1: determinism.

I generated a custom OpenCode plugin to auto-approve as many Bash commands as reasonable based on the contents. It's decently sophisticated. There's a very long regex-based list of known safe commands and subcommands that get auto-approved, as well as a similar of known dangerous commands that will get blocked. It actually does Bash parsing and tries to deal with heredocs and command substitutions.

The bigger question is how this actually works with OpenCode at all.

OpenCode refactored their whole permissions and plugin system a few months ago. There was already a "permission.ask" event trigger that allowed plugins to actually return a result, but in the refactor that got disabled. I filed an issue, other people have filed PRs, those haven't been merged. So, I've been maintaining a small local fork of OpenCode that reimplements that functionality. That allows my plugin to actually manage permissions.

I saw Claude Code now has a "not quite YOLO but let the agent self-approve commands" mode. That's nice, but why rely on more agent calls and more tokens when you can just parse and manage the commands deterministically? :)

The big loophole here is dynamic scripts - bun -e, node -e, python3 -c. Agents love those. They're really useful! Don't even bother writing a script file to /tmp, just bun -e "someCodeHere()" and get the results. Really useful when introspecting some data files.

Obviously this means the agent could trivially smuggle through code that starts nuking folders or uploading my secrets to the mothership.

In practice: I'm not sure I've seen my agents actually try to run a command that would be truly destructive. I've got these guardrails in place, and to some extent I haven't seen evidence that they're necessary.

The permissions plugin does try to do some additional scanning of inline eval scripts for blatantly obvious calls like unlink() and flags those entire commands for approval.

It's a tradeoff I'm happy with at this point.

File Reads

OpenCode has a Read tool, same as every other harness. It works. But agents default to just reading massive amounts of file text, over and over, both to learn the codebase and to remind themselves of what they've seen already.

I found a file read caching tool called cachebro that is available as an MCP. It checks file access times and hashes, and if the file hasn't changed since the last read, the tool returns a message saying "this hasn't changed".

I've told my agent to default to using cachebro for file reads. Unfortunately, the cachebro MCP read tool doesn't interact with OpenCode's own "file last read" logic. That meant that if my agent tried to write to a file, OpenCode would return an error saying "you must use the Read tool first", and then it would have to actually call Read and it's all a waste of time and tool calls and tokens.

So since I was already maintaining a small running fork of OpenCode to handle the "permission.ask" event, I just added another commit to the fork that exposed OpenCode's "file access time" calls into the Plugin interface. Then I built my own cachebro plugin that watches for cachebro MCP tool calls, and gets/sets the file read times so OpenCode knows that file's been read. Works great.

Code Structure and Search

Per above, agents default to just Read for everything. This is awful for codebase exploration. Read 10-15 files just to trace dependencies and imports. Lots of extra wasted code text bloating the context. Don't do this!

I've tried out a half-dozen different MCP tools that do full codebase parsing and provide tools to query structure, dependencies, outlines, and relevant chunks of code.

Currently, I use grepika and tilth via MCP, and get great results. Highly recommended. I again have my AGENTS.md with instructions to always use those as the primary file reading options, and I consistently see my agent using them and intelligently only loading relevant parts of files.

I've also previously tried ckb (CodeKnowledgeBase). It's got a massive set of tools for querying structure and blast radius, and I've wanted to use it, but I've run into a few problems with indexing not working or tool calls timing out. Been playing with ariadne, and I'm constantly keeping an eye out for other codebase indexing tools.

Context Management

Per above, I do all my work exclusively in subagent sessions that I drive myself. This means my parent orchestrator session doesn't have any issues with maxing out the context window and needing to compact, but the child tasks do.

I did pretty frequently hit 160-170K context and run into auto-compaction. I hated it. The auto-compact summaries were adequate, but inevitably there'd be a bunch of details I cared about that would get lost.

At first I built some dev scripts commands that would read the OpenCode session JSON files, filter out tool calls, and export the meaningful message contents out as a complete Markdown file. I built a /session-reload command and skill that would force the agent to run the export command, read the entire exported transcript, and then pause.

However, grepika really reduced the amount of code that was bloating context. That helped.

Then I found the OpenCode Dynamic Context Pruning Plugin. Instead of waiting until your session is almost at full context and then doing a big summary of the entire session as a replacement, which would be really lossy, it gives your agent a compress tool that it can use at will to compress and summarize chunks of your session (like a bunch of exploration file reads, or earlier messages as you switch focus to a follow-up effort). This means there's more of the session messages that stay exactly as-is, there's less time spent compacting, and it's much rarer that one of my subtask sessions actually nears max context and has to compact everything just to reset.

Ironically, I really found a great working combo of tools right before Anthropic shipped 1.0M context windows by default for everyone :) But seriously, in the last couple months since landing on this combo of tools and plugins, I find that even a very long-running technical subtask session rarely gets above 100K context, thanks to grepika and cachebro minimizing file reads and the agent happily compressing earlier chunks on the fly.

I know this does mean you might have some cache invalidations that could bump cost because it's no longer just the one latest message at the end of the session that hasn't been seen by the server. But on the flip side, keeping the actual total context size smaller seems to pay off, and it's not like the agent is calling compress every other turn.

I was using the rtk Rust CLI tool to automatically compress the output of tools like grep to save on tokens. Unfortunately I saw too many cases where the agent got stuck in a loop trying to call grep, rtk grep would change the output too much, the agent got confused and tried again, and it just wasted time and tokens. So, eventually disabled rtk. Might come back to it at some point.

Session History and Search

As mentioned, I initially built some custom scripts that would find the right session JSON files and export as separate Markdown transcripts with just the message text, no tool contents.

OpenCode 1.2 switched to storing sessions internally in SQLite. I held off upgrading to 1.2 for a while because I had all my custom scripts and commands and didn't want to take the time to rewrite them. But, I finally did, and that actually simplified things quite a bit.

My previous /session-reload command told the agent "run the export session-type command, look at the list of most recent sessions it gives you, find what you think is the right Markdown file based on your own context and instructions, and read that". Kind of awkward, left room for error.

After rewriting my own session history plugin to read from OpenCode's own plugin/internal client.session object, that all went away. Instead, I have a reload_session tool that can just directly grab the list of messages from the plugin API using the session ID, filter, construct the Markdown contents, and directly return it as the tool call result. No file path or "match up the session" steps needed.

It was also trivial to throw together search_sessions and read_session tools. They're currently pretty dumb and just do basic regex searches on session contents, no vector DBs or similarity scores or anything. But they're pretty useful even in that form.

Dev Plans Repo and Management Scripts

Plan Management Issues

During my initial couple months of using KiloCode, I found myself with the same proliferation of Markdown plan files that we've all run into. So where do those go? The agent wants to default to writing them to the root of whatever repo you're working in. I made an effort to redirect it to write them to a ./docs folder, although in our main backend repo that already existed so I had to resort to ./docs/merikson.

Some of these plan files were worth committing or preserving, even if just for my own reference later. Others weren't.

So where do the plan files go if this is a shared repo? I really didn't want to pollute a shared repo and commit history with dozens of my own personal semi-ephemeral plan files, or even the maybe-more-widely-useful architectural writeups.

Personal Dev Plans Repo

I decided the best option was to set up an entirely separate personal development doc knowledge base repo, just for myself. I dubbed it dev-plans. That would give me a consistent place that I could store all my own Markdown artifacts and knowledge, commit it, save those for future reference, and not pollute the shared actual repo Git history.

My dev-plans repo structure is:

/dev-plans
  /personal
  /redux
    /$PROJECT1
      /architecture
      /features
      /progress-updates
      /research
      /subtask-handoffs
      current-focus.md
      QUIRKS.md
  /replay

current-focus.md is intermittently updated whenever I feel I'm significantly shifting gears or focus on a repo, or completed a major chunk of work and the "what I'm working on now" section should reflect that.

QUIRKS.md is more of a project structure, known actual quirks or patterns, things to remember document.

Every file other than the two fixed current-focus.md and QUIRKS.md files starts with a YYYY-MM-DD prefix, such as features/2026-05-06-some-feature.md.

I've got some archival scripts set up to move older documents from the flat folder structure into a nested YYYY/MM structure to keep things a bit more readable, but haven't done much with that yet. I suspect I'll need some tools to help index the documents when I do that.

Workflow Artifacts

architecture is meant to be longer-lived explorations of the codebase structure and patterns. Full-blown "what do we know about this project", vertical "trace through this feature or data flow path", reference material
features is WIP development plans. Here's the next thing we're going to build, we've turned this into a concrete plan, use this as the basis
research is more of a grab bag. Sometimes it's scanning the codebase for concepts, other times it's research dives through Github and NPM for relevant concepts
progress-updates is the daily dated appended progress entries, subtask-handoffs is the individual handoff files that get automatically archived as soon as they're read

`devplans.ts` Automation Script

I've settled on using Bun with TS scripts for a lot of my personal needs and utilities.

I've tried to automate a lot of the common tasks I saw consistently pop up when interacting with the dev-plans repo. I created ~/.config/opencode/scripts/devplans.ts and have fleshed it out over time with a variety of commands the agent uses all the time.

Currently, that provides:

/**
 * devplans.ts - Unified dev-plans CLI helper
 *
 * Commands:
 *   devplans info                      Get project mapping info (no side effects)
 *   devplans progress                  Get/create today's progress file (with late-night date logic)
 *   devplans progress list [n]         List n most recent progress files (default 3)
 *   devplans progress append [file]    Append entry from JSON file (deleted after read) or stdin
 *   devplans handoff create [slug]     Get path for new handoff file
 *   devplans handoff list              List files in pending/
 *   devplans handoff move [file]       Move from pending/ to completed/YYYY/MM/DD/
 *   devplans handoff consume           List, read, and move all pending handoffs
 *   devplans doc create <type> <slug>  Create dated doc (type: feature|arch|analysis|issue)
 *   devplans doc list <type> [n]       List docs of type (default 10)
 *   devplans session extract           Extract recent sessions, list newest 10
 *   devplans archive <folder>          Archive old files to folder/archive/YYYY/MM/DD/
 *   devplans archive --all             Archive all doc folders
 */

These deterministic commands save a lot of steps that the agent would have had to run itself to figure out what dev-plans folder path matches the current repo, creating placeholder handoff files or documents, and updating progress docs. Most of my OpenCode slash commands start with some variation of "Run `bun dev-plans.ts some-command".

I also have a JSON file with folder mappings for my usual repo checkout paths in my personal laptop Windows and work laptop WSL environments. That way the script can just read those mappings and use the known paths.

Per the Workflow section, I have a /init command that helps me set up the scaffolding when adding a new project to dev-plans.

Progress Updates and Subtask Handoffs

The most used subcommands are for progress and handoff.

My original /progress command told the agent to actually read today's progress file completely, then append a new section. This became an O(n^2)-type problem as the progress file kept growing over the course of a day. Just reading the file would bloat context, and this was especially bad when the subtask was already bordering on max session context.

Now, devplans progress prints JSON output with today's progress file path, but also creates a placeholder temp file and provides that path as well. The agent is instructed to overwrite the placeholder with its actual update, and then runs devplans progress $TEMPFILE. The script then deterministically appends the new section to today's progress doc so the agent didn't have to read that file itself.

Similarly, /subtask-complete in the child tells it to run devplans handoff create creates a target file path that the agent can write to, and then when I run /subtask-resume in the orchestrator parent it runs devplans handoff consume which auto-prints all pending handoff files immediately and then auto-archives them. This saves multiple steps of the agent fumbling around.

The devplans commands also intelligently handle dates and timestamps - automatically generating file paths with timestamps, figuring out today's progress file, and even assuming that any updates between midnight and 6 AM really belong on "the previous day's progress file" because it's likely I was working late into the night on something.

There's still a lot of manual trigger behavior in here, and again I'm good with that :)

`AGENTS.md`

My AGENTS.md file is currently about 250 lines (and writing this post forced me to review it and do some reorganizing and condensation - it had gotten rather crufty).

It starts with a short personal overview: who I am, day job at Replay, OSS work on Redux.

Major sections include:

Interaction patterns: I specifically instruct the agent to be keep its responses short and direct, and avoid sycophancy. That includes both descriptions, and acknowledging my instructions.
Thinking and problem solving: use critical thinking and be skeptical about assumptions and correctness; stop and rethink problems if stuck; do research, not trial-and-error; state plans and wait for confirmation to check assumptions and confirm user intent
Git: never commit, always stop and wait for me to review changes
Personal tool environment: personal vs work laptop setups, path management, use Bun for scripting not Python
Coding standards: TS usage, running tests, minimizing comments
Code navigation: minimize context loading at all times; use grepika and tilth for navigating code, cachebro for other file reads
Tasks and behavior: manage todos and progress; be careful counting completion; manage current context
Dev plans workflow: use devplans.ts for file management; use of QUIRKS.md and other rules files
Subtask spawning and workflow rules: only spawn subtasks when told; verify completion;

I make no claims that my AGENTS.md is optimized, correct, or a model to follow :) It's just what I've evolved through my own usage.

Commands and Skills

Project Setup

I have some commands and skills for initial dev-plans setup for a new repo - the usual /init and /architecture-type commands that scan the codebase, do a writeup of the details, do some initial architecture description docs, etc. Don't do those often, nothing special there.

Progress and Subtask Management

As described above: /progress specifically instructs the current session to create a new entry in today's progress log. /subtask-complete tells a child subtask to record a separate handoff file that can be used as a "return value" to update the parent orchestrator session, /subtask-resume has the parent session read all outstanding handoff files to know what got done.

Task Tracking

For most of my day to day work development, I've gotten along fine without any explicit external task tracker. I'm the one who knows what I'm working on.

For a couple projects, I've tried using dex as a lightweight external nested task tracking CLI. (Think beads, but with way less slop.) It's worked pretty well! I've got a skill that instructs the agent how to use dex efficiently. I'd have the orchestrator session run dex commands to see where we stand in terms of task status, then pass dex task IDs to a subtask and tell it to mark that task as in progress. Similar to avoiding having the agent do any Git commits, I told it not to mark any dex tasks as complete until I explicitly instructed it to do so, and have a /dex-complete command to help with that.

Other Commands

/session-reload extracts the entire message history of the current session and returns it. Per above, this used to be writing it all to disk as a Markdown file, now it just constructs the transcript in memory from the DB entries and returns it as the tool result. I use this much less frequently now thanks to the OC Dynamic Context plugin and use of better file search tools - now that I have those I rarely get close to 150K+ context even in a long running subtask session.

I wrote my own AI-powered code review tool called diffloupe that tries to compare stated change intent vs inferred change intent, and reviews for both bugs and intent mismatches. I have a /code-review command that tells the agent to review the current changes by triggering diffloupe and then report on the results.

Other Skills

I do have a variety of skills available:

several for frontend design and UI work
advanced TS patterns
Replay CLI and MCP usage
Architecture design and feature planning
File / codebase search tools
Assorted skills explaining how to use diffloupe, dex, and gh

Config Improvement Process

After I did my initial config and workflow development efforts back in December, I've alternated between periods of leaving it exactly as-is and focusing on just doing work, and getting annoyed with parts of the workflow and making improvements. Generally it's noticing that some particular pain point is becoming an issue - that I keep having to repeat a particular set of instructions, or that part of the workflow could be automated via devplans.ts, or trying out a different codebase indexing plugin that I found. At that point I'll go spin up a new session inside my OpenCode config repo itself, talk through the problem space, and develop the changes accordingly.

As a relevant example, just reading AGENTS.md to write this post showed me how the content had gotten somewhat bloated and disorganized. So, I just started up a new session to review it, described my pain points, and the agent suggested a slimmed-down version that retained the key points, moved some of the detailed file tool usage instructions out to a separate skill, and better organized the "how to communicate" and "daily workflow" sections. Done.

Potential Future Workflow Improvements

I'm pretty happy with the tooling and workflow that I've got right now. The one area where I feel I'm lacking right now is that longer-term memory and context. The system I've got is great for "what is the repo we're working in?" and "what's the current set of tasks?", but I'm finding I have to do a lot of work to dig up other recent sessions where some decision was made or a research document was generated, and feed those back into the current session. I need something that helps index or scan generated planning and progress files and dynamically feeds in relevant results to the current session, or MCP tools that the session can use to scan already-indexed files.

I also don't have any kind of automatic "scan recent sessions for common patterns or corrections or learnings, and extract workflow improvements" system. I do instruct agents to include possible learnings in the progress and handoff artifacts, and sometimes those trickle over to updates to QUIRKS.md, but I feel there'd be value in the automatic review process.

I am very happy with grepika and tilth for codebase exploration over just reading files manually. I can imagine it would be nice to somehow preload more of the codebase into a given session so it doesn't have to re-explore some of the same files each time

Code review and ensuring intent are still hard. diffloupe has been useful for doing some review checks. I intended to add my own full code review UI to it, but got distracted and never got back to that. I've bookmarked a bunch of other code review tools and may still investigate some of those.

Final Thoughts

6000+ words, and this was with me deliberately avoiding going into full amounts of technical detail :) So there you go.

See the example config repo for the actual commands, scripts, and setup.

No idea how many people will end up going through here, but as always, hope this info was useful!

And more than anything else: whether you use AI to write all your code or write it by hand, I hope that you can find a workflow for yourself that is sustainable, maintainable, understandable, and safely productive.

My Thoughts on AI, Part 1: Fears, Opinions, and Mental Journey

Thu, 07 May 2026 14:00:00 -0500

Introduction

This post will be tough to write. There's a lot of discourse and arguing about AI everywhere you look. I've read it all. I don't want to get caught up in arguments, get misinterpreted, or be labeled with beliefs that don't apply.

I am not e/acc or P(doom). I am a software engineer, I am a person, trying to figure this out same as everyone else.

I am not trying to sell anything, change anyone's mind, or say I am an expert. I don't have the answers.

I do have thoughts, opinions, fears, excitement, and concerns. I've shared a lot of them in private. Enough people have heard those thoughts and said they want to hear or read more about my opinions that it seems worth my time to write them up publicly. A lot of these aren't original to me. I don't claim to be a deep thinker. I have read a lot, thought a lot, synthesized a lot.

So, here's my story and opinions, from the heart, best as I can write them. So many points I could make and articles I could cite as references here, but this one's just me. My story, told my way. A lot of you are probably going to see the length and yell "TLDR" and nope right out of here, or throw it in an agent to summarize. That's fine. Take it for what it's worth. Maybe this helps someone else. (and if you want to skip to the tech workflow post there you go.)

Part 1: The Before Times

Programming Is Life

I love programming. I love the problem solving, the thinking. Getting in a flow state, trance music cranked up, deep in the code, surfacing hours later and seeing that this feature didn't exist at the start of the day and now it does and it's all because I figured out how to do it. Debugging deeply. Trying to understand the guts of an unfamiliar system, figuring out where things were going wrong, finally nailing the tweak or architectural change that fixes the problem. Learning new tools, unlocking new capabilities.

I've been programming for 25+ years, over half my life. I cut my teeth in the early days of the Agile Manifesto. I read Joel Spolsky and that rewriting a system from scratch was a bad idea. I read the HN debates about how to interview programmers. I read the 8th Light and Uncle Bob discussions about "Software Craftsmanship" and honing our skills. I never spent time doing code katas, but clearly craft mattered. Writing code the right way mattered. It wasn't just about whether the code ran. It was whether the code was elegant, clean, readable, maintainable. "Make It Work, Make It Right, Make It Fast" became my mantra. "Tradeoffs" was my favorite keyword.

I built my career on deep understanding. I firmly believe that "programming is building a mental model of the system". That every programmer's job is really to understand the problem domain and the system they're working on, that you break down a new feature or bugfix by comparing it to your existing understanding of the system, and that at the end of a 6-hour coding session you've not only written working code but you've shaped a new and improved understanding of the existing system plus the changes. That you learn the fundamentals of your tools, go into the next layer of abstraction, dig into unfamiliar code, learn on the fly.

I believe in determinism. That pure functions aren't just an esoteric concept, but make code predictable and testable. That you can understand a system, even if there's distributed pieces and timing problems and race conditions. That you can solve a problem by taking the time to understand it, break it down, build the mental model, scientific method debug the solution, document it, maintain it. Redux wasn't just a way to manage data outside React, but a way to make the data flow predictable. That the Redux DevTools should show a meaningful human readable history of what happened in the application.

And then came AI.

Clouds on the Horizon

Dunno when I first started hearing about using AI to write code. Probably somewhere in the GPT-2ish era. Surely read it somewhere on HN or Reddit.

I think the first real examples I saw of actual AI-generated code were from Github Copilot as a VS Code extension. I'd always been an IDE user, all the way back to the Visual C++ days. (yes yes I know some of you predate that considerably, same as folks who used Amigas and C64s predate my first 286 and DOS 3.3 usage. My story here.)

IDE autocomplete is ancient tech. Whether it was VC++, Visual Studio C#, Eclipse and Java, or fine even you Vim/Emacs people with some custom plugin, we've always been able to type person. and see firstName, lastName, age pop up in the autocomplete overlay. Simple and deterministic. Sometimes we rely on it too heavily, but no one can memorize all the available methods anyway.

AI-powered "autocomplete", though? That's... something different.

I resisted at first. I didn't want a non-deterministic tool trying to offer "suggestions". I knew what I wanted to write. I didn't need some Statistical Word Generating Machine trying to somehow guess what I might want to do next.

My work journal says I finally gave in and decided to try Github Copilot in March 2023, and concluded: "It's... not awful? It's doing a decent job being a smarter Intellisense so far."

So I left it on. Got some use out of it. Two thirds of the time the suggestions were completely bogus, but at least it was limited to trying to suggest a couple lines of code at a time, maybe outlining a for loop or something. Scoped. Readable. I could see the suggestions, I could ignore them, I only had to accept them if they actually were what I was about to type in. Still under my control, my understanding.

I do remember one incident where I was working on some perf optimizations for Reselect, and hijacking some of the unit test files to run perf tests. I hit Enter a couple times and paused... and it suddenly "suggested" an entire multi-paragraph comment about doing performance analysis and even referenced some repo that might not have existed yet, but was eerily relevant conceptually. That was kinda funny! ... and also scary.

Ominous Signs

I started hearing about devs using this tech on a larger scale. Just repeatedly auto-accepting suggestions, or doing more. Seemed.... wrong. It's non-deterministic. These things hallucinate. Surely you can't rely on that for anything real, right?

I started seeing more and more evidence of AI-generated code and text popping up online. These frequently seemed to involve hallucinations. A couple incidents particularly stick out in my mind.

In one case, a dev pinged me on Twitter saying that a link to my blog was producing 404. The URL was something akin to https://blog.isquaredsoftware.com/YYYY/MM/blogged-answers-selectors-fetching/. Not quite that, but close enough. I checked, got a 404. Confused, I went to my blog repo, looked for that Markdown file, and it didn't exi..... waitaminute. I don't remember ever having written a post with that kind of title. Where'd they get that link? and sure enough, an LLM had utterly hallucinated a URL to my blog for a post that did not exist. Gotten the URL format correct and everything. Completely wrong.

Another time, a user asked in the Redux Discord channels about some Redux APIs, with code snippets... and those APIs also did not exist. Asked, and sure enough, same thing. They asked an LLM, it gave them the snippet, they never checked and just believed the answers and then the suggested code was completely bogus.

Small sample size, but examples of the larger pattern I was seeing. Clearly, LLMs could not be trusted! Too many hallucinations.

I think I even tried downloading a local LLM setup once or twice. Tried asking something about "Redux vs Flux", took a very long time to generate tokens, and came up with a vaguely plausible sounding but completely incorrect answer paragraph.

At this point every single indicator I had seen was telling me this technology simply could not be trusted. It's one thing to copy-paste some snippets off Stack Overflow and maybe the answer's wrong, but at least there it's a person that wrote that. AI code would never reach any acceptable level of trustworthiness. It's non-deterministic! It hallucinates! There is just no possible way you can safely build maintainable systems on this kind of flawed foundation.

Part 2: Fear, Doom, and Depression

The Pivot

I had joined Replay in spring 2022. Time travel debugging. It was the perfect fit. It matched everything I believed in in my career, every experience I had. Amazing team, brilliantly talented engineers. An impossible tool, already working. The hard part's done! We've built time travel! We've brought determinism to a non-deterministic world! A perfect product, blindingly obvious how useful it is. It'll save every web developer thousands of hours. I wish I'd had this years ago in my career. A codebase using React and Redux that I'm already familiar with. No possible way this could go wrong. Won't be a rocket ship, but steady growth on the way and we'll succeed.

We were wrong. I was wrong.

Turns out there's many aspects we never accounted for. Most devs are never taught how to debug. We learn it on the fly, by osmosis, as we go. No one approaches it scientifically. No one uses existing graphical debugging tools like Chrome DevTools or VS Code's debugger. It's just spam some more console logs and reload the page. Trying to sell a more powerful graphical debugger just didn't work the way we assumed it would.

I had spent all of 2023 getting to build amazing time-travel powered debugging tools. Our React and Redux DevTools integrations. "Jump to Code", going from a click event to the React onClick prop that handled it and pausing when it ran. I was a kid in a candy store, an entire toolbox of amazing time travel APIs at my fingertips. I was having a blast. I was building incredible tools now, but surely these were just the foundation for what we'd get to build on top 10 years from now.

But, turns out that just building the world's most amazing tech doesn't automatically bring in sales. We struggled. and in summer 2024, we pivoted.

The pivot was a brutal blow. I never saw it coming. And not only were we throwing away the dream of time travel debugging, the dream that I'd believed in, the dream I wanted to spend the rest of my career building...

we were pivoting to AI.

In that moment I felt, quite literally and physically, like I'd been punched in the gut.

And I had the most visceral gut reaction I can ever remember:

"HELL NO! I will never work on anything AI related!"

The Wilderness

Post-pivot, Replay regrouped and tried to figure out what we could do. I busied myself working on our React analysis layer. Outside work, I spent a few months working on RTK Query's infinite query support.

But at least those tasks were things I could do by hand. With my own brain. I dug through the ReactDOM artifacts and found the relevant functions to instrument. I looked at the instrumented minified bundles and found the places where we were inserting callbacks the wrong way. I went through the source for TanStack Query, Apollo, SWR, and built an understanding of infinite queries as a problem domain. I took the contributed first draft implementation PR, tested it, worked out what pieces were missing, listed out the test cases we needed to handle. I did the dozens of hours of grunt work to build the missing pieces, handle the edge cases, wrote the docs. I shipped the new features.

Me. My brain. My understanding.

and it also kept me away from both using AI to generate code, and from having to build systems that wrapped AI. I didn't want to do "prompt engineering". Didn't want to have to cajole an AI into maybe producing valid output, or telling it "you're the world's greatest expert in $TOPIC", or add "make no mistakes" to every instruction, or whatever stupid and ridiculous techniques of the week were getting thrown around on Twitter and HN. And I definitely didn't want to waste time writing AI infrastructure and plumbing. No SDKs, no API calls, no AI app whatsoever.

The Tidal Wave

But the onslaught of "AI!" everywhere was inescapable. And my reading habits didn't help.

I know I'm too addicted to social media. Lifelong information junkie. I need to read. Need to learn, to understand what's going on. I read anything that looks potentially relevant. I see technologies and concepts, label them, file them away in the back of my brain, and then months or years later I pattern-match and slap some relevant blocks together to build a solution that fits.

And unfortunately that meant reading Twitter and HN. And every other article was about AI, LLMs, codegen, prompt engineering.

By this time there was a constant drumbeat of articles insisting that "WRITING CODE WITH AI IS THE ONLY POSSIBLE FUTURE! YOU MUST ADOPT THIS NOW! IF YOU AREN'T LEARNING AND DOING THIS AND 10XING YOUR OUTPUT, YOU WILL FALL BEHIND! EVEN IF YOU'RE A SENIOR ENGINEER! YOU'LL BE ECLIPSED BY JUNIORS! NO ONE WILL HIRE YOU! YOUR ENTIRE LIFE'S WORK WILL BE A WASTE! GET WITH THE PROGRAM NOW OR YOU'RE DOOOOOOOMED!"

Needless to say, this was not good for my psyche.

And it was worse when it came from authors who I could tell just from reading were experienced engineers. It's one thing to look at a random Medium post and dismiss it three paragraphs in with "oh that person clearly has no clue what they're talking about". But seeing apparently clueful senior engineers drinking the koolaid and screaming "ADAPT AND USE AI OR DIE!" just made the spiral worse.

I read plenty of counter-arguments. I bookmarked numerous posts from other senior engineers who, like myself, mourned the loss of their craft. That our whole way of life was being shredded, ripped away. I read them, saved them, felt the loss with my whole heart.

And the drumbeats and onslaught just kept coming.

P(doom)

By early 2025 I could feel this toxic deluge of info poisoning the thoughts in my brain. I had half-formed thoughts swirling around in the back of my head. I knew they were there. I knew what they were about. I just hadn't fully articulated them.

And there was a lot of other things going on as well. Family concerns. Questions about work. Politics. The General State Of The World (gestures wildly).

And in March 2025, while chatting with a friend, I finally wrote out a whole bunch of bullet points describing my concerns about AI and what it was doing to the world and the software industry and my career.

And then I stopped. I re-read what I'd just written. Horrified.

I feared what AI would do to the quality of the code we were collectively writing. How can you trust the output of a tool that randomly hallucinates? How can you build an understanding of a system when everyone's generating code, and no one's reading it? Software's already held together with duct tape and baling wire. Our shiny desktop apps and SAAS sites are badly architected and bug-ridden even in the best run team and project. If we use AI, won't it all fall apart and collapse? And am I going to have to spend all my time carefully scrutinizing every line of code just to make sure hallucinations and logic errors don't sneak through?

I built my career on craftsmanship, on understanding, on careful work. I always felt like I never quite lived up to the ideals, that my code was always more slapped-together than it ought to be. But at least I'd taken the time to make it run, to figure out the solution, and to find the rough spots and sand them down. AI couldn't possibly match that understanding.

I definitely didn't want to use AI to write code. I loved programming! That's the fun part! I'd already switched back from being a team tech lead to just being an IC, because I had been reduced to only 50% coding time from being stuck in meetings, and that was awful. I wanted to code! I feared using AI would turn me into some kind of weird PM / code reviewer hybrid, that all my time would be spent mindlessly clicking "LGTM" on my agent's output, that I wouldn't get to use my brain, that I wouldn't learn anything, that my skills would atrophy. If I wanted to spend all my time managing I'd have become a manager! Don't take away the task that I love doing!

I still didn't want to work on any kind of AI-related project. If I moved on and looked for a new job, would I even be able to find something that wasn't AI? Every single opening and "we're hiring" post I saw was AI, AI, AI. I didn't think I could stomach talking to a company that did AI-based products, much less deal with expectations for using AI to write code.

And what happens to the thousands of hours of work I've put in building my craft? So many other fields of work were radically transformed by technology. Entire fields and industries changed overnight. Typists and secretaries disappeared with the PC. Lots of other examples. Was my own career about to become obsoleted? What does that mean for this industry? What about my future jobs?

And I feared for the rest of the industry as a whole. I had so many unique opportunities I was given in my career, from mentors and people who believed in me. I was given those opportunities because I'd already demonstrated my efforts and skills, and was able to take advantage of them because I put in the work. Would AI destroy the pathway for juniors to build a career? Would it prevent them from even being able to build the core skills they actually needed? Thinking, understanding, learning, reading code, building that mental model of a system.

And then even further, what happens to our entire society? We've seen ripple effects from everyone having phones at all times, social media overloading our brains with context collapse, teens sliding into depression from electronic bullying, students no longer learning to think and just throwing assignments into ChatGPT. What's the long-term prognosis of this on our whole world?

I wasn't worried about Skynet, or AGI, or gray goo wiping out humanity. I was worried about AI's effects on society, industry, and my own career.

And something in my brain snapped.

Depression

I am a very mentally stable person. It's kind of ironic to write that. 3 years ago I would have written it and meant that there was nothing wrong with me, at all, 100% completely perfectly mentally healthy. Then I had an "emotional awakening" thanks to a good friend asking pointed questions, realized I did have a lot to work on, and dove head-first into therapy. It's been wonderful. I've spent a massive amount of effort understanding myself, my feelings, my thoughts, my personal quirks, aspects of my brain and personality that caused me problems, building better relationships with those around me. And I'm a much healthier and happier person now than I was earlier.

But even before that, I've always had a very stable emotional baseline. On a +-10 scale, I'm generally 0 to +3 maybe? Very hard to get me upset. I snap back to a baseline "yup, things are fine" quickly. Between realist and optimist outlook. Never had to deal with "true" mental health issues.

But staring at those bullet points, I suddenly questioned how I was doing. I remember explicitly thinking "Wait. I've been saying 'I'm okay, there's just a lot going on around me right now'. But... am I 'okay'? Have I been fooling myself? And... oh. I'm also dealing with family concerns, and other stuff going on in the world... oh no. Maybe I'm not 'okay' after all."

And I suddenly found myself in this bizarre mini-depressive state. Something I've never experienced before.

I don't want to make comparisons to other people's mental health struggles. I can only speak for myself here. For me, it was like my emotional outlook on life had suddenly gotten dropped to... like a -3 on that +-10 scale? I was functional. I wasn't curled up in a ball. But instead of going around just focused on life and work, or being excited about things, I was telling myself "I'm.... struggling right now. I'm trying to hang in there. Things are not good. They're bad. It's all out of my control. We're all careening out of control. I'm just trying to make it through atm. Trying to manage."

It was not fun.

I know many folks deal with much worse. All I can say is this was something I'd never experienced before. And it was essentially all due to these fears about AI.

And nothing I could do could change the fate of the world.

Part 3: Reverse Engineering

Miami Sunshine

React Miami has always been one of my favorite conferences. Michelle, Rebecca, and Gabe have done an amazing job putting on a conf that is both a non-stop party, an incredible social event, and a great source of technical info.

Over the last several years, confs have become my own major social outlet. I've met so many wonderful people. I get to see other amazing speakers, teachers, and maintainers in the React ecosystem. I've built lifelong friendships. I get to meet tons of people from so many industries and walks of life. I get to travel the world, see places I've never been, have deep conversations about tech and life and so much more. I look forward to every conf I go to.

And in early April 2025, I feared going to React Miami.

I'd just burned a ton of emotional energy early in the year with the effort to kill Create React App. There'd been numerous intense debates over how to handle that and how to update the React docs. I'd had public and private discussions, given feedback, posted PRs, written and thrown away posts. I was already mentally exhausted from that, before this AI depression hit.

Now that I was wandering around in a "I'm just struggling to survive" mental daze, I didn't know if I'd even enjoy myself at React Miami. Had I burned bridges? Would I have any brain capacity left to think about the talks? Would I have any social energy to be able to be around people? I was genuinely questioning if I should even go.

I went.

And I'm thankful I did :)

React Miami Day 1 was pretty good. Lot of discussions and hangout time. Definitely felt low energy and "lousy". But I'd socialized, been around people, gotten through the day.

Day 2, though. Wow.

I'd gotten enough sleep to have some energy. And Day 2 was nothing but non-stop conversations. So many great conversations.

I've learned to manage my social battery over the years. Usually by 8:30-9:00 I know it's hitting 0% and I need to go back to my room and chill. This time, I was talking until almost midnight. I was worn out, but it felt good.

I got back to my room and was journaling. And as I was journaling that night, I realized I was tired of being stuck in this semi-depressive "struggling / doom and gloom" state. It sucked! It wasn't fun! I noted I wasn't making any forward progress, that I was just going in circles... and I hate going in circles for anything. Stop talking about Doing The Thing. Just Do The Thing and get something done!

So I decided I was done with whatever the hell this was. Let's move on.

And somehow, with essentially a snap of my fingers, I snapped myself out of that semi-depressive mindset.

Flipping The Bit

I honestly have no answer for this! I can't tell you how I did it. I've told friends and they've said "how did you do that?" and I can't explain it.

It's some combination of having done therapy, realizing I could change my attitudes and how I feel about things, not wanting to be permanently stuck in a negative situation, and being ready to get back to moving ahead with life.

I also realized that yes, there's an absurd number of things that are out of my control. Weather, politics, tech advancement, society, even many aspects of my job. If I spend all my time thinking about things I can't control, then no, of course I'm going to be worried, but also I literally can't do anything about it! So, why not refocus back on the things I can control. My brain, my life, my family, my friends, my hobbies, my career.

And somehow I just hit the reset button and went right back to my standard emotional baseline and have been there ever since :)

This didn't change any of the concerns I had. AI was still destroying the software industry. The world was still chaotic. I still didn't want to work on any AI project. And I sure absolutely would not ever use AI to write code for me. Period. Red line. Uncrossable. Ain't gonna happen.

I just didn't want to spend all my time walking around panicking about those things and the impending end of the world. So I didn't.

Part 4: Taming the Beast

Architectural Research

I'd spent the first few months of 2025 working on Replay's sophisticated React analysis instrumentation layer, enabling it to successfully instrument React 19 and extract timing data. Around May, I was tasked with refactoring a complex portion of our backend, which managed a tree of forked processes for executing and replaying the program being debugged. I had a decent grasp of the rough shape of how our time travel processing worked, and had touched some parts of the system, but not this section.

The general task was to replace a bunch of module-scoped variables that made this a singleton with equivalent classes so that we could instantiate multiple analysis processes in parallel. Doable, but you definitely need to, yes, understand the system to make the changes. So that would take time.

I remember a teammate had actually tried to do some of this refactor via AI and put up a draft PR that was broken. I skimmed it, and saw that a large chunk of the actual code changes were really just threading OpenTelemetry cx context objects through one function after another to enable proper traces. That was clearly a very mechanical set of refactoring. Maybe it was something I could just start doing myself, mindlessly, and start to learn this section of the codebase.

I started doing a lot of that context refactoring myself. And that did teach me a lot about the files and the methods and their connections. More importantly, it got my brain woken up and engaged in the "writing code and focused" part of the work.

I don't remember exactly when this happened, but somewhere around here, I took a big mental step:

I decided I could at least try using AI, to explain some of this code to me.

I was very cautious about it. I didn't want to install a CLI/TUI agent. I wanted to stay in VS Code. I'd seen the KiloCode extension mentioned a few times on HN by its founder, so I decided to give it a shot.

So I installed KiloCode (probably Sonnet 3.5 or 4.0 at the time), pointed it at our codebase, and asked it to generate some architectural walkthroughs of this section.

.... and it did? and... they were actually kinda useful? and didn't have blatantly obvious hallucinations? and they actually helped build my own mental diagram of how the pieces fit together?

Huh.

Okay. I guess this is something I could keep using. But for explanations only. I still gotta write the code. Just me. Hard red line that I will never cross.

Mind. Blown.

August 26, 2025. Tuesday afternoon.

I'd originally planned to spend time with family, but plans got canceled. Could have gotten day job work done, but I was in a mood to do some Redux maintenance.

I was trying to optimize some RTK Query internals. Unfortunately I also realized that we didn't have any tests that covered the specific bits of logic I was fiddling with. Tests are important, especially for libraries. I knew I needed to add tests before I made changes, so I could capture the existing behavior.

But my brain was tired. Tests are code. I didn't wanna write tests. That was more effort.

And so my brain said "uh... what if we had KiloCode + Claude write those tests for us?"

I pulled up KiloCode. I described the section of the RTK code I was working on. I said "I need a couple unit tests that cover these cases". I dictated the logic, the setup, the conditions, and what file they should go into.

And for the first time, I said "Go write those", and hit Enter.

.......... and a few seconds later it spit out 75 lines of test code.

And I read those tests. Carefully. Every line. Every condition. It was exactly what I asked for.

And I held my face in my hands and wept.

Okay okay it wasn't quite that bad :) But I was, truly and genuinely, shocked and mind-blown.

Absolutely I had just crossed the line that I swore I'd never cross. I'd used an AI to generate code for me! I even wrote in my journal that "I feel a lot of concern over blurring my own ethical boundaries. but also I just got a bunch of work done I wouldn't have had the brainpower to do today otherwise. and making progress on RTK maintenance is good for my psyche."

Rubicon. Crossed.

And then it got worse? better? shocking-er? the next two days.

Our backend depended on node-lz4 to compress artifacts for S3. Unfortunately that lib was long out of date and unmaintained, and also the only thing keeping us from upgrading to newer Node versions. I'd repeatedly looked for alternatives, always found a rust-based lz4-napi package, but lz4-napi didn't have Node streaming support and we relied on that. Repeated this cycle every few months.

Wednesday I looked at it and said: "could.... could AI implement that streaming feature?"

I cloned node-lz4 and lz4-napi. Pointed KiloCode + Claude at the two repos. Fed in the previous issues and discussion. Asked. And it claimed "sure, that's doable".

I made it write out some kind of plan, starting with upgrading some Rust deps, and then building the Rust streaming internals before doing the JS wrappers.

I hit Enter.

......... aaaaand I watched with my jaw dropped open as it cranked away on upgrading Rust deps, built, tested, and then started cranking out the streaming code.

you've got to be kidding me.

I walked KiloCode through doing some cross-compat tests and other bits. and within a few hours, it had seemingly built the entire missing feature I'd been waiting for. Unreal.

It would later turn out the "streaming" part didn't work right, and the lz4-napi maintainer had to turn down the PR. Totally fair.

But... I didn't even know Rust! I couldn't have even tried to do this myself! And here was a (seemingly) working implementation! Just by prompting!

And sure, I'd read dozens of articles about people running umpteen agents, and having agents do all this work for you automatically. I thought I understood the concept. I thought I got it.

But seeing an agent just happily talking its way through making dozens of code edits, one after the other? Wow.

Thursday. I had to add a new lint rule to our own internal custom TS-based linting system to check for invalid data-testid uses. I understand all these concepts. ASTs, nodes, parsers. TS program loading, visitors. Babel traversals, ts-morph. I know these ideas already. Okay, fine, I haven't worked on our custom linting system, so I don't know the exact nuances of the AST libs we need to use here, but this is clearly something I can do, I just gotta take the time to figure out what moving pieces we have available and how they fit together.

...... or could I have AI do it for me?

So I had KiloCode generate an architectural doc describing our custom linting setup: where it lived, moving pieces.

Then I had it generate an empty lint rule skeleton and integrate it into the existing system.

Then I had it add some checks for JSX syntax patterns and set up some tests. Then I had it add checks for some known internal component types, and some more edge cases and tests.

And. It. Worked.

Drat.

I'd gotten more done in an hour and a half, with AI, then I could have possibly accomplished by myself. I'd maybe have found the relevant files and set up the initial scaffolding. Instead here I had an actually working first pass on the lint rule, with tests.

Mind. Blown.

So I regret to inform you that this AI code gen thing is, apparently, useful.

Now what am I supposed to do with my entire worldview?

Part 5: Diving In

Ramping Up

Programmers love to build systems and tools. We build systems to build systems to build systems. Ask a programmer to write a blog post, and they'll spent a month building a blogging CMS from scratch before they write the first article. We customize our shell aliases, our IDEs, our toolsets, our workflows.

I've been down that rabbit hole. I know that feeling. I didn't want to go down that rabbit hole here. Not yet.

This blog is still on the same version of Hugo 0.17 that I chose when I started this blog in October 2016. It's simple. It works. It's fast. I have entirely too many demands on my time, too many things on my todo list, more things than I will ever get done. The last thing I need is to waste hours and days customizing my blog setup. So I've tweaked the templates over the years, but intentionally never let myself get sucked into changing the build setup.

I tried to apply that principle here. I was seeing dozens of "HOW TO AWESOMEMAXX YOUR AGENT BY FOLLOWING THESE 11 SUPERPROMPTS AND L33T SKILLZ!" articles every day. I was even bookmarking some of them! But I held off on actually changing anything.

I stayed with KiloCode and Claude (which I'll abbreviate as "KCC"). I did not try to mega-optimize my workflow. I allowed it to be inefficient. I needed to discover how I wanted to use AI.

Immer

By pure coincidence, early September is when I started trying to optimize performance for the Immer immutable update library.

And again I regret to inform you that AI was really helpful here.

I spent over 110 hours of my own free time in Sep-Oct working on Immer perf, and I used AI extensively through that process.

I'd used Immer since it came out. My first RTK prototypes were built around Immer since day 1. I'm even quoted in the docs saying how much I love it. I knew the principles of how it worked. I'd glanced at the code once or twice. I definitely didn't understand the code, at all.

I downloaded the repos for Immer, Mutative, Limu, and Structura. I had KCC scan the READMEs and documentation and write up summaries describing what each library claimed it was good and and how the alternatives compared themselves to Immer. I had it analyze the internal architecture of each library and write architecture docs describing implementation details, and highlighting places where the alternates took different approaches.

I knew how to do perf profiling, but I'd already had previous issues trying to find good ways to get properly sourcemapped perf traces for Node scripts. After figuring out I couldn't just have my perf benchmarks script import artifacts directly due to process.env.NODE_ENV skewing results, I had to pre-bundle my benchmark script, but that made it hard to read results. KiloCode generated scripts for me that would post-process my perf traces and re-combine with sourcemaps.

Those initial scripts at least told me which functions were hottest. I pointed KCC at those functions and it identified low-hanging fruit that could be optimized: some simple cachine, some bailouts.

My next big task was porting Mutative's "notification callback" cleanup approach to Immer to replace its recursive finalization logic. I had KCC try that. It failed, three separate times. Just got lost in the changes. So I did it myself. By now I'd at least started to internalize the structure and logic of Immer's codebase, so I was gaining understanding of the code.

I got the notification architecture port changes in place, but a large chunk of the unit tests were failing. And here I did something else I had sworn I wouldn't ever do:

I turned my brain off.

I had KCC add a few dozen log lines to Immer's internals. Then I'd run one test at a time, copy 500 lines of log spew from the console, paste into KCC, and beg "tell me what went wrong here and how do we fix it".

Frankly it went against everything I'd always believed in! But this was also in the middle of a bunch of conference travel. And I just wanted to make forward progress, and get to the point where the tests passed, and then I could see if this architectural change actually improved performance or not.

So I did. And eventually the tests passed. And it worked! Not as big a perf improvement as I'd hoped, but meaningful.

And so I turned my brain back on. I looked at the outstanding diffs from the KCC fixes. And I either traced through to understand what the point was for that change, or in some cases, asked KCC to explain the rationale to me. I filled in the gaps in my understanding, completed my mental model of the system.

After cleaning up a bunch of debug code, I moved on to analyzing why proxied array methods were horribly slow. Turns out even a draft.filter() requires iterating and that hits all the get traps, creating new proxies and requiring cleanup even if all you did was read from them. KCC helped port some of the array / patch generation logic from Mutative. It did get stuck a few times, did hallucinate a few fixes. I kept redirecting it, kept it on track, and eventually we got that working too.

I filed the Immer PRs. Michel Weststrate released them. AI was instrumental in helping me do that.

By the time I was done, I had built a full understanding of Immer's codebase. I'd say I'm probably the second best expert on Immer's architecture and functionality, after Michel (and in some ways better now that I pushed all those changes).

Nerd-Sniped

During 2025, Replay was working on Replay Builder, an AI-powered app builder. I'd looked at some of the RTK code we were generating, and it wasn't good :) I remember one app that had five separate data fetching files, with five separate patterns for managing the requests. There shouldn't have been any of that, because it all should have been using RTK Query for the data fetching :)

So I looked at that, and thought "wait, I can make it write better RTK code."

And I nerd-sniped myself hard :)

I spent the next few months building out an entire 100% deterministic custom codegen system. DB tables -> Hono routes -> RTKQ endpoints. Fully type-safe and comprehensive. Modified a couple example apps to prove out the patterns we needed. Built the codegen system. Updated the prebuilt blocks used by the app builder. And I did all that work with AI. Absolutely would have taken 3-5x as much time if I'd had to do it by hand.

But, uh... at some point you have to hook this up to the LLM so it knows how to use the new system.

And I had previously sworn that I would never, ever, spend time working on anything like "prompt engineering". Or "AI SDK plumbing".

Then again, I'd also sworn I would never use AI to generate code :) and here I was now using AI, so clearly that line had vanished.

So yeah, by the time I got around to the last phase of the project, I was actually excited about hooking up the codegen system to the actual LLM. Updating the prompts, getting the tooling right. I wanted to see it all work together.

I guess I just needed the right motivation, and a concrete example that wasn't abstract and wasn't something I feared.

Self-Reflection

I have a strong need to keep records about my life and what's going on. I've journaled a lot over the years. Kept a daily work journal for over a decade. Lots of personal journaling the last few years. Blogging. Various other communications.

In early December I was going back to re-read some of my old work journal entries to remind myself about some earlier projects. Got the idea that I could use LLMs to help auto-summarize some of the entries to roll up details on what I worked on and accomplished. That turned into a larger project. Had it generate some vertical summaries of things I worked on, growth as an engineer, where I succeeded, where I struggled.

And then I had it pull together some "insights about myself". And that was a fascinating document.

I've read plenty of articles about AI psychosis. People going crazy, thinking they've unlocked the Great Secrets of the Universe. I value my mind. I refuse to fall into that trap.

LLMs are, hand-waving, Statistical Word Generating Machines. I know, I know, that's a drastic over-simplification, the tech is far past that, I get it. My post. Work with me here.

An LLM has no soul. It's not a person. No matter how many pronouns or personalities or SOUL.md files you give it, "it" isn't actually intelligent. A set of words generated by an LLM carries no thought, no intent, no purpose behind it. We attribute intelligence where there isn't any.

And yet. Sometimes we look up, and we see a cloud, or the stars, and the shape has enough recognition that it means something to us.

And so an LLM generated a bunch of words, about me. Who I am. How I think. My struggles. My quirks. My flaws. My strengths.

And sometimes just seeing recognition and words about ourselves, from outside our own brain, helps us realize something about who we are.

This doc had a lot of very personal insights about myself. The specifics are important, and they're private. But with some of the ways I've grown, and challenges I've faced... I actually needed to read those. And yeah, I was bawling that night :)

Not gonna claim this works for everyone, or even that it's a good idea. But yeah, throwing a bunch of my writing into a blender and pulling out summaries and insights was useful for me.

Part 6: Liftoff

Trust Factor

You know the step change that happened in late 2025. Opus 4.5 was really the turning point. Even Sonnet 4.5 was a distinct improvement over earlier models.

I still very carefully drove all the sessions (and I still do today). But more and more, the code I saw getting generated was... well, valid at least. Maybe not always good. Certainly not how I'd write it. But if I told it to build $THING with $APPROACH... it'd pretty much do that. And sure, I'd have to course-correct some. But not nearly as much as before. And the mistakes were more about intent or details. At some point I realized I wasn't even thinking about hallucinations any more.

I'd also been using Claude.ai for some personal Q&A sessions. Still cautiously. Then it occurred to me that because I was starting to trust the code output of the models, I was also starting to trust the models in other domains too.

In a way it seemed like the opposite of the Gell-Mann Amnesia effect. We read news stories about our own domain, realize the reporter and news source have no clue what they're talking about for our area of expertise, but still trust for other areas unthinkingly.

Here... I didn't trust any of it. And then I found myself starting to trust the code. Maybe it wasn't good, but it worked and did what I wanted. .... so did I trust model output for other areas too? More and more, the answer was yes.

Gearing Up

By Christmas 2025, I was finally ready to go customize my workflow and setup. I'd accumulated a few months of experience, figured out how I wanted to use AI, and found a workflow pattern that worked for me. Now it was time to slap on all the bells and whistles and make this thing scream.

I tried Claude Code. For about a day, tops. The VSC extension didn't work at all. Tried the TUI, and bounced off it hard. Why would anyone want to use something stuck in a terminal?

Tried OpenCode. Also tried the TUI. Pretty much the same result. Don't get me wrong, I spend a lot of time in terminals, but I'd much rather do my work in a GUI. Nicer fonts, actual copy-paste handling.

But, OpenCode had a lot more flexibility. I'm a Windows user, but I also do my day job work in WSL. I found a third-party OpenCode web UI called CodeNomad. Figured out I could run the CodeNomad + OpenCode processes inside of WSL, load up the web UI on the Windows side, and not deal with cross-platform filesystem issues.

I'd bookmarked hundreds of potentially useful AI tools, techniques, approaches. I scoured the list, found several tools that looked like a good fit. I used OpenCode + Claude (henceforth OCC) to start customizing my own OpenCode config. I had it cross-compare plugins and tools. I started generating my own OpenCode plugins to fit my needs. I customized the AGENTS.md and skill files. I generated custom scripts to deterministically handle parts of my workflow and eliminate steps the agent would have had to do via tool calls.

By the end of the break, I had a fully loaded new OCC setup, ready to work the way I wanted to work.

The Re-Pivot

In late January, Replay re-found our direction.

We put together Replay MCP, allowing agents to do the hard work of investigating a Replay recording. I identified some existing pieces of infrastructure we'd previously built that we could tie together to form a new product line for automatically debugging failed E2E tests, and that this was a highly promising business direction for us to focus on.

We went all in.

Time travel debugging was always the promise, the potential, the future. We'd struggled to sell the vision and the use case, and ran into friction with humans trying to adopt the tools. Now, with agentic development 100Xing PR rates, Replay could fit into the automated QA story in ways no other tool could solve. And, Replay MCP gives agents the same time travel superpowers that humans always had with the Replay DevTools UI.

So now I've got an agent workflow, a product direction, and a license to go build Awesome Time Travel Superpowers again. This is going to change the world and fulfill the vision.

Let's rock.

Part 7: Warp Speed

Pulling It All Together

A few days later, I had an incredible brainstorm.

In summer 2023, I'd played with a Replay time-travel-based perf analysis tool for React and Redux. There are tools out there to capture React perf, but they're limited. They mostly only work against dev React builds, the numbers are only useful in relative terms ("CompA is 3x as slow as CompB"), and they don't tell you anything about how Redux dispatches fit into the render. There's nothing out there that can break down a dispatch into separate times for reducer, subscriber callbacks, selectors, and ensuing React render time.

I had whipped up a hacky POC to extract those perf numbers, by digging into the guts of React and instrumenting key points in React itself, and similarly in Redux. It worked! But it was too slow. Took way too long to collect the data. Wasn't a priority, so we never shipped it or pushed it forward.

We spent 2024-25 building out a sophisticated React analysis layer, so that we could create an async dependency graph: "DOM update caused by React render caused by setState caused by promise resolve caused by network request caused by useEffect", etc. I'd done much of that analysis work.

I suddenly realized that the analysis layer we'd built was the system I'd tried to build a few years earlier! Or rather, it was a vastly better implementation of the data extraction, and it covered React. Now all I had to do was repeat the same instrumentation patterns for Redux, and I could recreate the same perf analysis insights!

So I did! Had my agent read up on the React instrumentation approach, identify the key lines in Redux, look at my previous POC, and crank out the implementation. Bam. And within a couple days, I had the Redux + React perf insights coming together. Even better, I could extract those perf insights from production React apps, something no other tool could do!

I went kinda crazy with this newfound power :) I started generating a bunch of React and Redux insight reports, trying to see what I could pull together with the data we had available.

Pattern Matching

I'd spent most of Fall 2024 and Spring 2025 working on that analysis layer. It was painstaking work.

A former Replay teammate had insisted there was no possible way we could actually instrument the guts of React. It would be impossible to maintain it. React changed too much over time. We'd be eternally trying to keep up with the React team.

He was right.

I made it work anyway :)

The original version had looked for specific source fragments in the one experimental React build from Feb 2024 that we happened to be using in our own DevTools codebase. To make it work with React 18, I'd had to spend weeks carefully looking through different ReactDOM dev and prod artifacts to find the equivalent methods and copy the exact fragments from React 18 so our analysis layer could insert the right callbacks. Then I had to do the same thing for React 19, and that was a massive headache due to the drastic internal changes in React's implementation between 18 and 19. Functions had been moved, renamed, migrated. The Closure Compiler had inlined some functions that we cared about.

My first couple passes at this were intentionally ugly and unmaintainable. I didn't care about how the code was organized. I just needed it to work. Once I verified the output, then I could look for the right abstractions and common patterns, and try to organize that logic properly.

Our instrumentation layer looks for 40+ unique locations in the React bundle. createElement, renderWithHooks, effect creation and execution, and many more. I settled on a factory function pattern. Every single location that we cared about, I created a factory function that accepted a reactVersion object with details on the major and minor version, dev or prod or experimental, and other distinguishing factors. Then I wrote simple if/else logic based on the versions to decide which fragments and variables and snippets we cared about that for that version.

It was ugly. It was verbose.

It was straightforward. It encapsulated the complexity.

It worked.

I later went through and hardened it. I replaced literal copy-pasted source code strings with ast-grep pattern matching, to ignore whitespace and garbage compiler output variable names and focus on the structure of the code.

That sucked up months in 2024-25. We didn't even care about React 19 at the time - our app builder specifically used React 18. But React 19 had just come out. And if we ever were going to make this a production reality in the future, I knew it would have to handle React 19, and who knows when the React team might start shipping 19.x minors. So better to take the time to do that now. Really only looked for React artifacts as they'd appear in a simple SPA like Vite, intentionally didn't handle Next.js and complexity like the App vs Pages Router or canary builds. But I felt confident I'd put the right foundation in place.

Fast-forward to Feb 2026. I'm back in the analysis layer. Suddenly we're trying to analyze recordings that aren't just our own app builder, but real apps in the wild. Next.js 14/15/16, Webpack/Turbopack, some other weird experimental React 19+++ builds with feature flags turned on.

And I already had the right architecture and patterns and encapsulation in place.

It was dumb code. ugly code. Predictable code.

And you know what looooooves dumb, predictable, repetitive code?

LLMs :)

For the first time, I tried telling my agent about the React instrumentation layer and what we needed to add to expand it for these cases. It read the React fragment files and the rest of the instrumentation layer. It slurped up the various ReactDOM bundles I'd collected. And so help me it started comparing 19.0 and 19.1 and 19.2 and Next 19.random.canary and whatever other weird versions I was looking at, and within a few minutes it could tell me where the relevant functions and fragments had migrated between versions and how they'd changed and what we needed to update in the fragment factories to handle those.

I think I actually literally screamed at one point?

I mean, I spent months on this! I was so deep in the weeds and trying to understand what each fragment meant and why it mattered and how our analysis layer turned these points into a graph and what the actual variables were and how to insert the callbacks correctly without breaking the instrumented bundle.

And now I just say "here's the pattern and the current task, go figure out the next changes we need to add", and it just... does it? Correctly? Faster than I could have done?!?!?!? ARE YOU KIDDING ME?

But here's the thing. An LLM could never have come up with this from scratch. Even if I were rebuilding this from nothing today, I would have to tell it what I wanted it to accomplish. Even if it came up with the methods, I would have to provide the intent.

Looking back, I am so proud of 2025-me. I was upset, and angsting, and confused. But I still took the time to understand, to investigate. I built the right foundation, the hard way. I built the mental model in my head, I got the fragments working, I decided on the factory function encapsulations and the ast-grep usage and the file structures.

Because I did the hard work the first time, it paid off in 2026. When we needed to update that analysis layer for React 19.x, and handle Next.js variations and React canaries, the foundation was in place. I just had to build on top of it. A couple weeks of expansion, instead of months of building from scratch.

And with the foundation and those patterns were in place, now I could just point my agent at it and say "here's the right patterns, here's the goal, here's what we're adding. Read, learn, compare, do it".

Shipping for Agents

The Replay DevTools UI had started as a fork of the Firefox DevTools. We'd added time travel abilities, and rewritten the whole codebase in the process. That let me as a human investigate Replay recordings and do time-travel debugging, with my own eyes and knowledge.

Now, we had an MCP. That meant agents were doing the hard work. And they don't get tired :) Confused, maybe. Still very easily misled. But they'll happily just keep going indefinitely.

I had a few specific examples I was using as personal benchmarks. In one case, I'd pushed a PR to update the Replay DevTools UI to React 19. When I pushed the branch, half our Playwright E2E tests failed. Usually means you broke something badly. We already had our own repo set up to do Replay recordings of all tests, so I just popped open one failing test recording. Scrolled down the list of console messages, and sure enough, last message was an error: findDOMNode is not a function, from React Transition Group. Yup, sure, need to bump that too or tweak usage. 30s, done.

But could an agent find the answer that fast?

I tried feeding the recording to an agent, and it bumbled around for 10-15min. Didn't help the ConsoleMessages tool was broken and didn't report the error :) Eventually I "suggested" it use the Screenshot tool. It saw the error overlay, dove into the root error boundary component, added a logpoint, exfiltrated the error, got the right result.

I found the answer in the recording in 30s. Could I give an agent that ability?

So I started rewriting and expanding all our Replay MCP tools. I built a RecordingOverview tool that surfaced React version, Playwright test results and errors, console or React error boundary errors, React renders, Redux / Zustand / TanStack Query state results, network requests. I built a ReactRenders tool that lets an agent drill down into render details. A ReduxActions tool that provides drill down into Redux actions and state, and similar tools for Zustand and TanStack Query.

By the time I was done, I did see my agent find that same bug in <1min.

I also tried recreating other cases, like a React bug that Dan Abramov had reported, and my agent found that same bug in <10 min (faster or shorter depending on prompt quality).

Guess what. Turns out that surfacing deterministic info deterministically and providing more details and the ability to drill down and dive deep, helps people and agents. And now that I had all this time-travel info at my fingertips, I can build the tools that make solving impossible problems possible.

So apparently now I'm back to building UI. But for agents more than humans.

Part 8: So What Do I Think Today, Anyway?

So that's my story as of today.

If you've gotten this far, congratulations. Thanks for sticking with me. It's a lot of words (even for me!), and a lot of personal details. I know. And there's more to go.

So what do I think about all this? About agents, and using AI to write code? All the fears and concerns that I wrote about so vividly earlier?

Well, I do have opinions and thoughts. You didn't ask, probably. But some people have. And it's my blog, and I'm writing this one for me. And as always I hope some people get value out of this. So slap an MIT license on these thoughts - up for grabs, do what you want with this, not implied for any particular use, not my fault if this doesn't work for you.

Most Of My Fears Are Still Valid, And So Are You

I wrote a whole laundry list of fears up top.

Some of them turned out to be non-factors.

I thought I would essentially erase my existence if I ever stooped to asking an AI to write code for me. Well, I did it. I'm still here. I'm still me. So I was wrong on that one.

I thought I'd utterly hate being a "pseudo-PM/code reviewer" all the time. Uh. Well. Guess what. I'm kinda mostly a pseudo-PM/code reviewer. And it... doesn't suck? We'll get back to that in a minute.

Clearly the generated code quality is better than it used to be. Not necessarily great! It's not elegant. AI will happily keep slapping on more layers, not extracting abstractions. But we're long past the hallucination stage.

But most of the rest of my fears and concerns are still there. Especially the larger societal ones.

I know, I know. Every generation worries about "The Kids These Days". There's quotes from ancient Rome about how the kids are lazy and don't do anything. We've survived the last few thousand years. Society's still here. Human condition.

I do seriously worry about the large-scale impacts of AI on our collective thinking processes. It was hard enough to teach critical thinking skills in school when we had to do the work ourselves. We relied on artifacts like essays and projects as both forcing functions to make people learn the material, and social proof that they had learned the material and done the work. Now that anyone can copy-paste an assignment into their LLM of choice and paste the results back into a grading system in about 30s, there's no need to do the learning or thinking, and the artifact is no longer proof of the learning and the work. What happens to our world in the next 20-30 years, when we've Wall-e style obliterated our ability to think or learn or do?

We rely on photographs and voice recordings and trust that these things are proof that something actually happened. We've already seen those can be mimicked and forged instantly. How can we trust any news source or report or proof, and verify that things actually happened as described?

I'm towards the tail end of my career. I already felt out of touch with what today's junior devs must be going through. My own career arc has been unique - no one could replicate the specific opportunities I had at the times I had them. I have no idea what it's like to have to grind l33tcode or algos/DSAs, or go through a bootcamp and try to stand out amongst a sea of otherwise identical FE devs with no experience. What happens to people trying to get into the industry? What does their pathway look like? How do they build a career, gain experience?

I have no answers. Not that anyone would care if I did anyway :) I worry for the future in a lot of ways. But it's not mine to build.

I've talked to plenty of people who still refuse to use LLMs in any way shape or form. Who fear for the future of the industry, mourn the loss of their craft.

I understand. I agree. I was there. I'm still there. Please don't take my own change of heart and current excitement as any kind of statement that your own feelings are wrong. You are entirely justified to feel that way, because Feelings Are Valid and Real. I hope you can find mental peace and safety in the midst of this insanity.

The Tiger Is Out

There's a beautiful little poem apparently written by a 6yo kid a few years ago:

The tiger
He destroyed his cage
Yes
YES
The tiger is out

Well, guess what. The LLM is out. And it's not going away.

I've read a lot of the "THEY CAN'T POSSIBLY KEEP GETTING AWAY WITH THIS! IT'S A HOUSE OF CARDS! FINANCIAL RACKETEERING! CIRCULAR MONEY LAUNDERING! UNSUSTAINABLE PACE OF GROWTH! FAKE DEALS! NONE OF THESE DATA CENTERS ACTUALLY EXIST!" articles. I think there's a lot of truth to all that! It's entirely possible that a few years from now the entire AI ecosystem could collapse because of financial shenanigans, and that it could take down the US and world economies with them.

But LLMs as a technology exist. The genie is out of the bottle. Pandora's box is open. Pick your metaphor here.

And even if we stopped development on LLM model tech right this second, and we never had a single improvement in quality or memory usage or token speed for the rest of time... what we have now is already enough to upend many aspects of society as we've known it.

There will be positives from this! I'm not sitting here saying anything involving LLMs is inherently bad. Remember, I'm actually quite happy having LLMs crank out code for me at this point. I've talked to friends who are seeing potential medical benefits, and there's a lot more.

But I do mean that we have unleashed a technology that will continue to change our society in ways we can and can't predict going forward. Just the predictions I can see are concerning, and I've got a pretty limited imagination.

I don't know what the reality is on any of the ecological aspects (power and water usage). No idea on the big ethical / moral "WHERE DID THEY GET ALL THE TRAINING DATA? CAN ANY OUTPUT FROM AN LLM BE VALID WHEN IT'S ALL JUST COPYRIGHTED INPUT?" questions. And to be frank at this point it's kind of irrelevant.

Is this the future I want?

No, not particularly.

Do I think everyone should jump wholeheartedly into AI and use it everywhere for everything? Of course not and I hope you see that distinction in my explanations here.

What does this mean outside the software engineering industry? I have no idea. I see lots of discussion on AI eating writing and art and music. That seems wrong, in a different way. I dislike AI writing tone, and I detect written slop a mile away, and yet I've also used AI to generate summaries of text for myself and for work because there's still some value in the content as ugly as it is.

I suspect LLMs do better at code than in other domains, but I have no experience or knowledge what LLM use looks like in law or physics or finance. Wouldn't surprise me if there's a big difference in results, but man, I can't even keep up with our industry or even just the React ecosystem, don't ask me to keep up with other industries too :)

As you've seen, I am excited about what I personally can do with AI. I also see every day that it's causing problems directly and indirectly. So, no, I'm not blind to the damage that AI is doing, and that's the aspects that concern me. But given that it exists, now the question becomes how I'm going to react and use this myself, same as every other technology that has come out in my lifetime.

I'm a firm believer in dealing with reality the way it is. Not the way I want it to be. Sure, I'll gripe and complain and push if something isn't the way I want. I don't blindly accept it or act like things can't change.

But you've got to deal with the world as it actually exists. Good, bad, and ugly. And in this case, LLM tech exists, and it's already this powerful, and this easy to use. It's now part of our reality. Acknowledge that fact, and plan accordingly.

Gotta Go Fast

For software engineering especially: we built our processes around people. The Agile Manifesto, Scrum, standups, PR reviews, issue trackers, Git commit messages. It took time to plan. Writing code was expensive and time-consuming. We had to make sure we were writing the correct code in the first place. Can't waste time on irrelevant experiments.

AI broke that constraint.

With AI, we can crank out infinite amounts of code, essentially for free, and essentially immediately.

None of our processes were designed for that.

I spent a lot of time at AI Engineer and React Miami in April. I had dozens of conversations. And a lot of them were different forms of the same question: "how do we scale our review / test / QA / deploy / planning processes to keep up with the amount of code we're generating?".

I have a sort of answer to that. And a lot of you aren't going to like it :)

On the one hand, having an infinite codegen at your fingertips does change our processes for how we do things. Why make a napkin-styled wireframe of a UI to sketch out the layout, when you can just prompt it into existence in 30s? Why waste a couple weeks on building one prototype, when you can crank out 10 prototypes simultaneously and compare the results? Why spend time going back and forth with handoffs between designers in Figma and ensuring a dev has the right design tokens in place, when you can just have them both landing updates at the same time? This is legitimately possible, and does actually change so many of our assumptions for building apps.

But as my mother told us all the time growing up: "humans will take everything to the extreme".

Many of us have already experienced or seen AI psychosis. When you start writing code with one agent, it can edit files faster than you can. We're programmers - we already know that computers can execute our desires faster than we can do them manually. Write one line of code. Put it in a loop, it runs many times, automatically. Throw in a thread pool, now they're in parallel. Distribute the system. Scale. GPUs. Cores. Optimize.

Okay, what happens if I run two agent sessions at once? Can I ping-pong between tabs? It takes a couple minutes for Agent 1 to do its work, surely I can switch and tell Agent 2 to do some work in the meantime. It's the XKCD "COMPILING!", except now instead of sword fights in the hallway, we're just task switching. Context switching. Having to keep multiple tasks in our brain at once.

Over the years I've found that social media already ruined my ability to focus. I used to be able to intently focus on a single task for hours. Now I kick off a command that takes 30s, and my brain wants to go check Twitter and Bluesky and Reddit and HN just in case there were any new posts in the last 5 minutes. It's bad. I hate it.

Well, guess what. Running 2-3 agent sessions at once is just as bad. And that's just the starting point.

Okay, if we can run a couple sessions at once, but we're "manually" driving our agents... what happens if we remove the brakes? Why should I have to tell the agent what to do, or approve the permissions checks? SCALE THE SYSTEM BABY! YOLO MODE FOR LIFE! Why not run 5 agents? 10 agents? 100? 1000? Build a custom multi-pane dashboard just to track all my agents. Don't need to know what they're doing or how, just that they're all busy at the same time and not blocking the merge queue. Markdown's outdated, too many markdown files. Throw some gas rigs and mayors and polecats in there. We're tokenmaxxing, there's a leaderboard, burn more tokens or you're fired. Build a dark factory. No human ever looks at the code. It's Steve Ballmer's "DEVELOPERS! DEVELOPERS! DEVELOPERS!", except now it's "AGENTS! AGENTS! AGENTS!"

The goal is unlimited speed.

And what happens when you go too fast?

You crash.

Oops It's Capitalism

This is a pretty ironic section for me to write. I come from a conservative background. My own views and opinions have evolved significantly over time, but I'm hardly what you'd call an activist.

So here's my late-night, actual, no-kidding opinion:

We, as an industry, need to collectively accept that humans and processes and even agents have limits.

And sooner or later we're all going to have to take a step back and figure out what our safe, reasonable, and maintainable limits are. Because we're all about to learn those lessons the hard way.

They'll vary by company and team. Some teams will set up better automated processes. Maybe we'll democratize it, whatever the QA / CI version of k8s is so that we can all benefit from auto-scaling systems. But there will fundamentally be limits to how fast we can build and ship code.

Maybe it's 3x current / historical speed. Maybe it's 10x. I dunno.

But right now it seems like the entire industry is assuming there isn't a limit. And that if it is possible to go faster, then we must go faster.

Everyone's scared of how fast everyone else might go. Everyone's jumping on AI to get ahead of the curve, to try to beat their competition to the punch. Can't just naturally let adoption happen, have to mandate it top-down. Slap those sparkle icons in every single toolbar and panel, shove those "NOW WITH AI" features down everyone's throats whether they like it or not.

Wait, why are we going faster? To ship features? To beat our competitors to market? To deliver shareholder value?

Oops. It's Capitalism.

Don't get me wrong. I'm the last person who should be delivering a critique of our economic system here :) But as someone who's read too many online rants... well, that seems to be the final answer for all of this. We Go Faster so that we can Deliver Value in service of the System.

This is not sustainable.

Maintainability is the Mindset

I personally learned years ago that even in a great job, with a comfortable pace of work, and managers who actually care about you as a person, and tasks that I personally find fascinating and interesting and challenging... you still have to set boundaries. There's always more work to do. Always another fire to fight, another bug to fix, another feature to get out the door. No one else is going to tell you "you've done too much work, throttle back". You have to enforce that yourself.

It can wait.

I love work. I love being productive. I view the world in terms of checkboxes and getting things done, which I'm very well aware is not the healthiest mindset and trust me my therapist knows all about this trait :)

But "go faster, at all costs", should not be the goal.

Michelle Bakels has given an amazing talk on developer health, and one of the key lines was how "rest isn't earned. It's a critical part of performing at our best." Everyone should watch that and take that mindset to heart.

The goal should be a healthy, consistent, sustainable, maintainable pace of development and shipping.

I'm not the world's best poster child for this. I push myself hard, always feeling like I could and ought to be doing more than I am. I'm trying :) Work in progress. Do as I say, not always as I do myself.

Maybe that's 10x for you, and 3x for me. Whatever. That's fine.

But the sooner we all intentionally acknowledge that and adjust our plans and expectations accordingly, the better.

But Non-Determinism?

I wrote so much at the start of this post about how bad non-determinism is. I want predictability. I want to re-run my code and know that it's going to execute the same way every time given the same inputs. It's literally the core of how Replay time-travel works.

I've seen folks online hammer on this point. LLMs are inherently non-deterministic. How can you possibly rely on that? Forget even hallucinations, how can you build apps when you can't guarantee what the output is?

I've also read comparisons to how this is all just virtual slot machines and gambling. In the same way that social media gives us dopamine hits - "ooh look, 3 new posts since I last refreshed, maybe one of these will be interesting!" Gimme something new, something different, maybe it's useful. Same thing with LLMs and code. Give it instructions, pull the lever, watch the ~~dials spin~~ messages and diffs, and then inspect the results and hope it does what you want.

There's a lot of truth to it. I feel it all the time.

But we've always built guardrails and tools to keep the work on track. We use code formatters to make the code style consistent and eliminate the arguments over tabs vs spaces. We write unit and integration and E2E tests to verify the whole system works as intended. We typecheck and lint and squiggle-ify in our editors and in CI.

Non-determinism can be made sufficiently deterministic.

It takes work. It's a different kind of work. The results aren't always what you want. But with enough boundaries in place, the non-deterministic output should be close enough to fulfill the intended goal, and can be tweaked if needed. Provide the right context, the right prompts, the right skill files. Don't cut an agent loose autonomously, review the plan and make sure it's actually what you want done. Agents love to type-check. Give them the lint errors, enforce the commit rules, throw all the static analysis in there. Add the AI-powered code review, ralph loop it if you have to until the agent and the judge are both happy.

I still believe in making as much of the system as deterministic as possible. Redux reducers. 100% deterministic codegen. Turn patterns and routines into scripts, encode the knowledge. Minimize how much work the LLM actually has to do. Make it code anywhere you can. And then you can use an LLM to help with the automation.

There's a lot of variation in how people use AI. Agents, harnesses, workflows, models, prompts, plugins, skills. I see some complaints and think "well I've never had that problem myself", and I don't know if it is the codebase and domain I'm working in, or the models I'm using, or the prompts and context engineering, or just the whims of the RNG. No guarantees, no promises, this is just me saying what I've found works for me.

Is This Actually Better?

.... maybe? Sort of? Partly?

I've seen people throw around stats for how actually writing code is only N% of the work we do, and so even if AI writes code 3-5x faster, it's only a partial speedup. Lot of truth to that.

We're still pretty early in the adoption/hype curve. At some point you have to actually maintain codebases that were built with The New Shiny Toy, and then we all realize the horrible mistakes we made during the early hype period. Redux is a classic example of that :) During 2015-2016, people put Redux in sooo many places it should never have been used (hello controlled forms!), and then by 2017 there was a collective "Oops What Have We Done To Ourselves".

My own perspective is limited here. I've at least used AI to work on a decent variety of codebases at this point: Replay's backend and frontend, Redux libraries, some one-off side projects, a couple larger side projects. I've found value in all of them. But it's also only been ~6 months for me, maybe a year for most of us. We're just seeing the start of the ripple effects.

I've used agents to do a bunch of tasks much faster than I know I would have done it myself. I've seen the agent happily implement individual sub-features I'd had it plan out in a side project, and it wrote each of the files individually matching the listed spec, and none of them got hooked up together at all, and I didn't even catch that until it had "finished" all the issues for that feature. I've been able to think higher-level, and I've had to burn time trying to keep the agent on track. I've seen the agent expand my brief thoughts into plans that cover edge cases I never would have considered, and I've seen it happily proclaim "clean, no issues" after cranking out a bunch of code and yet missed the blindingly obvious problems.

But it does seem that while (true) { goodEnough() } has a value of its own.

I've seen plenty of debates about how AI is going to kill OSS. Maintainers are getting flooded with spammy contributions that look plausible enough. I fully believe that flood is happening. We haven't had to deal with it as much in the Redux repos, but I've certainly seen plenty of examples across the ecosystem. I've also seen maintainers say they're finally working through the issue backlogs and making headway.

I can tell you it sure feels a lot better mentally to have found a way to use AI for my work and be happy with the results, than it did when I was spending all my time angsting and fearing for both the industry and my own career. I'm not saying my own mental peace is the only factor to consider here :) but honestly it's a pretty big factor personally.

So I don't know. I'm using it because I myself have seen evidence it can help me finish my tasks faster and augment my ability to understand code (even if that understanding now takes different forms of effort). I think that's a good thing. I see plenty of other people saying similar things. But I fully acknowledge there's tons of competing pros and cons here, for myself and the industry.

What About the Craft?

I talked a lot about the craft of programming earlier. Also kind of ironic. I believed in it, and yet I never felt like my own work ever quite measured up to that ideal. Especially in recent years, I found myself gravitating more towards prototyping, especially with time travel debugging, where my focus was just trying to figure out and prove "is this even possible?" rather than fully productionize features - and goodness knows that didn't involve a lot of "craft" :) (you should see how often // HACK showed up in parts of my code.)

So what now? Is there any semblance of a "craft of programming" now that we're all relying on agents? Soulless Statistical Word Generating Machines, just generating tokens endlessly?

In some ways, no. That world is gone and it probably isn't coming back.

Pick your historical precedent. Cars eliminated horses, etc.

I talked to a genius-level senior engineer recently. He's built a reputation as someone who has incredibly deep knowledge and has pulled off some amazing feats. He's considering leaving the industry soon. He compared it to woodworking. Someone can hand-make a wood table, careful crafting, lots of attention. In some cases, they can command a higher price for a hand-crafted item.

No one's going to pay extra just so a human can hand-craft a for loop.

This saddens me. I wish it weren't the case. Please see previous section on Capitalism.

And yet.

I personally have found that I am able to use AI tools successfully, because of my deep experience and knowledge.

AI has been able to amplify my intent. It's a force multiplier. I use it to research portions of a codebase, write up a doc, draw diagrams, help me understand how something works. I feed it several paragraphs of an idea for a feature or a task, it reads a few dozen files and pulls in examples, and it spits out a much more detailed plan that covers a bunch of edge cases I never even would have considered. I modify the plan, I make some adjustments based on my own knowledge, I tell it "implement", and it cranks out the changes I want far faster than I could have made the edits. I've seen drastically different results depending on the quality of the context and the prompt. There is skill in providing the right context, and yes, SKILL.md files. It is literally a skill issue :)

It's certainly not perfect. Even with today's models there's mistakes. But it's more often mistakes of intent than syntax or hallucinations.

And I'm the one in charge. I'm driving. I provide the vision, the intent, the directions. And I heavily leverage my expertise to keep the AI on the right track, redirecting when it goes off the rails, asking "wait have we considered this approach?", insisting that we do things the way I want.

I've read a bunch of articles lately saying that AI has exposed a split in programmers. There's people who were in it for the craft itself, and then there's people who were in it for the results, the output, the end product.

I thought I was in it for both. I got so much joy from saying "this didn't exist, and then I thought and I worked, and now it does exist". Creation ex nihilo. But also nothing made me happier than those 6-hour uninterrupted deep flow music thumping mind on autopilot deep in the code sessions.

I still care about the craft. I want to build apps the right way.

But maybe it was a bit more about the final result than I thought.

I really hadn't worked on any side projects in the last 10 years since I started maintaining Redux. My todo list is perpetually overflowing, there's always more self-imposed priorities and responsibilities than I can ever get around to. I didn't have time or brain space or priority to work on code "for fun". Day job, second unpaid job as maintainer, trying to have a life, can't fit anything else in there.

In the last few months I've actually whipped up a few side projects! There were ideas that popped into my head. And normally I'd have dismissed them and said "I don't even have brain space to think about these, much less time to write the code". But now... if I could spare 30+ minutes to spin it up, I could get an agent session fleshing out a project plan, and then maybe babysit a session while I'm on the couch in the evening. Still some time and mental overhead, but a far lower barrier to entry than before. So yeah, I actually have built several small projects I never would have even tried to do otherwise. And it's been fun and exciting to see those random ideas come to life.

And to my surprise, I've actually re-found the joy and fairly deep flow in those long coding sessions where I'm driving agents! Yeah, I'm shocked too. Never would have predicted this. Maybe it's not always quite as deep. Certainly the actual work is different. But with my own hands-on human-in-the-loop agent dev workflow, my brain is engaged. I'm thinking through implementation approaches, having the agent flesh out my ideas, reviewing the output, deciding what to change and what's next and when this subtask session is done and it's time to move on. And I truly have had numerous coding sessions where I found myself deep in that same flow, and woke up at the end thrilled and excited with what I had just built, via an agent. My brain isn't off. I'm not just mindlessly stamping "LGTM". I'm thinking. I'm creating.

So maybe I'm not writing the actual lines of code now. Not crafting the perfect elegant algorithm.

But I am still "crafting" the software, in a way. I'm applying my taste, my judgment, my expertise. Not just in "here's the next ticket in my queue", but very intentionally deciding what and how to build the solutions, and what the right results are. I'm supplying the context, figuring out the right prompts, building the guardrails, and providing the vision.

I can do that, because I have an entire career's worth of experience and knowledge. I don't know how juniors today are supposed to build that expertise.

But hey. That was always the job.

Final Thoughts

Well. That was long. (Audience: "YES MARK WE KNOW!")

Not sure what's worse - you sitting here through this whole thing, or me having written ~12K words in one single Saturday :)

Like I said, I have fears. I don't have answers to a lot of big questions. It's always been a big scary world, now it seems like things really are accelerating out of control.

But I never could control the rest of the world. I can only control me, and my thoughts, and my actions.

I'm sure some of you disagree with some of the opinions I've stated here. That's fine. I'm not trying to convince anyone. (Well okay maybe the "3x limits" and "rest is core, not earned" things.) Go do whatever works for you.

A friend pointed out that this is probably just a really long rendition of the "stages of grief". Haven't tried to map them to what I wrote, but sure, I buy that.

Writing this was kinda cathartic :) I knew I had thoughts and opinions. Not sure I knew I had this many just sitting there waiting to come to life :)

Usually I fill my posts with dozens of links and cross-references and citations, backing up my arguments. Not gonna bother with that here. This post is about my thoughts, feelings, experiences, and opinions. You can find examples of everything I described in about every other HN and Twitter post. (I mean if you desperately want actual examples email me and I'll dump a laundry list of links or something :) )

If you found any of this useful at all, lemme know. I honestly have no idea how anyone's going to respond to this post. I suspect it could strike a chord. We'll see.

And now that we've gone through all that, on to the part some of you were really asking about :)

Pop on over to the next post to see details on my own personal OpenCode agent setup and coding workflow.

Thanks, y'all!

Mark's Dev Blog

My Thoughts on AI, Part 2: Agent Setup, Workflow, and Tools

Introduction

Agent Setup Overview

Agent: OpenCode and CodeNomad

Model: Opus 4.6

IDE / VCS: VS Code and Fork

Daily Development Workflow

Parent Orchestrator Session for Project Management

Subtasks for Development

OpenCode Config

Permissions Management

File Reads

Code Structure and Search

Context Management

Session History and Search

Dev Plans Repo and Management Scripts

Plan Management Issues

Personal Dev Plans Repo

Workflow Artifacts

devplans.ts Automation Script

Progress Updates and Subtask Handoffs

AGENTS.md

Commands and Skills

Project Setup

Progress and Subtask Management

Task Tracking

Other Commands

Other Skills

Config Improvement Process

Potential Future Workflow Improvements

Final Thoughts

My Thoughts on AI, Part 1: Fears, Opinions, and Mental Journey

Introduction

Part 1: The Before Times

Programming Is Life

Clouds on the Horizon

Ominous Signs

Part 2: Fear, Doom, and Depression

The Pivot

The Wilderness

The Tidal Wave

P(doom)

Depression

Part 3: Reverse Engineering

Miami Sunshine

Flipping The Bit

Part 4: Taming the Beast

Architectural Research

Mind. Blown.

Part 5: Diving In

Ramping Up

Immer

Nerd-Sniped

Self-Reflection

Part 6: Liftoff

Trust Factor

Gearing Up

The Re-Pivot

Part 7: Warp Speed

Pulling It All Together

Pattern Matching

Shipping for Agents

Part 8: So What Do I Think Today, Anyway?

Most Of My Fears Are Still Valid, And So Are You

The Tiger Is Out

Gotta Go Fast

Oops It's Capitalism

Maintainability is the Mindset

But Non-Determinism?

Is This Actually Better?

What About the Craft?

Final Thoughts

`devplans.ts` Automation Script

`AGENTS.md`