Tuesday, September 2, 2025

LLMs to Blaze a Trail

As an engineering executive there are a few ideas and practices which I reinforce via repetition to the team, either explicitly at the start of a recurring meeting or implicitly by bringing it up whenever relevant. The first of these ideas is:

The product is not the code, not the features, not the designs.
    The product is that people can use the service for things that are important to them.
The business is not the code, not the features, not the designs.
    The business is that people can use the service for things that are valuable to them.


But the topic of this post is not that. The topic is the second thing I frequently reinforce via repetition:

Get something, anything, working end to end as quickly as you can.
Not even a minimum viable thing. Any thing.


It has been my experience that, as developers, we tend to focus in on one area of a system to explore its requirements and build it out sufficiently until we feel confident that we understand what else will need to be done before moving on to the next piece. This results in a system where the understanding and the plan for development is grown by accretion, each piece layered atop the previous which is left undisturbed by later developments. We might go back and harmonize all of them later... maybe.

It has also been my experience that everything starts progressing more quickly once the system does something, anything end-to-end.

  1. We gain perspective on how the whole system will work and apply it to everything we do subsequently.
  2. One can make a change and see it function all the way through. Enthusiasm improves productivity.
  3. It is far more effective when multiple people work on a system in parallel if they can all see the impacts of each other's work.

Thus:

Get something, anything, working end to end as quickly as you can.
Not even a minimum viable thing. Any thing.


LLMs to Blaze a Trail

With the maturing capabilities of LLM code generation, I tried an experiment with Claude Code. At Google one of the classes in orientation was to construct a web scraper. I asked Claude Code to build a scraper, but an even simpler one: scrape a metric.

In a new scraper directory, create a go program which will scrape a web page formatted
in prometheus metrics format, and extract a floating point value labeled "example"

Create an SQL schema for a timeseries, with columns for a timestamp and a floating point value.

Have scraper connect to a Postgres database and write each sample it collects to the database.

In a new webui/frontend directory, create a web page using React and typescript which will
poll a backend server for changes in a loop and display rows of timeseries data with timestamp,
sample name, and value.

In a new webui/backend directory, create a go program which will handle queries from
webui/frontend and fetch timeseries data from the postgres database.

A classic hacker stock photo in a darkened room sitting in front of a laptop wearing a hoodie and mask, except the person typing is a robot

It produced a small, functional implementation.

scraper123 linesGo
backend169 linesGo
frontend127 linesTypescript
 100 linesCSS
 43 linesHTML

A few interesting tidbits:

  • It produced no unit tests for the Go code. I didn't tell it to.
  • It did produce unit tests for the TypeScript code, even though I did not tell it to. I think this speaks well for the TypeScript community, the training data is infused wth testing as an expected practice.
  • I wish wish wish that Claude Code would automatically populate a .gitignore for node_modules. Not for the first time, I checked 437 Megabytes of code into git and had to rewrite the history to remove it.

 

Unit Tests

Having no tests at all sets a bad example. I don't actually want to encourage the construction of large system test suites at this stage of a project, as the effort to keep updating a large test as the system evolves is likely to outweigh the value of the test at this stage. Yet I do want to set the example by ensuring there is something.

In the scraper directory, keep the main() function in main.go but move the rest of the code
to a scrape.go file. Write tests for scrape.go with a local prometheus server and in-memory
database. Check that metrics are correctly stored in the database.

Claude Code generated 377 lines of test cases, including scraping one value and several values. Most of the code was to set up an in-memory database using sqlite and to run a local Prometheus server.

The cost of the first prompt to generate the system and the second prompt to add unit tests: 93 cents.


 

Non-trivial example

That example was pretty contrived. How about an example of a more realistic system which:

  1. Implements a protocol connecting to a legacy communications system.
  2. Implements a set of modern protocols connecting to current Internet communications infrastructure, to forward messages to and from the legacy protocol.
  3. Has a management layer watching all of the connections and can stop or restart them as needed.
  4. Has a dashboard and console showing the status and configuration of the system.

Can it produce this? Well... not exactly. I kindof cheated: this is the first thing I attempted, I made up the contrived system later.

The problem is that first step. Claude Code was not much help in producing the first piece, connecting to the legacy system. The tasks there were more like engineering archaeology:

  • Trying variations on the digest hash function until the remote system suddenly returned 200 OK.
  • Figuring out what portions of the the poorly documented header fields were actually implemented.
  • Diagnosing failures when the only indication we get is "Invalid" with no further information about what was invalid.

There just isn't any training data for this, and so trying to rapidly get to a functioning end-to-end system entirely via code generation didn't work. I was able to work on the management layer and the dashboard and so on while still debugging the first piece, but it only started working when that first piece was done.

Could I have set that first piece aside with a mockup, and worked on the rest? Probably, but it was just me not a team and the first piece was the biggest risk. I focussed on eliminating the risk.

In an engineering team, I thnk I would approach this with a small team whose job is to sketch out the overall system. It might be entirely senior engineers or at least led by a quite senior engineer, and tasked to identify and quantify risks and to plan out a system. That team could multiply its efforts using LLMs to help generate the morewell understood portions of the system.