Generative Coding and the Inverse Mechanical Turk

This spun off from a conversation about Generative AI code systems and about their impact more broadly, and if the usage of developers as a kind of inverse Mechanical Turk processors to validate code. The tl;dr of this is: I think generative coding systems accelerate your lowest expertise developers more than your senior developers, due to low expertise developers accepting solutions that wouldn't meet the quality bar for more senior developers. Fixing this would require relatively systemically re-evaluating the code review process and new approaches to making sure that the risks created by the novel ways that generative code intersects with implicit requirements are addressed.  

Using Developers as an Inverse Mechanical Turks

At the moment, we're seeing big steps forward in the ability of LLMs to generate functional code - I've seen it with my friend's projects, I've experimented a little bit with it myself, although the actual effectiveness / productivity increase measurements are messy at best. The systems can't be entirely closed loop, they require a human in the loop to validate the code is as functional as it claims. 

Functional in this case means "it solves the specific problem that the user had at the time". I had a friend rip out some CANBUS functionality for datalogging on a motorcycle, and it did the job with a little prompting and saved a few hours of their time. This is generally the bit of core possibility that underpins the hype - it's very compelling to see someone solve a problem they have quickly based on a set of generalized statements.

As we see this scale and adopt, though, the majority of the success stories that I hear on this are from high expertise folks who say "Well, now my job becomes code review, which is great - I can think in the generalized abstractions that I want to have, instead of having to spend a bunch of time writing code I don't really want to care about". This is great if you have the pre-existing context to define an appropriate Accepted Scope. 

Accepted Scope

 Accepted Scope is how the developer is balancing the explicit requirements versus the implicit requirements of a given piece of development work. Explicit requirements are "this code must move data from A to B", implicit requirements are "the code meets security, style, reusability, reliability, or other characteristics that besides the movement from A to B". Having the data move? Explicit requirement. Having it logged in a usable and meaningful fashion? Implicit requirement. 

In the best environments, there is a high quality dialog between the people who are accountable for prioritization and the developers to balance the accepted scope of each project they're on. They might do some fast work to hit a particular milestone, and then refocus on the next round on making sure that the lack of meeting certain implicit requirements is addressed to avoid downstream technical debt leading to feature / product slowdown, etc. 

In the most ideal world, they're looking at the critical implicit requirements, and building frameworks or infrastructure or tooling to make sure they're easy to meet without having to spend a lot of extra energy, creating systems where the secure / appropriately logged / well structured way to do things is the easy way.  

Side note: This was one of the best parts about Google's infrastructure - when you used the Google tooling, you had the advantage of a bunch of patterns that were safe / secure / made sure you did hygiene properly, as well as a number of other controls in place. It had the downside of often not explaining why you couldn't do the thing that felt obvious, and that problem, ironically, could have been probably usefully solved by LLMs. 

Generative Code & Accepted Scope

When you use LLMs to generate code, the implicit requirements move from being defined by your company culture and the previous experience of your existing devs, as a decision that can be introspected, trained, and changed, to the average of the implicit requirements of the training dataset. A good developer will note this and correct it when reviewing code, based on their experience with the codebase they're working in. 

There's two major concerns in this. The first is that developers aren't generally trained on this as a skillset - the assumptions for reviewing human code and machine code are very different. The second is that doing this, and doing it well, is going to slow down the delivery of the code, while improving quality. 

But more broadly, your juniors will work faster because they will not have context on implicit requirements, pushing progressively more risk into your organization faster. Critically, this also pushes more work on to your senior staff, because they're the only ones who would have the context to perform reviews at that level of depth. Additionally, it's extremely hard to predict what the implicit requirements met or not met by any specific piece of code will be, because the generative code has no ability to understand what code quality means in your specific environment. 

The Outcomes

This becomes a business scale problem: You have a code review pipeline that is designed to handle human code review problems at a certain scale, and you're pushing into that system a new coding paradigm that has a completely different set of expectations and failure states. You have also changed the scale of the problem: Your developers will now move faster the less they care about implicit requirements. Senior developers might care about this problem, but junior developers won't even really be able to contextualize the problem without significant mentorship and support from senior folks to explain why this code that solves the problem is better than this other code that also solves the problem. 

Massively increasing code review load and scope simultaneously is a good way to increase the number of bugs in code.  

Some Common Arguments / Rebuttals

"Just prompt the implicit requirements into the models!"

  • It's not possible to define the implicit requirements reliably, not only because "define secure code" is an exercise in shifting context, but also because you can and should vary your implicit requirements. 
  • This also assumes you can make the models generate code consistently and reliably in the same fashion based on a given prompt, which is contrary to the design of a system that is supposed to generate novel code. 

"Humans make mistakes too!"

  • A human who makes the same mistake is likely to make them in a reasonably consistent way, so you can add a technical check, linting, or automated scanning for those specific problems. An AI is likely to make the mistakes in a novel way each time, because that's how they work, so it's much harder to define the behavior patterns either on the positive or negative side, because you have to account for the sum totality of the ways human write code, vs the specific contexts of your developers. 

Posted

AI and The Eternal Twilight of Code Freeze

This was mostly written as a result of this blog post: https://fly.io/blog/youre-all-nuts/

Some Initial Preamble 

I like technology, and I mostly like technology because of the impact it has on humans.  I believe the purpose of a system is what it does, and the drive to understand the impact of technology as it changes and grows on people, companies, and society, has been the root of most of the things I've enjoyed working on the most in my career. 

I don't write much code these days, most of the work I get paid for these days is in getting people to build the abstractions that drive what code gets written, in order to make sure that amorphous vision thing becomes something resembling "strategy", which is then further decomposed into things that people do.  

Caveat: there are a lot of words I could write on the broader scope of AI and society, but this is mostly about AI for software development and somewhat about knowledge-centric work directly. 

Where I'm at on AI 

The first thing is "AI" is like saying "tech" in the late 90s or early 200s, where it was constantly retconned on an ongoing basis to describe the capabilities as they are now, vs what they were when you used them last, regardless of if you have that capability available or not. We'll touch on this sort of "immaculate arrival" again later.  

The second thing is that the capabilities that are roughly lumped under "AI" are in many cases actually quite useful. What I have discovered is they're the most useful when you can hold them in a way that respects the sharp edges of a system, same as any other system. Lead may make your wine taste sweet but it also has downstream consequences. 

What we are appearing to see now is a shift where AI can credibly generate useful code at a rate where it can accelerate a developer's operations. In the context of code, used as Thomas describes in his blog post, it seems like it's something in the level of an order of magnitude efficiency improvements, if you have optimized your skillset to review code instead of write it. 

I'm mostly concerned about the broader implications of AI usage, in the same way I'm not actually concerned about the specific code that an individual person writes. I'm most interested in the outcomes of that code and how that is expressed against the strategy of a business. The best code isn't "elegant", it's easily parseable to any audience, regardless of context. I want straightforward, simple tools that are obvious and powerful in how they are applied.

The end state that I see from increased adoption of AI is basically companies accelerate towards a state that I tend to call "The Eternal Twilight of Code Freeze". We're in the early stages of that right now as an industry, but this feels like the method by which this problem calcifies massively.

The Eternal Twilight of Code Freeze 

The Eternal Twilight of Code Freeze occurs because conflicting or missing assumptions of systems reaches the tipping point into operational stasis, because no one can implement a change without clearly associated, catastrophic unintended consequences.

One of the more common patterns for this is duplication of similar service functionality in companies, but without the benefit of the lessons learned that are unique to that company in implementation. 

This problem is very abstract until the consequences become real, and they usually become real in a way that is extremely high risk to solve. 

A good filter to apply to this problem space is the question of a "source of truth". Can you reasonably abstract storage of a critical piece of information to one responsible person, technical stack, system, or service? Is that abstraction that you have chosen for a source of truth useful when your teams make technical decisions?

An example of this would be if you have a single database that tracks location and location related assumptions. You might store billing address, customer location, and IP. Each of these pieces of information is related to a user account, or perhaps a session. Can you reason about how you should use that piece of information? Does the reasoning your technical teams do related to a piece of location information scale appropriately to how much you've invested in the technical infrastructure for that storage of data? Can you reconcile conflict in what each piece of information implies? Did you pull that information from a trustworthy source? Who controls it?

This becomes impactful to actual service delivery because a statement like "I would like to update the user's address" goes from having a direct owner who is responsible for reconciling service dependencies of a single data field, to a complex and undefined ownership question scattered across separate teams who many not understand that they have become de-facto owners for a piece of technical infrastructure that supports the concept of an address.  Do you have a method to reason about how the changes to that information echo through your product? Does anyone at your company have a reason to care about that problem space?

This occurs pretty organically in technical systems, and usually ends up falling under the category of "technical debt", due to a delta between both explicit and implicit requirements, and implementation. 

How AI intersects with this 

The core risk that I see with AI within the current idea of "agentic implementation" may be completely fine for Thomas, who I believe undersells his own skills - he has the critical evaluation skills built from a career of coding work to actually spend the majority of his time doing the architectural evaluation, which appears to be the part of the work he enjoys - making the pieces fit together.

But that skillset of critical evaluation of "how the pieces fit together" exists primarily in a space where he can conceptualize responsibility for it as him being the primary owner for the totality of the project space. If you have 30 or 300 developers at his level doing that work, then they get to take 50% of that time to debate about what abstraction is most appropriate, and you can probably build something that's pretty effective. 

But if you don't have a set of entirely senior employees, then what you now is have no coherent architectural design control systems, at a massive rate of change.  

Critically, you don't fix this by adding code, you fix this by refactoring code, deciding on logical compromises between technical reality and product design, and collapsing context where needed into order to make sure the technical reality and the product functionality don't drift to the point that the assumptions that your product operates on live entirely in the heads of your users with no technical structure behind them. 

At some point, this becomes a pure scaling math problem. Let's take the AI optimist view of this and say that it's a 10x increase in LoC development, setting aside that no one who takes software development seriously thinks that LoC is a serious metric to measure productivity. Where is the corresponding architectural evaluation process that assesses that 10x increase in volume against making sure that the system you have built is achieving product goals? 

Furthermore, what happens when you do need to develop novel functionality, or when you need to revise your architecture? How do you tell the system that you need to refactor to take into account that the product has pivoted? Surely senior technical leadership intends that the architectural decisions that they make are meaningful, but how do you make those changes meaningful when the semantic weight of the codebase pushes all changes towards your historical patterns?

For people who do not have that experience, the acceleration of AI development means there is basically zero time for that feedback loop to occur and knowledge to build. We've already got a relatively major crisis with no one wanting to hire Junior people, and this is going to make it significantly worse. We see the reality of this now with the education system, where students get 100% on the homework and fail the tests because they don't understand the concepts outside of plugging in the text to the LLM.

Concerns about the Future 

I think there is a world where we could use generative capabilities to rapidly accelerate operational work and counterbalance that with design work to better systems, but I also do not think that most executive leadership understands how dramatically different their approach needs to be in order to make this happen. 

Instead, what I predict will happen is that some high skill developers will rapidly accelerate their code output, and then they will get either crushed under the weight of their self generated tech debt as companies continue to lay off employees, or they become accountable for the meta-processing overhead of the work expectations of the ownership distributed amongst those laid off people at a 10x pace of development, and they eventually crumble under that overhead.

Those negative externalities will get pushed to customers who may or may not be able to do anything about it. There is a chance for exceptional companies to avoid this, by being much more cautious about the adoption of AI systems, and by recognizing that many companies are going to end up failing to be able to execute on basic tasks as AI development means they end up accelerating themselves into the Eternal Twilight of Code Freeze that much faster. 

The problem never has been velocity, the problem has always been the direction that you're going. You can spend a lot of effort jumping up and down rather than moving forward, and if you sample at the right (or wrong) times you see a lot of movement, but your location doesn't actually change. I think for those who can hold the complete context of the system they're developing in their head, new generative AI capabilities is amazing. I think it's going to make business technical infrastructure so much more unimaginably complex it might actually cause them to become unprofitable and non-functional. However, if everything backslides at roughly the same rate, customers have no choice but to tolerate it due to the levels of regulatory capture that exist, so consumers of business services have no choice but to deal with the degraded service quality. It's gonna mean people who have the ability to effectively introspect these jobs will probably have work for forever, I guess, but also, good luck evaluating for these things vs finding convincing charlatans.  

So we're back to the immaculate arrival - don't look too closely, these things are all problems for the future, right now, the AI future is here, and if it's not here now, it'll be here tomorrow! Just gotta keep believing!

Posted

The Mindset of a Long Term Commuter


The commuting mindset is the most conservative, risk averse, and safe of the mindsets I use to approach riding. There's a couple of other ones (spirited street, track, offroad, and supermoto all come to mind), but this one is the broadest and the most useful and accessible to other riders, so I'm starting here. 

But before we get into the details, I think it's important to discuss some of the foundational components of safe riding. The baseline for looking at hazards in MSF that stuck with me was "Search, Evaluate, Execute" (SEE) as the approach to how you look at common situations to identify and react to hazards. There's also the concept of maintaining a "space cushion" that showed up in the CA drivers handbook. These things are both good foundational skills - look for upcoming hazards, decide on a plan of action, and then perform that plan of action, and make sure you have space for emergency maneuvers. These techniques are designed to give you time to identify an incoming situation, and rely on your vision and evaluation skills of the current state of things to decide on a plan of action. This covers the foundational mental component of safe riding - identify hazards as they are presented to you, and avoid them. Most riders will start to evolve on this organically, where they look at things that have the potential to be hazards and react to them preemptively. If you haven't started doing this yet, it's a great starting point for safe riding (or driving, or bicycling). It's also worth noting that any emotional reaction from someone else doing something is likely an outcome of not adequately predicting and internalizing that thing. I don't get angry when people merge into me anymore because I've already accepted that possibility and done my best to counter it actually impacting me in any way. In general, if you find yourself having strong emotional reactions to other drivers on the road, it's a good time to reflect on why you're actually reacting to that thing - is it because you failed to adequately predict the behavior? 

There's also the component of using the controls to avoid road hazards or misbehaving cars. Many people spend a lot of time (as they should!) honing their skillsets for those reaction moments, be it hard application of the brakes, judicious use of the throttle, or a quick swerve to get around an obstacle. People also seek out things like dirt riding, trials, supermoto, etc, to build skillsets when the bike starts to move past the limits of traction so they can handle those situations more confidently. All of these things are very good ideas, and I have spent significant amounts of working on those skills. However, for my daily commute, I consider needing any of these skills to be a failure in approaching the commuting environment appropriately. As an analogy, I wear full gear nearly always on my commute, but I haven't needed it in the last few years to protect me from a crash, and I consider having to use my riding skills during my commute similar to having to use my gear in a crash - any significant usage of the performance of my motorcycle is exposing me to risk of screwing it up, losing traction, and crashing. As the old saw goes: Superior riders use superior judgment to avoid situations that require superior skill. 

When I first started thinking like this, I considered any time I had to use my "track skills" on the street to be a situation I needed to reflect on and evaluate. Now, I consider any time that someone else forces me to use the brake, throttle, or change the trajectory of my bike a failure. There are a few unavoidable situations that pop up every couple of months, generally around no look lane changes during lane sharing, but now they generally happen at low enough speeds that even having to execute the swerve after a long, tiring day at work is not a big deal. With all of that foundation laid out, the commute mindset breaks down to a very simple thing:
How do I get to work each day using as little of my riding skill as possible?

The mental side of the game is wide open - use and stretch those mental muscles! But the needed riding skills to get to work should be no more than gentle countersteering, throttle, and brake application, and the associated shifting. The staples I use to prevent having to do more than that are as follows: 

  • Lane position
  • Making sure that the bike is pointed in a direction of clear road as much as possible (either towards the split, towards the shoulder, or towards a blank spot in traffic)
  • Trying to consistently break free of groups of cars and exist in the space between groups of cars on the freeway
  • In the split, moving slow enough that I don't have to perform massive swerves to stop avoid unexpected lane changes

I'll lay out a few situations here to hopefully illustrate:
If I'm riding in traffic, I will attempt to position myself so that lane changes, sudden applications of the brakes or gas by the cars around me won't cause an accident, even if I do nothing.

If I'm getting ready to change lanes, and there is a car two lanes over, I will accelerate or brake slightly so that if that car changes lanes at the same time, we won't end up colliding. 

If I'm in traffic, and traffic is slowing, I will gently point the bike towards the split as soon as I'm reasonably sure I will be sharing. That way if the car in front of me slams on the brakes, I'm still likely to make it into the split without an aggressive swerve needed to avoid hitting them. If they start to change lanes when I enter the split, I'm already in position to move with them into the next lane over and split the 2/3 lanes instead of the 1/2 lanes. 

If I'm in the carpool lane and traffic is moving at 50mph, and the next lane over is completely stopped, my reaction is to ride on the fog line, very nearly on the shoulder. This way, if a car pulls out from the stopped traffic into the car pool lane, it requires no movement and no skill from me to avoid the accident in the vast majority of cases. If they are pulling on to the shoulder, I have additional space and time to consider my options and swerve gently on to the shoulder with them or swerve behind their vehicle, depending on the situation. Swerving gently on to the shoulder with the car requires minimal skill to perform as it's only a movement of a few feet to the side, whereas swerving behind them will likely require significant skill to execute and put me on a bad path into stopped traffic. 

Generally, looking at every situation as "how can I position myself so I need the the minimum amount of effort and skill to navigate this situation" is going to pretty clearly give you the lowest risk option, and you should train yourself to always go for that option as far in advance of the situation actually developing as possible. Don't let your speedy commute ambitions outweigh your talent!

The caveat: It's possible to switch mindsets multiple times in a single ride. Maintaining the commute mindset for the majority of my ride helps me establish a reasonably safe baseline, but I definitely will swap to the spirited riding mindset when I get a good run at a clear interchange!

Posted

Motorcycle Risk Mitigations

I've been meaning to make this post for some time, and finally have the time to sit down and do it. It's going to get into much more of the theory around safe riding as opposed to the hands on practical skills. That said, I think there's stuff of value here for riders of all skill levels. This model is a simplification of reality in pursuit of a way to better help folks avoid getting in accidents, and to hopefully help the conversation along when it comes to discussing what makes more experienced riders safer vs. newer riders. 

Risk
In this post, I'm using the word risk to refer to our chances of ending up on the ground. Easiest to conceptualize as "If I were put into a bad situation 100 times, how many times would I make it out upright?". It's worth noting that risk can never be taken to zero except by not riding a motorcycle. It's also worth noting that someone can ride in a very risky fashion and be lucky for an extended period, and not crash. However, regardless of your skill level, you ride in a high risk fashion for long enough and it will eventually catch up to you.

I generally categorize risks into two categories: Environmental and individual. Environmental risks are the broader risks that are present in the world out there, everything from pavement quality, the weather, the density of traffic, the chances of road debris, etc. These are the things that determine my "baseline pace".  If I'm on a back road in the middle of nowhere with no cross streets, on a sunny day, with good pavement, and I haven't seen a car in 20 minutes, my pace will likely be higher as the environmental risk is relatively low. If it's raining, I'm in SF with city traffic, pedestrians, etc, my pace is going to be much slower. The amount of environmental risk determines my baseline pace. If you're riding at a pace where nothing that you come across surprises you, chances are you have established an appropriate baseline pace. If you're riding at a pace where a completely unexpected hazard comes up, you should re-evaluate if that hazard was completely unexpected or if your pace was too fast to allow you to identify that risk. 

Individual risks are individual, tangible things that are increasing risk right now. That can be a pothole, gravel, bonzai pedestrians, an erratic car, and other things that directly increase my risk by potentially knocking me off my bike.  Individual risks are also things that contribute to higher risk by robbing me of situational awareness and visibility or otherwise reducing my chances of identifying a hazard. Individual risk is a thing that determines if I'm going to be going faster or slower than my baseline pace on a per situation basis.  If I've got good sight lines, excellent situational awareness and visibility, I might go significantly faster than my baseline pace through a corner, despite high environmental risk. If I can see a couple of potholes on approach to a blind corner, I'm going to be going significantly slower than my baseline pace. 

There is also a dedicated skill of identification of risks in the environment. I'm going to just leave that out for the moment, as that's a complete conversation unto itself, and I want to focus this post more on risks and mitigations than I do identification. 

Risk Mitigations
So now we've got a baseline for our mental risk model - environmental risk as a broad, large scale thing, and individual risk as situations that expose us to additional risk. With that baseline, we can move on to the important thing: Risk mitigation. 

The first form of risk mitigation is "reactive mitigation".  Reactive mitigations are when you have to do something or you will have an accident. It's the additional lean angle to avoid running off the road, it's panic braking to avoid colliding with a car, or swerving into the gap to avoid a car doing a sudden lane change while you're splitting. This is the last ditch effort that relies on reaction time and individual skill and motorcycle performance to prevent an accident.

The next way you avoid risk is by what I will call "predictive mitigation".  Predictive mitigation is when you have identified the situation in advance, and you have taken the appropriate action to reduce the chances of that situation requiring a reactive mitigation or causing an accident. These sort of reactions are things like pre-emptively hugging the white line on a blind right hander in case a car comes around the corner, slowing when you see gravel signs, slowing when the vanishing point is closing down, and other actions you take in anticipation of risks, as opposed in reaction to them. 

The final forms form of avoiding risk is to not ride at all. This is strong risk avoidance - choosing not to ride because you are sick, tired, or under the influence. It's an important tool when you don't feel like you are capable or your bike isn't in a functional, safe state. For some people, this is the only acceptable way to deal with the risk of motorcycling, and as a result they do not ride at all. For me, it's a decision I make when I'm not confident I can effectively avoid risks by predictive mitigation. 

Putting It All Together
The reason that I categorize risks this way is because it simplifies safe, quick, and fun riding. Thinking through the environmental risk before I even swing a leg over the bike helps me get my baseline pace right. Once I'm riding, that baseline pace gets modified based on the encounters with individual risks I have.  If I'm on a long ride and encountering minimal individual risks, chances are good I can turn up the pace a little bit and be just fine. If I'm on a ride and encountering many individual risks, I can back the pace down to reduce the chances of any of those individual risks knocking me off the bike.

Furthermore, as a new rider progresses in experience, their risk mitigations should be moving from more reactive mitigations to more predictive mitigations. Reactive mitigations are easy to mess up, rely on the performance and skill of the rider to be performed successfully, and if you fail to perform a reactive mitigation, you are highly likely to crash. Predictive mitigations are low risk, low skill maneuvers that are taken in advance of a hazard becoming highly risky. Obviously, this relies on predictive thinking and anticipation of situations in advance, such as predicting the conditions that are going to show up around a corner, or the actions of a car in front of you, and using techniques that allow you to maximize your space cushion and awareness.

The goal is to put you and your motorcycle in such a position on the road that you have pre-emptively taken any needed actions to avoid an accident. Performing predictive mitigations is a continual process that will be constantly changing as you move through traffic or up a road, as you take into account new information, discard old information, and identify new hazards. It's a very good idea to think through your history of reactive risk mitigations, and consider how those could have been pre-emptive mitigations instead.

The other thing that I would strongly encourage riders to consider is to look at each encounter they have not on an individual basis, but as an average for the number of miles they ride, and the number of years they expect to ride. I expect to be riding for at least another 40 years, so that means that if I have a single high skill required reactive event while commuting every 2 months, I will have to successfully navigate 240 high skill events over the next 40 years to avoid crashing while commuting alone. That isn't a particularly acceptable risk level for me, so I have adjusted my riding, speed, and approach to lane splitting and traffic to keep the number of reactive events down. As it is, I've managed to successfully move the vast majority of my risk mitigations from reactive to predictive, and as a result, have a much lower chance of being involved in an accident.

I hope my perspective helps other riders have a new tool for approaching the risks of riding a motorcycle safely - as always, remember, all models are wrong but some models are useful, and I've found this one very useful over the years. I hope others find it useful as well.

Posted