Coding Relic: September 2017

Sunday, September 24, 2017

There is No Feminist Cabal

From the 23-Sep-2017 New York Times:

One of those who said there had been a change is James Altizer, an engineer at the chip maker Nvidia. Mr. Altizer, 52, said he had realized a few years ago that feminists in Silicon Valley had formed a cabal whose goal was to subjugate men. At the time, he said, he was one of the few with that view.

Now Mr. Altizer said he was less alone. "There’s quite a few people going through that in Silicon Valley right now," he said. "It’s exploding. It’s mostly young men, younger than me."

I want to share some experiences, as another white male in Tech for a similar number of years as Mr. Altizer.

A while ago I made a conscious effort to follow more women in Tech on Twitter, to deliberately maintain a ratio of ~50% in those I follow. I wanted to try for more perspective than that provided by my own vantage point in the industry, where the gender ratio is definitely not 50%. It has been illuminating... and often painful. Intermixed with happy and proud events in life and work is the constant level of sexism which women experience. Sometimes it is blatant and vile: intimidation, physical threats. More often it is a grinding, ever present disrespect from men. It is so commonplace that it becomes completely expected, often mentioned in an offhand way. This doesn’t mean it is a minor thing, it means that it never stops, isn’t possible to avoid, and ceases to be surprising.

You likely won’t hear this in person, from women you work with. That doesn’t mean women you work with are experiencing something different. It doesn’t mean that their career and work environment are free of sexism and discrimination. It means that talking about it in person is asking them to relive those events, sometimes extraordinarily painful events. It means it is vastly more difficult to relate horrible experiences in person, in conversation. It is understandable to not want to talk about it.

Yet one thing I haven’t seen, not even a hint of, is the existence of a powerful group of women who are organizing to oppress men. I’ve seen no evidence of any kind of backlash against men in Tech. Jokes about a Cabal started circulating after the NYT story was published. This was irony, not confirmation.

We’re hearing more about sexism in Tech, far more than we did even a year ago. I think, I hope, that is because we are in the early stages of the extinction burst. When a behavior which was formerly rewarded no longer is, that behavior will begin to decline... except for a final gasp, a final burst, in trying to turn back the clock. The process of acknowledging the disparities in Tech has been ongoing for many years, slowly. It has reached a point where the industry is starting to respond, if only a little. That the response may grow stronger will feel like a threat, like a backlash against males. It really isn't. It is about disparity, and doing something to rectify that long-present disparity.

I’m posting this because it is unfair to expect people in disadvantaged groups to carry the entire burden of correcting the disadvantage. In Tech the advantaged group is males, and as I happen to be a male in Tech, that means me. I have not been nearly enough a part of the solution. That needs to change.

Males in Tech have almost certainly witnessed aggressions: a woman being spoken over, or not being invited to a meeting she should be, or not receiving sufficient credit for her work. One thing learned from following women in tech is that no matter how much we think we understand, the aggressions they go through happen orders of magnitude more frequently than we think. For every occurrence we know of there are ten more, a hundred more, a thousand more, which we don’t see and don't grasp the frequency of because we are not female.

We, males in Tech, need to speak out. We need to speak out frequently and firmly. So I am. My voice isn’t powerful, but power can be achieved in numbers too. I’m adding my voice to the multitude. You should too; not just as a blog post, speak out when you see the grinding aggression happening.

Thursday, September 21, 2017

Software Engineering Maxims which May or May Not Be True

This is a series of Software Engineering Maxims Which May or May Not Be True, developed over the last few years of working at Google. Your mileage may vary. Use only as directed. Past performance is not a predictor of future results. Etc.

Maxim #1: Small teams are bigger than large teams

In my mind, the ideal size for a software team is seven engineers. It does not have to be exactly seven: six is fine, eight is fine, but the further the team gets from the ideal the harder it is to get things done. Three people isn’t enough and limits impact, fourteen is too many to effectively coordinate and communicate amongst.

Organizing larger projects becomes an exercise in modularizing the system to allow teams of about seven people to own the delivery of a well-defined piece of the overall product. The largest parts of the system will end up with clusters of teams working on different aspects of the system.

Maxim #2: Enthusiasm improves productivity.

By far the best way to improve engineering productivity is to have people working on something which they are genuinely enthused about. It is beneficial in many ways:

the quality of the product will benefit from the care and attention
people don’t let themselves get blocked by something else when they are eager to see the end result
they’ll come up with ways to make the product even better, by way of their own resourcefulness
people are simply happier, which has numerous benefits in morale and working environment.

There are usually way more tasks on the project wish list than can realistically be delivered. Some of those tasks will be more important than others, but it is rarely the case that there is a strict priority order in the task list. More often we have broad buckets of tasks:

crucial, can’t have the product without it
nice to have
won’t do yet: too much crazy, or get to it eventually, or something

The crucial tasks have to be done, even the ones which no-one particularly wants to do.

In my mind, when selecting from the (lengthy) list of nice-to-have tasks, the enthusiasm of the engineering team should be a factor in the choices. The team will deliver more if they can work on things they are genuinely interested in doing.

Maxim #3: Project plans should have top-down and bottom-up elements

It is possible for a team to work entirely from a task list, where Product Management and the Leadership add stuff and the team marks things off as they are completed. This is not a great way to work, but it is possible.

It is better if the team spend some fraction of their time on tasks which germinated from within the team - not merely 20% time, a portion of regular work should be on tasks which the team itself came up with.

The team tends to generate immediately practical ideas, things which build upon the product as it exists today and provide useful extensions.
It is good for morale.
It is good for careers. Showing initiative and technical leadership is good for a software engineer.

Maxim #4: Bricking the fleet is bad for business

Activities with a risk of irreparable consequences deserve more care. This sounds obvious, like something which no-one would ever disagree with, but in the day-to-day engineering work those tasks won’t look like something which require that extra level of care. Instead they will look like something which has been running for years and never failed, something which fades into the background and can be safely ignored because it is so reliable.

Calls to add this risk will not be phrased as "be cavalier about something which can ruin us." It will be phrased as increasing velocity, or lowering cost, or not being stuck in doing things the old way - all of which might be true, it just needs more care and attention in changing it.

Maxim #5: There is an ideal rate of breakage: no more, no less

Breaking things too often is a sign of trying to do too much too quickly, and either excessively dividing attention or not allowing time for proper care to be taken.

Not breaking things often enough is a sign of the opposite problem: not pushing hard enough.

I’m sure it is theoretically possible for a team to move at an absolutely optimal speed such that they maximize their results without ever breaking anything, but I’ve no idea how to achieve it. The next best thing is to strive for just the right amount of breakage: not too much, not too little.

Maxim #6: It’s a marathon, not a sprint

"Launch and iterate" is a catchy phrase, but often turns into excuses to launch something sub-par and then never getting around to improving it.

Yet there is a real advantage to being in a market for the long term, launching a product and improving it. Customer happiness is earned over time, not all at once with a big launch.

This means structuring teams for sustained effort, not big product pushes.
It means triaging bugs: not everything will get fixed right away, but should at least be looked at to assess relative priority.
It means really thinking about how to support the product in the field.
It means not running projects in a way which burn people out.

Maxim #7: The service is the product

The product is not the code. The product is not the specific feature we launched last week, nor the big thing we’re working on launching next week.

The product is the Service. The Product which customers care about is that they can get on the Internet and do what they need to do, that they can turn on the TV and have it work, that they can make phone calls, whatever it is they set out to do.

Maxim #8: Money is not the only motivator

A monetary bonus is one tool available for managers to reward good work. It is not the only tool, and is not necessarily the best tool for all situations.

For example, to encourage SWEs to write automated system tests we created the Yakthulhu of Testing. It is a tiny Cthulhu made from the hair of shaved yaks (*). A Yakthulhu can be obtained by checking in one’s first automated test to run against the product.

(*) It really is made from yak hair. Yak hair yarn is a thing which one can buy. Disappointingly though, they do not shave the yaks. They comb the yaks.

Maxim #9: Evolve systems as a series of incremental changes

There is substantial value in code which has seen action in the field. It contains a series of small and large decisions, fixes, and responses which made the system better over time. Generally these decisions are not recorded as a list of lessons learned to be applied to a rewrite or to the next system.

Whenever possible, systems should evolve as a series of incremental changes to take it from where it is to where we want it to be. Doing this incrementally has several advantages:

benefits are delivered to customers much earlier, as the earliest pieces to be completed don’t have to wait for the later pieces before deployment.
there is no stagnant period in the field after work on the old system is stopped but before the new system is ready.
once the system is close enough to where we want it to be that other stuff moves higher on the list of priorities, we can stop. We don’t have to push on to finish rewriting all of it.

Maxim #10: Risk is multiplicative

There is a school of thought that when there are multiple large projects going on, and there is some relation between them, that they should be tied together and made dependent upon each other. The arguments for doing so are often:

"We’re going to pay careful attention to those projects, making them one project means we’ll be able to track them more effectively."
"There was going to be duplication of effort, we can share implementation of the common systems."
"We can better manage the team if we have more people available to be redirected to the pieces which need more help."

The trouble with this is that it glosses over the fundamental decision being made: nothing can ship until all of it ships. Combining risks makes a single, bigger risk out of the multiple smaller risks.

Maxim #11: Don’t shoot the monitoring

There is a peculiar dynamic when systems contain a mix of modules with very good monitoring along with modules with very poor monitoring; the modules with good monitoring report all of the errors.

The peculiarity becomes damaging if the result is to have all of the bugs filed against the components with good monitoring. It makes it look like those modules are full of bugs, when the reality is likely the opposite.

Maxim #12: No postmortem prior to mortem

There are going to be emergencies. It happens, despite our best efforts to anticipate risks. When it happens, we go into damage control mode to resolve it.

People not involved in handling the emergency will begin to ask about a postmortem almost immediately, even before the problem is resolved. It is important to not begin writing the postmortem until the problem has been mitigated. Doing so turns a unified crisis response into a hotbed of fingerpointing and intrigue. Even in a culture of blameless postmortems, it is difficult to avoid the harmful effects of the hints of blame while writing that blameless postmortem.

It is fine, even crucial, to save information for later drafting of the postmortem. IRC logs, lists of bugs/CLs/etc, will all be needed eventually. Just don’t start a draft of a postmortem while still antemortem.

Maxim #13: Cadence trumps mechanism

We tend to focus a lot on mechanisms in software engineering as a way to increase velocity or productivity. We reduce the friction of releasing software, or we automate something, and we expect that this will result in more of the activity which we want to optimize.

Inertia is a powerful thing. A product at rest will tend to stay at rest, a product in motion will tend to stay in motion. The best way to release a bunch of software is to release a bunch of software, by setting a cadence and sticking to it. People get used to a cadence and it becomes self-reinforcing. Making something easier may or may not result in better velocity, making it more regular almost always does.

Maxim #14: Churn wastes more than time

Project plans change. It happens.

When plans change too often, or when a crucial plan is suddenly cancelled and destaffed, we burn more than just the time which was spent on the old plan. We burn confidence in the next plan. People don’t commit as readily and don’t put their best effort into it until they’re pretty sure the new plan will stick.

In the worst case, this becomes self-reinforcing. New plans fail because of the lack of effort engendered by the failure of previous plans.

Maxim #15: Sometimes the hard way is the right way

For any given task, there is often some person somewhere who has done it before and knows exactly what to do and how to do it. For things which are likely to be done once in a project and never repeated, relying on that person (either to do it or to provide exactly what to do step by step) can significantly speed things up.

For things which are likely to be repeated, or refined, or iterated upon, it can be better to not rely on that one expert. Learning by doing results in a much deeper understanding than just following directions. For an area which is core to the product or will be extended upon repeatedly, the deeper understanding is valuable, and is worth acquiring even if it takes longer.

Maxim #16: Spreading knowledge makes it thicker

Pigeonholing is rampant in software engineering, engineers who have become experts in a particular area and always end up being assigned tasks in that area.

There are occasions where that is optimal, where becoming a subject matter expert takes substantial time and effort, but these situations are rare. In most cases it is not the expense of becoming an expert that keeps an engineer doing similar work over and over, it is just complacency.

Areas of the product where the team needs to continue to expend effort over a long time period should move around to different members of the team. Multiple people familiar with an area will reinforce each other. Additionally, teaching the next person is a very effective way to get a better understanding for oneself.

Maxim #17: Software Managers must code

When one first transitions from being an individual contributor software engineer to being a manager, there is a decision to be made: whether to stop doing work as an individual contributor and focus entirely on the new role in guiding the team, or to keep doing IC work as well as management.

There are definitely incentives to focus entirely on management: one can have a much bigger impact via effective use of a team than by one’s own effort alone. When a new manager makes that choice, they get a couple of really good years. They have more time to plan, more time to strategize, and the team carries it all out.

The danger in this path comes later: one forgets how hard things really are. One forgets how long things take. The plans and strategies become less effective because they no longer reflect reality.

Software managers need to continue do engineering work, at least a little, to stay grounded in reality.

Maxim #18: Manage without a net

Managers and Tech Leads cannot depend on escalation. We sometimes believe that the layers of management above us exist in order to handle things which the lower layers are having trouble with. In reality, those upper layers have their own goals and priorities, and they generally do not include handling things bubbling up from below.

Do not rely on Deus Ex Magisterio from above, organizations do not work that way.

Maxim #19: Goodwill can be spent. Spend wisely.

Doing good work accumulates goodwill. It is helpful to have a pile of goodwill, it tends to make interactions smoother and generally makes things easier.

Nonetheless, it is possible to spend goodwill on something important: to redirect a path, to right a wrong, etc. Sometimes spending goodwill is the right thing to do. Don’t spend it frivolously.

Maxim #20: Everyone finds their own experience most compelling

"We should do A. I did that on my last project, and it was great."

"No, we should do B. I did that on my last project, and it was great."

Comparing experiences rarely builds consensus, everyone believes their own experiences to be the most convincing. Comparing experiences really only works when there is a power imbalance, when the person advocating A or B also happens to be a Decider.

In most cases, simply being aware of this phenomena is sufficient to avoid damaging disagreements. The team needs to find other ways to pick their path forward, such as shared experiences or quantitative measurements, not just firmly held belief.

Friday, September 15, 2017

On CPE Cost

When it comes to the cost of hardware, volume matters more than anything else. To large extent, volume matters more than everything else put together. A cost efficient hardware design produced in low volume will be considerably more expensive than an inefficient and sloppy design produced in high volume. Plus, for a high volume product, the Contract Manufacturer will have engineering teams to help tighten the design for a moderate fee.

If your own sales volume is sufficient to get deep volume discounting, you can stop reading now (more honestly, you aren't reading this in the first place). Otherwise, if you are building a product for a new market or you are building for a niche, read on.

What does this mean? It means you should work very, very hard to use hardware which is produced in high volume. The compromises you would make in terms of RAM or other capabilities in order to get your own custom design down to a price you can tolerate will cost you far more than you saved in terms of updating the software and capabilities throughout the service lifetime. Using an existing, high volume design may bring other compromises, but it is a good tradeoff to make.

If you want to have your branding on the box: many commercial off the shelf (COTS) devices are available in unbranded white-box versions. It is simple and easy to add silkscreening or design flourishes, often a one-time design fee and a tiny line item on the Bill of Materials.

If you want to add RAM, Flash, moderately faster CPU, etc: most of those white-box products allow customization of specs which do not require changes in the board design. RAM and Flash suppliers offer different capacities in the same pinout, and CPU vendors offer multiple speed-bins of their chips. There will be a sweet spot in the market where the industry is buying the most volume, with a reasonable standard deviation such that you can moderately increase the capability without substantially increasing the cost. The converse is also true: moderate reductions in RAM/Flash/CPU don’t substantially decrease cost and may not be a good tradeoff.

If you want to have a unique industrial design: many ODMs will customize a product for you, including a new casing. It will need to fit the existing board, and will cost a few hundred thousand dollars for design, tooling, and emissions testing, but that is still cheaper than taking it all on in-house as you get the volume pricing for the board and other components.

Corollaries:

Mobile ate the world. You shouldn’t shy away from using mobile chipsets, even if your product will never operate on battery. Volume drives cost down, and mobile has the volume. Also, mobile chipsets with good power management are less in need of active cooling, and fanless is a huge win for consumer products.
RAM does cost money, but RAM is your future proofing. Greatly reducing RAM to lower cost is usually a bad tradeoff. Raspberry Pi Zero has 512 MBytes of RAM and costs US $10. Moderate amounts of RAM do not add much cost.
Many modern CPUs have configurable endianness, but seriously: little endian won. I hate that it won, but it did. If you’re considering a big endian toolchain, think carefully about the life choices that led you to that dark place. You’ll be taking endianness bugs onto your own plate for no benefit.

Wednesday, September 13, 2017

Peer review terminology

peerception [peer-sept-shun] (noun) : I need to come up with some more Strengths about this person to balance out all these Areas for Improvement.

peer annum [peer-ann-uhm] (noun) : Didn't I review this person last year, and the year before that, and the year before that...

peer capita [peer cap-ee-tah] (noun) : I have how many reviews left to write?

peergatory [peer-guh-tohr-ee] (noun) : The set of reviews I might decline for lack of time.

peersuasion [peer-sway-zhun] (noun) : Do your best. They're up for promotion.

peeriodic [peer-ee-od-ik] (adjective) : Maybe I'll just copy some of what I wrote last year.

peerjury [peer-juh-ree] (noun) : This self-assessment seems somewhat... inflated.

peerritation [peer-ree-tay-shun] (noun) Unreasonable hostility felt toward the subject of the last peer review left to be written.

peersuasion [peer-sway-zhun] (noun): Do your best. They're up for promotion.

peerpetuity [peer-peh-too-it-tee] (noun): This one is taking a while to write.

peerdition [peer-dih-shun] (noun): This person really, really shouldn't have asked for my review.

peerfunctory [peer-funk-tor-ee] (adjective): Sorry, I ran out of time.

peerrihelion [peer-ih-heel-ee-un] (noun): The point at which 50% of requested peer reviews have been completed.

peerfectionist [peer-feck-shun-ist] (adjective): I spend a long time obsessing over wording.

peerfumed [peer-fume-d] (adjective): This self-assessment rather obviously glosses over failings.

peerinatal [peer-ee-nate-al] (adjective): A started but not yet completed review.

peeripeteia [peer-ih-pet-ee-ah] (noun): It was a glowing review until I remembered that outage...

peeripheral [peer-ih-fur-all] (noun): Concentrating on their 20% project.

peerquisite [peer-kwi-site] (adjective): TLs have to review their team. It's only fair.

peersecuted [peer-suh-cute-ted] (adjective): How nice. Everyone asked for my review.

peerseverance [peer-suh-veer-unce] (noun): Just keep writing, it will be done soon.

peersnickety [peer-snick-it-tee] (adjective): The self assessment could have been better.

peersona non grata [peer-so-na-non-gra-ta] (noun): Declined.

peerplexed [peer-plex'd] (adjective): What exactly did they work on, anyway?

(building on a list from a few years ago)

Monday, September 11, 2017

Musings On Customer Premises Equipment

I spent most of the period 2011 - 2017 building Customer Premises Equipment (CPE) for an Internet and television service provider. CPE gear is equipment which is installed in a customer’s residence/business/etc in order to provide an endpoint for whatever service the provider offers. For an Internet Service Provider this will be things like cable/DSL modems and Wi-Fi routers. For an electric utility, the remotely accessible meter could be considered the CPE. There are very few services which don’t require any sort of equipment on premises.

CPE devices are built very much like other consumer electronics, using the same components and software techniques. The biggest distinction is in the mindset for its development: the device you are building is not the product. The device you are building is an enabler, the actual product is the service being offered. It might seem a minor distinction but, done right, it shouldn’t be minor. It should influence the entire implementation.

For example, comparing consumer electronics built for sale to end-users to CPE devices used by service providers:

Devices for retail will build several variants, leveraging the same development and codebase but providing multiple price points for different market segments. They may also have slightly differentiated SKUs for large retailers to not compete directly with each other on price. For a service provider CPE, the incentive is the opposite: just one model to avoid inventory cost and CPE replacement for customers who upgrade. Multiple price points are provided by the service plan, which means the CPE will have capabilities which are not enabled for customers on the lower tier plans.
Devices sold to end-users will often cater to power users, as getting good reviews and positive press leads to increased sales in the vastly larger mid-tier market. They will have long lists of features, and try to check as many boxes as possible. For a service provider CPE, obscure features add support costs without attracting much additional market share. The CPE will usually target the 80% point of features which will be widely used. The kind of customer who really wants the power-user features tends to also be the most likely to try to Bring Their Own Device and make minimal use of the provided CPE, anyway.
For a CPE device, remote serviceability is a hugely important feature. Ability to resolve a customer escalation without having to send a technician is important for OpEx (Operational Expense). Even if the service provider charges for technician visits, it is at best defraying costs and not enough to incentivize wanting to send techs.
Though upfront capital cost is important, the service lifetime of CPE in the field is equally important. The value of CPE devices is in the ongoing service revenue they enable. It makes sense to leave headroom in the initial device to be able to support upgraded services in the future, and is financially preferable to having to retire the device earlier. Retirement either means mass replacement of customer devices, or writing off a portion of the customer base for any future offerings and just leaving them as is.

I found the work on CPE devices to be interesting. I hope to write about a few areas of it.