Coding Relic

Sunday, June 24, 2018

Carbon Capture: Reforestation

Pre-industrialization, forests covered approximately 5.9 billion hectares across the planet. Today that figure is 4 billion hectares, and still dropping. The deforestation has reduced the ability of the terrestrial plants to sink carbon in their yearly growth.

The basic idea in reforestation is straightforward: plant trees and other long-lasting plants in order to take up and store carbon from the atmosphere. Development of mechanisms to plant trees in large enough scale and short enough time frame to be useful in ameliorating climate change is the difficult part. This requires automation, most obviously by use of flying drones.

Biocarbon Engineering and Droneseed are two firms building technologies for rapid planting of trees. They use largish drones loaded with seed pods. The drones do require pilots, as most jurisdictions now require licensed pilots for dones, but where possible the drones are set to fly in a formation to allow a single pilot to control many at a time.

The cost efficiency of this automated seeding method is not clear from publicly available information. Each reseeding project is a unique bid, and the bids are mostly not made public. Estimates of the cost of manual planting average $4940 per hectare using manual methods. Rough estimates of the cost of a Biocarbon Engineering project to reseed Mangrove trees in Myanmar is about half of what a manual effort would be.

Companies in this technology space

Propagate Ventures works with farmers and landowners to implement regenerative agriculture, restoring the land while keeping it productive.

Dendra Systems (formerly Biocarbon Engineering) builds drones which fly in swarms, numerous drones with a single pilot, and utilizes seed pods loaded with nutrients fired from the drones toward the ground. A good percentage of the seed pods will embed into the ground, and the outer packaging will rapidly biodegrade and allow the seed to germinate.

Droneseed also builds drones to plant trees, though fewer details are available.

musings on plants

In real deployments the type of plant life seeded will be chosen to fit the local environment by the client, such as the choice of Mangrove trees in Myanmar. If we were only concerned with the rapidity of carbon uptake, and did not care about invasive species, I think there are two species of plants we would focus on:

Paulownia trees which grow extremely rapidly, up to 20 feet in one year. These are native to China, and an invasive species elsewhere.
Hemp: "Industrial hemp has been scientifically proven to absorb more CO2 per hectare than any forest or commercial crop and is therefore the ideal carbonsink." (source). I find it amusing that hemp may be crucial in saving humanity after all.

Saturday, June 23, 2018

Carbon Capture: Biochar

biochar is charcoal made from biomass, from agricultural waste or other plant material. If left to rot or burned, the carbon trapped in this plant material would return to the atmosphere. By turning it into charcoal, a large percentage of the carbon is fixed into a stable form for decades.

Turning plant material into charcoal is a straightforward process: heat without sufficient oxygen to burn. This process is called pyrolysis (from the Greek pyro meaning fire and lysis meaning separating). In ancient times this was accomplished by burying smoldering wood under a layer of dirt, cutting it off from air. More recently, a kiln provided a more efficient way to produce charcoal by heating wood without burning it. Modern methods generally use sealed heating chambers in order to capture all of the produced gases.

Pyrolysis produces three outputs:

the solid char, which has a much higher concentration of carbon than the original plant material.

a thick tar referred to as bio-oil, which is much higher in oxygen than petroleum but otherwise similar.

a carbon-rich gas called syngas. It is flammable, though it contains only about half the energy density of methane. In earlier times the gas generally just escaped, while modern processes capture and usually burn it as heat to continue the pyrolysis process.

The temperature and length of pyrolysis determines the relative quantity of char, bio-oil, and syngas. Baking for longer time at lower temperature emphasizes char, shorter times at higher temperature produces more gas and oil.

The idea of biochar for carbon capture is to intercept carbon about to return to the atmosphere, primarily agricultural waste, and turn it into a form which both sequesters carbon and improves the soil into which it is tilled. The very fine char produced from agricultural waste is quite porous and makes soil retain water more effectively. It can also improve the soil health of acidic soils, balancing the pH and making the soil more productive.

Carbon Capture: Temperature Swing Adsorption

Adsorption: the adhesion of atoms, ions or molecules from a gas, liquid or dissolved solid to a surface. This process creates a film of the adsorbate on the surface of the adsorbent.

Temperature Swing Adsorption (TSA) for carbon capture relies on a set of materials, called carbon dioxide sorbents, which attract carbon dioxide molecules at low temperature and release them at a higher temperature. Unlike the Calcium Loop described previously, there is no chemical reaction between the sorbent and the CO2. Adsorption is purely a physical process, where the CO2 sticks to the sorbent due to the slight negative charges of the oxygen atoms and positive charge of the carbon.

There are a relatively large number of materials with this sorbent property for carbon dioxide, enough to have a dedicated Wikipedia page. These materials contain porous gaps. The gaps in the most interesting materials for our purpose are the right size to hold a CO2 molecule, with a slight charge at the right spot to attract the charges of different points on the CO2. To be useful for carbon capture, the sorbent has to attract CO2 molecules but readily release them with a change in temperature. They can be cycled from cold to hot to repeatedly grab and release carbon dioxide.

Unfortunately most of the known materials have drawbacks which make them unsuitable for real-world use, such as being damaged by water vapor.

The most recent class of sorbents developed are Metal-Organic Frameworks (MOFs), which are chains of organic molecules bound up into structures with metals. Metal-Oxide Frameworks are interesting because they are much more robust than the previously known sorbents, not being easily damaged by compounds found in the air and capable of being cycled in temperature without quickly wearing out.

Companies in this technology space

Climeworks in Switzerland describes their process as a filter which is then heated to release the carbon dioxide. This is clearly an adsorption process, and almost certainly using Metal-Organic Frameworks as it is described as being reusable for a large number of cycles.

Global Thermostat in New York describes their process as an amine-based sorbent bonded to a porous honeycomb ceramic structure.

Inventys in Canada builds a carbon capture system using Temperature Swing Adsorption materials. Their system uses circular plates of a sorbent material, stacked vertically, and rotates the plates within a cylindrical housing. At different parts of the revolution the plates spend 30 seconds adsorping CO2, 15 seconds being heated to 110 degrees Celsius to release the concentrated CO2, and 15 seconds cooling back down to 40 degrees to do it again.

Inventys goes to some length to explain that their technology is in the whole system, not tied to any particular sorbent material. I suspect this is emphasized because Metal Oxide Frameworks are innovating rapidly, and indeed the entire class of MOF materials was developed after Inventys was founded, so they ensure that the system can take advantage of new sorbent materials as they appear.

Skytree in the EU is a patent licensing firm which is fairly coy about the technologies it licenses but says they were developed as part of the Advanced Closed Loop System for the International Space Station. One of the main innovations in the ACLS is the development of a solid resin adsorbent Astrine, which means the technology is adsorption-based.

Soletair in Finland aims to create an end-to-end process using adsorption and electrolysis to create feedstock for fuels.

Carbon Clean Solutions has developed a new carbon dioxide sorbent, amine-promoted buffer salt (APBS). This sorbent is available for licensing.

Mosaic Materials has developed a new carbon dioxide sorbent using nitrogen diamines, and which requires only half of the temperature swing to capture and release CO2. This will result in considerably lower energy cost and higher volume production.

Tuesday, June 19, 2018

Carbon Capture: Calcium Looping

I am very interested in technologies to ameliorate climate change. The looming, self-inflicted potential extinction of the human species seems important to address.

In this post we’ll examine the steps in Carbon Engineering’s Direct Air capture process, as published on their website, and explore what each step means. As I am an amateur at carbon capture technologies, anything and everything here may be incorrect. I’m writing this in an attempt to learn more about the space.

step 1: wet scrubber

A wet scrubber passes a gas containing pollutants, in this case atmospheric air containing excess carbon dioxide, through a liquid in order to capture the undesired elements. Scrubber designs vary greatly depending on the size of the pollutant being captured, especially whether particles or gaseous. In this case because CO2 molecules are being targeted, the scrubber is likely a tall cylindrical tower filled with finned material to maximize the surface area exposed to the air.

This process step uses hydroxide HO-, a water molecule with one of the hydrogen atoms stripped off, as the scrubbing liquid. Hydroxide bonds with carbon dioxide to form carbonic acid H2CO3. It is interesting to note that this same chemical process is occurring naturally at huge scale in the ocean, where seawater has acidified due to the absorption of carbon dioxide and formation of carbonic acid.

step 2: pellet reactor

The diluted carbonic acid is pumped through a pellet reactor, which is filled with very small pellets of calcium hydroxide Ca(OH)2. Calcium hydroxide reacts with the carbonic acid H2CO3 to form calcium carbonate CaCO3, which is the primary component of both industrial lime and antacid tablets. The small pellets in the reactor serve to both supply calcium for the reaction and to serve as a seed crystal to allow a larger calcium carbonate crystal to grow. In the process, hydrogen and oxygen atoms are liberated which turn back into water.

As the point of this system is a continuous process to remove carbon dioxide from air, I imagine the pellets are slowly cycled through the reactor as the liquid flows over them. The pellets with their load of newly grown crystal would automatically move on to the next stage of processing.

It is important to dry the pellets of calcium carbonate as they leave the pellet reactor. The next step collects purified carbon dioxide, where water vapor would be a contaminant. Removal of the remaining water could be accomplished by heating the pellets to somewhere above 100 degrees Celsius where water evaporates, but much less than 550 degrees where the calcium carbonate would begin to break down. Hot air would be sufficient to achieve this.

step 3: circulating bed fluid calcinator

A calcinator is a kiln which rotates. The wet pellets loaded with crystals of calcium carbonate CaCO3 slowly move through the kiln, where they are heated to a sufficient temperature for the calcium carbonate to decompose back into calcium oxide CaO and carbon dioxide CO2. A temperature of at least 550 degrees centigrade is needed for this, and the reaction works best somewhere around 840 degrees which is quite hot. There are catalysts which can encourage this reaction at lower temperatures, notably titanium oxide TiO2, but they are quite expensive and might not be economical compared with heating the kiln.

The carbon dioxide would be released as a hot gas to be collected, the calcium oxide will be left as solid grains in the calcinator. The calcium oxide can be used over and over, called calcium looping. Energy is expended at each cycle through the loop to free the carbon dioxide from the calcium oxide.

step 4: slaker

The solid output of the calcinator is calcium oxide CaO, also called quicklime. Quicklime is not stable, and will absorb other molecules from the air which would introduce impurities if put back into the pellet reactor. Therefore the calcium oxide CaO is combined with water to form calcium hydroxide Ca(OH)2.

A slaker adds controlled amounts of water to quicklime. This reaction releases a great deal of heat, so it is controlled by a feedback loop which reduces the inflow of material when the reaction gets too hot. I imagine the waste heat from this process could provide some of the heat needed for the earlier calcinator step, though additional heating would also be needed.

Companies in this technology space

Carbon Engineering, which builds large scale operations using the calcium loop process to capture carbon dioxide from air.
Calera, which captures CO2 to produce calcium carbonate and magnesium carbonate for industrial use.
CleanO2 builds CO2 scrubbers for HVAC systems, allowing cold air from the building to be recirculated after scrubbing carbon dioxide (and likely also scrubbing water vapor and other contaminants). As the systems produce calcium carbonate as an end-product, I'm going to assume it uses the first two steps of the calcium loop as a recovery mechanism.

Postscript

At the end of the process we have a highly purified stream of carbon dioxide extracted from ambient air. The long term goal of this kind of technology would be negative carbon emissions, which would mean keeping the CO2 from immediately circulating back into the environment by utilizing it in a long-lived form like various plastics or graphene. The technology also allows carbon neutral fuels to be made for applications where energy density requirements are higher than what battery chemistries are likely to provide, such as airplanes or ocean going vessels. Using carbon which was already in the atmosphere for these applications is much better than digging more carbon out of the ground.

Friday, June 15, 2018

CPE WAN Management Protocol: transaction flow

Technical Report 69 from the Broadband Forum is a management protocol called the CPE WAN Management Protocol (CWMP). It was first published in 2004, revised a number of times since, and aimed at the operation of DSL modems placed in customer homes. Over time it has broadened to support more types of devices which an Internet Service Provider might operate outside of its own facilities, in the residences and businesses of its customers.

There are a few key points about CWMP:

It was defined during the peak popularity of the Simple Object Access Protocol (SOAP). CWMP messages are encoded as SOAP XML.
Like SNMP and essentially every other network management protocol, it separates definition of the protocol from definition of the variables it manages. SNMP calls them MIBs, CWMP calls them data models.
It recognizes that firewalls will be present between the customer premises and the ISP, and that the ISP can expect to control its own firewall but not necessarily other firewalls between it and the customer.
It makes a strong distinction between the Customer Premises Equipment (CPE) being managed, and the Auto Configuration Server (ACS) which does the managing. It does not attempt to be a generic protocol which can operate bidirectionally, it exists specifically to allow an ACS to control CPE devices.

A few years ago I helped write an open source tr-69 agent called catawampus. The name was chosen based mainly on its ability to contain the letters C W M P in the proper order. I’d like to write up some of the things learned from working on that project, in one or more blog posts.

Connection Lifecycle

One unusual thing about CWMP is connection management between the ACS and CPE. Connections are initiated by the CPE, but RPC commands are then sent by the ACS. Keeping with the idea that it is not a general purpose bidirectional protocol, all commands are sent by the ACS and responded to by the CPE.

tr-69 runs atop an HTTP (usually HTTPS) connection. The CPE has to know the URL of its ACS. There are mechanisms to tell a CPE device what ACS URL to use, for example via a DHCP option from the DHCP server, but honestly in almost all cases the URL of the ISP’s ACS is simply hard-coded into the firmware of devices supplied by the ISP.

Thus:

The CPE device in the customer premises initiates a TCP connection to the ACS, and starts the SSL/TLS handshake. Once the connection is established, the CPE sends an Inform message to the ACS using an HTTP POST. This is encoded using SOAP XML, and tells the ACS the serial number and other information about the CPE in the <DeviceId> stanza.

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:cwmp="urn:dslforum-org:cwmp-1-2"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
    xmlns:soap-enc="http://schemas.xmlsoap.org/soap/encoding/"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <soap:Header>
    <cwmp:ID soap:mustUnderstand="1">catawampus.1529004153.967958</cwmp:ID>
  </soap:Header>
  <soap:Body>
    <cwmp:Inform>
      <DeviceId>
        <Manufacturer>CatawampusDotOrg</Manufacturer>
        <OUI>ABCDEF</OUI>
        <ProductClass>FakeCPE</ProductClass>
        <SerialNumber>0123456789abcdef</SerialNumber>
      </DeviceId>
      <Event soap-enc:arrayType="EventStruct[1]">
        <EventStruct>
          <EventCode>0 BOOTSTRAP</EventCode\>
        </EventStruct>
      </Event>
      <CurrentTime>2018-06-14T19:34:47.297063Z</CurrentTime>
      <ParameterList soap-enc:arrayType="cwmp:ParameterValueStruct[1]">
        <ParameterValueStruct>
          <Name>InternetGatewayDevice.ManagementServer.ConnectionRequestURL</Name>
          <Value xsi:type="xsd:string">http://[redacted]:7547/ping/7fd86a7302ec5f</Value>
        </ParameterValueStruct>
      </ParameterList>
    </cwmp:Inform>
  </soap:Body>
</soap:Envelope>

Several fields are highlighted above: the EventCode tells the ACS why the CPE device is connecting. It might have just booted, it might be a periodic connection at a set interval, or it might be because of an exceptional condition. The ParameterList, also highlighted, is a list of parameters the CPE can include to tell the ACS about exceptional conditions.

The ACS sends back an InformResponse in response to the POST.

<soapenv:Envelope
    xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
    xmlns:cwmp="urn:dslforum-org:cwmp-1-2"
    xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <soapenv:Header>
    <cwmp:ID soapenv:mustUnderstand="1">catawampus.1529004153.967958</cwmp:ID>
    <cwmp:HoldRequests>0</cwmp:HoldRequests>
  </soapenv:Header>
  <soapenv:Body>
    <cwmp:InformResponse>
      <MaxEnvelopes>1</MaxEnvelopes>
    </cwmp:InformResponse>
  </soapenv:Body>
</soapenv:Envelope>

If the CPE has other conditions to communicate to the ACS, such as successful completion of a software update, it performs additional POSTs containing those messages. When it has run out of things to send, it does a POST with an empty body. At this point the ACS takes over. The CPE continues sending HTTP POST transactions with an empty body, and the ACS sends a series of RPCs to the CPE in the response. There are RPC messages to get/set parameters, schedule a reboot or software update, etc. All transactions are sent by the ACS and the CPE responds.

<soapenv:Envelope
    xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
    xmlns:cwmp="urn:dslforum-org:cwmp-1-2"
    xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <soapenv:Header>
    <cwmp:ID soapenv:mustUnderstand="1">TestCwmpId</cwmp:ID>
  </soapenv:Header>
  <soapenv:Body>
    <cwmp:SetParameterValues>
      <ParameterList>
        <ns2:ParameterValueStruct xmlns:ns2="urn:dslforum-org:cwmp-1-2">
          <Name>StringParameter</Name>
          <Value xmlns:xs="http://www.w3.org/2001/XMLSchema" xsi:type="xs:string">param</Value>
        </ns2:ParameterValueStruct>
      </ParameterList>
      <ParameterKey>myParamKey</ParameterKey>
    </cwmp:SetParameterValues>
  </soapenv:Body>
</soapenv:Envelope>

The ACS can send multiple RPCs in one session with the CPE. Only one RPC can be outstanding at a time, the ACS has to wait for a response from the CPE before sending the next.

When the session ends, it is up to the CPE to re-establish it. One of the parameters in a management object is the PeriodicInformInterval, the amount of time the CPE should wait between initiating sessions with the ACS. By default it is supposed to be infinite, meaning the CPE will only check in once at boot and the ACS is expected to set the interval to whatever value it wants during that first session. In practice we found that not to work very well and set the default interval to 15 minutes. It was too easy for something to go wrong and result in a CPE which would be out of contact with the ACS until the next power cycle.

There is also a mechanism by which the ACS can connect to the CPE on port 7547 and do an HTTP GET. The CPE responds with an empty payload, but is supposed to immediately initiate an outgoing session to the ACS. In practice, this mechanism doesn't work very well because intervening firewalls, like the ISP's own residential gateway within the home, will often block the connection. This is an area where the rest of the industry has moved on: we now routinely have a billion mobile devices maintaining a persistent connection back to their notification service. CPE devices could do something similar, perhaps even using the same infrastructure.

Wednesday, June 6, 2018

Reading List: High Output Management

High Output Management by Andy Grove was first published in 1983, making it one of the earliest books about management in the technology industry and an influential book about management overall. I recently read the 2nd edition, revised in 2015.

Though the revisions help in updating the material, the book does still strongly resonate of the 1980s. Some of the examples concern Japanese DRAM manufacturers crowding out US firms, the rise of the PC industry, and the business climate of email beginning to replace telephone and memos. Nonetheless, management techniques change much more slowly than technology, and there is quite a bit of useful material in the book.

Some key takeaways for me:

Manager output = output of org + output of adjacent orgs under their influence

Grove’s point is that managers should be evaluated based on the performance of their own organization, plus the extent to which they influence the output of those who don’t directly report to them. This is especially important for knowledge leaders who provide technical direction for a large organization in particular areas, but without having large numbers of people reporting to them on the orgchart. The examples Grove uses are typically concerned with manufacturing and production, which was a particular strength and focus of his at Intel.

It is notable that 30+ years later, we’re still not very good at evaluating management performance in influencing adjacent organizations. Manager evaluations focus mostly on their direct reports, because that is more straightforward to judge. The incentives for managers are therefore to grow their org as large as possible, which isn’t always the best thing for the company even if it is the best thing for the manager.

Choose indicators carefully, and monitor them closely

It is important to monitor output, not just activity, or you’ll end up emphasizing busywork. An example Grove gives is a metric of the number of invoices processed by an internal team. That metric should be paired with a count of the number of errors produced. Any productivity metric needs to be paired with a quality measurement, to ensure that the team doesn’t feel incentivized to produce as much sloppy work as possible.

Even more importantly, the indicators need to be credible. If you won't act on them by taking big (and possibly expensive) steps, then all the monitoring will produce is anxiety. The business indicators need to be invested with sufficient faith to act when a new trend is clear, even if that trend has yet to percolate up in other, more visible, ways.

Management can usually be forecasted and scheduled

Though we will always deal with interruptions or emergencies or unexpected issues, a big portion of a manager’s job is predictable. You know how often you should have career discussions with team members, and when performance appraisals should be done, so put career discussions on the calendar a quarter before performance appraisals. You know when budgeting will be done, put milestone planning on the calendar two months before that.

For lull times between the scheduled activities, Grove recommends a backlog of manager tasks which need to be done but don’t have a hard deadline. This also nicely reduces the temptation to fill the lull periods by meddling in the work of subordinates.

I feel like this is something management as a profession has gotten better at since the book was initially written. Practices may vary across companies, but on the whole I feel like there is perhaps more structure for managers than the book implies from earlier times.

Now, a disagreement: technical half-life

Grove makes a point several times that technology changes quickly so the company needs to keep hiring younger workers straight out of university, where they will have learned the latest technology. As engineers become more senior they can move into leadership and management roles and leave the technology to those more recently graduated.

I find this not credible, for several reasons:

It assumes that technology work is 100% technical, that communications skills and leadership are entirely separate and can be supplied by those senior engineers who move into management roles.
There are far fewer managers than engineers. This idea takes it as given that universities should produce a large number of grads for corporations to chew through, and discard most of them in favor of fresh graduates. It seems like corporations could find a better use for their senior engineers than to discard most of them.
It implies that all of this new tech comes from somewhere else, perhaps from Universities themselves, and that senior engineers play no role in developing it.

Wednesday, May 2, 2018

We Edited DNA in our Kitchen. You Can Too!

When our children expressed an interest in DNA and genetic engineering, we wanted to encourage their curiosity and interest. We went looking for books we could read, videos we could watch, etc.

However as we all now live in the future, there is a much more direct way to inspire their interest in genetic engineering: we could engineer some genes, in our kitchen. Of course.

We bought a kit from The Odin, a company which aims to make biological engineering and genetic design accessible and available to everyone. The kit contains all of the supplies and chemicals needed to modify yeast DNA: Genetically Engineer Any Brewing or Baking Yeast to Fluoresce

Altogether the exercise took about a week, most of which was spent allowing the yeast time to grow and multiply. If we had an incubator we could have sped this up, but an incubator is not essential for a successful experiment.

The first step was to create a healthy colony of unmodified yeast. We mixed a yeast growth medium called YPD, rehydrated the dried yeast, and spread everything onto petri dishes. The yellowish gel on the bottom of the dish is the growth medium, the droplets are the rehydrated yeast.

After several days to grow, we could then take up a bit of yeast into a small tube. We would be modifying the DNA of the yeast in the tube, and would later be able to compare it to our unmodified yeast.

The next steps are the amazing stuff.

We used a pipette to add a tiny amount of transformation matrix. This mixture prepares the yeast cells to take in new DNA.

We then used the pipette to add the GFP Expression Plasmid. GFP is Green Fluorescent Protein, and is what makes jellyfish glow in blue light. The GFP Expression Plasmid bundles the DNA segment for the jellyfish gene together with CRISPR as the delivery mechanism.

Swirling the yeast together with the plasmid is how we edited DNA in our kitchen. Over several hours, CRISPR transferred the new gene into the yeast cells in the tube. We incubated the tube for a day, then spread it onto a fresh petri dish to spend a few more days growing.

Voila: shining a blue light on the original dish of unmodified yeast versus the dish with our genetically engineered strain, you can see the difference. Our modified yeast glows a soft green. This is the Green Fluorescent Protein which our modified yeast produces.

This wasn’t a difficult experiment to perform, every step was straightforward and the instructions were quite clear. The kids got a great deal out of it, and are enthused about learning more.

We genetically engineered yeast in our kitchen. You can too!
Genetically Engineer Any Brewing or Baking Yeast to Fluoresce

Monday, April 30, 2018

Automated Blackmail at Scale

I received a blackmail letter in the postal mail yesterday. Yes, really. It begins thusly:

Hello Denton, I’m going to cut to the chase. My name is SwiftBreak~15 and I know about the secret you are keeping from your wife and everyone else. More importantly, I have evidence of what you have been hiding. I won’t go into the specifics here in case your wife intercepts this, but you know what I am talking about.

You don’t know me personally and nobody hired me to look into you. Nor did I go out looking to burn you. It is just your bad luck that I stumbled across your misadventures while working on a job around <redacted name of town>. I then put in more time than I probably should have looking into your life. Frankly, I am ready to forget all about you and let you get on with your life. And I am going to give you two options that will accomplish that very thing. Those two options are to either ignore this letter, or simply pay me $8,600. Let’s examine those two options in more detail.

In email this wouldn't be notable. I probably wouldn't even see it as it would be classified as spam. Via postal mail though, it is unusual. Postal spam is usually less interesting than this.

The letter went on to describe the consequences should I ignore it, how going to the police would be useless because the extortionist was so very good at covering their tracks, and gave a bitcoin address to send the payment to.

There are several clues that this was an automated mass mailing:

It helpfully included a How To Bitcoin page, which seemed odd for an individual letter (though crucial to make the scam work).
It looked like a form letter, inserting my first name and street name at several points.
Perhaps most importantly, I don't have any kind of secret which I could be blackmailed over. I don't live that kind of life. Reading the first paragraph was fairly mystifying as I had no idea what secret they were referring to.

I haven't written about bitcoin before as, other than wishing I'd mined a bunch of coins in 2013 or so, I find it farcical. However cryptocurrency is key in enabling things like this automated blackmail at scale, by providing a mostly anonymous way to transfer money online.

I am by no means the first person to be targeted by this scam:

Dave Eargle received an early version of the letter, which called out infidelity specifically. The letter I received was completely vague as to the nature of the scandalous secret.
Joshua Bernoff received a letter earlier this month which looks very similar to mine.
As the scam has grown, various news outlets have covered it: CNBC, Krebs On Security. The news coverage occurred in a burst in January 2018, covering Dave Eargle.

The amount of money demanded has increased over time. The 2016 letter which Dave Eargle received demanded $2000. The 4/2018 letter which Joshua Bernoff received demanded $8,350. My letter demanded $8,600. I imagine the perpetrator(s) are fine-tuning their demand based on response rates from previous campaigns. More sophisticated demographic targeting is possible I suppose, but the simpler explanation seems more likely.

I'll include the complete text of the letter at the end of this post, to help anyone else targeted by this scam to find it. I'm also trying to figure out if there is somewhere at USPS to send the physical letter to. Using the postal service to deliver extortion letters is a crime, albeit in this case one where it would be difficult to identify the perpetrator.

Option 1 is to ignore this letter. Let me tell you what will happen if you choose this path. I will take this evidence and send it to your wife. And as insurance against you intercepting it before your wife gets it, I will also send copies to her friends, family, and your neighbors on and around <redacted name of street>. So, Denton, even if you decide to come clean with your wife, it won’t protect her from the humiliation she will feel when her friends and family find out your sordid details from me.

Option 2 is to pay me $8,600. We’ll call this my “confidentiality fee.” Now let me tell you what happens if you choose this path. Your secret remains your secret. You go on with your life as though none of this ever happened. Though you may want to do a better job at keeping your misdeeds secret in the future.

At this point you may be thinking, “I’ll just go to the cops.” Which is why I have taken steps to ensure this letter cannot be traced back to me. So that won’t help, and it won’t stop the evidence from destroying your life. I’m not looking to break your bank. I just want to be compensated for the time I put into investigating you.

Let’s assume you have decided to make all this go away and pay me the confidentiality fee. In keeping with my strategy to not go to jail, we will not meet in person and there will be no physical exchange of cash. You will pay me anonymously using bitcoin. If you want me to keep your secret, then send $8,600 in BITCOIN to the Receiving Bitcoin Address listed below. Payment MUST be received within 10 days of the post marked date on this letter’s envelope. If you are not familiar with bitcoin, attached is a “How-To” guide. You will need the below two pieces of information when referencing the guide.

Required Amount: $8,600
Receiving Bitcoin Address: <redacted>

Tell no one what you will be using the bitcoin for or they may not give it to you. The procedure to obtain bitcoin can take a day or two so do not put it off. Again, payment must be received within 10 days of this letter’s post marked date. If I don’t receive the bitcoin by the deadline, I will go ahead and release the evidence to everyone. If you go that route, then the least you could do is tell your wife so she can come up with an excuse to prepare her friends and family before they find out. The clock is ticking, Denton.

Wednesday, January 24, 2018

I Know What You Are by the Smell of Your Wi-Fi

In July 2017 gave a talk at DEFCON 25 describing a technique to identify the type of Wi-Fi client connecting to an Access Point. It can be quite specific: it can distinguish an iPhone 5 from an iPhone 5s, a Samsung Galaxy S7 from an S8, etc. Classically in security literature this type of mechanism would have been called "fingerprinting," but in modern usage that term has evolved to mean identification of a specific individual user. Because this mechanism identifies the species of the device, not the specific individual, we refer to it as Wi-Fi Taxonomy.

The mechanism works by examining Wi-Fi management frames, called MLME frames. It extracts the options present in the client's packets into a signature string, which is quite distinctive to the combination of the Wi-Fi chipset, device driver, and client OS.

The video of the talk has been posted by DEF CON:

Additionally:

The slides are available in PDF format from the DEFCON media server, and the speaker notes on the slides contain the complete talk.
The database of signatures to identify devices is available as open source code with an Apache license as a GitHub repository.
There is also a paper which describes the mechanism, and which goes a level of detail deeper into how it works. It is available from arXiv.

Tuesday, January 23, 2018

Yakthulhu

Behold: the Yakthulhu. It is a tiny Cthulhu made from the hair of shaved yaks.

(Really, it is. Yak hair yarn is a thing which one can buy. Disappointingly though, they do not shave the yaks. They comb the yaks.)

Saturday, October 21, 2017

On CPE Release Processes

Datacenter software is deployed frequently. Push daily! Push hourly! Push on green whenever the tests pass! This works even at extremely large scale, new versions of facebook.com are deployed multiple times each day (much of the site functionality is packaged in a single deployable unit).

CPE device software tends to not be deployed so often, not even close. There are several reasons for this:

Test practices are different.

Embedded systems is one of the oldest niches in software development and does not have a strong tradition even of unit testing, let alone the level of automated testing which makes concepts like push-on-green possible. One can definitely get good unit test coverage of code which the team developed, but the typical system will include a much larger amount of open source code which rarely has unit tests and is daunting for the team to try to add tests to. Much of the code in the system is only going to be tested at the system level. With effort and effective incentives one can develop a level of automated system test coverage... but it still won’t be close to 95%. System level testing never is, the combinatorial complexity is too high.

Additionally, with datacenter software, the build system creating the release is often somewhat similar to the production system which will run the release. It may even be the same, if the development team uses prod systems to farm out builds. A reasonable fraction of the system functionality can be run in tests on the builder.

With CPE devices, the build system is almost always not a CPE being tasked to compile everything. The build system is an x86 server with a cross-compiler. The build system will likely lack much of the hardware which is key to the CPE device functionality, like network interfaces or DRM keystores or video decoders. Large portions of the system may not be testable on the builder.
The scale is different.
Having a million servers in datacenters is a lot, that is one or more very large computing facilities capable of serving hundreds of millions of customers.

Having a million CPE devices is not a lot. There are typically multiple devices within the home (modem, router, maybe some set top boxes), so that is a couple hundred thousand customers.

It can simply take longer to push that amount of software to the much larger number of systems whose network connections will generally be slower than those within the datacenter. Multiple days is typical.
The impact of a problem in deployment is different.
If you have a serious latent bug which is noticed at the 3% point of a rollout within a datacenter, that is probably a survivable event. Customers may be impacted and notice, but you can generally quarantine those 3% of servers from further traffic to end the problem. The servers can be rolled back and restored to service later, even if remediation steps are required, without further impacting customers.

If you have serious latent bug which is noticed at the 3% point of a rollout within a CPE Fleet, you now have a crisis. 3% of the customer base is impacted by a serious bug, and will feel the impact until you finish all of the remediation steps.

If the remediation steps in 3% of a datacenter rollout require manual intervention, that will be a significant cost. If the remediation steps in 3% of a CPE Fleet deployment require manual intervention, it will have a material impact on the business.

We’ll jump straight to the punchline: How often should one deploy software updates to a CPE fleet?

In my opinion: exactly as often as it takes to not feel terrified at the prospect of the next release, no more and no less often than that.

Releasing infrequently allows requirements and new development to build up, making the release heavier and with more opportunities for accidental breakage. It also results in developer displeasure at having to wait so long for their work to make it to customers, and corresponding rush to get not-quite-baked features in to avoid missing the release.
Releasing too frequently can leave not enough time to fully test a release. Though frequent releases have the advantage of having a much smaller set of changes in each, there does still need to be a reasonable confidence in testing.

In the last CPE fleet I was involved in, we tried a number of different cadences: every 6 weeks, then weekly, then quarterly. I believe the 6 week cadence worked best. The weekly cadence resulted in a number of bugs being pushed to the fleet and subsequent rollbacks simply due to the lack of time to test. The quarterly cadence led to developers engaging in bad behavior to avoid missing a release train, by submitting their feature even in terrible shape. The release cadence became even slower, and the quality of the product noticeably lower. I think six weeks was a reasonable compromise, and left enough headroom to do minor releases at the halfway point as needed where a very small number of changes which were already tested for the next release could be delivered to customers early.

One other bit of advice: no matter what the release cadence is, once it has been going on long enough, developers will begin griping about it and the leadership may begin to question it (Maxim #4). Leadership interference is what led to the widely varying release processes in the last CPE fleet I was involved in. My only advice there is to manage upwards: announce every release, and copy your management, to keep it fresh in their minds that the process works and delivers updates regularly.

Wednesday, October 4, 2017

Bad Physics Jokes

Recently I posted a number of truly terrible physics jokes to Twitter, as one does. For your edification and bemusement, here they are:

If you accelerate toward the red light fast enough, the blue shift turns it green again.
Two people are walking up a frictionless hill...
(that's it. That's the joke.)
Whenever a neutron asks the price of anything: "For you, no charge."
Whenever the proton is asked if they are sure: "I'm absolutely positive."
The electron isn't invited to the party in the nucleus. The other particles find it boorish, being so constantly negative all the time.
The Higgs Boson conveys the gravity of the situation to other particles.
Photons make light of EVERYTHING. They can be SO inappropriate sometimes.
You might assume that Gravitons would be extroverts attracted to large groups, but no. They're actually really, really shy.
Despite the occasional unverified sighting, experts agree that Phlogistons are the Bigfoot of the subatomic particle world.
Neutrinos are the tragic poets of the subatomic world. They yearn for interaction, but know that it can never be.
As a community, Protons realized that their diet and exercise habits needed to improve.
Other particles wish they could help the Pion, but don't know what to do. Even the smallest thing can make them fall to pieces.

Sunday, September 24, 2017

There is No Feminist Cabal

From the 23-Sep-2017 New York Times:

One of those who said there had been a change is James Altizer, an engineer at the chip maker Nvidia. Mr. Altizer, 52, said he had realized a few years ago that feminists in Silicon Valley had formed a cabal whose goal was to subjugate men. At the time, he said, he was one of the few with that view.

Now Mr. Altizer said he was less alone. "There’s quite a few people going through that in Silicon Valley right now," he said. "It’s exploding. It’s mostly young men, younger than me."

I want to share some experiences, as another white male in Tech for a similar number of years as Mr. Altizer.

A while ago I made a conscious effort to follow more women in Tech on Twitter, to deliberately maintain a ratio of ~50% in those I follow. I wanted to try for more perspective than that provided by my own vantage point in the industry, where the gender ratio is definitely not 50%. It has been illuminating... and often painful. Intermixed with happy and proud events in life and work is the constant level of sexism which women experience. Sometimes it is blatant and vile: intimidation, physical threats. More often it is a grinding, ever present disrespect from men. It is so commonplace that it becomes completely expected, often mentioned in an offhand way. This doesn’t mean it is a minor thing, it means that it never stops, isn’t possible to avoid, and ceases to be surprising.

You likely won’t hear this in person, from women you work with. That doesn’t mean women you work with are experiencing something different. It doesn’t mean that their career and work environment are free of sexism and discrimination. It means that talking about it in person is asking them to relive those events, sometimes extraordinarily painful events. It means it is vastly more difficult to relate horrible experiences in person, in conversation. It is understandable to not want to talk about it.

Yet one thing I haven’t seen, not even a hint of, is the existence of a powerful group of women who are organizing to oppress men. I’ve seen no evidence of any kind of backlash against men in Tech. Jokes about a Cabal started circulating after the NYT story was published. This was irony, not confirmation.

We’re hearing more about sexism in Tech, far more than we did even a year ago. I think, I hope, that is because we are in the early stages of the extinction burst. When a behavior which was formerly rewarded no longer is, that behavior will begin to decline... except for a final gasp, a final burst, in trying to turn back the clock. The process of acknowledging the disparities in Tech has been ongoing for many years, slowly. It has reached a point where the industry is starting to respond, if only a little. That the response may grow stronger will feel like a threat, like a backlash against males. It really isn't. It is about disparity, and doing something to rectify that long-present disparity.

I’m posting this because it is unfair to expect people in disadvantaged groups to carry the entire burden of correcting the disadvantage. In Tech the advantaged group is males, and as I happen to be a male in Tech, that means me. I have not been nearly enough a part of the solution. That needs to change.

Males in Tech have almost certainly witnessed aggressions: a woman being spoken over, or not being invited to a meeting she should be, or not receiving sufficient credit for her work. One thing learned from following women in tech is that no matter how much we think we understand, the aggressions they go through happen orders of magnitude more frequently than we think. For every occurrence we know of there are ten more, a hundred more, a thousand more, which we don’t see and don't grasp the frequency of because we are not female.

We, males in Tech, need to speak out. We need to speak out frequently and firmly. So I am. My voice isn’t powerful, but power can be achieved in numbers too. I’m adding my voice to the multitude. You should too; not just as a blog post, speak out when you see the grinding aggression happening.

Thursday, September 21, 2017

Software Engineering Maxims which May or May Not Be True

This is a series of Software Engineering Maxims Which May or May Not Be True, developed over the last few years of working at Google. Your mileage may vary. Use only as directed. Past performance is not a predictor of future results. Etc.

Maxim #1: Small teams are bigger than large teams

In my mind, the ideal size for a software team is seven engineers. It does not have to be exactly seven: six is fine, eight is fine, but the further the team gets from the ideal the harder it is to get things done. Three people isn’t enough and limits impact, fourteen is too many to effectively coordinate and communicate amongst.

Organizing larger projects becomes an exercise in modularizing the system to allow teams of about seven people to own the delivery of a well-defined piece of the overall product. The largest parts of the system will end up with clusters of teams working on different aspects of the system.

Maxim #2: Enthusiasm improves productivity.

By far the best way to improve engineering productivity is to have people working on something which they are genuinely enthused about. It is beneficial in many ways:

the quality of the product will benefit from the care and attention
people don’t let themselves get blocked by something else when they are eager to see the end result
they’ll come up with ways to make the product even better, by way of their own resourcefulness
people are simply happier, which has numerous benefits in morale and working environment.

There are usually way more tasks on the project wish list than can realistically be delivered. Some of those tasks will be more important than others, but it is rarely the case that there is a strict priority order in the task list. More often we have broad buckets of tasks:

crucial, can’t have the product without it
nice to have
won’t do yet: too much crazy, or get to it eventually, or something

The crucial tasks have to be done, even the ones which no-one particularly wants to do.

In my mind, when selecting from the (lengthy) list of nice-to-have tasks, the enthusiasm of the engineering team should be a factor in the choices. The team will deliver more if they can work on things they are genuinely interested in doing.

Maxim #3: Project plans should have top-down and bottom-up elements

It is possible for a team to work entirely from a task list, where Product Management and the Leadership add stuff and the team marks things off as they are completed. This is not a great way to work, but it is possible.

It is better if the team spend some fraction of their time on tasks which germinated from within the team - not merely 20% time, a portion of regular work should be on tasks which the team itself came up with.

The team tends to generate immediately practical ideas, things which build upon the product as it exists today and provide useful extensions.
It is good for morale.
It is good for careers. Showing initiative and technical leadership is good for a software engineer.

Maxim #4: Bricking the fleet is bad for business

Activities with a risk of irreparable consequences deserve more care. This sounds obvious, like something which no-one would ever disagree with, but in the day-to-day engineering work those tasks won’t look like something which require that extra level of care. Instead they will look like something which has been running for years and never failed, something which fades into the background and can be safely ignored because it is so reliable.

Calls to add this risk will not be phrased as "be cavalier about something which can ruin us." It will be phrased as increasing velocity, or lowering cost, or not being stuck in doing things the old way - all of which might be true, it just needs more care and attention in changing it.

Maxim #5: There is an ideal rate of breakage: no more, no less

Breaking things too often is a sign of trying to do too much too quickly, and either excessively dividing attention or not allowing time for proper care to be taken.

Not breaking things often enough is a sign of the opposite problem: not pushing hard enough.

I’m sure it is theoretically possible for a team to move at an absolutely optimal speed such that they maximize their results without ever breaking anything, but I’ve no idea how to achieve it. The next best thing is to strive for just the right amount of breakage: not too much, not too little.

Maxim #6: It’s a marathon, not a sprint

"Launch and iterate" is a catchy phrase, but often turns into excuses to launch something sub-par and then never getting around to improving it.

Yet there is a real advantage to being in a market for the long term, launching a product and improving it. Customer happiness is earned over time, not all at once with a big launch.

This means structuring teams for sustained effort, not big product pushes.
It means triaging bugs: not everything will get fixed right away, but should at least be looked at to assess relative priority.
It means really thinking about how to support the product in the field.
It means not running projects in a way which burn people out.

Maxim #7: The service is the product

The product is not the code. The product is not the specific feature we launched last week, nor the big thing we’re working on launching next week.

The product is the Service. The Product which customers care about is that they can get on the Internet and do what they need to do, that they can turn on the TV and have it work, that they can make phone calls, whatever it is they set out to do.

Maxim #8: Money is not the only motivator

A monetary bonus is one tool available for managers to reward good work. It is not the only tool, and is not necessarily the best tool for all situations.

For example, to encourage SWEs to write automated system tests we created the Yakthulhu of Testing. It is a tiny Cthulhu made from the hair of shaved yaks (*). A Yakthulhu can be obtained by checking in one’s first automated test to run against the product.

(*) It really is made from yak hair. Yak hair yarn is a thing which one can buy. Disappointingly though, they do not shave the yaks. They comb the yaks.

Maxim #9: Evolve systems as a series of incremental changes

There is substantial value in code which has seen action in the field. It contains a series of small and large decisions, fixes, and responses which made the system better over time. Generally these decisions are not recorded as a list of lessons learned to be applied to a rewrite or to the next system.

Whenever possible, systems should evolve as a series of incremental changes to take it from where it is to where we want it to be. Doing this incrementally has several advantages:

benefits are delivered to customers much earlier, as the earliest pieces to be completed don’t have to wait for the later pieces before deployment.
there is no stagnant period in the field after work on the old system is stopped but before the new system is ready.
once the system is close enough to where we want it to be that other stuff moves higher on the list of priorities, we can stop. We don’t have to push on to finish rewriting all of it.

Maxim #10: Risk is multiplicative

There is a school of thought that when there are multiple large projects going on, and there is some relation between them, that they should be tied together and made dependent upon each other. The arguments for doing so are often:

"We’re going to pay careful attention to those projects, making them one project means we’ll be able to track them more effectively."
"There was going to be duplication of effort, we can share implementation of the common systems."
"We can better manage the team if we have more people available to be redirected to the pieces which need more help."

The trouble with this is that it glosses over the fundamental decision being made: nothing can ship until all of it ships. Combining risks makes a single, bigger risk out of the multiple smaller risks.

Maxim #11: Don’t shoot the monitoring

There is a peculiar dynamic when systems contain a mix of modules with very good monitoring along with modules with very poor monitoring; the modules with good monitoring report all of the errors.

The peculiarity becomes damaging if the result is to have all of the bugs filed against the components with good monitoring. It makes it look like those modules are full of bugs, when the reality is likely the opposite.

Maxim #12: No postmortem prior to mortem

There are going to be emergencies. It happens, despite our best efforts to anticipate risks. When it happens, we go into damage control mode to resolve it.

People not involved in handling the emergency will begin to ask about a postmortem almost immediately, even before the problem is resolved. It is important to not begin writing the postmortem until the problem has been mitigated. Doing so turns a unified crisis response into a hotbed of fingerpointing and intrigue. Even in a culture of blameless postmortems, it is difficult to avoid the harmful effects of the hints of blame while writing that blameless postmortem.

It is fine, even crucial, to save information for later drafting of the postmortem. IRC logs, lists of bugs/CLs/etc, will all be needed eventually. Just don’t start a draft of a postmortem while still antemortem.

Maxim #13: Cadence trumps mechanism

We tend to focus a lot on mechanisms in software engineering as a way to increase velocity or productivity. We reduce the friction of releasing software, or we automate something, and we expect that this will result in more of the activity which we want to optimize.

Inertia is a powerful thing. A product at rest will tend to stay at rest, a product in motion will tend to stay in motion. The best way to release a bunch of software is to release a bunch of software, by setting a cadence and sticking to it. People get used to a cadence and it becomes self-reinforcing. Making something easier may or may not result in better velocity, making it more regular almost always does.

Maxim #14: Churn wastes more than time

Project plans change. It happens.

When plans change too often, or when a crucial plan is suddenly cancelled and destaffed, we burn more than just the time which was spent on the old plan. We burn confidence in the next plan. People don’t commit as readily and don’t put their best effort into it until they’re pretty sure the new plan will stick.

In the worst case, this becomes self-reinforcing. New plans fail because of the lack of effort engendered by the failure of previous plans.

Maxim #15: Sometimes the hard way is the right way

For any given task, there is often some person somewhere who has done it before and knows exactly what to do and how to do it. For things which are likely to be done once in a project and never repeated, relying on that person (either to do it or to provide exactly what to do step by step) can significantly speed things up.

For things which are likely to be repeated, or refined, or iterated upon, it can be better to not rely on that one expert. Learning by doing results in a much deeper understanding than just following directions. For an area which is core to the product or will be extended upon repeatedly, the deeper understanding is valuable, and is worth acquiring even if it takes longer.

Maxim #16: Spreading knowledge makes it thicker

Pigeonholing is rampant in software engineering, engineers who have become experts in a particular area and always end up being assigned tasks in that area.

There are occasions where that is optimal, where becoming a subject matter expert takes substantial time and effort, but these situations are rare. In most cases it is not the expense of becoming an expert that keeps an engineer doing similar work over and over, it is just complacency.

Areas of the product where the team needs to continue to expend effort over a long time period should move around to different members of the team. Multiple people familiar with an area will reinforce each other. Additionally, teaching the next person is a very effective way to get a better understanding for oneself.

Maxim #17: Software Managers must code

When one first transitions from being an individual contributor software engineer to being a manager, there is a decision to be made: whether to stop doing work as an individual contributor and focus entirely on the new role in guiding the team, or to keep doing IC work as well as management.

There are definitely incentives to focus entirely on management: one can have a much bigger impact via effective use of a team than by one’s own effort alone. When a new manager makes that choice, they get a couple of really good years. They have more time to plan, more time to strategize, and the team carries it all out.

The danger in this path comes later: one forgets how hard things really are. One forgets how long things take. The plans and strategies become less effective because they no longer reflect reality.

Software managers need to continue do engineering work, at least a little, to stay grounded in reality.

Maxim #18: Manage without a net

Managers and Tech Leads cannot depend on escalation. We sometimes believe that the layers of management above us exist in order to handle things which the lower layers are having trouble with. In reality, those upper layers have their own goals and priorities, and they generally do not include handling things bubbling up from below.

Do not rely on Deus Ex Magisterio from above, organizations do not work that way.

Maxim #19: Goodwill can be spent. Spend wisely.

Doing good work accumulates goodwill. It is helpful to have a pile of goodwill, it tends to make interactions smoother and generally makes things easier.

Nonetheless, it is possible to spend goodwill on something important: to redirect a path, to right a wrong, etc. Sometimes spending goodwill is the right thing to do. Don’t spend it frivolously.

Maxim #20: Everyone finds their own experience most compelling

"We should do A. I did that on my last project, and it was great."

"No, we should do B. I did that on my last project, and it was great."

Comparing experiences rarely builds consensus, everyone believes their own experiences to be the most convincing. Comparing experiences really only works when there is a power imbalance, when the person advocating A or B also happens to be a Decider.

In most cases, simply being aware of this phenomena is sufficient to avoid damaging disagreements. The team needs to find other ways to pick their path forward, such as shared experiences or quantitative measurements, not just firmly held belief.

Friday, September 15, 2017

On CPE Cost

When it comes to the cost of hardware, volume matters more than anything else. To large extent, volume matters more than everything else put together. A cost efficient hardware design produced in low volume will be considerably more expensive than an inefficient and sloppy design produced in high volume. Plus, for a high volume product, the Contract Manufacturer will have engineering teams to help tighten the design for a moderate fee.

If your own sales volume is sufficient to get deep volume discounting, you can stop reading now (more honestly, you aren't reading this in the first place). Otherwise, if you are building a product for a new market or you are building for a niche, read on.

What does this mean? It means you should work very, very hard to use hardware which is produced in high volume. The compromises you would make in terms of RAM or other capabilities in order to get your own custom design down to a price you can tolerate will cost you far more than you saved in terms of updating the software and capabilities throughout the service lifetime. Using an existing, high volume design may bring other compromises, but it is a good tradeoff to make.

If you want to have your branding on the box: many commercial off the shelf (COTS) devices are available in unbranded white-box versions. It is simple and easy to add silkscreening or design flourishes, often a one-time design fee and a tiny line item on the Bill of Materials.

If you want to add RAM, Flash, moderately faster CPU, etc: most of those white-box products allow customization of specs which do not require changes in the board design. RAM and Flash suppliers offer different capacities in the same pinout, and CPU vendors offer multiple speed-bins of their chips. There will be a sweet spot in the market where the industry is buying the most volume, with a reasonable standard deviation such that you can moderately increase the capability without substantially increasing the cost. The converse is also true: moderate reductions in RAM/Flash/CPU don’t substantially decrease cost and may not be a good tradeoff.

If you want to have a unique industrial design: many ODMs will customize a product for you, including a new casing. It will need to fit the existing board, and will cost a few hundred thousand dollars for design, tooling, and emissions testing, but that is still cheaper than taking it all on in-house as you get the volume pricing for the board and other components.

Corollaries:

Mobile ate the world. You shouldn’t shy away from using mobile chipsets, even if your product will never operate on battery. Volume drives cost down, and mobile has the volume. Also, mobile chipsets with good power management are less in need of active cooling, and fanless is a huge win for consumer products.
RAM does cost money, but RAM is your future proofing. Greatly reducing RAM to lower cost is usually a bad tradeoff. Raspberry Pi Zero has 512 MBytes of RAM and costs US $10. Moderate amounts of RAM do not add much cost.
Many modern CPUs have configurable endianness, but seriously: little endian won. I hate that it won, but it did. If you’re considering a big endian toolchain, think carefully about the life choices that led you to that dark place. You’ll be taking endianness bugs onto your own plate for no benefit.