Friday, October 23, 2009

ARM Cortex A5

On October 21 ARM announced a new CPU design at the low end of their range, the Cortex A5. It is intended to replace the earlier ARM7, ARM9, and ARM11 parts. The A5 can have up to 4 cores, but given its positioning as the smallest ARM the single core variant will likely dominate.

To me the most interesting aspect of this announcement is that when the Cortex A5 ships (in 2011!) all ARM processors, regardless of price range, will have an MMU. The older ARM7TDMI relied on purely physical addressing, necessitating the use of either uClinux, vxWorks, or a similar RTOS. With the ARM Cortex processor family any application requiring a 32 bit CPU will have the standard Linux kernel as an option.

Story first noted at Ars Technica

Cortex A5

 
Update

In the comments Dave Cason and Brooks Moses point out that I missed the ARM Cortex-M range of processors, which are considerably smaller than the Cortex-A. The Cortex-Ms do not have a full MMU, though some parts in the range have a memory protection unit. So it is not the case that all ARM processors will now have MMUs. Mea culpa.

Monday, October 19, 2009

Tweetravenous Solutions

Recently Rob Diana wrote a guest post at WorkAwesome, Balancing Work And Social Media Addiction. In it, he said:

"If your day job is sitting in a cube or corporate office somewhere, then you will need to limit your activity in some way. If you want to be like Robert Scoble or Louis Gray, you will have to give up some sleep to stay active on several sites."

Twitter Skin Patch We at Tweetravenous Solutions have another solution to offer, which allows our customers to handle a full day's work, a full day of social media, and still get a full night's sleep! We call it the Social Media Patch. Our patented algorithms will monitor your social media feeds, collating and condensing updates from those you follow and encoding them in a complex matrix of proteins, amino acids, and caffeine. Its all infused into an easy-to-apply skin patch which can be worn inconspicuously beneath clothing. Even better, its in Real Time! (note1)


 
Social Media Skin patches for twitter, friendfeed, and RSS.  

Never go without your twitter fix again!

Feed your friendfeed need!

Works with any RSS feed, too! (note2)


 
Wait, there's more! For the true Social Media Expert wanting to get an edge on the competition, we offer additional treatment options administered in our state-of-the-art facility!
Twitter IV drip

Coming soon: two way communication! Thats right, you'll be able to retweet or reply using only the power of your mind! (and bodily secretions) (note3)


 

Don't wait, call now!


 

note1: Real Time is defined as 48 hours, to allow for processing and express overnight delivery.

note2: pubsubhubbub and rssCloud support coming soon.

note3: Two way communication will incur additional cost, and add processing time. US Postal Service will not accept bodily secretions for delivery.


 

Author's note: I enjoyed Rob Diana's article, and recommend people to read it. When I saw "Social Media Addiction" in its title, the thought of a Nicotine patch came spontaneously to mind. It became a moral imperative to write this up.

Also: sadly, modern browsers no longer render the <blink> tag. Too bad, it would have been sweet.


 

Wednesday, October 14, 2009

AMD IOMMU: Missed Opportunity?

In 2007 AMD implemented an I/O MMU in their system architecture, which translates DMA addresses from peripheral devices to a different address on the system bus. There were several motivations for doing this:

    Direct Virtual Memory Access
  1. Virtualization: DMA can be restricted to memory belonging to a single VM and to use the addresses from that VM, making it safe for a driver in that VM to take direct control of the device. This appears to be the largest motivation for adding the IOMMU.
  2. High Memory support: For I/O buses using 32 bit addressing, system memory above the 4GB mark is inaccessible. This has typically been handled using bounce buffers, where the hardware DMAs into low memory which the software will then copy to its destination. An IOMMU allows devices to directly access any memory in the system, avoiding copies. There are a large number of PCI and PCI-X devices limited to 32 bit DMA addresses. Amazingly, a fair number of PCI Express devices are also limited to 32 bit addressing, probably because they repackage an older PCI design with a new interface.
  3. Enable user space drivers: A user space application has no knowledge of physical addresses, making it impossible to program a DMA device directly. The I/O MMU can remap the DMA addresses to be the same as the user process, allowing direct control of the device. Only interrupts would still require kernel involvement.

 
I/O Device Latency

Multiple levels of bus bridging PCIe has a very high link bandwidth, making it easy to forget that its position in the system imposes several levels of bridging with correspondingly long latency to get to memory. The PCIe transaction first traverses the Northbridge and any internal switching or bus bridging it contains, on its way to the processor interconnect. The interconnect is HyperTransport for AMD CPUs, and QuickPath for Intel. Depending on the platform, the transaction might have to travel through multiple CPUs before it reaches its destination memory controller, where it can finally access its data. A PCIe Read transaction must then wend its way back through the same path to return the requested data.

Graphics device on CPU bus

Much lower latency comes from sitting directly on the processor bus, and there have been systems where I/O devices sit directly beside CPUs. However CPU architectures rev that bus more often than it is practical to redesign a horde of peripherals. Attempts to place I/O devices on the CPU bus generally result in a requirement to maintain the "old" CPU bus as an I/O interface on the side of the next system chipset, to retain the expensive peripherals of the previous generation.


 
The Missed Opportunity: DMA Read pipelining

An IOMMU is not a new concept. Sun SPARC, some SGI MIPS systems, and Intel's Itanium all employ them. Once you have taken the plunge to impose an address lookup between a DMA device and the rest of the system, there are other interesting things you can do in addition to remapping. For example, you can allow the mapping to specify additional attributes for the memory region. Knowing whether it is likely to do long, contiguous bursts or short concise updates allows optimizations to reduce latency by reading ahead, to transfer data faster by pipelining.

Without prefetch (CONSISTENT) With prefetch (STREAMING)
DMA Read with no prefetch DMA Read with prefetch

AMD's IOMMU includes nothing like this. Presumably they wanted to confine the software changes to the Hypervisor alone, whilst choosing STREAMING versus CONSISTENT requires support in the driver of the device initiating DMA, but they could have ensured software compatibility by making CONSISTENT be the default with STREAMING only used by drivers which choose to implement it.


 
What About Writes?

The IOMMU in SPARC systems implemented additional support for DMA write operations. Writing less than a cache line is inefficient, as the I/O controller has to fetch the entire line from memory and merge the changes before writing it back. This was a problem for Sun, which had a largish number of existing SBus devices issuing 16 or 32 byte writes while the SPARC cache line had grown to 64 bytes. A STREAMING mapping relaxed the requirement for instantaneous consistency: if a burst wrote the first part of a cache line, the I/O controller was allowed to buffer it in hopes that subsequent DMA operations would fill in the rest of the line. This is an idea whose time has come... and gone. The PCI spec takes great care to emphasize cache line sized writes using MWL or MWM, an emphasis which carries over the PCIe as well. There is little reason now to design coalescing hardware to optimize sub-cacheline writes.

Without buffering (CONSISTENT) With buffering (STREAMING)
DMA Write with no prefetch DMA Write with prefetch

 
Closing Disclaimer

Maybe I'm way off base in lamenting the lack of DMA read pipelining. Maybe all relevant PCIe devices always issue Memory Read Multiple requests for huge chunks of data, and the chipset already pipelines data fetch during such large transactions. Maybe. I doubt it, but maybe...

Tuesday, October 13, 2009

WinXP Network Diagnostic

Please contact your network administrator or helpdesk

Thanks, but I could probably have figured that out without help from the diagnostic utility.

Sunday, October 11, 2009

A Cloud Over the Industry

Unbelievable:

Regrettably, based on Microsoft/Danger's latest recovery assessment of their systems, we must now inform you that personal information stored on your device - such as contacts, calendar entries, to-do lists or photos - that is no longer on your Sidekick almost certainly has been lost as a result of a server failure at Microsoft/Danger. That said, our teams continue to work around-the-clock in hopes of discovering some way to recover this information. However, the likelihood of a successful outcome is extremely low.

This kind of monumental failure casts a cloud over the entire industry (pun intended). How did it happen? I do not believe Danger could have operated without sufficient redundancy and without backups for so many years, it simply does not pass the smell test. There must be more to the story.

On possible theory for unrecoverable loss of all customer data is actually sabotage by a disgruntled insider. This is, itself, pretty unbelievable. Nonetheless according to GigaOM, when Microsoft acquired Danger "...while some of the early investors got modest returns, I am told that the later-stage investors made out like bandits." I wonder if the early employees also made only a modest payout in return for their years of effort, and had to watch the late stage investors take everything. Danger filed for IPO in late 2007, but was suddenly acquired by Microsoft in early 2008. What if the investors determined they would not make as much money in a large IPO as they would in a carefully arranged, smaller acquisition by Microsoft? For example, the term sheet might have included a substantial liquidation preference but specify that all shares revert to common if the exit is above N dollars. For the investors, the best exit is (N-1) dollars; they maximize their return by keeping a larger fraction of a smaller pie. If they had drag along rights, they could force the acquisition over the objections of employees and management. If the long-time employees watched their payday be snatched away... a workforce of disgruntled employees seems quite plausible.

This is all, of course, complete speculation on my part. I simply cannot believe that the company could accidentally lose all data, for all customers, irrevocably. It doesn't make sense.

Danger Sidekick

Other links relating to this story:

Thursday, October 8, 2009

Code Snippet: SO_BINDTODEVICE

In a system with multiple network interfaces, can you constrain a packet to go out one specific interface? If you answered "bind() the socket to an address," you should read on.

Why might one need to strictly control where packets can be routed? The best use case I know is when ethernet is used as a control plane inside a product. Packets intended to go to another card within the chassis must not, under any circumstances, leave the chassis. You don't want bugs or misconfiguration to result in leaking control traffic.

The bind() system call is frequently misunderstood. It is used to bind to a particular IP address. Only packets destined to that IP address will be received, and any transmitted packets will carry that IP address as their source. bind() does not control anything about the routing of transmitted packets. So for example, if you bound to the IP address of eth0 but you send a packet to a destination where the kernel's best route goes out eth1, it will happily send the packet out eth1 with the source IP address of eth0. This is perfectly valid for TCP/IP, where packets can traverse unrelated networks on their way to the destination.

In Linux, to control the physical topology of communication you use the SO_BINDTODEVICE socket option.

#include <netinet/in.h>
#include <net/if.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/socket.h>

int main(int argc, char **argv)
{
    int s;
    struct ifreq ifr;

    if ((s = socket(AF_INET, SOCK_STREAM, 0)) < 0) {
        ... error handling ...
    }

    memset(&ifr, 0, sizeof(ifr));
    snprintf(ifr.ifr_name, sizeof(ifr.ifr_name), "eth0");
    if (setsockopt(s, SOL_SOCKET, SO_BINDTODEVICE,
                (void *)&ifr, sizeof(ifr)) < 0) {
        ... error handling ...
    }

SO_BINDTODEVICE forces packets on the socket to only egress the bound interface, regardless of what the IP routing table would normally choose. Similarly only packets which ingress the bound interface will be received on the socket, packets from other interfaces will not be delivered to the socket.

There is no particular interaction between bind() and SO_BINDTODEVICE. It is certainly possible to bind to the IP address of the interface to which one will also SO_BINDTODEVICE, as this will ensure that the packets carry the desired source IP address. It is also permissible, albeit weird, to bind to the IP address of one interface but SO_BINDTODEVICE a different interface. It is unlikely that any ingress packets will carry the proper combination of destination IP address and ingress interface, but for very special use cases it could be done.

Monday, October 5, 2009

Thursday, October 1, 2009

Yield to Your Multicore Overlords

ASIC design is all about juggling multiple competing requirements. You want to make the chip competitive by increasing its capabilities, by reducing its price, or both. Today we'll focus on the second half of that tradeoff, reducing the price.

Chip fabrication is a statistical game: of the parts coming off the fab, some percentage simply do not work. The vendor runs test vectors against the chips, and throws away the ones which fail. This is called the yield, and is the primary factor determining the cost of the chip. If the yield is bad, so you have to fab a whole bunch of chips to get one that actually works, you have to charge more for that one working chip.

To illustrate why most chips coming out of the fab do not work, I'd like to walk through part of manufacturing a chip. This information is from 1995 or so, when I was last seriously involved in a chip design, and describes a 0.8 micron process. So it is completely old and busted, but is sufficient for our purposes here.

Begin by placing the silicon wafer in a nitrogen atmosphere. You deposit a photo-resist on the wafer, basically a goo which hardens when exposed to ultraviolet light. You place a shadow mask in front of a light source; the regions exposed to light will harden while those under the shadow mask remain soft. You then chemically etch off the soft regions of the photo-resist, leaving exposed silicon where they were. The hardened regions of photo-resist stay put.

Next you heat the wafer to 400 degrees and pipe phosphorous into the nitrogen atmosphere. As the K atoms heat they begin moving faster and bouncing off the walls of the chamber. Some of them move fast enough that when they strike the surface of the wafer they break the Si crystal lattice and embed themselves in the silicon. If they strike the hardened photo-resist, they embed themselves in the resist; very, very few are moving fast enough to crash all the way through the photoresist into the silicon underneath.

Electron Microscopy image of a silicon die

Next you use a different chemical process to strip off the hardened photoresist. You are left with a wafer which has phosphorous embedded in the places you wanted. Now you heat the wafer even higher, hot enough that the silicon atoms can move around more freely; they move back into position and reform the crystal lattice, burying the phosphorous atoms embedded within. This is called annealing. Now you have the p+ regions of the transistors.

You repeat this process with aluminum ions to get the n- regions. Now you have transistors. Next you connect the transistors together with traces of aluminum (which I won't go into here). You cut the wafer to separate the die, and place each die in a package. You connect bonding wires from the edge of the die to the pins of the chip. And voila, you're done.


It should be apparent that this is a probabalistic process. Sometimes, based purely on random chance, not enough phosphorous atoms embed themselves into the silicon and your transistors don't turn on. Sometimes too much phosphorous embeds and your transistors won't turn off. Sometimes the Si lattice is too badly damaged and the annealing is ineffective. Sometimes the metal doesn't line up with the vias. Sometimes a dust particle lands on the chip and you deposit metal on top of the dust mote. Sometimes the bonding doesn't line up with the pads. Etc etc.

This is why the larger a chip grows, the more expensive it becomes. Its not because raw silicon wafers are particularly costly, its that the probability of there being a defect somewhere grows ever greater as the die becomes larger. The bigger the chip, the lower the yield of functional parts.

Intel Nehalem chip with SRAM highlited For at least 15 years that I know of, chip designs have improved their yield using redundancy. The earliest such efforts were done in on-chip memory: if your chip is supposed to include N banks of SRAM, put N+1 banks on the die connected with wires in the top most layer which can be cut using a laser. The SRAM occupies a large percentage of the total chip area, statistically it is likely that defects will be within the SRAM. You can then cut out the defective bank, and turn a defective chip into one that can be used. More recent silicon processes use fuses blown by the test fixture instead of lasers.


 
Massively Multicore Processors

Yesterday NVidia announced Fermi, a beast of a chip with 512 Cuda GPU cores. They are arranged in 16 blocks of 32 cores each. At this kind of scale, I suspect it makes sense to include extra cores in the design to improve the yield. For example, perhaps each block actually has 33 cores in the silicon so that a defective core can be tolerated.

In order to avoid having weird performance variations in the product, the extra resources are generally inaccessible if not used for yield improvement. That is even though the chip might have 528 cores physically present, no more than 512 could ever be used.

NVidia Fermi GPU

Monday, September 28, 2009

49.710269618056 Days

Western Digital recently corrected a firmware issue in certain models of VelociRaptor where the drive would erroneously report an error to the host after 49 days of operation. Somewhat inconveniently for RAID arrays, if all drives powered on at the same time they would all report an error at the same time.

Informed speculation: the drive reports an error after exactly 49 days, 17 hours, 2 minutes, 47 seconds, and 294.999 milliseconds of operation. That is the moment where a millisecond timer overflows an unsigned 32 bit integer.

WD VelociRaptor

Tuesday, September 22, 2009

A Pudgier Tux

Tux the Penguin At LinuxCon 2009 a discussion arose about the Linux kernel becoming gradually slower with each new release. "Yes, it's a problem," said Linus Torvalds. "The kernel is huge and bloated, and our icache footprint is scary. I mean, there is no question about that. And whenever we add a new feature, it only gets worse."

I think the addition of new features is a red herring, and the real problem is in letting Tux eat the herring. Just hide the jars, maybe get a treadmill, and everything will go back to the way it was.

Pickled Herring

Story originally noted in The Register.

Thursday, September 17, 2009

Jasper Forest x86

Intel has a long but uneven history in the embedded market. In the early days of the personal computer Intel released the 80286 as a followon to the original 8086. There actually was an 80186: it was a more integrated version of the 8086 aimed at embedded applications. Intel's interest in embedded markets has waxed and waned over the years, but it is an area where Intel still has room for significant growth.

I wrote about x86 for embedded use about a year and a half ago, with four main points:

  • Volume Discounts
    PC pricing thresholds at 50,000 units have to be rethought for a less homogenous market
  • System on Chip (SoC)
    Board space is at a premium, we need fewer components in the system
  • Production lifetime
    These systems are not redesigned every few months, chips have to remain in production longer
  • Power and heat
    Airflow is more constrained, and the system has other heat generating components besides the CPU complex
Nehalem vs Jasper Forest

At the Intel Developer Forum next week Intel is expected to focus on embedded applications for its products. In advance of IDF Intel announced the Jasper Forest CPU, a System on Chip version of Nehalem. It is based on a 1, 2, or 4 core CPU plus an integrated PCI-e controller, so it does not need a separate northbridge chip. Intel also committed to a 7 year production lifetime, allowing the part to be designed into products which will remain on the market for a while. I'd speculate that Intel will offer industrial temperature grade parts as well, perhaps at lower frequencies.

Jasper Forest is particularly suited for and aimed at storage applications. It has additional hardware for RAID support (presumably XOR & ECC generation), and a feature to use main memory as a nonvolatile buffer cache. When loss of power is detected the chip will flush any pending writes out to RAM and then set the DRAM to self-refresh before shutting down. By including a battery sufficient to power the DRAM, the system can avoid the need for a separate nonvolatile data buffer like SRAM.

This is a good approach for Intel: target silicon at specific high margin, growing application areas. Go for markets with moderate power consumption requirements, as x86 is clearly not ready for small battery powered applications like phones. Ars Technica discusses Intel's upcoming weapon for getting into mobile and other battery powered markets, a version of their 32nm process which reduces leakage current to almost nothing. An idle x86 would consume essentially no power, which would be huge.

Tuesday, September 15, 2009

Soft Errors Are Hard Problems

"Soft Error" is a euphemism in the semiconductor industry for "the silicon did the wrong thing." Soft errors can occur when a circuit is infused with a sudden burst of energy from an external source, for example when it is hit by a high energy subatomic particle or by radiation.

Alpha particle strike - two protons plus two electrons, emitted when a heavy radioactive element decays into a lighter element. Alpha particles are so large that the chip packaging will normally block them, they are only a problem when something inside the package undergoes radioactive decay.

Cosmic ray strike - a high energy neutron (or other particle) emitted by the Sun. These particles are gradually absorbed by the Earth's atmosphere, so they are more of a problem in orbit and at high altitude. Cosmic rays can directly impact the silicon, or can hit a nearby atom and throw off neutrons which in turn cause a soft error.

Beta particle strike - an electron, emitted when a neutron is converted into a proton + electron + antineutrino. Beta particles rarely hold enough energy to affect current silicon technology, alpha and cosmic ray strikes are more of a problem.

A DRAM bit Soft errors are usually discussed in the context of DRAM, where the problem was initially noticed. DRAM consists of a capacitor to store the bit, with a transistor to keep the capacitor charge stable. A capacitor is an energy storage circuit: it stores voltage. A particle strike on the capacitor will impart a large amount of energy, which can spontaneously change it to a 1 or, more rarely, overload its capacity such that the energy quickly escapes into the substrate and leaves the bit as a 0.

Some quick searching will turn up a few facts about soft errors:

  • Soft errors were first noted in the 1970s.
  • The primary cause was the use of slightly radioactive isotopes in the chip packaging, such as lead (Pb-212).
  • Materials in chip packaging are now carefully screened to substantially eliminate radioactivity.
  • Soft errors are now very rare and mostly caused by cosmic rays.

We'll come back to these later.


 
Beyond DRAM A 6 transistor SRAM bit

Soft errors are not confined to DRAM alone. Any circuit will glitch if hit by a sufficiently energetic particle - whether DRAM, SRAM, or a logic element. DRAM began to be affected by soft errors when the energy stored in the capacitor shrunk to be on the order of the energy induced by an alpha particle. SRAM was not initially affected because its cells are actively driven via 6 transistors, whose energy level is considerably higher. Nonetheless as silicon feature sizes have shrunk it is now quite possible for SRAM to suffer a soft error. Soft errors in logic elements are somewhat less noticeable in that they will correct themselves on the next clock cycle, while an error in a storage element will persist until it is rewritten.

Intel Nehalem with SRAM highlighted Modern CPUs include a great deal of SRAM on the die, comprising the caches, TLBs, reorder buffers, and numerous other uses. The image shown here is Intel's quad core Nehalem die, with the SRAM areas highlighted and logic deemphasized (both based on my best guesses). SRAM is a significant fraction of the die. Many, many other ASIC and CPU designs contain similar or higher fractions of SRAM.

What impact can a bitflip in the SRAM have? Consider that there is just one bit difference between the following two instruction opcodes. A bitflip can make the software come up with results which should be impossible.

ADD R1,1
ADD R1,32769

Even more bizarrely a bitflip could change some random instruction into a memory reference, such as a load or store. As the register being dereferenced would likely not contain a valid pointer, the process would segfault for inexplicable reasons.

To prevent this problem Intel and all major CPU vendors protect their caches and other on-chip memories using Error Correcting Codes, but many ASIC designs do not. They might implement parity, but on-chip ASIC memories commonly have no error checking at all. A soft error will simply corrupt whatever was in the SRAM, which will only be noticed if it causes the ASIC to misbehave in some perceptible way.

What happens if a particle strike causes the hardware to misbehave, or to get the wrong answer? Usually, we blame the software. It must be some weird bug.


 
Back To The Future

Returning to the earlier list of common facts about soft errors, lets focus on the last two.

  • Materials in chip packaging are now carefully screened to substantially eliminate radioactivity.
  • Soft errors are now very rare and mostly caused by cosmic rays.

There are many different materials used in a finished chip, beyond the silicon die and the gold wires connecting to its pins. The chip package is plastic or ceramic, which is composed of a host of different elements including boron. Solder bonds the wires to the pins. Solder used to be mostly lead, later tin, and might now be a polymer. There are heat spreading compounds and shock absorption goo, which are often organic polymers. Some of these materials are naturally slightly radioactive. For example, in nature Lead (Pb-208) contains traces of Pollonium (Po212) which will emit alpha particles and decay into Pb-208. Similarly boron-10 is more prone to fission than boron-11 - or so they tell me, I've no idea why.

Modern chip packaging uses strained versions of these materials, to reduce the level of undesirable isotopes and leave purified inert material behind. This is an expensive process. Each new generation of silicon imposes more stringent requirements, making them even more costly. It is crucial that the correct materials be used... and this is where human error can creep in.

Using the wrong packaging materials increases alpha emissions to the point where the silicon will experience an unacceptable soft error rate. Yet to control costs it is also important to not overshoot the alpha emission requirements by using a more expensive material than necessary. So each chip design may use a different mix of materials depending on its process technology, die size, and the amount of SRAM it contains. The manufacturer will have checks in place to ensure the correct materials are used, but mistakes can occur. Sometimes a batch of chips is produced which suffers an unusually high soft error rate. When this happens the manufacturer is generally loathe to admit it, and will quietly replace the chips with a corrected batch. One recent case where the problem was too widespread to cover up was with the cache of the Ultrasparc-II CPU from Sun Microsystems, see point #5 of an Actel FAQ on the topic for more details.


 
The Moral of the Story

The impossible triangle, a tribar If you are working on low level software for a chip and run into bizarre errors, you should suspect a software problem first. Soft errors really are rare, and the manufacturing screwups described above are very uncommon. If the problem is repeatable, even if it has only happened twice, it is not a soft error. Particle strikes are too random for that. However if you keep running into different symptoms where you think "but that is impossible..." you should consider the possibility that the problem really is impossible and was introduced by a bitflip. You should start checking whether the problems are confined to a particular batch of parts, or only produced in a certain range of dates. If so, it could be that batch of parts has a problem with the packaging materials.


 
Other resources

While researching this post I came across a few additional sources of information which I found fascinating, but did not have a good place to link to them. They are presented here for your edification and bemusement.

  • An analysis of the high soft error rate of the ASC Q supercomputer at LLNL. These errors were cosmic ray induced, not due to a badly packaged batch of chips. The part of the system in question did not implement ECC to correct errors, only parity to detect them and crash.
  • Cypress Semiconductor published a book to explain soft errors to their customers.
  • Fujitsu has a simulator to predict soft error rates, called NISES. The linked PDF is mostly in Japanese, but the images are fascinating and very illustrative.

This post was many years in the making.

Friday, September 11, 2009

Motorola Mobile Meanderings

On September 10th Motorola announced the CLIQ, its first Android phone and a product which a good friend of mine has labored over for quite some time.



For many years I have used my trusty Sony-Ericsson T616, but I admit it might be getting a bit dated. I decided to compare the two devices to see whether its time for me to take up a new phone.

T616
CLIQ
Cracked LCD screen? Y N
Broken down arrow key? Y N
Wallpaper picture of my Daughter? Y N

As you can see, clearly I cannot replace my T616 until Motorola rectifies these glaring omissions in the CLIQ. The cracked screen and non-functional arrow key could both be achieved via sufficiently rough handling. Frankly I don't know how Motorola would obtain pictures of my daughter to use as wallpaper, though I might be willing to accept this little flaw and add my own wallpaper later.

This chart shows that my T616 is the perfect phone, but I'm in a generous mood. The first person to offer me enough money for me to buy a CLIQ, takes it.

Tuesday, September 8, 2009

Infinite Arrays of Tweeples

Twitter followers list Twitter is somewhat out of the range of topics I normally cover here, but I promise we'll come around to a software development angle by the end of this post.

When you follow someone on twitter, you appear in their followers list and they appear in your following list. New entries appear at the top, so the newest follower will be the first entry in your list. Recently I noticed an exception to this sort ordering, when someone who had been following a long time ago and later unfollowed decided to follow again. I received the notification email from twitter, yet he wasn't at the top of the list of followers. Instead he appeared much further down in the listing, off the first page. That is where he was when he followed me the first time, many months ago. This entry disappeared when he unfollowed, and when he re-followed he ended up back in that same place. Why would that be?


 
Possibility #1: Timestamp

The first possibility, and more likely the correct one, is that twitter tracks the timestamp of every new follow and chooses not to update it on a subsequent refollow. No matter how many times you have followed/unfollowed, you retain the timestamp of the very first time and will show up in the followers list at that position.

If an unfollow+refollow was sufficient to move you back up to the top of the list of followers, the bots would do it all the time to make it more likely you'd follow them back. Yet this is a boring root cause. So lets consider a second possibility which is more illustrative to software development.


 
Possibility #2: Array Deletion

Twitter operates at a scale where performance optimization is essential. If they are not cognizant about performance the wheels fly off and users start to see the Fail Whale. An area of particular importance is the list of followers, as the backend infrastructure has to traverse it for every tweet. It is possible that twitter implemented the followers list as an array in memory instead of a linked list, presumably to get better locality. The classic drawback of an array is deletion: you cannot delete an element from the middle without moving all subsequent elements into the hole thus created. To avoid this compaction a "deleted" or "active" bit is commonly kept for each element, allowing deleted entries to be left in place but skipped without processing.

When Scobleizer unfollowed everyone it would have resulted in holes in the followers list of 106,000 different accounts, entries with the deleted bit set.

Array with deleted bits

I suspect that Twitter does not immediately compact these arrays, so long as the ratio of holes/filled entries is tolerable. When Scobleizer decided to re-follow me the twitter backend located the earlier, deleted entry and flipped the bit back to active.

Array after refollowing

Thus the newly restored entry will re-appear in the followers list, but not as the top most entry. It will re-appear at its existing position within the array. This, or a similar implementation choice of retaining deleted entries in some way, could be why re-follows do not appear at the top of the list.


 
The Moral of the Story

Optimization is fine, and absolutely crucial to function at Twitter scale, but one must to be careful when an optimization changes user-visible behavior. This is particularly true for social media, where we're explicitly conversing with other humans and ascribe human motivations to their actions. Twitter's handling of deleted and re-added follows can cause considerable consternation, because to the casual observer it appears the person followed but then immediately unfollowed. It can seem judgmental.

Of course, I am most likely completely wrong about Twitter's implementation using arrays. It wouldn't be the first time I've made a complete fool out of myself in a blog post. Its cathartic, in a way. Perhaps I'll do it more often.

Monday, August 24, 2009

Plummeting Down the Chasm

Crossing the Chasm book coverCrossing the Chasm is a seminal book in technology marketing, whose ideas quickly spread through the industry. Originally written in 1991 by Geoffrey Moore, it showed a new take on the technology adoption lifecycle. The lifecycle starts with tech enthusiasts willing to buy an immature product and runs through the majority buyers who make up the bulk of the market, finally trailing off when market saturation is reached. It had been commonly depicted as a bell curve:


Technology Adoption Life Cycle

Moore's key observation is that while the bell curve implies there is a smooth transition from early market to majority, in reality the buyers in the early market are fundamentally different from the majority that comes later. Technology companies who don't appreciate this gap will stumble and often fail once they saturate the small pool of early buyers. Moore referred to this gap as "the chasm."

Technology Adoption Life Cycle

Early adopters are visionaries. They are willing to look at an immature product and figure out how to use it in their operations to get a competitive advantage. They will sponsor integration work within their IT departments, and generate long lists of product feedback to better fit their needs. They are fundamentally different from the majority buyers in that they will look at an interesting product and figure out what problem it can solve. The majority market comes from the opposite direction with a problem to solve, looking for a solution. Product planning and marketing which works in the early part of a product lifecycle will fail utterly later in the game.


 
Yon Chasm Approacheth

It is quite possible to get stuck in the early market, continuously trying to meet the needs of early adopters and never enjoying the big sales of the majority market. Having spent the last several years in this predicament, I'll offer my version of the lifecycle chart. What it lacks in precision, it makes up for in snark.


Rope bridge with gaps

As a development engineer it can be difficult to tell how well the sales cycle is working as one rarely gets direct visibility, but the indirect evidence is plentiful.

  • If nearly every deal comes with a list of new product features to be implemented, you have not crossed the chasm.
  • If in every medium to large deal it is not clear whether the cost of getting the business is greater than the revenue it would bring, you have not crossed the chasm.
  • If every deal is "high touch," requiring multiple visits by a salesperson and sales engineer and possibly a consultation with the development team, you have not crossed the chasm.
  • If your product cannot be sold via a web site but instead always requires an evaluation period and report, you have not crossed the chasm.
  • If every customer is using a different subset of the product functionality, you have not crossed the chasm.

This is important: if the company does not realize that the real problem is in the approach to the market, all of these things will be blamed on the product. The reasoning will be that "if we just pound out a few more of these deals, we'll have finally implemented everything that everybody wants and sales will take off." There may even be an element of truth in this sentiment, if the product shipped early before its natural feature set was complete. However if the company has not crossed the chasm, the fundamental problem is elsewhere.

You are not getting a list of feature requirements because the product is incomplete, but because you are selling to the type of customer who generates lists of requirements.

You are getting that list of requirements because you are still selling into the visionary and early adopter segments of the market, the people who are willing to think about how best to integrate the product into their operations. If you were selling into the mainstream market there would be no list of requirements because the mainstream won't do an extensive integration on their own. The product either meets their needs or it doesn't, and you'll either get the sale or you won't. In the mainstream there will be no back and forth of what the product could do to win the business. At most, the mainstream buyer might tell you why you lost the business.

Don't get stuck wandering around in the chasm. Trust me, it sucks.