Monday, February 28, 2011

Mechanical Denial of Service

Mechanical Engineers need to worry about denial of service attacks too.

Penny jammed into lock cylinder of a change machine

Tuesday, February 22, 2011

All in Good Time

Current directives:

  1. Learn to dance.
  2. Kill all Humans.

The ordering is important.

(via @mattcutts)

Sunday, February 20, 2011

From a Lawnchair Overlooking the Bot War

In the last several days I've noticed a large increase in the rate of bots following me on twitter. It went from perhaps one a day to dozens. The bots are following 100-200 people, and have always sent zero tweets. They sometimes have an avatar picture, and their bio always sounds convincing. The avatars and bios are probably harvested from real Twitter accounts, and sometimes they get the default avatar. They appear to avoid any bios with a link.

The email from Twitter tells you what client was used to follow you. They have cycled through various third party clients, and then Twitter's mobile web page. Most recently the email contains no mention of the client, and I don't know what that means.

Twitter bot profile

What is interesting is that this influx of follow bots never appear to send any tweets. An entirely different herd of bots has started spamming via @-replies, with a link purporting to offer a free iPhone. The bots which send the spam follow zero people.

I wonder: are the botnets now fielding offensive and defensive teams? By this I mean using the offense to send spam, and watch for twitter users which block them. They can check whether the bot can still see the tweets of the users it has spammed. Users who react by blocking are likely reporting the bot for spam, and can themselves be targeted by the defense. The defense has never sent any spam, and can report legitimate users to try to get their account suspended.

Thursday, February 17, 2011

The Programmer's Paradox

I wrote a guest post today at The Programmer's Paradox, on open sourcing of personal coding projects.

Paul Homer is a great observer on software and engineering topics, I highly recommend subscribing to his blog.

Monday, February 14, 2011

Android Multicore

I have it on good authority that an Android release in late 2011 will support 29 CPUs. Here is how I expect the Q&A after the announcement to go, fielding questions from developers and users. In the interest of brevity I've omitted the questions. Here are the answers.

"No, not 'up to'. We support twenty nine cores."

"We are confident that ARM vendors are up to the challenge."

"Have you ever tried to animate a convincing look of horror on a green pig face? Plus, consider this: realistically dented helmets deformed by the physics, not pre-dented 3d models."

"No, that would be too many. Also, it needs to be a prime number to properly balance the workload."

"This is Google, of course we have data to support that."

"Very sophisticated CPU throttling, thats how."


 

I think I might stop correcting rumors. It's more fun to read the things people just make up.8 Feb 2011 via Twitter for Android

Monday, February 7, 2011

On Self Driving Cars

Were I working on software for self-driving cars, I'd name the steering wheel device as /dev/student. That way the car would be controlled by the student driver.

Thursday, February 3, 2011

Listing Processes with libproc

I recently had to work on functionality to look through /proc/<pid> for information about processes, which would have entailed an annoying amount of file schlepping and string parsing. Fortunately there is procps, a very nice library to makes /proc access work very much like directory access via opendir(). It normalizes the procfs implementations of a number of OSes like Linux and Solaris, so you work with a common data structure and don't have to maintain a bunch of parsing code. However it doesn't abstract /proc very much: you still need to know what it is and what information is in each /proc file to make good use of the facility.

You start with a call to openproc(), which creates a PROCTAB* structure to iterate through running processes.

#include <proc/readproc.h>

int main(int argc, char** argv) {
  PROCTAB* proc = openproc(PROC_FILLMEM | PROC_FILLSTAT | PROC_FILLSTATUS);

A flag argument to openproc() tell it what kind of information you want. The library will skip processing files in /proc if it can.

PROC_FILLMEMread /proc/<pid>/statm
PROC_FILLCOMallocate and populate `cmdline'
PROC_FILLENVallocate and populate `environ'
PROC_FILLUSRlook up user id number, fill in user name
PROC_FILLGRPlook up group id number, fill in group name
PROC_FILLSTATUSread /proc/<pid>/status
PROC_FILLSTATread /proc/<pid>/status
PROC_FILLWCHANread function name from /proc/<pid>/statm
PROC_FILLARGhandled identically to PROC_FILLCOM

The PROCTAB is repeatedly passed to readproc(), which populates a proc_t for each running process.

proc_t proc_info;
memset(&proc_info, 0, sizeof(proc_info));
while (readproc(proc, &proc_info) != NULL) {
  printf("%20s:\t%5ld\t%5lld\t%5lld\n",
         proc_info.cmd, proc_info.resident,
         proc_info.utime, proc_info.stime);
}

When done, call closeproc() to release resources.

closeproc(proc);

Some sample output from my system:

  process:  pages utime   stime
   xinetd:    139       1       0
     sshd:    866      10      21
     bash:   1377      28      16
ssh-agent:    208      11       3
  portmap:    158       1       4
rpc.statd:    208       1       0

 
proc_t

The proc_t contains a great deal of information about the process.

typedef struct proc_t {
  int
    tid,         // (special)     task id, the POSIX thread ID (see also: tgid)
    ppid;        // stat,status   pid of parent process

  unsigned long long
    utime,       // stat          user-mode CPU time accumulated by process
    stime,       // stat          kernel-mode CPU time accumulated by process
    cutime,      // stat          cumulative utime of process and reaped children
    cstime,      // stat          cumulative stime of process and reaped children
    start_time;  // stat          start time of process -- seconds since 1-1-70

  long
    priority,    // stat          kernel scheduling priority
    nice,        // stat          standard unix nice level of process
    rss,         // stat          resident set size from /proc/#/stat (pages)
  ...etc...

The proc_t also contains this maddening little member variable:

  unsigned pcpu; // stat          %CPU usage (is not filled in by readproc)

Instantaneous CPU percentage is commonly desired, but is not tracked by the kernel and is therefore not available anywhere procps can read. Tracking a percentage has to be implemented in the application by taking a snapshot, waiting a little while, and taking another snapshot to learn the utime+stime spent during the interval. This is the reason why top shows all CPU percentages as 0.0% when it starts, and corrects them on the next interval. procps provides a convenient place to store the CPU percentage, but does not implement it in the library.

Monday, January 31, 2011

Updating the Zodiac

Noting the impact of date corrections in stoking interest, Astrologers have announced further changes to the Zodiac for 2012. Instead of just 12 signs roughly corresponding to a month each there will be 8,766 signs, one for each hour of the year. The Hubble space telescope has provided ample stellar data to make this possible. Additionally to keep the Zodiac fresh, periodically old signs will be retired and new ones introduced.

Companies wishing to sponsor their logo as a new Zodiacal symbol should contact the International Union of Astrology and Tarot. One, five, and twenty year plans are available.

Hubble starfield image superimposed with decepticon, tron recognizer, wizard hat

Thursday, January 27, 2011

C++ POD Member Handling

I always mess up the initialization of plain old data fields in C++. Always. Maybe by writing it up, I'll finally get it right.

Plain Old Data is essentially anything a regular C compiler could compile. That is:

  • integer and floating point numbers (including bool, though it isn't in C)
  • enums
  • pointers, including pointers to objects and pointers to functions
  • some aggregate data structures (structs, unions, and classes)

A struct, union, or class is treated as plain old data if it has only the default constructor and destructor, has no protected or private member variables, does not inherit from a base class, and has no virtual functions. I suspect most C++ programmers have an intuitive feel for when a class behaves like an object and when its just a collection of data. That intuition is pretty good in this case.

The default constructor for plain old data leaves it uninitialized. An explicit constructor sets it to zero.

CodeResult
class Foo {
 public:
  Foo() {}
  int a_;
};
Result: a_ is uninitialized.
class Foo {
 public:
  Foo() : a_() {}
  int a_;
};
Result: the member corresponding to a_ is zeroed.
Were it a structure, the entire thing would be zero.

People are often confused by the first point, that member POD fields are left uninitialized unless specifically listed in the initializer list. This is not the same as for member objects, which call the default constructor. Making this even more confusing, when a process starts up any pages it gets from the OS will be zeroed out to prevent information leakage. So if you look at the first few objects allocated, there is a better than average chance that all of the member variables will be zero. Unfortunately once the process has run for a while and dirtied some of its own pages, it will start getting objects where the POD variables contain junk.


 
Struct initializers

Putting a POD struct in the initializer list results in zeroing the struct. If the struct needs to contain non-zero data, C++0x adds a useful capability:

struct bar {
  int y;
  int z;
};

class Foo {
 public:
  Foo() : b_({1, 2}) {}
  struct bar b_;
};

Recent versions of gcc implement this handling, though a warning will be issued unless the -std=c++0x or -std=gnu++0x command line flag is given.

Monday, January 24, 2011

Automotive Easter Eggs

Automotive software has some great opportunities for easter eggs.

OFF ON CAN Richen fuel mixture 20%
VOL ↑ VOL ↑ VOL ↑ VOL ↓ HAZARD Tune audio to Sirius headbanger channel
HORN Frickin' lasers!

Steering wheel control buttons

Saturday, January 22, 2011

IPv4 Addresses Almost Depleted

According to both ipv4depletion.com and potaroo.net, the IANA is within days of allocating its last block of IPv4 addresses to the regional internet registries. In fact it might have already happened, though there has been no announcement of it yet.

This means the global pool of IPv4 addresses has been exhausted, and all remaining IPv4 addresses have been allocated to a particular region of the planet. The regional authorities will start to run out in 2011 and 2012.

IPv6. It's on.

Thursday, January 13, 2011

Microsoft and ARM

In July 2010 Microsoft signed an architecture license with ARM. In January 2011 Microsoft announced that Windows 8 will run on ARM CPUs. So the license was purchased to support the Windows development effort, right?

I really don't think so.

Porting Windows to a new processor architecture is a massive undertaking. To its credit, Microsoft has maintained the discipline to keep the OS from becoming too entangled with the underlying platform. At various points in its history Windows NT has run on Alpha, MIPS, PowerPC, and Itanium. All but Itanium are long discontinued, and given the enormous codebase and inertia adding a new instruction set would take quite a bit of effort. If Microsoft needed an architectural license to proceed, they needed it more than 6 months before a public demonstration of the result.

Additionally an architectural license is not required to port software to ARM, not even for software as extensive as NT. ARM will provide the necessary support via other, less spectacular arrangements. An architectural license allows the licensor to develop their own implementation of the instruction set, either completely independently or as a substantial modification to a core supplied by ARM. Most producers of ARM chips don't have an architectural license; they don't need one to add peripherals and coprocessors around an unmodified ARM core.

Microsoft is engaging with ARM on multiple fronts, and as it involves Windows (and Office) it would be at the CxO level. I think the ARM license is for Xbox, not for Windows Mobile and not for the NT port. In other products Microsoft relies on hardware partners, which would seriously complicate an effort to introduce a custom CPU. Xbox is the one place where Microsoft produces its own platform in volumes large enough to warrant custom ASIC development; they rely on contract manufacturers to build it, but the design and finished product is unique to Microsoft.

XCPU from Xbox360-S

Developing a customized ARM processor isn't easy, but it isn't unapproachably difficult either. The current Xbox relies on a PowerPC processor from IBM, but PPC is increasingly being relegated to the very low and very high ends of the market. Embedded controllers don't have the needed processing power, while supercomputer CPUs are too expensive and too hot. Xbox has already changed CPUs once, from an x86 in the original to the current PPC. Microsoft has to be weighing the alternatives of completely funding a suitable PowerPC core design, or switching to a different architecture with more presence in the midrange. Nowadays that means x86 or ARM. I think they've chosen ARM, and previously speculated what a CPU designed specifically for Xbox might look like.

Wednesday, January 5, 2011

Overlays Not Yet Extinct

In 2010 both Ian Lance Taylor and Dave Miller wrote about STT_GNU_IFUNC, an ELF symbol type supported by the GNU compiler and linker. These symbols are functions, but do not appear at a fixed address. Instead STT_GNU_IFUNC is a function which returns an address to the function you actually want to call. STT_GNU_IFUNC solves a common problem in supporting platform variations which are similar enough to use the same binary, but different enough to benefit from specific optimizations. A good example is block copy operations like memcpy(). Depending on the CPU one might be able to use SIMD instructions like SSE or MMX, a block move engine in a cache controller, or one of a number of different unrolled loops optimized for specific CPU pipelines.

STT_GNU_IFUNC is the new hotness, but I'm going to describe a different technique for accomplishing similar functionality using the program loader, i.e. the code which loads program text into memory before execution. This might be a boot ROM environment like Das U-Boot, or in the case of a datapath CPU it might be a process running on a separate control CPU. My use of this technique was the latter case, where a deeply embedded CPU in the datapath had its code loaded into its memory by control software running elsewhere.

The key technique in this scheme uses overlays. Yes, overlays.


 
Set Wayback Machine to 1974

DEC PDP11/20 illustration of program overlays The typical minicomputer in the 1970s had tens to hundreds of kilobytes of memory. Though programs were dramatically smaller, available RAM was still a significant limitation and practices to maximize RAM efficiency were common. One common technique was overlays: multiple segments of code compiled at the same address to be loaded when needed and replaced later. As only one such code segment could be present in memory at a time, the different overlays could not reference each other and ideally would not need to be swapped very often. A common use was to put initialization-specific code in an overlay and replace it with other code once the init was done.

Note that this was not the same as virtual memory. Some minicomputers of the time had virtual addressing hardware, but it was not universal. Overlays were simply regions of memory which the application would overwrite. Modern CPUs and practices make this use of overlays more difficult, with program text generally mapped read-only and large instruction caches which need to be flushed of stale contents. Overlays as a practice are now practically unknown on any system with virtual memory, we rely on the VM system to kick unneeded code out of RAM.


 
Overlays Today

The GNU linker nonetheless still has support for overlays. A linker script can specify a group of ELF sections to appear at the same address. The linker provides no help in managing the overlay segments in memory, this is left entirely to the developer. We will use this linker support to provide multiple implementations of an API, tuned for different CPUs.

The first step is to implement the code for each CPU. In this example we'll use something trivial, a function which returns a different integer value for each platform. Two attributes are added to each implementation:

  • noinline - we want to choose which version of the function to load at runtime. This cannot work if the compiler inlines the first one it finds.
  • section - each implementation of the function goes in its own ELF section. This will be discussed later.
int foo_v0() __attribute__((noinline,section(".s_foo_v0")));
int foo_v0() {
  return 0;
}

int foo_v1() __attribute__((noinline,section(".s_foo_v1")));
int foo_v1() {
  return 1;
}

int foo_v2() __attribute__((noinline,section(".s_foo_v2")));
int foo_v2() {
  return 2;
}

illustration of multiple functions in a section Each ELF section can contain exactly one function which is to be called from elsewhere in the code. Defining multiple functions in one ELF section doesn't work with this technique: we rely on placing all versions of the function at the same address. If there are multiple functions they can be at different offsets, so code elsewhere in the program won't have the correct address to call. It would be possible to define multiple functions which are only ever called from other routines in the section, this is left as an exercise to the reader.

Similarly, though the implementations can differ substantially the function signature has to be exactly the same for all variants. They must return the same type and take the same arguments, even if that means some of the variants have an argument which they ignore.

If we examine the object file we can see the sections we defined (other sections omitted for brevity):

$ objdump --section-headers foo.o
Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000031  0000000000000000  0000000000000000  00000040  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  3 s_foo_v1      0000000b  0000000000000000  0000000000000000  00000074  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  4 s_foo_v2      0000000b  0000000000000000  0000000000000000  0000007f  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  6 s_foo_v0      0000001d  0000000000000000  0000000000000000  0000009b  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE

 
OVERLAY : NOCROSSREFS

The next step is the crucial one, linking the binary. We're going to use a custom linker script which tells ld to arrange those sections as an overlay. The linker can take only one script as input. To enable our overlay, we must also support everything else the linker normally does for binaries on this platform. If you're not already using a linker script, you need to retrieve the default script for your platform using ld --verbose and look for a SECTIONS block in which to add the handling. A snippet of my linker script is shown here, with the added text bolded.

SECTIONS
{
  ... bunch of stuff ...

  .text : {
    ...
  }

  PROVIDE (foo = .);
  OVERLAY : NOCROSSREFS
  {
    .foo_v0 { *(.s_foo_v0) }
    .foo_v1 { *(.s_foo_v1) }
    .foo_v2 { *(.s_foo_v2) }
  }

  .fini : {
    ...
  }

We've defined an overlay with three ELF sections as members. NOCROSSREFS means the linker will flag an error if one of the overlay sections references a symbol in one of the other sections.

This script is passed to the linker using a -T argument. If not using a separate linking step, pass "-Wl,-Tld.script" to gcc instead. If we disassemble the resulting binary we see all three routines are linked at the same address:

$ objdump -d ./a.out
00000000004006e0 :
  4006e0: 31 c0                 xor    %eax,%eax
  4006e2: c3                    retq   

00000000004006e0 :
  4006e0: b8 01 00 00 00        mov    $0x1,%eax
  4006e5: c3                    retq   

00000000004006e0 :
  4006e0: b8 02 00 00 00        mov    $0x2,%eax
  4006e5: c3                    retq   


$ nm --numeric-sort ./a.out
0000000000400594 T main
00000000004006e0 A foo
00000000004006e0 T foo_v0
00000000004006e0 T foo_v1
00000000004006e0 T foo_v2

 
Commence Handwaving

illustration of multiple variants of foo()At this point I have to merely describe what would happen next, as I don't have sample code to show. The target CPU has some mechanism to load its code into memory. It might bootstrap itself using a boot loader, or it might be loaded by an external supervisor CPU. This loader would need to decide which of the overlay sections to load. It might be as simple as a naming convention, for example having platform-specific ELF sections end in "_v#" and loading only those appropriate for the platform.

What we end up with is the platform independent code calling a symbol named foo, at 0x4006e0. That code is not concerned with what will be found at that address. That it contains different code depending on platform has no impact on the callers.


 
Downsides

There are several downsides to this technique.

  1. It is cumbersome to support many such functions. Each one requires a new OVERLAY block in the linker script, and a different set of __attributes__ in the code. My recommendation is to only do this where performance is really crucial. For the typical init code or uncommon API, a switch (platform) will be fine.
  2. The debugger has no idea what is going on. If you ask gdb to disassemble this routine it will show the correct instructions, but any source line numbers it prints will be wrong.
  3. Source code management tools also have no idea what is going on. Asking for the definition of foo() will either fail, or turn up the wrong code. The developer has to know which version of code will be used on a given platform.

Nonetheless this technique proved useful to me in the past, and I hope it will be useful to someone in the future.


 
Closing Thoughts

STT_GNU_IFUNC is a more general solution to this sort of problem, and far less cumbersome to support. There is slightly more overhead to STT_GNU_IFUNC as it involves an extra call to retrieve the address of the function to call, but I suspect even this could someday be alleviated by dynamically rewriting the PLT (Procedure Linkage Table) with the resulting address. If I recall correctly the Solaris linker does rewrite the PLT, it seems a viable technique.