Friday, June 15, 2012

Word-aligned bytecode

Thinking about bytecode interpreters again... specifically variable-size instruction sets, and the detail that bugs me is that full-size words are often not word-aligned. In some cases (x86), the cpu will read an unaligned word, no problem. Otherwise the word needs to read byte-by-byte, and this inefficiency bothers me.

So here's an idea for a bytecode format that keeps all words aligned, separate from the instructions, but still interleaved with them. Assume a 32-bit word machine, 4 byte-sized instructions per word.


+--+--+--+--+ +-+-+-+-+ +-+-+-+-+ +--+--+--+--+
|i1|i2|i3|i4| | w1    | | w2    | |i5|i6|i7|i8|
+--+--+--+--+ +-+-+-+-+ +-+-+-+-+ +--+--+--+--+

Of the first four instructions, i2 and i4 require a word-sized argument. Each word argument moves the program counter after it is used.  When the word-sized batch of instructions are complete, the pc should point to the next batch of instructions. Here's the general idea for the interpreter loop:

while (...) {
  u32 ins = word[pc++];
  while (ins) {
    switch(ins & 0xff) {
      case 0: break; //noop
      case i2: arg = word[pc++]; break;
      case i4: arg = word[pc++]; break;
      case branch: pc = word[pc]; ins = word[pc++]; continue;
      ...
    }
    ins >>= 8;
  }
}

No comments:

Post a Comment