So here's an idea for a bytecode format that keeps all words aligned, separate from the instructions, but still interleaved with them. Assume a 32-bit word machine, 4 byte-sized instructions per word.
+--+--+--+--+ +-+-+-+-+ +-+-+-+-+ +--+--+--+--+
|i1|i2|i3|i4| | w1 | | w2 | |i5|i6|i7|i8|
+--+--+--+--+ +-+-+-+-+ +-+-+-+-+ +--+--+--+--+
Of the first four instructions, i2 and i4 require a word-sized argument. Each word argument moves the program counter after it is used. When the word-sized batch of instructions are complete, the pc should point to the next batch of instructions. Here's the general idea for the interpreter loop:
while (...) {
u32 ins = word[pc++];
while (ins) {
switch(ins & 0xff) {
case 0: break; //noop
case i2: arg = word[pc++]; break;
case i4: arg = word[pc++]; break;
case branch: pc = word[pc]; ins = word[pc++]; continue;
...
}
ins >>= 8;
}
}