X86 Instruction decoding
Contents
Overview
X86 instruction encodings have several unique characteristics which make them harder to deal with than the encodings for the other ISAs M5 supports. Despite that, x86 is decoded using the same basic mechanisms as the other ISAs.
At the lowest level, instructions can take any number of bytes (up to a maximum) and can have any alignment. That means that when a CPU brings in bytes of memory to decode, it may contain several instructions, the end of one and the start of the next, the middle of an instruction, etc. It's the predecoder's job to take the incoming stream of bytes and to turn them into a stream of objects which represent the encoding of a single instruction from beginning to end in a unique and uniform way. The predecoder also makes the guarantee that if any two instruction representations are the same, the instruction object that implements it is also the same.
Next, the instruction representations need to be translated into an instruction object which the CPU can execute. This is the job of the x86 decoder which is implemented as ISA description files and automatically generated.
Finally, the instruction object is handed back to the CPU. If the object is a regular instruction it can be executed directly. If the object is a macroop, aka a microcoded instruction, the CPU pulls microops out of it one at a time and actually executes them.
Predecoder
Reset State
Resets internal state to prepare for the next instruction.
Prefix State
Recognizes and records the various prefixes allowed for an instruction. When a byte comes in that is not recognized as a prefix, the machine goes to the Opcode state.
Opcode State
Gathers up all the bytes that represent the instructions opcode. Instructions can have 1, 2 or even 3 byte opcodes. This state also figures out how many bytes of immediate value, if any will be part of this instruction, and what the default address size, operand size, and stack size are. If this instruction uses a modRM byte, the next state is ModRM. Failing that, if there are any immediate bytes to collect, the next state is Imm. If neither of those are true, then the instruction has been gathered and the machine resets.
Decoder generation