Re: slighty off topic -- writing an assembler!

"Leonard Kevin McGuire Jr." <kmcguire3413@hotmail.com>
16 Jul 2006 10:43:48 -0400

          From comp.compilers

Related articles
Re: slighty off topic -- writing an assembler! kmcguire3413@hotmail.com (Leonard Kevin McGuire Jr.) (2006-07-16)
slightly off topic -- writing an assembler! SAMIGWE@worldnet.att.net (samuel) (1998-06-24)
| List of all articles for this month |

From: "Leonard Kevin McGuire Jr." <kmcguire3413@hotmail.com>
Newsgroups: comp.compilers
Date: 16 Jul 2006 10:43:48 -0400
Organization: Compilers Central
References: 98-06-126
Keywords: assembler
Posted-Date: 16 Jul 2006 10:43:48 EDT

[Note that this is a followup to a thread from 1998. -John]


>>From: samuel <SAMIGWE@worldnet.att.net>
>>Date: 24 Jun 1998 00:04:30 -0400
>>Good day all:
I am currently working on writing an assembler (intel syntax
for the x86 microprocessor)for my operating system project. I haven't
yet had any formal training on the design of one and havent been able
to find any "assembler design" books.


I am currently written a assembler. I unfortunately started writing one
before finding this thread, and reading the short description on a macro
assembler. Also, this thread being quite dated - still does not mean it
could never be useful - I found it useful in the year 2006.


I ended up creating a table for the MODRM and SIB bytes at compile time,
using macros to generate the rather large table. The table used the format.


struct tmodrmsib_tbl
{
        dword type;
        sbyte *expression;
        bool hasSIB;
        byte modrm;
        byte sib;
};


I generated every single possible addressing mode, and its corresponding
addressing entries. I used expression to hold a ASCII zero-terminated string
to store something like: "eax", "eax+ebx", "eax*4", "eax+ecx+$1".


I used: $1, $2, and $3. To represent a dword, word, and byte displacement.
My table ended up looking like:
        {A_PTR, "eax", false, 0x00, 0x00},
        {A_PTR, "ecx", false, 0x01, 0x00},
        {A_PTR, "edx", false, 0x02, 0x00},
        {A_PTR, "ebx", false, 0x03, 0x00},
        MASIB3(A_PTR, , 0x04), // generate all SIB possibilities - no
displacement.
        {A_PTR, "$3", false, 0x05, 0x00},
        {A_PTR, "esi", false, 0x06, 0x00},
        {A_PTR, "edi", false, 0x07, 0x00},


For the first addressing mode. I used a macro to generate the SIB entries.


I store instructions with:
struct tISet
{
        dword memonic;
        dword prefixs;
        word opcode;
        dword operand1;
        dword operand2;
};


So, it looks like this:
tISet ISet[] = {
        {0xFFFFFFFF, 0, 0, 0, 0},
        {ME_MOV,0, 0x0088, A_RM8, A_R8 | X86_O_R},
        {ME_MOV,P_OSO, 0x0089, A_RM16, A_R16 | X86_O_R},
        {ME_MOV,0, 0x0089, A_RM32, A_R32 | X86_O_R},
        {ME_MOV,0, 0x008A, A_R8 | X86_O_R, A_RM8},
        {ME_MOV,P_OSO, 0x008B, A_R16 | X86_O_R, A_RM16},
        {ME_MOV,0, 0x008B, A_R32 | X86_O_R, A_RM32},
};


I define my flags so that:


A_RM32 = A_R32 | A_DWORDPTR .. and so on. So multiple types can be specified
and pass for one type specified, when the assembler chooses the correct
instruction. X86_O_R is ignored by the type checking, and is later handled
by a function for writing out the arguments for the instruction.


I used this, passed around between my functions to keep track of the
instruction building process:
struct tipi
{
        bool wrotePrefix;
        dword prefix;
        bool wroteOpcode;
        word opcode;
        bool wroteMODRM;
        byte modrm;
        bool wroteSIB;
        byte sib;
        byte wroteDisplacement;
        union{
                dword displacement;
                sdword sdisplacement;
        };
        byte wroteIntermediate;
        dword intermediate;
};


The final step is reading this struct and writing out the bytes for the
instruction. So, I do not think I built a macro assembler at all, but rather
something else that so far this design of the assembler has worked very
well.


I am planning on packing the just the core of generating the x86
instructions into this layer of the assembler, and the rest into a
preprocessor layer for the assembler I suppose? =)


http://compilers.iecc.com/comparch/article/98-06-126


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.