Tuesday, November 19, 2019

A 68000 Disassembler


I have found that a good way to really understand a microprocessor's instruction set is to write a disassembler for it. I have done this for a number of processors including the 6502, 6800, 6809, and Z80. My udis disassembler was implemented on Python and supports a number of processors.

Having recently worked on my 68000 single-board computer, I decided to write a disassembler for the Motorola 68000. This is more challenging than for 8-bit processors due to it's complex instruction set and many addressing modes. I decided to again use Python as it is fast to develop, readable, and cross-platform.

I wanted the design to be at least partially table based. I started with the table on the Motorola MC68000 Programmers Reference Card, but it did not adapt will to a software-based table. I found a very nice and succinct table  written by someone in France under the name of GoldenCrystal that was a better fit. It organized all of the 68000 instructions and decoding of fields in a logical manner. I made a spreadsheet based on the table data.

The 68000 uses 16-bit opcodes, but they are not unique. Various bits in the opcode control the addressing modes and operands, so I needed to use an approach where each instruction has a bit pattern and a mask indicating which bits to examine when looking for a match to a specific opcode. For example, a NOP is $4E71 and all bits are valid but a MOVE instruction has thw two most significant bits as zeroes but the rest vary with the address mode.

I exported the spreadsheet into CSV format, which can easily be read into a data structure from Python. I then use this data for determining what instruction is read. Then I handle the encoding of the instruction and any extension words or operands. Many instructions follow similar encoding and can use the same logic, while others are unique.

It was somewhat tedious and time consuming to work my way through all of the possible instructions. As I proceeded, I wrote a test program with the instructions I was implementing and examples of each addressing mode. An additional good "stress test" of the code is to use random data (such as /dev/urandom on Linux) as input and make sure that it does not crash or produce errors.

After few weeks of occasional evenings (interrupted by a trip to Europe) I had finished support for all instructions. The most complex was the MOVE instruction as it supports almost every addressing mode for both source and destination operands. The final program is just over 1000 lines of Python code including comments and blank lines.

Here is some sample output:

00000000  4E 71                          NOP
00000004  A2 34                          UNIMPLEMENTED
00000006  4A FC                          ILLEGAL
00000008  4E 70                          RESET
00000012  4E 40                          TRAP      #$00
0000001A  00 7C AA 55                    ORI       #$AA55,SR
0000002A  02 7C AA 55                    ANDI      #$AA55,SR
00000032  60 5E                          BRA       $00000092
000000BA  48 C2                          EXT.l     D2
000000BE  4E 69                          MOVE      USP,A1
000000DE  57 CF 00 22                    DBEQ      D7,$00000102
00000112  72 01                          MOVEQ     #$01,D1
00000146  EF 82                          ASL.l     #7,D2
000006E4  08 78 00 08 12 34              BCHG      #$08,$1234
00000C9E  4C FB 55 AA 90 12              MOVEM.l   $12(PC,A1),D1/D3/D5/D7/A0/A2/A4/A6
00000CF4  2C 6D 12 34                    MOVEA.l   $1234(A5),A6
00000D3A  18 3A 12 34                    MOVE.b    $1234(PC),D4
00000F24  55 91                          SUBQ.l    #2,(A1)
00001334  DF B8 12 34                    ADD.l     D7,$1234

With the -n or --nolist option, it only disassembles the instructions. This could be used to feed the output back into an assembler, if you were reverse engineering some code for example. Here is some sample output in this mode:

 NOP
 UNIMPLEMENTED
 ILLEGAL
 RESET
 TRAP      #$00
 ORI       #$AA55,SR
 ANDI      #$AA55,SR
 BRA       $00000092
 EXT.l     D2
 MOVE      USP,A1
 DBEQ      D7,$00000102
 MOVEQ     #$01,D1
 ASL.l     #7,D2
 BCHG      #$08,$1234
 MOVEM.l   $12(PC,A1),D1/D3/D5/D7/A0/A2/A4/A6
 MOVEA.l   $1234(A5),A6
 MOVE.b    $1234(PC),D4
 SUBQ.l    #2,(A1)
 ADD.l     D7,$1234

The source code and test program can be found here.

This process gave me an appreciation for the effort that the Motorola engineers must have gone through to implement the native 68000 dissasembler in the TUTOR firmware which was written in assembly language.

I can also appreciate that significant more work would be needed to extend this to support the 68020 or later processors which have more instructions and addressing modes.

While it was not meant to be a production program, it was fun to write and I now have a much better understanding of the 68000 instruction set and its complexity, quirks and limitations.