I have found that a good way to really understand a microprocessor's instruction set is to write a disassembler for it. I have done this for a number of processors including the 6502, 6800, 6809, and Z80. My udis disassembler was implemented on Python and supports a number of processors.
Having recently worked on my 68000 single-board computer, I decided to write a disassembler for the Motorola 68000. This is more challenging than for 8-bit processors due to it's complex instruction set and many addressing modes. I decided to again use Python as it is fast to develop, readable, and cross-platform.
I wanted the design to be at least partially table based. I started with the table on the Motorola MC68000 Programmers Reference Card, but it did not adapt will to a software-based table. I found a very nice and succinct table written by someone in France under the name of GoldenCrystal that was a better fit. It organized all of the 68000 instructions and decoding of fields in a logical manner. I made a spreadsheet based on the table data.
The 68000 uses 16-bit opcodes, but they are not unique. Various bits in the opcode control the addressing modes and operands, so I needed to use an approach where each instruction has a bit pattern and a mask indicating which bits to examine when looking for a match to a specific opcode. For example, a NOP is $4E71 and all bits are valid but a MOVE instruction has thw two most significant bits as zeroes but the rest vary with the address mode.
I exported the spreadsheet into CSV format, which can easily be read into a data structure from Python. I then use this data for determining what instruction is read. Then I handle the encoding of the instruction and any extension words or operands. Many instructions follow similar encoding and can use the same logic, while others are unique.
It was somewhat tedious and time consuming to work my way through all of the possible instructions. As I proceeded, I wrote a test program with the instructions I was implementing and examples of each addressing mode. An additional good "stress test" of the code is to use random data (such as /dev/urandom on Linux) as input and make sure that it does not crash or produce errors.
After few weeks of occasional evenings (interrupted by a trip to Europe) I had finished support for all instructions. The most complex was the MOVE instruction as it supports almost every addressing mode for both source and destination operands. The final program is just over 1000 lines of Python code including comments and blank lines.
Here is some sample output:
00000000 4E 71 NOP
00000004 A2 34 UNIMPLEMENTED
00000006 4A FC ILLEGAL
00000008 4E 70 RESET
00000012 4E 40 TRAP #$00
0000001A 00 7C AA 55 ORI #$AA55,SR
0000002A 02 7C AA 55 ANDI #$AA55,SR
00000032 60 5E BRA $00000092
000000BA 48 C2 EXT.l D2
000000BE 4E 69 MOVE USP,A1
000000DE 57 CF 00 22 DBEQ D7,$00000102
00000112 72 01 MOVEQ #$01,D1
00000146 EF 82 ASL.l #7,D2
000006E4 08 78 00 08 12 34 BCHG #$08,$1234
00000C9E 4C FB 55 AA 90 12 MOVEM.l $12(PC,A1),D1/D3/D5/D7/A0/A2/A4/A6
00000CF4 2C 6D 12 34 MOVEA.l $1234(A5),A6
00000D3A 18 3A 12 34 MOVE.b $1234(PC),D4
00000F24 55 91 SUBQ.l #2,(A1)
00001334 DF B8 12 34 ADD.l D7,$1234
With the -n or --nolist option, it only disassembles the instructions. This could be used to feed the output back into an assembler, if you were reverse engineering some code for example. Here is some sample output in this mode:
NOP
UNIMPLEMENTED
ILLEGAL
RESET
TRAP #$00
ORI #$AA55,SR
ANDI #$AA55,SR
BRA $00000092
EXT.l D2
MOVE USP,A1
DBEQ D7,$00000102
MOVEQ #$01,D1
ASL.l #7,D2
BCHG #$08,$1234
MOVEM.l $12(PC,A1),D1/D3/D5/D7/A0/A2/A4/A6
MOVEA.l $1234(A5),A6
MOVE.b $1234(PC),D4
SUBQ.l #2,(A1)
ADD.l D7,$1234
The source code and test program can be found here.
This process gave me an appreciation for the effort that the Motorola engineers must have gone through to implement the native 68000 dissasembler in the TUTOR firmware which was written in assembly language.
I can also appreciate that significant more work would be needed to extend this to support the 68020 or later processors which have more instructions and addressing modes.
While it was not meant to be a production program, it was fun to write and I now have a much better understanding of the 68000 instruction set and its complexity, quirks and limitations.