68070

This is a prelimenary design draft for the 68070 CPU project.
The 68070 is not available yet.
This is not an announcement.


The 68K-family - some background

The 68k Family is one of the main CISC CPU families. It features


The 68k instruction set can be divided in the following broad categories


The wonderful and luxures world of 68k-addressing modes
The 68K CPUs are blessed with very flexible address modes allwoing to manipulate data in very elegant ways.

Rn = Can be any Adress register or Data register.
*X = Scale factor. Can be (*1, *2, *4, *8). Very useful for working with arrays
For read operations the d(An) and d(An,Rn*x) modes can be used with PC instead An.
This is great for reading of local constants without the need to use an Adress-register as pointer to the local variables.
For simplicity, the indirect modes are not included in this overview.


Looking at the 68K-CISC instruction set
The 68K CPU is a CISC instruction set. This means that single instructions often do more than one thing.

Example 1) MOVE.L (A0)+,(A1)+
This single instruction does 4 things internally:

In comparison a RISC CPU usually needs 4 instructions and 1 extra register to do the same work
Both the 68K and the RISC CPU need to do the same amount of work.
The code of the 68K instruction is 1 instruction = 2 Bytes long.
The code of the RISC CPU is 4 instructions wih total 16 Bytes long.


Example 2) ADD.L D0,100(A1,A2*4)
This single instruction does 4 things internally:

In comparison a RISC CPU needs 5 instructions and 2 extra register to do the same work
Both the 68K and the RISC CPU need to do the same amount of work.
The code of the 68K instruction is 1 instruction = 4 Bytes long.
The code of the RISC CPU is 5 instructions with a total of 20 Bytes long.

Lets remember: Both the 68K and the RISC CPU do the same amount of work.
The advantage of the 68K instruction is more compact code. = smaller Programs
The advantage of the RISC instructions set is simple decoding.
The RISC CPU always does exactlly one work item per instructions, this makes the decoder of the RISC CPU very simple.


Its easy to see that the CISC CPU has the advantage of the much more compact code.
Does the RISC CPU has some advantages as well?

Yes, the execution of RISC instructions is very simple. One instruction is always one work item. Simple is always equal to fast in CPU design. Because of the fast pipeline a RISC engine can be clocked very high.
Today RISC CPUs archive clockrates of up to 5 GHz.

Is there a way to combine the CISC and RISC advantages?
Yes, it is.
All you need to do is combining the CISC instruction set with a RISC execution engine.
You have a decoder that translates your CISC instructions into a RISC instruction stream. Your execution engine is a simple RISC design that can be clocked very high.
The Motorola 68060 CPU was designed that way. Its a CISC decoder in front of a RISC execution pipeline.
The Coldfire CPUs work the same way.
And the x86 Intel and x86 AMD CPUs are desinged that way.
As you can see on the example of the AMD x86-Athlon by combining the CISC instruction set with a RISC execution engine you can achive clock rates of 3 GHz.

The 68K instruction set is actually easier to decode than the slightly obscure x86 instruction set. You can read easier as faster here.


So we want a CISC decoder and a RISC execution engine

A CISC decoder will give us the compact 68K instruction set.
And the pipelined RISC execution engine will allow us to achive high clockrates.


How can we increase the performance of our engine?
Each memory access of the 68k-instruction set can include a powerful Adressing mode.
An Adressing mode can be something like (100 + Register + Register*Scale).
To calculate the address we have to do three additions and one small multiplication/shift.
An effective way to highly improve the speed of our RISC execution engine,
is addding a second CALCULATION UNIT that is specialized to do exactly
ONE SHIFT AND THREE ADDITIONS per clock in parrallel to the normal instructions.

Lets go back to our example
68K instruction: MOVE.L (A0)+,(A1)+

Our 68K decoder will translate this for the execution engine into:
load (A0),dummyaddi 4,A0 #runs in parallel
store dummy,(A1)addi 4,A1 #runs in parallel
As you can see the 68070 needs 2 internal instructions = 2 clocks for this instruction.


68K instruction: ADD.L D0,100(A1,A2*4)
Our 68K decoder will translate this for the execution engine into:
calc 100(A1,A2*4),eaLOAD (ea),dummy
add D0,dummystore dummy,(ea)
As you can see the 68070 needs 4 internal instructions which will thanks to parralel engines and pipelinign appear as 2 clocks for this instruction.

The above examples assume memory latency as 0 which is uneralistic but good for simplifying the examples. As you can clearly see our 68k-optimized RISC engine can don the workload of 68K instrcutions much faster than a real RISC CPU".


Simplefied pipeline model of the 68070
Execution of a computer instructions involves several steps.

ARITHMETIC Instructions
Register load
Execution
Write back

LOADS Instructions
EA Register load
EA Calculation
Memory cycle for Operant fetch

STORE Instructions
EA Register load
EA Calculation
Memory cycle for store

The Execution UNIT consist of THREE Subunits.
1) The EA-CALCULATION UNIT
2) The ALU UNIT for addition/multiplication/divide/and/or
3) The LOAD-STORE-UNIT

The three Units can operate in parallel.
An EA can be calculated in parallel to the ALU execution.
And one LOADS can be executed in paralled the other units.

In front of the EXECUTION PIPELINE are the instruction FETCH UNIT and the DECODER UNIT.

Purpose of the the Instruction Fetch Unit is read instructions from memory in advance of the instruction execution. The Instruction fetch Unit uses the instruction cache to increase the memory throughput and it uses cache line bursts/double cache line burst from memory to improve memory read performance.

The DECODER UNIT translates the 68K instructions into subopcodes for the EA-RISC
Part of the Decoder UNIT it the BRANCH Prediction UNIT that can remove predicted branches complete from the execution stream.
Many 68K Instructions translate into 1 parallel RISC instruction set.
Other 68K Instructions translate into 2 or 3 parallel RISC instructions sets.
A few complicated 68K instructions like DIVISION or MOVEM will translate into more RISC instructions sets.


Lets summerize our design idea:
We will use the same design as the 68060, a CISC decoder and a RISC backend.
Intel, AMD, and VIA showed that its possible to create multi GHz Chip this way.
Our advantage even is that a 68K decoder is easier to create than a x86 decoder.

RISC pipeline will run with several hundred MHz even in a FPGA.
This design has a great speed potential.

So are there any possible speed limitations?


General Factors limmiting a CPU Speed - and their solutions
Lets look at a list of speed drains that effect all CPUs in general and let think about how much they will affect us and how to avoid them.

Memory latency
Memory latency is the speed limiting factor numbero uno.
The Natami is has relative fast memory, so this is less of a limit for us.
Memory latency limits instruction fetching and data processing.
Instruction fetching can be increased by means of implementing a reasonable instruction cache and prevent buffers with pipeline.
The 68070 has those.
Data latency is not easy to avoid. The best cure for data latency is to allow the CPU to fetch data in parallel to executing of idenpendent instructions. Including this feature will give a huge speed improvement to the 68070.


Pipeline stalls due to branching
'The branch instruction was invented by the Devil.'
At least this impression will you get when you look at a pipelined CPU and what problems branches are giving todays CPUs.
A pipelined CPU needs to load instructions ahead of their execution.
Depending on your pipeline length your instruction fetcher and decoder needs to work far ahaed. Commen are 8 - 30 instructions ahead of the Program counter.
The mean thing about a branch is it changes your program flow.
Good branches are those which are of the unconditional "GOTO Label" type.
These are no problem to predict them and your branch-Unit can eleminate them complete from the instructions stream - making these instructions bascily free.
More difficult are branches that are contitional.

If (some condition) then
  some instruction
endif
Here some assembly example of a typical usage case.
A routine that lower a value but with clipping to zero.
(We do not want to underflow)
 sub #5,D0	#
 bge. label	# This branch is extremly hard to predict
    moveq #0,D0
 label:
This is the typical example of an construct that will create a branch that is hard to predict. A misprediction get as expensive is long your decoder and execution pipeline is. For Intel chips this could be as expensive as 30 clocks. For the 68070 a misprediction will be less expensive around 8 clocks but still not nice.
But there is a solution this problem.
The ARM chip did had a neat feature: It could do contitional instructions. The arm could do conditional moves or adds.
If we are clever and borrow this feature then we can elimiate small forward branches form the instruction flow for free with a guarantee that we will never mispredict.
The decoder of the 68070 could rewrite the ASM exmaple to this:
 sub #5,D0	 #
  moveq.lt #0,D0 # The branch is gone
While the old branch did need up to 9 cycles
This branch free code will always just need 2 cycles.


Pipeline stalls due to subroutine calls
The instructions that are used to call a subroutine are predictable on 68K.
This means we can build a 68K CPU that can call a subroutine (GOSUP) (ASM: BSR) for free.
Returning from subroutine call is more complicates as the reurn need to fetch the return address back
from the stack (doing a memory read) and will then jump to this loaded address.
This is a bit more complicated to predict. So normally a CPU like the 68060 needs a memory read + 10 clocks for this.
Therefor a (C: RETURN) or (ASM: RTS) is unpredictable and cost a lot on the 68K.
But this can be improved - by installing an on chip return stack cache, the 68070 can be improved to handle Returns in 1 clock
This will allow it to makel subroutines calls or calling operating system functions many times faster than previous 68K CPUs could do it.


Enhancing the CPU to controll the Data cache efficiently.
Every program works with some a small amount of local vatriables.
Programs use the local variable segment and the stack to hold often used variables.
Programs tends to use these variables a lot. Because of this Data-caches will highly improve performance.
But there a hitch: Often programs work as well with "streaming data". Streaming data is data that the CPU will "read once", process and then never touch again.
The problem that is caused by this type of data is that a CPU will not know that it will never use this values again and will put them into the data-cache.
This effect is called data cache pollution.
Normally a 32KB Data cache is more than enough to hold all the local variables of the CPU.
But if the program stream processes some data these values will push out all the valuable local variables out of the data cache.
A poor but simple way to mimimize this polution effect is to create huge caches. A cpu with several MB cache has a higher chance is having at least some valuable data remaining in it.
Many CPUs have no good way to solve this issue. Some CPUs are more clever and allow hinting that certain types of data if "valueable" for caching and other is not. We can easily enhance the 68K-Family to allow user defined and automatic cache hinting. On the 68K we know that PC-relative memory access is always access to local variables - These data could be gives a higher caching priority. Memory access to stack is another very good indication that this data should stay in the cache.
By implementing the above and by adding a new addressing mode that will allow the programmer to 'hint' a streaming (non caching) memory access the efficiency of the CPU data cache can be highly improved.


Pre-recording condition codes that will later be needed This is a neat feature that the PowerPC has.
A good programmer can use it to calculate condition in advance that will later be needed.
This help to get branches for free in work loops.
The 68K-family was originally not desinged for this.
Nevertheless with a little thoughjt this can be integrated for free into the 68k CPU as well.
Using this tachniques a good programmer or clever compiler can enforce that many branches will be for free.


Summerizing: Features to create an improved the 68K design

Allowing Parallel Loads
A typical usage case for a CPU is to process data.
Processing data usually includes

There are CPUs that "block" when doing a memory load.
And there are CPUs which allow other "independent" instructions to continue.
A huge preformance improvement can be achieved be allowing a CPU to process independant instructions while a parallel load is executed.


Forward Branch Elimination
Converting short forward branches to contional instructions.
The 68K family does officially not support these type of instructions.
But we could add these instructions to the internal RISC execution units.
This will 100% prevent misprediction of these branches.



Adding Instruction cache coherency.
The 68k family does snoop for data cache coherency but not for instruction cache coherency.
By adding instruction cache coherency snooping mostly all problems of selfmodifying code will be solved.


Adding SuperScalarity The 68060 featured two integer ALUS already.



68070 RISC UNITS

THE EA-UNIT
Purpose of the EA unit is to calculate EA (effective addresses)
The EA-UNIT has 3 sources (immidiate,A-Register,Any-Register*Scale)
The product of the above inputs is calculated per clock
The EA Unit produces one ea-result per clock.
The EA-Result can be used to update one register
And the EA-Result can be use to as address for a memory access (see FETCH_UNIT)

THE ALU-UNIT
Purpose of the ALU Unit is to do the majority of Arithmetic and logical operations.
Some of the arithmetic operations can alternatively be handled by the EA-UNIT.
The ALU has as possible sources: (immidiate,Any-Register,Any-Register,Value-From-Fetch-Unit)
If as Input (Result-from-Memory-Unit) is selected the ALU will block until the memory unit sets mem-result-ready-flag.
The ALU can do the typical arithmetic and logical oeprations (+)(-)(&)(|)(^)(*)(%)(>>)(<<)
The ALU unit produces one result and can [optionally] update the Condition-Codes.
The result can [optionally] be stored back into one register.
And the result can [optionally] be stored to the memory to the address pointed to by EA-result.

THE FETCH/MEMORY-UNIT
The Memory UNIT can load one result or store one result back to memory.
Stores are posted.
Loads are blocking for the memory unit (ie other memory operations). Operatings of the other units (ALU/EA) are not affectec by a load block.

THE BRANCH+PREDICTION-UNIT
The Branch-and branch prediction unit is part of the early decoder step ahead of the execution pipeline.
The Conditions-Codes are logically part Branch unit
The ALU can store to the Condition codes.
The Decoder keeps a Condition-Code-validity counter per flag. Their purpose it to keep track if there are instruction in the pipe that will be able to trash the CC-flags. It allows to do an early decision of branches.

68K Decoding, Pipelining Examples
ADD
ADD Rn,Dn
1AALU LOAD
1BALU EX
1CALU STORE
ADD (ea),Dn
1AEA LOAD
1BEA EX
1CEA RESULTFETCH can take n cycles
2AALU LOAD *wait on memFETCH result ready
2BALU EX
2CALU STORE
ADD Dn,(ea)
1AEA LOAD
1BEA EX
1CEA RESULTFETCH can take n cycles
2AALU LOAD *wait on memFETCH result ready
2BALU EX
2CALU STORESTORE accept (posted)

LEA (ea),An
1AEA LOAD
1BEA EX
1CEA RESULT STORE

ADD Rn,An
1AEA LOAD
1BEA EX
1CEA RESULT STORE
As you can see the units are not always equally utilized.
Instruction can be folded over the units for parellel execution.


ADD , SUB
ADD Rn,Dn
 ALU 
XNZVC
*****
ADD #im,Dn
 ALU 
XNZVC
*****
ADD ea,Dn
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx xx xx
 ALU   EA   FETCH 
 ALU 
XNZVC
*****
ADD Dn,ea
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx nn xx
 EA   FETCH 
 ALU   STORE 
XNZVC
*****
ADDI #im,ea
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx nn xx
 EA   FETCH 
 ALU   STORE 
XNZVC
*****
ADDA Rn,An
 EA 
XNZVC
-----
ADDA #im,An
 EA 
XNZVC
-----
ADDA ea,An
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx xx xx
 EA   FETCH 
 EA 
XNZVC
-----
ADDQ is a special form of ADDI. Like ADDI ADDQ can be used on Dn, AN, and ea.


LEA
LEA ea,An
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx xx xx
 EA 
XNZVC
-----


PEA
PEA ea
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx xx xx
 EA   STORE 
XNZVC
-----
PEA is implemented as:
LEA
MOVE result,-(SP)


CMP
CMP Rn,Dn
 ALU 
XNZVC
*****
CMP #im,Dn
 ALU 
XNZVC
*****
CMP ea,Dn
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx xx xx
 ALU   EA   FETCH 
 ALU 
XNZVC
*****
CMP Dn,ea
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx nn xx
 EA   FETCH 
 ALU 
XNZVC
*****
CMPI #im,ea
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx nn xx
 EA   FETCH 
 ALU 
XNZVC
*****
CMPA Rn,An
 EA 
XNZVC
*****
CMPA #im,An
 EA 
XNZVC
*****
CMPA ea,An
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx xx xx
 EA   FETCH 
 EA 
XNZVC
*****
CMPM (an)+,(an)+
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
x
 EA   FETCH 
 EA   FETCH 
 ALU 
XNZVC
*****
TST is a special form of CMPI.


AND , OR
AND Dn,Dn
 ALU 
XNZVC
-**00
ANDI #im,Dn
 ALU 
XNZVC
-**00
AND ea,Dn
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx xx xx
 ALU   EA   FETCH 
 ALU 
XNZVC
-**00
AND Dn,ea
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx nn xx
 EA   FETCH 
 ALU   STORE 
XNZVC
-**00
ANDI #im,ea
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx nn xx
 EA   FETCH 
 ALU   STORE 
XNZVC
-**00
ANDI #im,An
 ALU 
XNZVC
-----


EOR
EOR Dn,Dn
 ALU 
XNZVC
-**00
EORI #im,Dn
 ALU 
XNZVC
-**00
EOR Dn,ea
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx nn xx
 EA   FETCH 
 ALU   STORE 
XNZVC
-**00
EORI #im,ea
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx nn xx
 EA   FETCH 
 ALU   STORE 
XNZVC
-**00
EORI #im,An
 ALU 
XNZVC
-----


CLR
CLR Dn
 ALU 
XNZVC
-0100
CLR ea
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx nn xx
 ALU   EA   STORE 
XNZVC
-0100


MUL , DIV
MUL Dn,Dn
 ALU 
XNZVC
-***0
MUL #im,Dn
 ALU 
XNZVC
-***0
MUL ea,Dn
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx xx xx
 ALU   EA   FETCH 
 ALU 
XNZVC
-***0


EXG
EXG Rn,Rn
 ALU 
XNZVC
-----


EXT
EXT Dn
 ALU 
XNZVC
-**--


BCC , BRA
BCC #
 PC 
XNZVC
-----
PC += distance


BSR
BSR #
 EA   STORE 
 EA   PC 
XNZVC
-----
move PC,-(SP)
PC += #


JMP
JMP ea
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
x xx xx xx
 EA   PC 
XNZVC
-----


JSR
JSR ea
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
x xx xx xx
 EA   STORE 
 EA   PC 
XNZVC
-----


RTS
RTS
 EA   FETCH 
 EA   PC 
XNZVC
-----
move (SP)+,PC


RTR/RTE
RTR
 EA   FETCH 
 EA   FETCH 
 EA   PC 
XNZVC
-----
move (SP)+,CR
move (SP)+,PC


LINK AN,#
move An,-(A7)
lea (d16,A7),A7


UNLINK AN
lea (An),A7
move (A7)+,An


MOVE
MOVE Rn,Dn
 ALU 
XNZVC
-**00
MOVE #im,Dn
 ALU 
XNZVC
-**00
MOVE ea,Dn
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx xx xx
 ALU   EA   FETCH 
XNZVC
-**00
MOVE Dn,ea
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx nn xx
 ALU   EA   STORE 
XNZVC
-**00
MOVE #im,ea
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx nn xx
 ALU   EA   STORE 
XNZVC
-**00
MOVE ea,ea
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx nn xx
 EA   FETCH 
 EA   STORE 
XNZVC
-**00
MOVEQ is a special from of MOVE #im,Dn


MOVEA
MOVEA Rn,An
 EA 
XNZVC
-----
MOVEA #im,An
 EA 
XNZVC
-----
MOVEA ea,An
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx xx xx
 EA   FETCH 
XNZVC
-----


MOVEM
MOVEM list,ea
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
x xxx xx
 ALU   EA   FETCH 
XNZVC
-----
MOVEM ea,list
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xx xx xx xx
 ALU   EA   FETCH 
XNZVC
-----


NEG
NEG Dn
 ALU 
XNZVC
*****
NEG ea
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx nn xx
 EA   FETCH 
 ALU   STORE 
XNZVC
*****
NEG is implemented as a SUB. SUB (0 - ea) = ea


NOT
NOT Dn
 ALU 
XNZVC
-**00
NOT ea
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx nn xx
 EA   FETCH 
 ALU   STORE 
XNZVC
-**00
NOT is implemented as EOR. EOR (ea,ea) = ea


SCC
SCC Dn
 ALU 
XNZVC
-----


SWAP
SWAP Dn
 ALU 
XNZVC
-**00


LSR, LSL, ASR, ASL, ROL, ROR
LSR #1-8,Dn
 ALU 
XNZVC
***0*
LSR Dn,Dn
 ALU 
XNZVC
***0*


BSET , BCLR , BCHG, BTST
BSET #,Dn
 ALU 
XNZVC
--*--
BSET #,ea
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx nn xx
 EA   FETCH 
 ALU   STORE 
XNZVC
--*--
BSET Dn,Dn
 ALU 
XNZVC
--*--
BSET Dn,ea
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx nn xx
 EA   FETCH 
 ALU   STORE 
XNZVC
--*--
BSET is a specialcase of ORI
BCLR is a specialcase of ANDI
BCHG is a specialcase of EORI


NEW INSTRUCTIONS

MVS
MVS Rn,Dn
 ALU 
XNZVC
-**00
MVS #im,Dn
 ALU 
XNZVC
-**00
MVS ea,Dn
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx xx xx
 ALU   EA   FETCH 
XNZVC
-**00


MVZ
MVZ Rn,Dn
 ALU 
XNZVC
-0*00
MVZ #im,Dn
 ALU 
XNZVC
-0*00
MVZ ea,Dn
EA= (An)(An)+-(An)(d16,An)(d#,An,Xn) (d16,Pc)(d#,Pc,Xn) (xxx).w(xxx).l
xxxxx xx xx
 ALU   EA   FETCH 
XNZVC
-0*00


EA 5 4 3 2 1 0 LENGTH
DN 000reg no +0
AN 001reg no +0
(AN) 010reg no +0
(AN)+ 011reg no +0
-(AN) 100reg no +0
16(AN) 101reg no +2
8(AN,Xi) 110reg no +2 | +4 | +6
(xxx).w 111 000 +2
(xxx).L 111 001 +4
16(PC) 111 010 +2
8(PC,Xi) 111 011 +2 | +4 | +6
#data 111 100 +2 | +4
Unused 111 101
Unused 111 110
Unused 111 111


OPCODE LENGTH 1514 13 12 1110 9 8 SIZE 5 4 3 2 1 0
ORI to CCR 00xx 4 0000 0000 00 EA=111100 (immidate) 8-bit imm
ORI to SR 00xx 4 0000 0000 01 EA=111100 (immidate) 16-bit imm
ORI.B ea 00xx 4 0000 0000 00 EA 16-bit imm
ORI.W ea 00xx 4 0000 0000 01 EA 16-bit imm
ORI.L ea 00xx 6 0000 0000 10 EA 32-bit imm
ANDI to CCR 02xx 4 0000 0010 00 EA=111100 (immidate) 8-bit imm
ANDI to SR 02xx 4 0000 0010 01 EA=111100 (immidate) 16-bit imm
ANDI.B ea 02xx 4 0000 0010 00 EA 16-bit imm
ANDI.W ea 02xx 4 0000 0010 01 EA 16-bit imm
ANDI.L ea 02xx 6 0000 0010 10 EA 32-bit imm
SUBI.B ea 04xx 4 0000 0100 00 EA 16-bit imm
SUBI.W ea 04xx 4 0000 0100 01 EA 16-bit imm
SUBI.L ea 04xx 6 0000 0100 10 EA 32 bit imm
ADDI.B ea 06xx 4 0000 0110 00 EA 16-bit imm
ADDI.W ea 06xx 4 0000 0110 01 EA 16-bit imm
ADDI.L ea 06xx 6 0000 0110 10 EA 32 bit imm
CMP2 0xxx 4 0000 0SIZE0 11 EA 16-obcode
CHK2 0xxx 4 0000 0SIZE0 11 EA 16-obcode
EORI to CCR 0Axx 4 0000 1010 00 EA=111100 (immidate) 8-bit imm
EORI to SR 0Axx 4 0000 1010 01 EA=111100 (immidate) 16-bit imm
EORI.B ea 0Axx 4 0000 1010 00 EA 16-bit imm
EORI.W ea 0Axx 4 0000 1010 01 EA 16-bit imm
EORI.L ea 0Axx 6 0000 1010 10 EA 32-bit imm
CMPI.B ea 0Cxx 4 0000 1100 00 EA 16-bit imm
CMPI.W ea 0Cxx 4 0000 1100 01 EA 16-bit imm
CMPI.L ea 0Cxx 6 0000 1100 10 EA 32 bit imm
BTST ea 08xx 4 0000 1000 00 EA 16-bit opcode
BCHG ea 08xx 4 0000 1000 01 EA 16-bit opcode
BCLR ea 08xx 4 0000 1000 10 EA 16-bit opcode
BSET ea 08xx 4 0000 1000 11 EA 16-bit opcode
FREE 0Exx - 0000 1110 00 EA
FREE 0Exx - 0000 1110 01 EA
FREE 0Exx - 0000 1110 10 EA
FREE 0Exx - 0000 1110 11 EA
MOVES 0Exx 4 0000 1110 SIZE EA 16-bit opcode move to address space (not needed)
CAS2 0xxx 6 0000 1SIZE0 11 111111 32-bit opcode (not needed)
CAS 0Exx 4 0000 1110 11 EA 16-bit opcode (not needed)
BTST Dn 0Xxx 2 0000 Reg1 00 EA
BCHG Dn 0Xxx 2 0000 Reg1 01 EA
BCLR Dn 0Xxx 2 0000 Reg1 10 EA
BSET Dn 0Xxx 2 0000 Reg1 11 EA
MOVEP 0xxx 2 0000 Reg Opmode 001 AREG Illegal on AMIGA
Conflicts with AND,OR Decoding
MOVEA.W 3xxx 2 00 11 Reg 001 SRC EA
MOVEA.L 2xxx 2 00 10 Reg 001 SRC EA
MOVE.B 1xxx 2 00 01 Reg Mode SRC EA
MOVE.W 3xxx 2 00 11 Reg Mode SRC EA
MOVE.L 2xxx 2 00 10 Reg Mode SRC EA
NEGX.B ea 40xx 2 0100 0000 00 EA
NEGX.W ea 40xx 2 0100 0000 01 EA
NEGX.L ea 40xx 2 0100 0000 10 EA
MOVE from SR 40xx 2 0100 0000 11 EA
CLR.B ea 42xx 2 0100 0010 00 EA
CLR.W ea 42xx 2 0100 0010 01 EA
CLR.L ea 42xx 2 0100 0010 10 EA
MOVE from CCR 42xx 2 0100 0010 11 EA
NEG.B ea 44xx 2 0100 0100 00 EA
NEG.W ea 44xx 2 0100 0100 01 EA
NEG.L ea 44xx 2 0100 0100 10 EA
MOVE to CCR 44xx 2 0100 0100 11 EA
NOT.B ea 46xx 2 0100 0110 00 EA
NOT.W ea 46xx 2 0100 0110 01 EA
NOT.L ea 46xx 2 0100 0110 10 EA
MOVE to SR 46xx 2 0100 0110 11 EA
EXTB.W 48xx 2 0100 1000 10 000 Reg
EXTW.L 48xx 2 0100 1000 11 000 Reg
EXTB.L 49xx 2 0100 1001 11 000 Reg
LINK.L 48xx 6 0100 1000 00 001 Reg 32-bit displacement
NBCD 48xx 2 0100 1000 00 EA
SWAP 484x 2 0100 1000 01 000 Reg
BKPT 484x 2 0100 1000 01 001 Reg
PEA 48x 2 0100 1000 01 EA
TST.B 47x 2 0100 1010 00 EA
TST.W 47x 2 0100 1010 01 EA
TST.L 47x 2 0100 1010 10 EA
TAS 47x 2 0100 1010 11 EA
ILLEGAL 47FA 2 0100 1010 11 111 100
MULU 4Ax 4 0100 1100 00 EA 16-bit opcode
MULS 4Ax 4 0100 1100 00 EA 16-bit opcode
DIVU 4Ax 4 0100 1100 00 EA 16-bit opcode
DIVS 4Ax 4 0100 1100 00 EA 16-bit opcode
TRAP 4E4x 2 0100 1110 0100 Vec
LINK 4E5x 4 0100 1110 0101 0Reg 16-bit opcode
UNLINK 4E5x 2 0100 1110 0101 1Reg
MOVE USP 4E6x 2 0100 1110 0110 drReg
RESET 4E70 2 0100 1110 0111 0000
NOP 4E71 2 0100 1110 0111 0001
STOP 4E72 4 0100 1110 0111 0010
RTE 4E73 2 0100 1110 0111 0011
RTD 4E73 4 0100 1110 0111 0100 16-bit displacement
RTS 4E75 2 0100 1110 0111 0101
TRAPV 4E76 2 0100 1110 0111 0110
RTR 4E77 2 0100 1110 0111 0111
MOVEC 4E7x 4 0100 1110 0111 101dr 16-bit displacement
JMP 4Exx 2 0100 1110 10 EA
JSR 4Exx 2 0100 1110 11 EA
MOVEM.W #,ea 4Exx 4 0100 1000 10 EA Register Mask
MOVEM.L #,ea 4Exx 4 0100 1000 11 EA Register Mask
MOVEM.W ea,# 4Exx 4 0100 1100 10 EA Register Mask
MOVEM.L ea,# 4Exx 4 0100 1100 11 EA Register Mask
LEA 4xxx 2 0100 Reg 111 EA
CHK.W 4xxx 2 0100 Reg 110 EA
CHK.L 4xxx 2 0100 Reg 100 EA
ADDQ.B 5xxx 2 0101 Data 000 EA
ADDQ.W 5xxx 2 0101 Data 001 EA
ADDQ.L 5xxx 2 0101 Data 010 EA
SUBQ.B 5xxx 2 0101 Data 100 EA
SUBQ.W 5xxx 2 0101 Data 101 EA
SUBQ.L 5xxx 2 0101 Data 110 EA
DBcc 5xxx 4 0101 Cond 11 001 Reg 16-Bit Displacement
TRAPcc 5xxx 4 0101 Cond 11 111 Mode 16-Bit Displacement
SCC 5xxx 2 0101 Cond 11 EA
BRA.S 60xx 2 0110 0000 8-Bit Disp
BRA.W 60xx 4 0110 0000 00 16-Bit Displacement
BRA.L 60xx 6 0110 0000 FF 32-Bit Displacement
BSR.S 61xx 2 0110 0001 8-Bit Disp
BSR.W 61xx 4 0110 0001 00 16-Bit Displacement
BSR.L 61xx 6 0110 0001 FF 32-Bit Displacement
Bcc.S 6xxx 2 0110 Condition 8-Bit Disp
Bcc.W 6xxx 4 0110 Condition 00 16-Bit Displacement
Bcc.L 6xxx 6 0110 Condition FF 32-Bit Displacement
MOVEQ.L 7xxx 2 0111 Reg 0 8-Bit Data
MVS.B 7xxx 2 0111 Reg 100 EA
MVS.W 7xxx 2 0111 Reg 101 EA
MVZ.B 7xxx 2 0111 Reg 110 EA
MVZ.W 7xxx 2 0111 Reg 111 EA
DIVU 8xxx 2 1000 Reg 0 1 1 EA
SBCD 8xxx 2 1000 Reg 1 0 0 0 0 R Reg
PACK 8xxx 2 1000 Reg 1 0 1 0 0 R Reg
UNPACK 8xxx 4 1000 Reg 1 1 0 0 0 R Reg 16-Bit Displacement
DIVU 8xxx 2 1000 Reg 1 1 1 EA
OR.B EA,Dn 8xxx 2 1000 Reg 0 00 EA
OR.W EA,Dn 8xxx 2 1000 Reg 0 01 EA
OR.L EA,Dn 8xxx 2 1000 Reg 0 10 EA
OR.B Dn,EA 8xxx 2 1000 Reg 1 00 EA
OR.W Dn,EA 8xxx 2 1000 Reg 1 01 EA
OR.L Dn,EA 8xxx 2 1000 Reg 1 10 EA
SUBX.B Dx,Dy 9xxx 2 1001 Reg 1 00 000 Reg
SUBX.W Dx,Dy 9xxx 2 1001 Reg 1 01 000 Reg
SUBX.L Dx,Dy 9xxx 2 1001 Reg 1 10 000 Reg
SUBX.B -(Ax),-(Ay) 9xxx 2 1001 Reg 1 00 001 Reg
SUBX.W -(Ax),-(Ay) 9xxx 2 1001 Reg 1 01 001 Reg
SUBX.L -(Ax),-(Ay) 9xxx 2 1001 Reg 1 10 001 Reg
SUB.B EA,Dn 9xxx 2 1001 Reg 0 00 EA
SUB.W EA,Dn 9xxx 2 1001 Reg 0 01 EA
SUB.L EA,Dn 9xxx 2 1001 Reg 0 10 EA
SUBA.W EA,An 9xxx 2 1001 Reg 0 11 EA
SUBA.L EA,An 9xxx 2 1001 Reg 1 11 EA
SUB.B Dn,EA 9xxx 2 1001 Reg 1 00 EA
SUB.W Dn,EA 9xxx 2 1001 Reg 1 01 EA
SUB.L Dn,EA 9xxx 2 1001 Reg 1 10 EA
CMPM Bxxx 2 1011 Reg 1 Size 001 Reg
CMP.B ea,Dn Bxxx 2 1011 Reg 0 0 0 EA
CMP.W ea,Dn Bxxx 2 1011 Reg 0 0 1 EA
CMP.L ea,Dn Bxxx 2 1011 Reg 0 1 0 EA
CMPA.W ea,An Bxxx 2 1011 Reg 0 1 1 EA
CMPA.L ea,An Bxxx 2 1011 Reg 1 1 1 EA
EOR.B Dn,ea Bxxx 2 1011 Reg 1 0 0 EA
EOR.W Dn,ea Bxxx 2 1011 Reg 1 0 1 EA
EOR.L Dn,ea Bxxx 2 1011 Reg 1 1 0 EA
ABCD Cxxx 2 1100 Reg 1 0 0 0 0 R Reg
EXG Dn,Dn Cxxx 2 1100 Reg 1 0 1 0 0 0 Reg
EXG An,An Cxxx 2 1100 Reg 1 0 1 0 0 1 Reg
EXG Dn,An Cxxx 2 1100 Reg 1 1 0 0 0 1 Reg
AND.B ea,Dn Cxxx 2 1100 Reg 0 00 EA
AND.W ea,Dn Cxxx 2 1100 Reg 0 01 EA
AND.L ea,Dn Cxxx 2 1100 Reg 0 10 EA
MULU Cxxx 2 1100 Reg 0 1 1 EA
MULS Cxxx 2 1100 Reg 1 1 1 EA
AND.B Dn,ea Cxxx 2 1100 Reg 1 00 EA
AND.W Dn,ea Cxxx 2 1100 Reg 1 01 EA
AND.L Dn,ea Cxxx 2 1100 Reg 1 10 EA
ADDX.B Rx,Ry Dxxx 2 1101 Reg 1 00 000 Reg
ADDX.W Rx,Ry Dxxx 2 1101 Reg 1 01 000 Reg
ADDX.L Rx,Ry Dxxx 2 1101 Reg 1 10 000 Reg
ADDX.B -(Ax),-(Ay) Dxxx 2 1101 Reg 1 00 001 Reg
ADDX.W -(Ax),-(Ay) Dxxx 2 1101 Reg 1 01 001 Reg
ADDX.L -(Ax),-(Ay) Dxxx 2 1101 Reg 1 10 001 Reg
ADD.B ea,Dn Dxxx 2 1101 Reg 000 EA
ADD.W ea,Dn Dxxx 2 1101 Reg 001 EA
ADD.L ea,Dn Dxxx 2 1101 Reg 010 EA
ADDA.W ea,An Dxxx 2 1101 Reg 011 EA
ADDA.L ea,An Dxxx 2 1101 Reg 111 EA
ADD.B Dn,ea Dxxx 2 1101 Reg 100 EA
ADD.W Dn,ea Dxxx 2 1101 Reg 101 EA
ADD.L Dn,ea Dxxx 2 1101 Reg 110 EA
ASR ea Exxx 2 1110 0000 11 EA
ASL ea Exxx 2 1110 0001 11 EA
LSR ea Exxx 2 1110 0010 11 EA
LSL ea Exxx 2 1110 0011 11 EA
ROXR ea Exxx 2 1110 010 0 11 EA
ROXL ea Exxx 2 1110 010 1 11 EA
ROR ea Exxx 2 1110 011 0 11 EA
ROL ea Exxx 2 1110 011 1 11 EA
ASR.B #,Dn Exxx 2 1110 Imm0 00 100 Reg
ASR.W #,Dn Exxx 2 1110 Imm0 01 100 Reg
ASR.L #,Dn Exxx 2 1110 Imm0 10 100 Reg
ASL.B #,Dn Exxx 2 1110 Imm0 00 000 Reg
ASL.W #,Dn Exxx 2 1110 Imm0 01 000 Reg
ASL.L #,Dn Exxx 2 1110 Imm0 10 000 Reg
ASR.B Dx,Dn Exxx 2 1110 Reg1 00 100 Reg
ASR.W Dx,Dn Exxx 2 1110 Reg1 01 100 Reg
ASR.L Dx,Dn Exxx 2 1110 Reg1 10 100 Reg
ASL.B Dx,Dn Exxx 2 1110 Reg1 00 000 Reg
ASL.W Dx,Dn Exxx 2 1110 Reg1 01 000 Reg
ASL.L Dx,Dn Exxx 2 1110 Reg1 10 000 Reg
LSR.B #,Dn Exxx 2 1110 Imm0 00 101 Reg
LSR.W #,Dn Exxx 2 1110 Imm0 01 101 Reg
LSR.L #,Dn Exxx 2 1110 Imm0 10 101 Reg
LSL.B #,Dn Exxx 2 1110 Imm0 00 001 Reg
LSL.W #,Dn Exxx 2 1110 Imm0 01 001 Reg
LSL.L #,Dn Exxx 2 1110 Imm0 10 001 Reg
LSR.B Dx,Dn Exxx 2 1110 Reg1 00 101 Reg
LSR.W Dx,Dn Exxx 2 1110 Reg1 01 101 Reg
LSR.L Dx,Dn Exxx 2 1110 Reg1 10 101 Reg
LSL.B Dx,Dn Exxx 2 1110 Reg1 00 001 Reg
LSL.W Dx,Dn Exxx 2 1110 Reg1 01 001 Reg
LSL.L Dx,Dn Exxx 2 1110 Reg1 10 001 Reg
ROXR.B #,Dn Exxx 2 1110 Imm0 00 110 Reg
ROXR.W #,Dn Exxx 2 1110 Imm0 01 110 Reg
ROXR.L #,Dn Exxx 2 1110 Imm0 10 110 Reg
ROXL.B #,Dn Exxx 2 1110 Imm0 00 010 Reg
ROXL.W #,Dn Exxx 2 1110 Imm0 01 010 Reg
ROXL.L #,Dn Exxx 2 1110 Imm0 10 010 Reg
ROXR.B Dx,Dn Exxx 2 1110 Reg1 00 110 Reg
ROXR.W Dx,Dn Exxx 2 1110 Reg1 01 110 Reg
ROXR.L Dx,Dn Exxx 2 1110 Reg1 10 110 Reg
ROXL.B Dx,Dn Exxx 2 1110 Reg1 00 010 Reg
ROXL.W Dx,Dn Exxx 2 1110 Reg1 01 010 Reg
ROXL.L Dx,Dn Exxx 2 1110 Reg1 10 010 Reg
ROR.B #,Dn Exxx 2 1110 Imm0 00 111 Reg
ROR.W #,Dn Exxx 2 1110 Imm0 01 111 Reg
ROR.L #,Dn Exxx 2 1110 Imm0 10 111 Reg
ROL.B #,Dn Exxx 2 1110 Imm0 00 011 Reg
ROL.W #,Dn Exxx 2 1110 Imm0 01 011 Reg
ROL.L #,Dn Exxx 2 1110 Imm0 10 011 Reg
ROR.B Dx,Dn Exxx 2 1110 Reg1 00 111 Reg
ROR.W Dx,Dn Exxx 2 1110 Reg1 01 111 Reg
ROR.L Dx,Dn Exxx 2 1110 Reg1 10 111 Reg
ROL.B Dx,Dn Exxx 2 1110 Reg1 00 011 Reg
ROL.W Dx,Dn Exxx 2 1110 Reg1 01 011 Reg
ROL.L Dx,Dn Exxx 2 1110 Reg1 10 011 Reg
BFTST Exxx 4 1110 1000 10 EA 16 bit Opcode
BFEXTU Exxx 4 1110 100 1 11 EA 16 bit Opcode
BFCHG Exxx 4 1110 101 0 11 EA 16 bit Opcode
BFEXTS Exxx 4 1110 101 1 11 EA 16 bit Opcode
BFCLR Exxx 4 1110 110 0 11 EA 16 bit Opcode
BFFFO Exxx 4 1110 110 1 11 EA 16 bit Opcode
BFSET Exxx 4 1110 111 0 11 EA 16 bit Opcode
BFINS Exxx 4 1110 111 1 11 EA 16 bit Opcode