Home   News   Concept   AMIGA-Compatible   Hardware   Forum   Questions+Answers   Pictures   Contact & Team

Welcome to the Natami / Amiga Forum

This forum is for AMIGA fans interested in the new NATAMI platform.
Please read the forum usage manual.



All TopicsNewsQAFeaturesTalkTEAMLogin to post    Create account
Welcome to the Natami lounge.
Meet new AMIGA friends here and enjoy having a friendly chit chat.

68K Ideas for the Futurepage  1 2 3 4 
Gunnar von Boehn
Germany
(Moderator)
Posts 5775
17 Sep 2010 12:58


I think its always nice to have some goals for the future.

The discussion with "Amiga Believer" about the pros and cons of the Motorola 96000 DSP made me think a bit.
The 96000 looks from the ASM dialect a lot like the 68K but has some difference. The 96000 support some instructions which the 68K does not have.

But all of those missing instructions would be easy to add the 68K.
So here is a list of Stuff that the 96000 has and we not.

In general we could add any of those instructions to the 68050.
But I would like that we do not add new features hastily and only add features of which we are really sure that they are of significant benefit.

Maybe we could brainstorm about the value of these instructions:

ABS    Absolute integer Value
SHIFT  Immidiate Shift allowing > 8 bits
FF1    Find First 1
BScc    Conditional BSR
Jcc    Conditional JMP
JScc    Conditional JSR
JOIN    Src.LW -> Dst.HW
JOINB  Src.LB -> Dst.HB
ORC    Or with Complement
SPLIT  Src.HW  -> Dst.LW (sign extented)
SPLITB  Src.HB  -> Dst.LB

The detailed description of the instruction is in the 96000 user manual (see freescale website)

So what are your thoughts?

André Jernung
Sweden
(MX-Board Owner)
Posts 988
17 Sep 2010 13:13


Gunnar von Boehn wrote:

BScc    Conditional BSR
Jcc    Conditional JMP
JScc    Conditional JSR

I think that especially these three would be very useful. They would make life easier for the 68k programmer.

SID Hervé
France

Posts 663
17 Sep 2010 13:49


Personal opinion of a future user.
 
If a simple extension of the instruction set allows the 68k to work as a DSP, so why not?
 
You could do so the economy of a hypothetical integration of a dedicated component on the motherboard and for a single user, the economy of an expansion card.
 
Thank you.

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
17 Sep 2010 14:03


SID Hervé wrote:

  If a simple extension of the instruction set allows the 68k to work as a DSP, so why not?
 

 
  Actually this has nothing to do with "working as a DSP".
  We are just talking about potentially useful CPU instruction.

SID Hervé
France

Posts 663
17 Sep 2010 15:19


The term "Motorola 96000 DSP" induced me in the error.

Thank you

Thomas Richter
Germany
(MX-Board Owner)
Posts 1425
17 Sep 2010 16:14


A DSP needs mostly instructions for implementing digital filters and performing simple arithmetic operations on large vectors. This includes the typical SIMD instructions, multiply-add instructions, fixcomma arithmetic, and table lookup instructions like in the CPU32.

Greetings,
Thomas


Gunnar von Boehn
Germany
(Moderator)
Posts 5775
17 Sep 2010 16:29


We could also think if there are ways to enhance the 68K instruction set without even adding new instructions.

For example:
Lets say we want to add this instruction:
LSL.L #9,Dn

Lets say because of encoding space limitations we find no 16bit encoding for this but only a 32bit encoding.

If this would be the case we could also do the same work by using 2 instructions:
E.g
LSL.L #8,Dn
LSL.L #1,Dn

Of course the two instruction would take 2 clocks instead of 1.
But if we "enhance" the CPU Decoder it could merge those 2 instruction into 1. Thereby doing these 2 instructions in 1 cycle.

The net effect would be the same as adding a new 32bit encoding but without even needing to add a new encoding.
This means our CPU would run faster on old code and new code without that we need to change the existing 68k Compilers. :-)

Another merge example:

MOVE.L D0,D1
LSL.L #8,D1
LSL.L #1,D1

These 3 instructions could in theory be all merged into 1.

Another example:
It would be great if we coudl do BSRcc in 1 cycle.
Only the encoding is challenging.

But maybe we could encode it like this:
Bcc
BSR

Then it would be backward compatible with the old 68K CPUs
and if the Decoder is smart enough still be executed in 1 cycle.

What do you think?



Samuel D Crow
USA
(Natami Team)
Posts 1295
17 Sep 2010 16:43


Since we've already got some opcode fusion planned, let's continue with that in mind.  The following should be possible also:

This one is the equivalent of JScc but could be a predecated jump or some such similar thing:


    Bcc.b label
    JSR label2
label:

Jcc could be implemented with this:


    Bcc.b label
    JMP label2
label:

All we'd need is a way to reverse the condition code in the macro assembler.  Otherwise we'd have to make a macro for every condition code for each new instruction.

Matt Hey
USA

Posts 727
18 Sep 2010 08:37


@Gunnar
      I think you were thinking about how the N68050+ ColdFire extensions already add much of the power of the 96000 DSP when you wrote FF1 instead of BFIND ;). The N68050+ is going to have the CF MAC instruction, right? The versatility and power of the 68k + CF instruction set already is more powerful than the 96000 DSP IMHO (easier to use too). Instruction fusion and predication will speed up commonly used instruction combinations as already mentioned. It would be interesting to have a tool that would track commonly used instruction combinations and report which ones save the most cycles in real world use.
     
      My instruction review...
     
      -------
     
      ABS
     
      tst dx
      bpl.b .skip
      neg dx
    .skip:
     
      Trivial and fast already with predication. The tst can be avoided sometimes if the cc is set from another operation.
     
      -------
     
      SHIFT  Immediate Shift allowing GT 8 bits
      (GT = greater than, LE = less than or equal)
   
    for n GT 8 and n LE 16
      lsl #m,dx
      lsl #(n-m),dx
   
    or
   
    for n GT 16
      moveq #n,dy
      lsl dy,dx
     
      Shifts greater than 8 are common. It would be nice to have an immediate word size shift of up to 31 but it's probably not worth it now. Code fusion is just as fast (but not as small) and will speed up old 68k code.
     
      -------
     
      BFIND
     
      ff1
     
      Already in CF additions and 68k bfffo is available too.
     
      -------
     
      BScc, JScc
     
      bcc.b .skip
      bsr label
    .skip:
     
      Variables passed complicates subroutine calls limiting the use of a conditional subroutine. Predication speeds up this combination already.
     
      -------
     
      Jcc
     
      Would not be used enough.
     
      -------
     
      JOIN
     
      swap dx
      move.w dy,dx
      swap dx
     
      JOINB
     
      rol.l #8,dx
      move.b dy,dx
      ror.l #8,dx
     
      SPLIT
     
      swap dy
      mvs.w dy,dx ; ColdFire instruction
      swap dy
     
      SPLITB
     
      rol.l #8,dy
      mvs.b dy,dx ; ColdFire instruction
      ror.l #8,dy
     
      I don't think JOINB and SPLITB would be common enough to add. JOIN and SPLIT are fairly common but I believe it's better to instruction fuse at least the swap + move.w (and maybe swap + move.w) as I mentioned in the instruction fusion thread. This way legacy 68k code benefits too. Using mvs and mvz CF instructions adds flexibility as signed AND unsigned work. It's too bad these 2 CF instructions weren't in the 68k from the beginning. They are work horses and reduce code size a lot. They can replace most of the ext instructions and allow to clear the upper contents of a register allowing for better register utilization...
     
      mvs.b dx,dx ;extb.l dx
      mvs.w dx,dx ;ext.l dx
      mvz.b dx,dx ;clear the upper 24 bits of a register
      mvz.w dx,dx ;clear the upper word of a register
     
      -------
     
      ORC (NOR)
     
      not dy
      or dy,dx
     
      ANDC (NAND)
     
      not dy
      and dy,dx
     
      More instruction fusion possible?
     
      -------
     
      I think some of the 96000 fpu instructions are interesting but I'll save that for a fpu thread later ;).
   

Denis Markovic
Germany
(Natami Team)
Posts 41
18 Sep 2010 08:49


Hi Natami Team,

I would like to apply to become member of the Team.

Hope this is the right place to ask?

I think I could mainly contribute in the areas signal processing
implementations (work with DSPs since 99, lately with vector
processors), instruction set discussions (started with Amiga 87,
68k, ARM, ... experience) and some floating point issues (started to implement some 16bit floating point stuff in Verilog in a friends open source DSP project, http://code.google.com/p/ajardsp/).

/Br

Denis


Gunnar von Boehn
Germany
(Moderator)
Posts 5775
18 Sep 2010 12:23


Denis Markovic wrote:

Hi Natami Team,
 
I would like to apply to become member of the Team.
 
Hope this is the right place to ask?

People sharing our AMIGA ambition and our mind set are always welcome. It would be best to have a quick call to discuss all possibilities in detail. Can you email me your phone number to gunnar@greyhound-data.com ?



Amiga Believer
Canada

Posts 282
19 Sep 2010 03:56


The absolute integer value instruction would be useful with stuff like video or audio compression and decompression. For this reason, it can be good to add it to the SIMD unit too.



Matt Hey
USA

Posts 727
19 Sep 2010 05:17


Amiga Believer wrote:

    The absolute integer value instruction would be useful with stuff like video or audio compression and decompression. For this reason, it can be good to add it to the SIMD unit too.
   

   
    Yes, an integer absolute value instruction would be useful but it's not needed. The 68k is fast and flexible enough to handle this functionality without wasting instruction space on an ABS instruction. The N68050+ will be able to do the functionality in 1-3 cycles using 6 bytes or less. This is very good and faster than the 68060. If you have to have an instruction, I would recommend you use a macro...

    abs MACRO
        tst.\0 \1
        bpl.b .skip\@
        neg.\0 \1
      .skip\@:
        ENDM
 
  Now you have your instruction. Use like...
   
      abs.l d0
   
  Problem solved. Now please go write us your optimized audio software in assembler!

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
19 Sep 2010 07:37


Hi Matt,

Many thanks for the good ideas.

Matt Hey wrote:

for n GT 16
moveq #n,dy
lsl dy,dx

ORC (NOR)
not dy
or dy,dx


Unfortunately those are difficult to merge.

I'll explain why, maybe we can then together create a table of good "fusion pairs" and instructions which we better add new.

The 68K ALU is in general designed like this:
2 register read (each 32bit) - operation - 1 register update (32 bit).

This means the following:

MOVE.L D0,D1 is implemented as
SrcA= D0
Srcb= D1
Dst = (SrcA & $ffffffff)

But MOVE.B D0,D1 is internally implemented like this:
SrcA = D0
SrcB = D1
Dst  = (A & $ff) OR (B & $ffffff00)

Because of this we have some limitations when we fuse pairs.

Rule: each ALU can only update one register per clock.
The below would update 2 registers therefore we can NOT merge them.

not dy
or dy,dx


SPLITB
   
rol.l #8,dy
mvs.b dy,dx ; ColdFire instruction
ror.l #8,dy

How about this:
MOVE.L Dy,Dx
LSR.L  #8,Dx
EXTB.L Dx

This version could have the same flag settings as the SPLIT instruction.

I'm not sure if we should go for 6 byte encodings.
If we can encode a new instruction using two old instructions in 4byte total - then this is good IMHO. But using 6 Byte might be on the edge.

What do you think?

Matt Hey wrote:

I think some of the 96000 fpu instructions are interesting but I'll save that for a fpu thread later ;).   

Please feel free to share your thoughts about them!

Claudio Wieland
Germany
(Natami Team)
Posts 703
19 Sep 2010 07:49


We also have to mind limited cache sizes. Smaller code is better.

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
19 Sep 2010 08:40


Claudio Wieland wrote:

We also have to mind limited cache sizes. Smaller code is better.

This is true. But our free encoding space is also very limited.

In the A-range the 68000 room for 2 new full instructions using the form EA,DN and having B/W/L support.

This means we can not add many instructions that encode in 16bit.
But we can add many instructions that encode in 32bit.

Simple instructions that only use 1 operant like ABS Dn need less encoding space. We can certainly add a few of those in 16bit still.



Denis Markovic
Germany
(Natami Team)
Posts 41
19 Sep 2010 09:37


A few quick thoughts about the instructions:
 
 
Gunnar von Boehn wrote:

 
  ABS    Absolute integer Value
 

 
  Very useful for DSP stuff, e.g. power density spectrum,
  L1 norm as long as there is no sum of absolute differences instruction (can be used in e.g. speech recognition, video
  coding, ...)
 
 
Gunnar von Boehn wrote:

 
  SHIFT  Immidiate Shift allowing > 8 bits
 

 
  Obviously a good idea :)
 
 
Gunnar von Boehn wrote:

  FF1    Find First 1
 

 
  I guess something like count leading zeros (i.e. count number of most significant 0 bits in a variable)? Very important
  for block floating point implementation, i.e. normalization of a block for higher precision/higher dynamic range;
 
  it would also be very useful, to have the instruction for
  signed numbers, i.e. if the number is positive return number
  of most significant 0 bits (minus 1), if it is negative return number
  of most significant 1 bits (minus 1)
  If you want to save space for op code, one instruction could
  do both and put the unsigned result (# leading 0) in the low
  part and the signed result in the high part of the result
  ...
 
  Important in this context:
  array minimum/maximum search (i.e. example pseudocode for
  findmax.w var0,cntvar,var2
  if(var0[31:16] > var2[31:16]) {
    var2[31:16] = var0[31:16];
    var2[15:0] = cntvar[15:0];
    cntvar++;
  }
  if(var0[15:0] > var2[31:16]) {
    var2[31:16] = var0[15:0];
    var2[15:0] = cntvar[15:0];
    cntvar++;
  });
 
  This would be extremely useful to find the range of an array
  and to do block floating point.
 
  even better to make a minmax instruction (for .l, .w, .b) that
  does a parallel search on min/max.
 
  What do you think about that idea?

One drawback is the 3 operands; if we don't want that, we could
have special DSP registers to e.g. store minumum/maximum value and the minumum/maximum index; after looping with this instruction you could simply read the values from that memory mapped register; while this is not very 68k like it would be very powerful for signal processing and save a lot of opcode space.

We could have 3 modes (similar to ARM):
68k basic version, 68k basic plus DSP instructions (with memory mapped or special registers or registers could be readable/writable with an extra move), 68k basic plus DSP plus SIMD?
 
 
Gunnar von Boehn wrote:

  BScc    Conditional BSR
  Jcc    Conditional JMP
  JScc    Conditional JSR
  JOIN    Src.LW -> Dst.HW
  JOINB  Src.LB -> Dst.HB
  ORC    Or with Complement
  SPLIT  Src.HW  -> Dst.LW (sign extented)
  SPLITB  Src.HB  -> Dst.LB
 
  The detailed description of the instruction is in the 96000 user manual (see freescale website)
 
  So what are your thoughts?
 

 
  Hm, not so many thoughts on the rest of the instructions.
 
 

Matt Hey
USA

Posts 727
19 Sep 2010 10:37


First off, I should mention from my previous post...

swap + move.w (and maybe swap + move.w)

should read...

swap + move.w (and maybe move.w + swap)

I couldn't edit after someone else posted. I also had to use GT and LE instead of the greater than and less than signs because they are interpreted as html codes by the forum.

Gunnar von Boehn wrote:
 
 
Matt Hey wrote:

  for n GT 16
  moveq #n,dy
  lsl dy,dx
 
  ORC (NOR)
  not dy
  or dy,dx
 

  Unfortunately those are difficult to merge.
 

I see. I don't think it's such a big loss in the case of ORC and ANDC. The blitter or other gfx hardware can be used if there is a lot of these logical operations. The shifting greater than 8 is more of a problem. As much as I want to limit adding new instructions, I think this a prime candidate. A single word encoding would save code space in the case of 2 consecutive shifts too. I'm thinking the RTM + CALLM or A_ instruction space looks like it might have room. The asl and lsl could have the same encoding. Immediate 16 (swap) and 32 (useless) would not be needed. That would leave 9-15 and 17-31. I don't like how leaving rotate out would be unbalanced and how the encoding would be ad hoc. Maybe someone will have a different idea.


 
  SPLITB
     
  rol.l #8,dy
  mvs.b dy,dx ; ColdFire instruction
  ror.l #8,dy
 

 
  How about this:
  MOVE.L Dy,Dx
  LSR.L  #8,Dx
  EXTB.L Dx
 
  This version could have the same flag settings as the SPLIT instruction.

There are 2 "upper" bytes so there could be a SPLITB.L and and SPLITB.W. My code was for the former and yours the latter. Either one could be encoded differently...

SPLITBS.W (signed)

  move.l dy,dx
  lsr.l  #8,dx
  extb.l dx

SPLITBZ.W (unsigned)

  move.l dy,dx
  lsr.l  #8,dx
  mvz.b dx,dx

SPLITBS.L (signed)

  move.l dy,dx
  rol.l #8,dx
  extb.l dx

SPLITBZ.L (unsigned)

  move.l dy,dx
  rol.l #8,dx
  mvz.b dx,dx

I don't think the byte versions of SPLIT and JOIN are common enough to fuse let alone create a new instruction. The CF instructions mvz.b and mvs.b already make dealing with the byte easier and using a few instructions is more flexible and less confusing.

Another possibility to look at is the bit field instructions.  BFEXTU and BFEXTS would already be smaller than 3 instructions.
How fast will the bit field instructions be on the N68050 to a register?


  I'm not sure if we should go for 6 byte encodings.
  If we can encode a new instruction using two old instructions in 4byte total - then this is good IMHO. But using 6 Byte might be on the edge.

 
I never suggested a 6 byte encoded instruction. 4 bytes is quite enough for an instruction. I was stating that a macro would take at most 6 bytes and opposed creating an ABS instruction. Anything besides a 2 byte ABS instruction would not make sense.


Phil "meynaf" G.
France
(Natami Team)
Posts 393
19 Sep 2010 12:18


ORC (Or with Complement) does not seem very useful to me.
ORC d1,d0 equals (if you want to change only 1 reg) :
  not d0
  and d1,d0
  not d0
And is basically the "implication" logical operator (unless i'm mistaken).

Gunnar von Boehn wrote:

  In the A-range the 68000 room for 2 new full instructions using the form EA,DN and having B/W/L support.

You really want to break 68k Mac emulators, don't you ? ;-)

Gunnar von Boehn wrote:

  Simple instructions that only use 1 operant like ABS Dn need less encoding space. We can certainly add a few of those in 16bit still.

Area used by things such as ff1, bitrev, byterev have a few places left. So abs could fit in here. I'd also recommend adding a new BitCnt (bit counter), PopCnt if you prefer to name it like that.


Gunnar von Boehn
Germany
(Moderator)
Posts 5775
19 Sep 2010 12:40


Phil G. wrote:

 
Gunnar von Boehn wrote:

    In the A-range the 68000 room for 2 new full instructions using the form EA,DN and having B/W/L support.
 

  You really want to break 68k Mac emulators, don't you ? ;-)
 

 
I think this is solveable. :-)

The new A-line instructions could be enabled or disabled with a special Bit in the SR register. This way each task would could decide to be in AMIGA or MAC mode. The AMIGA OS could default to enabled, and the MAC-Emulator would disable them. Should be simple and fully compatible.
 
 
Phil G. wrote:

 
Gunnar von Boehn wrote:

    Simple instructions that only use 1 operant like ABS Dn need less encoding space. We can certainly add a few of those in 16bit still.
 

  Area used by things such as ff1, bitrev, byterev have a few places left. So abs could fit in here. I'd also recommend adding a new BitCnt (bit counter), PopCnt if you prefer to name it like that.
 

 
Yes, this makes perfect sense to me.
Popcount sounds very good.
So which encoding would you propose for POPCOUNT and ABS?

Regarding the Shift.
I wonder if adding an "extend" opcode would make sense.
Lets say we identify a 16bit opcode which is so far unused and has some free bits.
 
This opcode could be used to context sensitive extend the existing instructions:
Adding this opcode to the existing SHIFT #1,(ea) instruction would make it to an SHIFT #n,(ea)
 
And other instructions could be extended like this also ... Just a thought.

posts 68page  1 2 3 4