 |
Welcome to the Natami / Amiga ForumThis forum is for AMIGA fans interested in the new NATAMI platform.
Please read the forum usage manual.
|
Welcome to the Natami lounge. Meet new AMIGA friends here and enjoy having a friendly chit chat. |
| 68K Ideas for the Future | page 1 2 3 4
|
|---|
|
|---|
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 17 Sep 2010 12:58
| I think its always nice to have some goals for the future. The discussion with "Amiga Believer" about the pros and cons of the Motorola 96000 DSP made me think a bit. The 96000 looks from the ASM dialect a lot like the 68K but has some difference. The 96000 support some instructions which the 68K does not have. But all of those missing instructions would be easy to add the 68K. So here is a list of Stuff that the 96000 has and we not. In general we could add any of those instructions to the 68050. But I would like that we do not add new features hastily and only add features of which we are really sure that they are of significant benefit. Maybe we could brainstorm about the value of these instructions: ABS Absolute integer Value SHIFT Immidiate Shift allowing > 8 bits FF1 Find First 1 BScc Conditional BSR Jcc Conditional JMP JScc Conditional JSR JOIN Src.LW -> Dst.HW JOINB Src.LB -> Dst.HB ORC Or with Complement SPLIT Src.HW -> Dst.LW (sign extented) SPLITB Src.HB -> Dst.LB The detailed description of the instruction is in the 96000 user manual (see freescale website) So what are your thoughts?
| |
André Jernung Sweden
| | (MX-Board Owner) Posts 988 17 Sep 2010 13:13
| Gunnar von Boehn wrote:
| BScc Conditional BSR Jcc Conditional JMP JScc Conditional JSR
|
I think that especially these three would be very useful. They would make life easier for the 68k programmer.
| |
SID Hervé France
| | Posts 663 17 Sep 2010 13:49
| Personal opinion of a future user. If a simple extension of the instruction set allows the 68k to work as a DSP, so why not? You could do so the economy of a hypothetical integration of a dedicated component on the motherboard and for a single user, the economy of an expansion card. Thank you.
| |
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 17 Sep 2010 14:03
| SID Hervé wrote:
| If a simple extension of the instruction set allows the 68k to work as a DSP, so why not? |
Actually this has nothing to do with "working as a DSP". We are just talking about potentially useful CPU instruction.
| |
SID Hervé France
| | Posts 663 17 Sep 2010 15:19
| The term "Motorola 96000 DSP" induced me in the error. Thank you
| |
Thomas Richter Germany
| | (MX-Board Owner) Posts 1425 17 Sep 2010 16:14
| A DSP needs mostly instructions for implementing digital filters and performing simple arithmetic operations on large vectors. This includes the typical SIMD instructions, multiply-add instructions, fixcomma arithmetic, and table lookup instructions like in the CPU32. Greetings, Thomas
| |
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 17 Sep 2010 16:29
| We could also think if there are ways to enhance the 68K instruction set without even adding new instructions. For example: Lets say we want to add this instruction: LSL.L #9,Dn Lets say because of encoding space limitations we find no 16bit encoding for this but only a 32bit encoding. If this would be the case we could also do the same work by using 2 instructions: E.g LSL.L #8,Dn LSL.L #1,Dn Of course the two instruction would take 2 clocks instead of 1. But if we "enhance" the CPU Decoder it could merge those 2 instruction into 1. Thereby doing these 2 instructions in 1 cycle. The net effect would be the same as adding a new 32bit encoding but without even needing to add a new encoding. This means our CPU would run faster on old code and new code without that we need to change the existing 68k Compilers. :-) Another merge example: MOVE.L D0,D1 LSL.L #8,D1 LSL.L #1,D1 These 3 instructions could in theory be all merged into 1. Another example: It would be great if we coudl do BSRcc in 1 cycle. Only the encoding is challenging. But maybe we could encode it like this: Bcc BSR Then it would be backward compatible with the old 68K CPUs and if the Decoder is smart enough still be executed in 1 cycle. What do you think?
| |
Samuel D Crow USA
| | (Natami Team) Posts 1295 17 Sep 2010 16:43
| Since we've already got some opcode fusion planned, let's continue with that in mind. The following should be possible also: This one is the equivalent of JScc but could be a predecated jump or some such similar thing:
Bcc.b label JSR label2 label:
Jcc could be implemented with this:
Bcc.b label JMP label2 label:
All we'd need is a way to reverse the condition code in the macro assembler. Otherwise we'd have to make a macro for every condition code for each new instruction.
| |
Matt Hey USA
| | Posts 726 18 Sep 2010 08:37
| @Gunnar I think you were thinking about how the N68050+ ColdFire extensions already add much of the power of the 96000 DSP when you wrote FF1 instead of BFIND ;). The N68050+ is going to have the CF MAC instruction, right? The versatility and power of the 68k + CF instruction set already is more powerful than the 96000 DSP IMHO (easier to use too). Instruction fusion and predication will speed up commonly used instruction combinations as already mentioned. It would be interesting to have a tool that would track commonly used instruction combinations and report which ones save the most cycles in real world use. My instruction review... ------- ABS tst dx bpl.b .skip neg dx .skip: Trivial and fast already with predication. The tst can be avoided sometimes if the cc is set from another operation. ------- SHIFT Immediate Shift allowing GT 8 bits (GT = greater than, LE = less than or equal) for n GT 8 and n LE 16 lsl #m,dx lsl #(n-m),dx or for n GT 16 moveq #n,dy lsl dy,dx Shifts greater than 8 are common. It would be nice to have an immediate word size shift of up to 31 but it's probably not worth it now. Code fusion is just as fast (but not as small) and will speed up old 68k code. ------- BFIND ff1 Already in CF additions and 68k bfffo is available too. ------- BScc, JScc bcc.b .skip bsr label .skip: Variables passed complicates subroutine calls limiting the use of a conditional subroutine. Predication speeds up this combination already. ------- Jcc Would not be used enough. ------- JOIN swap dx move.w dy,dx swap dx JOINB rol.l #8,dx move.b dy,dx ror.l #8,dx SPLIT swap dy mvs.w dy,dx ; ColdFire instruction swap dy SPLITB rol.l #8,dy mvs.b dy,dx ; ColdFire instruction ror.l #8,dy I don't think JOINB and SPLITB would be common enough to add. JOIN and SPLIT are fairly common but I believe it's better to instruction fuse at least the swap + move.w (and maybe swap + move.w) as I mentioned in the instruction fusion thread. This way legacy 68k code benefits too. Using mvs and mvz CF instructions adds flexibility as signed AND unsigned work. It's too bad these 2 CF instructions weren't in the 68k from the beginning. They are work horses and reduce code size a lot. They can replace most of the ext instructions and allow to clear the upper contents of a register allowing for better register utilization... mvs.b dx,dx ;extb.l dx mvs.w dx,dx ;ext.l dx mvz.b dx,dx ;clear the upper 24 bits of a register mvz.w dx,dx ;clear the upper word of a register ------- ORC (NOR) not dy or dy,dx ANDC (NAND) not dy and dy,dx More instruction fusion possible? ------- I think some of the 96000 fpu instructions are interesting but I'll save that for a fpu thread later ;).
| |
Denis Markovic Germany
| | (Natami Team) Posts 41 18 Sep 2010 08:49
| Hi Natami Team, I would like to apply to become member of the Team. Hope this is the right place to ask? I think I could mainly contribute in the areas signal processing implementations (work with DSPs since 99, lately with vector processors), instruction set discussions (started with Amiga 87, 68k, ARM, ... experience) and some floating point issues (started to implement some 16bit floating point stuff in Verilog in a friends open source DSP project, http://code.google.com/p/ajardsp/). /Br Denis
| |
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 18 Sep 2010 12:23
| Denis Markovic wrote:
| Hi Natami Team, I would like to apply to become member of the Team. Hope this is the right place to ask?
|
People sharing our AMIGA ambition and our mind set are always welcome. It would be best to have a quick call to discuss all possibilities in detail. Can you email me your phone number to gunnar@greyhound-data.com ?
| |
Amiga Believer Canada
| | Posts 282 19 Sep 2010 03:56
| The absolute integer value instruction would be useful with stuff like video or audio compression and decompression. For this reason, it can be good to add it to the SIMD unit too.
| |
Matt Hey USA
| | Posts 726 19 Sep 2010 05:17
| Amiga Believer wrote:
| The absolute integer value instruction would be useful with stuff like video or audio compression and decompression. For this reason, it can be good to add it to the SIMD unit too. |
Yes, an integer absolute value instruction would be useful but it's not needed. The 68k is fast and flexible enough to handle this functionality without wasting instruction space on an ABS instruction. The N68050+ will be able to do the functionality in 1-3 cycles using 6 bytes or less. This is very good and faster than the 68060. If you have to have an instruction, I would recommend you use a macro... abs MACRO tst.\0 \1 bpl.b .skip\@ neg.\0 \1 .skip\@: ENDM Now you have your instruction. Use like... abs.l d0 Problem solved. Now please go write us your optimized audio software in assembler!
| |
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 19 Sep 2010 07:37
| Hi Matt, Many thanks for the good ideas. Matt Hey wrote:
| for n GT 16 moveq #n,dy lsl dy,dxORC (NOR) not dy or dy,dx
|
Unfortunately those are difficult to merge.I'll explain why, maybe we can then together create a table of good "fusion pairs" and instructions which we better add new. The 68K ALU is in general designed like this: 2 register read (each 32bit) - operation - 1 register update (32 bit). This means the following: MOVE.L D0,D1 is implemented as SrcA= D0 Srcb= D1 Dst = (SrcA & $ffffffff) But MOVE.B D0,D1 is internally implemented like this: SrcA = D0 SrcB = D1 Dst = (A & $ff) OR (B & $ffffff00) Because of this we have some limitations when we fuse pairs. Rule: each ALU can only update one register per clock. The below would update 2 registers therefore we can NOT merge them.
not dy or dy,dx
SPLITB rol.l #8,dy mvs.b dy,dx ; ColdFire instruction ror.l #8,dy
How about this: MOVE.L Dy,Dx LSR.L #8,Dx EXTB.L Dx This version could have the same flag settings as the SPLIT instruction. I'm not sure if we should go for 6 byte encodings. If we can encode a new instruction using two old instructions in 4byte total - then this is good IMHO. But using 6 Byte might be on the edge. What do you think? Matt Hey wrote:
| I think some of the 96000 fpu instructions are interesting but I'll save that for a fpu thread later ;).
|
Please feel free to share your thoughts about them!
| |
Claudio Wieland Germany
| | (Natami Team) Posts 703 19 Sep 2010 07:49
| We also have to mind limited cache sizes. Smaller code is better.
| |
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 19 Sep 2010 08:40
| Claudio Wieland wrote:
| We also have to mind limited cache sizes. Smaller code is better.
|
This is true. But our free encoding space is also very limited. In the A-range the 68000 room for 2 new full instructions using the form EA,DN and having B/W/L support. This means we can not add many instructions that encode in 16bit. But we can add many instructions that encode in 32bit. Simple instructions that only use 1 operant like ABS Dn need less encoding space. We can certainly add a few of those in 16bit still.
| |
Denis Markovic Germany
| | (Natami Team) Posts 41 19 Sep 2010 09:37
| A few quick thoughts about the instructions: Gunnar von Boehn wrote:
| ABS Absolute integer Value |
Very useful for DSP stuff, e.g. power density spectrum, L1 norm as long as there is no sum of absolute differences instruction (can be used in e.g. speech recognition, video coding, ...) Gunnar von Boehn wrote:
| SHIFT Immidiate Shift allowing > 8 bits |
Obviously a good idea :) Gunnar von Boehn wrote:
| FF1 Find First 1 |
I guess something like count leading zeros (i.e. count number of most significant 0 bits in a variable)? Very important for block floating point implementation, i.e. normalization of a block for higher precision/higher dynamic range; it would also be very useful, to have the instruction for signed numbers, i.e. if the number is positive return number of most significant 0 bits (minus 1), if it is negative return number of most significant 1 bits (minus 1) If you want to save space for op code, one instruction could do both and put the unsigned result (# leading 0) in the low part and the signed result in the high part of the result ... Important in this context: array minimum/maximum search (i.e. example pseudocode for findmax.w var0,cntvar,var2 if(var0[31:16] > var2[31:16]) { var2[31:16] = var0[31:16]; var2[15:0] = cntvar[15:0]; cntvar++; } if(var0[15:0] > var2[31:16]) { var2[31:16] = var0[15:0]; var2[15:0] = cntvar[15:0]; cntvar++; }); This would be extremely useful to find the range of an array and to do block floating point. even better to make a minmax instruction (for .l, .w, .b) that does a parallel search on min/max. What do you think about that idea?One drawback is the 3 operands; if we don't want that, we could have special DSP registers to e.g. store minumum/maximum value and the minumum/maximum index; after looping with this instruction you could simply read the values from that memory mapped register; while this is not very 68k like it would be very powerful for signal processing and save a lot of opcode space. We could have 3 modes (similar to ARM): 68k basic version, 68k basic plus DSP instructions (with memory mapped or special registers or registers could be readable/writable with an extra move), 68k basic plus DSP plus SIMD? Gunnar von Boehn wrote:
| BScc Conditional BSR Jcc Conditional JMP JScc Conditional JSR JOIN Src.LW -> Dst.HW JOINB Src.LB -> Dst.HB ORC Or with Complement SPLIT Src.HW -> Dst.LW (sign extented) SPLITB Src.HB -> Dst.LB The detailed description of the instruction is in the 96000 user manual (see freescale website) So what are your thoughts? |
Hm, not so many thoughts on the rest of the instructions.
| |
Matt Hey USA
| | Posts 726 19 Sep 2010 10:37
| First off, I should mention from my previous post... swap + move.w (and maybe swap + move.w) should read... swap + move.w (and maybe move.w + swap) I couldn't edit after someone else posted. I also had to use GT and LE instead of the greater than and less than signs because they are interpreted as html codes by the forum. Gunnar von Boehn wrote:
| Matt Hey wrote:
| for n GT 16 moveq #n,dy lsl dy,dx ORC (NOR) not dy or dy,dx |
Unfortunately those are difficult to merge. |
I see. I don't think it's such a big loss in the case of ORC and ANDC. The blitter or other gfx hardware can be used if there is a lot of these logical operations. The shifting greater than 8 is more of a problem. As much as I want to limit adding new instructions, I think this a prime candidate. A single word encoding would save code space in the case of 2 consecutive shifts too. I'm thinking the RTM + CALLM or A_ instruction space looks like it might have room. The asl and lsl could have the same encoding. Immediate 16 (swap) and 32 (useless) would not be needed. That would leave 9-15 and 17-31. I don't like how leaving rotate out would be unbalanced and how the encoding would be ad hoc. Maybe someone will have a different idea. SPLITB rol.l #8,dy mvs.b dy,dx ; ColdFire instruction ror.l #8,dy How about this: MOVE.L Dy,Dx LSR.L #8,Dx EXTB.L Dx This version could have the same flag settings as the SPLIT instruction.
|
There are 2 "upper" bytes so there could be a SPLITB.L and and SPLITB.W. My code was for the former and yours the latter. Either one could be encoded differently... SPLITBS.W (signed) move.l dy,dx lsr.l #8,dx extb.l dx SPLITBZ.W (unsigned) move.l dy,dx lsr.l #8,dx mvz.b dx,dx SPLITBS.L (signed) move.l dy,dx rol.l #8,dx extb.l dx SPLITBZ.L (unsigned) move.l dy,dx rol.l #8,dx mvz.b dx,dx I don't think the byte versions of SPLIT and JOIN are common enough to fuse let alone create a new instruction. The CF instructions mvz.b and mvs.b already make dealing with the byte easier and using a few instructions is more flexible and less confusing. Another possibility to look at is the bit field instructions. BFEXTU and BFEXTS would already be smaller than 3 instructions. How fast will the bit field instructions be on the N68050 to a register? I'm not sure if we should go for 6 byte encodings. If we can encode a new instruction using two old instructions in 4byte total - then this is good IMHO. But using 6 Byte might be on the edge. |
I never suggested a 6 byte encoded instruction. 4 bytes is quite enough for an instruction. I was stating that a macro would take at most 6 bytes and opposed creating an ABS instruction. Anything besides a 2 byte ABS instruction would not make sense.
| |
Phil "meynaf" G. France
| | (Natami Team) Posts 393 19 Sep 2010 12:18
| ORC (Or with Complement) does not seem very useful to me. ORC d1,d0 equals (if you want to change only 1 reg) : not d0 and d1,d0 not d0 And is basically the "implication" logical operator (unless i'm mistaken).Gunnar von Boehn wrote:
| In the A-range the 68000 room for 2 new full instructions using the form EA,DN and having B/W/L support.
|
You really want to break 68k Mac emulators, don't you ? ;-)Gunnar von Boehn wrote:
| Simple instructions that only use 1 operant like ABS Dn need less encoding space. We can certainly add a few of those in 16bit still.
|
Area used by things such as ff1, bitrev, byterev have a few places left. So abs could fit in here. I'd also recommend adding a new BitCnt (bit counter), PopCnt if you prefer to name it like that.
| |
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 19 Sep 2010 12:40
| Phil G. wrote:
| Gunnar von Boehn wrote:
| In the A-range the 68000 room for 2 new full instructions using the form EA,DN and having B/W/L support. |
You really want to break 68k Mac emulators, don't you ? ;-) |
I think this is solveable. :-)The new A-line instructions could be enabled or disabled with a special Bit in the SR register. This way each task would could decide to be in AMIGA or MAC mode. The AMIGA OS could default to enabled, and the MAC-Emulator would disable them. Should be simple and fully compatible.
Phil G. wrote:
| Gunnar von Boehn wrote:
| Simple instructions that only use 1 operant like ABS Dn need less encoding space. We can certainly add a few of those in 16bit still. |
Area used by things such as ff1, bitrev, byterev have a few places left. So abs could fit in here. I'd also recommend adding a new BitCnt (bit counter), PopCnt if you prefer to name it like that. |
Yes, this makes perfect sense to me. Popcount sounds very good. So which encoding would you propose for POPCOUNT and ABS?Regarding the Shift. I wonder if adding an "extend" opcode would make sense. Lets say we identify a 16bit opcode which is so far unused and has some free bits. This opcode could be used to context sensitive extend the existing instructions: Adding this opcode to the existing SHIFT #1,(ea) instruction would make it to an SHIFT #n,(ea) And other instructions could be extended like this also ... Just a thought.
| |
|
|
|
|