Home   News   Concept   AMIGA-Compatible   Hardware   Forum   Questions+Answers   Pictures   Contact & Team

Welcome to the Natami / Amiga Forum

This forum is for AMIGA fans interested in the new NATAMI platform.
Please read the forum usage manual.



All TopicsNewsQAFeaturesTalkTEAMLogin to post    Create account
Do you have ideas and feature wishes? Post them here and discuss your ideas.

New 68k ISApage  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 
SID Hervé
France

Posts 663
20 Jun 2012 16:33


Matt Hey wrote:

SID Hervé wrote:

 
Marcel Verdaasdonk wrote:

  A clean way of adding extra instructions is to see them as coprocessor instructions IMHO.
 

  I'm not sure that's a good idea because I remember that this way has a significant impact (it was a topic dealing with extension)
 

MOVE16 is implemented a lot like a separate coprocessor so maybe that makes it ok?

I do not know. Does this implies an interrupt?

Marcel Verdaasdonk
Netherlands

Posts 3976
20 Jun 2012 17:56


SID does this generate a interrupt on a 68060 with onboard FPU?

Matt Hey
USA

Posts 734
21 Jun 2012 13:00


SID Hervé wrote:

Matt Hey wrote:

  MOVE16 is implemented a lot like a separate coprocessor so maybe that makes it ok?
 

  I do not know. Does this implies an interrupt?

No. There is no interrupt for MOVE16. It works much like the FPU or MMU coprocessors where the instruction fetch and pipeline are shared with the main processor but at a certain point the work is turned over to the coprocessor to finish. The main processor may or may not be able to do some work in parallel before the coprocessor returns with a result. A coprocessor in this sense is not much different than an executing/functional unit (i.e. integer unit, branch unit). I don't know the technical difference but a coprocessor can consist of several executing units. This method of starting and stopping the coprocessor is simpler and faster for short memory copies. Using the blitter would become much more interesting for large memory copies if fast memory could be accessed also as recently talked about in another thread you started ;).


Marcel Verdaasdonk
Netherlands

Posts 3976
22 Jun 2012 11:05


Matt think you can enforce in the ISA that FPU and SIMD style instructions should use the coprocessor extension?(Keeps instruction space clean)
If the CPU is designed in a manner that handles them external they can be outside the main pipeline, and thus asynchrone to the CPU itself.(Read DMA)

Keeping as much in the 16bit instruction space as possible has a advantage to the fetch of those instructions.

Note: The CpID of 110 and 111 should remain open for user-defined coprocessors.



Matt Hey
USA

Posts 734
23 Jun 2012 10:13


Marcel Verdaasdonk wrote:

Matt think you can enforce in the ISA that FPU and SIMD style instructions should use the coprocessor extension?(Keeps instruction space clean)

I hope think enforcement is possible. I would like to see agreement by the leaders who help create the ISA and first implementations and then followers will want to be compatible. As for FPU, it's pretty much already set as 68060 FPU compatible with coprocessor ID 001. I think it would be good to add 1/2 IEEE FP and at least unsigned longword <-> FP but, besides that, the 68060 FPU is pretty good. I could document the FP instructions for the ISA if or when someone becomes interested :/. More advanced single precision FP processing could be done in a SIMD eventually. The first SIMD would probably be much simpler (integer only and small sizes). I don't know enough about SIMD to design an SIMD coprocessor for the ISA. An SIMD unit would be tricky to design as they evolve as they get bigger. A small SIMD doing byte color component calculations is very much different than a big SIMD doing 3D vectors with FP. Even the instructions used changes. They are more specialized than the CPU or FPU. It might be best not to include an SIMD specification in the ISA at first.

Marcel Verdaasdonk wrote:

  If the CPU is designed in a manner that handles them external they can be outside the main pipeline, and thus asynchrone to the CPU itself.(Read DMA)

That is true. The main CPU pipeline is shared at first but then the coprocessor instructions can execute asynchronously (in parallel). There can be a change/use delay when results are brought back to the CPU pipeline in some cases.

Marcel Verdaasdonk wrote:

  Keeping as much in the 16bit instruction space as possible has a advantage to the fetch of those instructions.

I think it's more about having 16 bit encodings for the commonly used instructions. There is not enough encoding space to have 16 bit instructions in all cases.

Marcel Verdaasdonk wrote:

  Note: The CpID of 110 and 111 should remain open for user-defined coprocessors.

000 MMU
001 FPU
011 68040-68060 MOVE16 instruction
100 CPU32 table instructions
110 ???
111 ???

Why should those coprocessor codes be reserved?


Marcel Verdaasdonk
Netherlands

Posts 3976
24 Jun 2012 02:06


CpID 000 should induce a error code when the type field has a none zero value. so :S
  001 is to be considered the FPU by default this is the only SET in stone CpID.(Legacy compilers)
  besides 110 and 111 which should be user defined are open and should remain open from defined space which means the CPU ISA should make no assumptions on what the CpID are used upon.(software could define this as something specific but this should remain open to the one implementing the ISA)
  All what the ISA should define is what happens when a CpID is given and what happens when a specific type is defined.(Universal co -processor interface sorta say)
 
  On the 68K one could use a 74LS138 to decode the FC0, FC1, FC2 lines external to the CPU thus the ISA should make no direct assumptions on what is where besides what is legacy.
  we could make a list and advice the use of that list but if the implementer uses External F-lines plus decoder all bets are off on which is on what CpID exactly.
  Motorola was smart to say it in the way they did which gives us 001 till 101 for our usage.(5 slots)
  we can define them all now and regret it later, what I say is for now we don't care which the CpID is however each Coprocessor is expected to Behave like this when we give this in the type field.
  000 in the type field means a general instruction for instance.
  So we should more focus on what is in the type field and what the coprocessor response should be.
  Motorola was on the right track, but for instance if we would want to implement a SIMD we will see some short comings in the current scheme used.

And matt your right about SIMD, but this should not be visible in the ISA itself but should be left to the implementer of that unit all we should give them is a interface to exchange information between the coprocessor and the CPU efficiently without too much of a hassle and overhead on both ends.(this is the tricky part)

a summing up, IMHO CpID is something the ISA designer should give very little thought to.
Because there is too much possible coprocessors we should not care what is on what CpID this is a worry for someone writing the software.(I am sure this comment is going to byte me in the future)
Wat we should care about more is that all co processors use the same uniform interface to communicate to the CPU no matter what they actually do.

Marcel Verdaasdonk
Netherlands

Posts 3976
24 Jun 2012 18:52


Okay i am gonna expand on what i said this morning i had slept on it and I know why we should keep the last two Coprocessors slots open for user defined purpose.
  I shall list some of the possible Coprocessor options.
  DSP
  FPU
  MMU
  SIMD
  Instruction Set Extension.
  I am sure i could go on and on, but this also means that there are more possible coprocessors then there are slots.
 
  What should be done is give a clear protocol of how to communicate.
  This for instance would make it possible to interface with the Amiga chipset as a coprocessor on a later revision if implemented on both ends.(entertain yourselfs with this idea. :P)

The CpID is not a coprocessor ID it's an address, we should use identifier keywords to know what specific Coprocessor is located on what slot.(This should be done by software)

Matt Hey
USA

Posts 734
25 Jun 2012 14:00


@Marcel Verdaasdonk
I think reserving the upper CpIDs makes sense. User coprocessors should start allocating CpIDs from the top (%111) down. I don't know that dynamically allocating a CpID or CpID type could work. I'll have to think about it some more. The idea of interfacing with the custom chips is interesting. It could work for the blitter maybe. Doing any kind of wait or interrupt would be tricky though.

I don't think a separate DSP coprocessor would be necessary. It's easier to add DSP like functionality into the integer (main) processor. We tried to do that with instructions like ABS, PERM, SBcc, SELcc, SATS, SATU, MULADD, MULSUB. These are often used for codecs and gfx/sound processing as well which most of us like ;). An SIMD would add even more capabilities in this realm.

I think a way to query processor (including coprocessor) capabilities should probably be defined in the ISA. Most modern processors have this now. Perhaps it should be accessible (read only) without going into supervisor mode. I'm open to suggestions as I have not planned such an implementation.


Marcel Verdaasdonk
Netherlands

Posts 3976
25 Jun 2012 16:48


Matt there are too many options for co processor to fit in the coding scheme hence me advocating for a good universal defined coprocessor interface that all coprocessor could follow and then it wouldn't matter what their CpID is since this is just a address.

You could add a identifier field on the general instruction type.

F-line
  1111

CpID
  000 - MMU, no external cycle!!!
  001 - Assemblers by default would use 001 for the FPU, could in/external cycle
  010/101 - free to assign
  110/111 - user defined

Instruction Type
  000 - Coprocessor General Instruction
  Coprocessor Conditional Instructions, handled by the CPU most of the times.
  010 - Branch on Coprocessor Condition Instruction cpBcc
  001 - Set on Coprocessor Condition Instruction cpScc, cpDBcc, cpTRAPcc
  100 - Coprocessor Context Save Instruction Format cpSAVE
  101 - Coprocessor Context Restore Instruction Format cpRESTORE

Type Dependent
  XXXXXX - Depends on the instruction type

There is plenty of room still, yet i wonder what 011 in the type field would mean.

Ceti 331
United Kingdom

Posts 282
27 Jun 2012 15:40


Has 64bit been discussed ?
I would imagine thats beyond the scope of FPGA.

I remember you guys talking about DSP but then 68k helper cores... whats the thinking on SIMD.. perhaps that is also beyond the scope of FPGA ?

Megol .

Posts 676
27 Jun 2012 17:13


ceti 331 wrote:

Has 64bit been discussed ?
  I would imagine thats beyond the scope of FPGA.
 
  I remember you guys talking about DSP but then 68k helper cores... whats the thinking on SIMD.. perhaps that is also beyond the scope of FPGA ?

Most 64 bit operations are fast enough even for a FPGA softcore. Shifts, multiplication and division requires more effort though. The real problem is making 32 bit operations as fast as possible...

Marcel Verdaasdonk
Netherlands

Posts 3976
01 Jul 2012 06:50


Hm, Matt can you release a striped down version of the ISA too with the bare minimum of instructions to that for a implementer can use a simplified decoder?

Matt Hey
USA

Posts 734
01 Jul 2012 18:54


@Ceti 331
  Megol is right about 64 bit. Operations that the bits can be done independently in parallel work great like logic operations. They are 2x as fast as 32 bits using 2x the logic elements. Shifts, multiplication, division and muxes get expensive very fast. I don't think it's a good idea to do 64 bit in the main integer CPU. It would not be as compatible, easy to encode or have as good of code density and is not as commonly used except for addressing which would cause the biggest problem. Maybe it would be possible to have a 64 bit supervisor mode and give each task it's own 32 bit address space but that would require an MMU (for mapping in resources like libraries), a lot of planning and some big changes to the OS.
 
  Gunnar and Rune Stensland worked on a SIMD coprocessor using 64 bit registers and allowing only 8 and 16 bit integer sizes. These are the sizes that would be fast and not be too big. Some operations were also limited that need a lot of logic. An SIMD is more likely to use muxes. Rune could probably tell you more about the limitations of an SIMD in an fpga. Note that there is generally no condition codes or branches unlike other coprocessors. So much for Marcel's coprocessor standardization ;).
 
 
Marcel Verdaasdonk wrote:

  Hm, Matt can you release a striped down version of the ISA too with the bare minimum of instructions to that for a implementer can use a simplified decoder?
 

 
  That is the eventual plan with the 68kF1 ISA but which instructions should be left away? I was planning on someone developing a simple core (i.e. TG68) to provide some input on what they want defined and is simple enough to implement. Do you have ideas on what should be in a simplified ISA or a specific implementation that should be targeted? Should we try to contact Tobias Gubener (TG68) and/or Jens Schönfeld (Clone-A)? Then again, we haven't received any input from anyone involved with the N68050 on the Natami Team :/.


Marcel Verdaasdonk
Netherlands

Posts 3976
02 Jul 2012 17:22


About F-line codes i only really stated what was already there.
Without standardization of it a lot of the CpID addresses will get wasted.
Standardized this would be less of a problem for the interface between them is described and let software figure out what the unit actually is.(ID field of 3 bits is too short for all modern options available)

Hm, Let's use a knife on the 68K addressing modes.
Make BCD optional.
and remove TAS and CAS instructions.
and maybe remove the F-Line codes too.



Matt Hey
USA

Posts 734
03 Jul 2012 00:52


Marcel Verdaasdonk wrote:

  Hm, Let's use a knife on the 68K addressing modes.

Pre and post indexed memory indirect could be removed from the ISA but the encoding reserved for compatibility. They could be trapped or implemented in hardware only where 68k compatibility is needed. For the complexity, they rarely save cycles or code size. They are useful when low on registers, especially address registers. Many of the ISA changes allow for better register utilization already. Allowing a single prefix to add 3 op instructions similar to Megol's idea could also save registers. On the other hand, the 68060 did implement these address modes but maybe because Unix/Linux used them so much. It's really easy to delete the lines for these address modes in the documentation but not as easy to add them back. All the other addressing modes I think are useful enough. Addressing modes are very powerful as they make most of the instructions more powerful. I don't think we need to "cut" much here. We do need to be careful that new addressing modes don't slow down the effective address calculation.

Marcel Verdaasdonk wrote:

  Make BCD optional.

That's a good point. The BCD instructions are pretty weak and need not be in hardware. Again, the encoding space could be reserved for compatibility whether trapped or implemented in hardware.

Marcel Verdaasdonk wrote:

  and remove TAS and CAS instructions.

I didn't document either of these instructions in the ISA but it would be good to reserve CAS.L, CAS2.L, and TAS for future use. They could be made optional as well.

Marcel Verdaasdonk wrote:

  and maybe remove the F-Line codes too.

I didn't document anything in F-line or A-line. It could be argued that something like MOVE16 should be up to the implementation and it's requirements.

I'm thinking we should limit address register destinations to a longword size. This would simplify address register forwarding (no combining the old register with new) and effective address calculation without hurting the code density much with our word extended to long immediates. This would remove EXT.W An (extend byte to a word) at least.

There are several new instructions that need to be evaluated for usefulness. Some would not be common but are powerful or remove branches which is good.


Marcel Verdaasdonk
Netherlands

Posts 3976
03 Jul 2012 14:47


I was more thinking inline of the Address Register Indirect with Index when i said put a knife to the addressing modes.

Predecrement and postincrement have their uses like user space stacks.
But basically what i mean with this is that some of these could stall a pipeline which is a performance hit.

My personal goal for something like the 68kF1 would be speed and being a lightweight implementation of the 68K ISA.

I am gonna do some foul proposals here.
Remove memory to memory operations in favor of memory to register or register to memory operations to reduce bus load.

IMHO, MOVEP has a reduced usage outside the 6800 Compatibility area.
And i think i shall continue on a later time.



Thomas Richter
Germany
(MX-Board Owner)
Posts 1425
03 Jul 2012 15:32


Marcel Verdaasdonk wrote:

  I am gonna do some foul proposals here.
  Remove memory to memory operations in favor of memory to register or register to memory operations to reduce bus load.

The question here is : how much of the existing software basis do you want to break? Is this for natami? Otherwise, one could argue that Mot did actually just that for the Coldfire, probably in a slightly different direction (Mot did remove the .w and .b, but I believe memory to memory move is still available).

Marcel Verdaasdonk wrote:

IMHO, MOVEP has a reduced usage outside the 6800 Compatibility area.

Some drivers used it, actually.

So long,
Thomas


Marcel Verdaasdonk
Netherlands

Posts 3976
03 Jul 2012 17:17


ThoR you know me, I like playing devils advocate.
Besides that I said reduced usage, I didn't said useless instruction you know there is a subtle difference. ;)

Okay let me explain myself on my eagerness to disallow mem to mem instructions. (Move is the only thing i would allow in this way to save registers)
It is very simple and easy, Memory access latency is a nightmare!!!
I hope that would clear that part up.
Seriously a destination address in mem requires a read and a write cache help elevate this problem to some extend by hiding it.
This is not a real solution, thus I rather see something cheaper on the gate count.

My personal goal for this would be speed and simplicity if neither is available dump it.
And no I am not giving advice this for use in the Natami, hm At best a Maid core perhaps.

Thomas Richter
Germany
(MX-Board Owner)
Posts 1425
04 Jul 2012 09:32


Marcel Verdaasdonk wrote:

  Okay let me explain myself on my eagerness to disallow mem to mem instructions. (Move is the only thing i would allow in this way to save registers)
  It is very simple and easy, Memory access latency is a nightmare!!!

Well, I know. There is just another nightmare, namely supporting correct access error handling. A mem to mem move has two failure points, and you may not be able to recover without restarting the instruction completely, causing a double read from the source - might or might not be sensible. In the same vain, double indirection modes are increadibly complex to handle, and then should be removed. So is movem, because it *also* requires a restart and you do not know (at least for the 040 and 060) *which* of the register loads or saves failed.

However, from a pure pragmatic point of view, this is just not possible. At least mem to mem and movem are too popular to be removed. And I'm at this point not entirely sure about double indirection. Though rarely used, I believe they *might* be used even down in the mot FPSP and ISP codes, so the system emulation itself might depend on them.

If the goal would be to build an entirely new system that does not need to depend on an existing software library, that might work. But then using any other microprocessor would also work, and there are a lot more powerful processors on the market for a cheaper price than an FPGA, so I probably don't see the point.

Greetings from Bilbao,

Thoma


Marcel Verdaasdonk
Netherlands

Posts 3976
04 Jul 2012 17:49


Thomas what could be considered a bare minimum for a 68k compatibility?

For the 68kF1 should follow that what i would say use the 68000 instructions and remove stuff like MOVEP, TAS, ABCD, NBCD, SBCD and then some more.
No new instructions, spartan like the word implies.

This should be the most simple implementation of the 68K just above coldfire perhaps.

No coprocessor support, so there should be no cc codes no F-line etc.

posts 420page  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21