| 68060 + 68050/70 N Core | page 1 2 3 4 5 6 7
|
|---|
|
|---|
SID Hervé France
| | Posts 666 31 Aug 2010 18:50
| Are you going to include new 68k instructions and advise one or more methods to utilize parallelism and facilitate its use in a multitasking environment? Mercis.
| |
Marcel Verdaasdonk Netherlands
| | Posts 3991 31 Aug 2010 21:16
| CLICK HERE there is also a pdf with the instructions in existence.
| |
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 01 Sep 2010 06:49
| SID Hervé wrote:
| Are you going to include new 68k instructions ... |
So far we came up with the following enhancements: 1) Many instructions which previously could only operate on DATA registers can now also operate on ADDRESS registers. 2) The CPU can now also write to PC-relative addresses. This makes the 68K easier to program as all address modes now behave the same. And this allows GCC to utilize PC relative address modes properly. 3) DRBA.L updates the whole LW of the count registers. This allows writing of generic code which support INTs. 4) Extended range of BRA.b, BSR.b, BCC.b. These instruction can now jump twice as wide. This allows more compact code as often the .W version is not needed. 5) FF1 find first 1. This is a Coldfire instruction. 6) MVZ, MVS. Move with zero extend and move with sign extent. These are Coldfire instructions. 7) BYTEREV and BITREV. Useful instructions not only for Endian conversion. These are Coldfire instructins. 8) For the 68070 we following pipeline enhancement is on the plan. Two instructions following each other where the first is a MOVE or MOVEQ and both using the same destination will be merged and executed as one instruction. Example: MOVE.L D0,D7 ADD.L D1,D7 Another example: MOVEQ #-3,D2 AND.L D0,D2
| |
Marcel Verdaasdonk Netherlands
| | Posts 3991 01 Sep 2010 07:17
| Gunnar what about these instructions? CMP2.B CMP2.W CMP2.L CHK2.B CHK2.W CHK2.L RTM CALLM CAS2.B CAS2.W CAS2.L MULU 32*32=>64 MULS 32*32=>64 DIVU 64*32=>32:32 DIVS 64*32=>32:32 BRA.B2 BSR.B2 BCC.B2 As listed in the Decoder document, I only listed instructions which had not been flagged available in previous irretations of the design.
| |
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 01 Sep 2010 07:20
| The following other enhancements were discussed and on the todo for the future: 1) Doubling the FPU registers to support 16 registers. Having 16 FPU registers does greatly improves performance when doing typical 3D MATRIX operations. 2) Enhancing the FPU to generally support 3 operant instructions. This makes many FMOVE instructions in workloop unnessary and allows more compact code. 3) Super Pipelining the FPU to tripple throughput. 4) Maybe adding a FMADD and FMSUB instruction to the FPU 5) Maybe adding a lower precision FSQRT instruction to the FPU to improve performance 6) Parallel execution of DIV instruction. DIV is the slowest integer instruction, this enhancements could allow the CPU to execute it in parallel. 7) Parallel execution of MOVE instructions missing the cache. This instruction would allow algogrithms to continue on cache misses and could be used to write much faster algorithms. 8) Great enhancements on the MOVE16 instruction allowing to double memory throughput. 9) Maybe adding an extension words which will allow doubling the interger registers and allow extending of mostly all instruction to a 3 operant form. This enhancements has to be discussed more.
| |
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 01 Sep 2010 07:27
| Marcel Verdaasdonk wrote:
| Gunnar what about these instructions? |
RTM CALLM
These are useless. These instruction were "accidently" added to the 68020 and removed in every later 68K CPU. Like the 030/040/060 we are not going to support them. No compiler does use these instructions.
CMP2.B CMP2.W CMP2.L CHK2.B CHK2.W CHK2.L CAS2.B CAS2.W CAS2.L
These instuction were considered obsolute by Motorola and removed in the 68060 and later CPUs. Like the 68060 we are currently not planning to support them in direct HW. We might support them with Millicode or library.
MULU 32*32=>64 MULS 32*32=>64 DIVU 64*32=>32:32 DIVS 64*32=>32:32
These instructions are very good and we are supporting them directly in HW.
BRA.B2 BSR.B2 BCC.B2
These variantions are new and are our invention. Cheers
| |
Thomas Richter Germany
| | (MX-Board Owner) Posts 1425 01 Sep 2010 07:31
| Gunnar, if your plan to add multiprocessing to Amiga, i.e. run the 060 and 050 in parallel, CAS and TAS would be extremely important to have to synchronize the CPUs. It also means that your bus must support read-modify-write cycles and keep the caches of the CPUs coherent at least for these cycles. So long, Thomas
| |
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 01 Sep 2010 07:52
| Thomas Richter wrote:
| Gunnar, if your plan to add multiprocessing to Amiga, i.e. run the 060 and 050 in parallel, CAS and TAS would be extremely important to have to synchronize the CPUs. |
Yes CAS and TAS are useful for this. And like the 68060 I would leave CAS2 away. Thomas Richter wrote:
| It also means that your bus must support read-modify-write cycles and keep the caches of the CPUs coherent at least for these cycles.
|
This is actually the tricky part. And we have to think about this. As you know the AMIGA chipset did originally not support this. A read-modify write cycle somewhat contradicts both concepts of a DMA system and the concept of a pipelined memory controller. I wonder what the most sensible way is to solve this.
| |
Thomas Richter Germany
| | (MX-Board Owner) Posts 1425 01 Sep 2010 08:30
| Gunnar von Boehn wrote:
|
Thomas Richter wrote:
| Gunnar, if your plan to add multiprocessing to Amiga, i.e. run the 060 and 050 in parallel, CAS and TAS would be extremely important to have to synchronize the CPUs. |
Yes CAS and TAS are useful for this. And like the 68060 I would leave CAS2 away.
|
Unfortunately, the 060 doesn't have CAS2 indeed, even though it would be pretty d*mn useful for fully synchronized exec lists. Too bad. The 060 ISP has an emulation for it, but it requires explicit bus-locking which can be software-controlled by the 060, but the emulation is so slow that it is highly undesirable for the purpose at hand, namely for fast non-blocking multiprocessing. )-: Gunnar von Boehn wrote:
| This is actually the tricky part. And we have to think about this. As you know the AMIGA chipset did originally not support this.
|
Yes, I know. But I afraid for multiprocessing - at least for efficient multiprocessing - there is not any other good way.Gunnar von Boehn wrote:
| A read-modify write cycle somewhat contradicts both concepts of a DMA system and the concept of a pipelined memory controller.
|
Yup. I know... )-: Not yet any good idea here either.So long, Thomas
| |
Deep Sub Micron Germany
| | (MX-Board Owner) Posts 567 01 Sep 2010 09:54
| Thomas Richter wrote:
| Gunnar von Boehn wrote:
| A read-modify write cycle somewhat contradicts both concepts of a DMA system and the concept of a pipelined memory controller. |
Yup. I know... )-: Not yet any good idea here either.
|
In a cache coherent system it can maybe some kind of cache line protection. The processor doing the read modify write snoops the bus and if another one accesses the same cache line as the read then it adds wait cycles to that access. This protection is active until the write happened. So the bus is not blocked except for this single cache line.
| |
SID Hervé France
| | Posts 666 01 Sep 2010 11:59
| I still think I misstated my question. Historically, multitasking is run on one CPU with one core. But the advent of 68050 will complicate things a bit. If I consider the FPGA as a hardware driver that: A) For any program Amiga: a) the FPGA acts as 680x0 (with one core enabled) by default (mainly for the OS). b) each new program is automatically executed on a vacant core B) For any program NatAmi, the latter shall request the provision of one or more additional cores to the FPGA. C) For any program compatible Amiga and NatAmi, it is a solution based on previous proposals. For multitasking and its context switching, I assume that the backup of the processor (FPGA) state (PC, registers...) involve every core. It seems so simple that I prefer to ask advice before proceeding.
| |
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 01 Sep 2010 12:25
| To be honest I'm not sure what has to be done to make AMIGA OS SMP compliant. A "simple" solution which is prooven to work is to dispatch dedicated job to a slave core - as it was done on PowerUp cards. I think one of the challenges to make AMIGA OS SMP working is that internal structures are sometimes protected by usages FORBID and PERMIT. In an FPGA based system like the NATAMI one could solve this by simply freezing all cores except the one calling the FORBID. Then FORBID would be SMP save. This could be done in the FPGA backward compatible without needing to patch the OS. Maybe there are other things that need to be handled. Maybe someone else with full knowledge of the OS can chime in here? Thomas was else wis needed?
| |
SID Hervé France
| | Posts 666 01 Sep 2010 12:57
| What I understood: The OS is not affected by the bowels of the FPGA. You have inserted an interface before the cores which, by default (ie in the absence of explicit request), send the code on one core. Is it right?
| |
SID Hervé France
| | Posts 666 01 Sep 2010 14:18
| Gunnar von Boehn wrote:
| I think one of the challenges to make AMIGA OS SMP working is that internal structures are sometimes protected by usages FORBID and PERMIT. In an FPGA based system like the NATAMI one could solve this by simply freezing all cores except the one calling the FORBID. Then FORBID would be SMP save. This could be done in the FPGA backward compatible without needing to patch the OS.
|
Radical solution but would not touch the OS (in a first time ... Thomas?) And instead of freezing the other cores, a backup of their status would be more appropriate. The routine between FORBID and PERMIT could, in this way, have all the power available. Perhaps a wired logic would be more appropriate.
| |
Thomas Richter Germany
| | (MX-Board Owner) Posts 1425 01 Sep 2010 16:17
| SID Hervé wrote:
|
Gunnar von Boehn wrote:
| I think one of the challenges to make AMIGA OS SMP working is that internal structures are sometimes protected by usages FORBID and PERMIT. In an FPGA based system like the NATAMI one could solve this by simply freezing all cores except the one calling the FORBID. Then FORBID would be SMP save. This could be done in the FPGA backward compatible without needing to patch the OS. |
Radical solution but would not touch the OS (in a first time ... Thomas?) And instead of freezing the other cores, a backup of their status would be more appropriate. The routine between FORBID and PERMIT could, in this way, have all the power available. Perhaps a wired logic would be more appropriate.
|
The problem with Forbid() is that it is not a function. It is a makro. (addq.b #1,TDNestCount(a6)). Thus, even if you would patch over Forbid() with massively magic code, the problem wouldn't go away because a good deal of Os functions do not even call it to implement a Forbid(). They just increment a counter in SysBase.Basically, it boils down to: Only a single core can use AmigaOs, and that is it. Second problem: Interrupts. Who is handling an interrupt, please? Nothing worse than having two cores trying to serve interrupts on related resources. For example, which core is reseting the Paula interrupt request word? Alternative solution one: Write a new Os, emulate AmigaOs within a virtual machine, and run two instances. No, wait, UAE already exists - too bad. Alternative solution two: Live with the restriction that any secondary core has to stay away from the system libraries (possible exception: math stuff that doesn't call anything else), and provide a very simple library or device interface that allows to off-load purely computational tasks to a second core. Fire off the core, let it run, collect results later: SendIO() with io->io_Request = CMD_COMPUTE. (-; Greetings, Thomas
| |
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 01 Sep 2010 16:35
| Thomas Richter wrote:
| The problem with Forbid() is that it is not a function. It is a makro. (addq.b #1,TDNestCount(a6)). Thus, even if you would patch over Forbid() with massively magic code, the problem wouldn't go away because a good deal of Os functions do not even call it to implement a Forbid(). |
I would not solve this in software but in HW. The beauty of having our design in an FPGA is that we can add special features to solve issues like this easily. I think there are several simple ways to solve this: We can implement this one memory address as interal FPGA register with some logic behind it and the effect that a write to it will freeze the other cores. This could be done in HW in different ways: 1) By using the snoop logic that we want to give the Softcores. 2) As special feature of the memory bus 3) As special register inside the CPU which is mirrored. I think this could be solved relative easy one way or another. We only need to magic register which stores the memory address of the semaphore and once "armed" this will work.
Thomas Richter wrote:
| Basically, it boils down to: Only a single core can use AmigaOs, and that is it. |
If we would only buy CPUs from MOto yes. But as we have full control of the HW we can try to work around this. What do you think? Thomas Richter wrote:
| Second problem: Interrupts. Who is handling an interrupt, please? |
The first CPU core and only the first. Thomas Richter wrote:
| Alternative solution two: Live with the restriction that any secondary core has to stay away from the system libraries (possible exception: math stuff that doesn't call anything else), and provide a very simple library or device interface that allows to off-load purely computational tasks to a second core. Fire off the core, let it run, collect results later: SendIO() with io->io_Request = CMD_COMPUTE. (-; |
Yes, I agree that this is a way to go forward. That is basically what I was thinking for a simple solution. Do you see more stumple stones other than FORBID?
| |
SID Hervé France
| | Posts 666 01 Sep 2010 17:27
| Third proposal : As suggested by Gunnar, a hardware solution with the help of Thomas (thank you). Every application (including the OS) is executed by one core. The use of additional core is left to the discretion of the application (SO, not OS) (so we will laugh). In critical cases previously reported by Thomas (thanks again), an interface, layer or other hardware (name it as you want) is developed accordingly. Too easy... What is (are) the problem(s)?(I'll make coffee, who wants coffee?)
| |
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 01 Sep 2010 17:31
| SID Hervé wrote:
| Third proposal : As suggested by Gunnar, a hardware solution with the help of Thomas (thank you). Every application (including the OS) is executed by one core. The use of additional core is left to the discretion of the application (SO, not OS) (so we will laugh). In critical cases previously reported Thomas (thanks again), an interface, layer or other hardware (name it as you want) is developed accordingly. Too easy... What is (are) the problem(s)?
|
Can you rephrase your post please? I do not understand what you ask.
| |
SID Hervé France
| | Posts 666 01 Sep 2010 17:43
| Gunnar von Boehn wrote:
| Can you rephrase your post please? I do not understand what you ask.
|
I try to say that "how does the material" should not affect the OS.Anyway, a kind of filter (placed between the OS and the 68k) should manage critical situations. Mercis
| |
Marcel Verdaasdonk Netherlands
| | Posts 3991 01 Sep 2010 18:23
| n cores would add n interrupts with would halt the main CPU several times. Then another question which would have priority over the other? The Idea slave is bad. if you would use a clone core to do loop and repeatative task acceleration you would cause less trouble.(suffice it be independent) IMHO i think it is more important to actually finish the 68050 core and then start work on the 68070 as this would be a clean and clear solution.
| |
|