| 68060 + 68050/70 N Core | page 1 2 3 4 5 6 7
|
|---|
|
|---|
Cesare Di Mauro Italy
| | Posts 526 04 Sep 2010 20:07
| Gunnar von Boehn wrote:
| | The FPGA design gives us immensense power and incredible flexibility. Problems which are impossible to solve in systems build from 'stock-parts' will suddenly be just weekend tasks to solve in an FPGA design. |
It works only if you threat FPGA as a SoC. You have a single chip, so you can address problems in simpler ways because you have ALL under your control.A multichip Natami is a different beast, and will share the SAME problems of SMP systems. So, keep in mind that your solution is just an hack, something that works only because you have a single chip system.
| |
Claudio Wieland Germany
| | (Natami Team) Posts 703 04 Sep 2010 20:24
| You just stated what Natami is: a SOC. A N-core Natami is still a SOC. So your objection does not make sense, because we will 100% NOT make a multi-FPGA Natami. Cheers
| |
Cesare Di Mauro Italy
| | Posts 526 05 Sep 2010 05:55
| You are talking about now, but FPGAs are limited, and if you want to scale beyond (after the first Natami version was released) you need to think about multichip systems.
| |
Claudio Wieland Germany
| | (Natami Team) Posts 703 05 Sep 2010 06:38
| We have now reached a point in the development of FPGA chips, where the exponential growth of their capabilities and size is already much faster than what we will be able to put and integrate into them. Using the words "multichip" and "FPGA" together defeats the purpose of using FPGAs at all, because their strength and main purpose is to put into one single chip what took multiple chips before -> SOC. The FPGA we are using have to route and connect logic in "2D", because they are kind of flat. The inevitably more advanced stacked "3D" chips (virtual or real) lead to not only squared growth, but *cubic* growth of size and capabilities. Thus, your sorrows are unfounded. Please understand this :) . Cheers
| |
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 05 Sep 2010 06:40
| Cesare Di Mauro wrote:
| You are talking about now, but FPGAs are limited, and if you want to scale beyond (after the first Natami version was released) you need to think about multichip systems.
|
No, the opposite is true. FPGA are huge! FPGA siye doubles every 1.5 years. If you take one big FPGA today then you can already put 32 CPU cores in it. The fact that FPGA's are getting bigger all the time is the reason why we are talking about MAID-CPU-Cores. FPGA are soon so big that to fill them we will need to add a couple extra CPU cores. Let me give you some examples: There are 3 main groups of FPGAs. * The entry level families called Cyclone and Spartan. These FPGA chips range from 10,000 to 150,000 LE. * The mid range level family is called ARRIA and Xiliunx will bring out a competitor in this range too. These FPGA chips range from 30,000 to 300,000 LE. * The top range family are Stratix and Virtex These FPGA cange up to 2,000,000 LE Now for comparision a completet TG68 CPU has a size of less than 4000. The 68050 has a size of about 8000. The logic element size of the FPGA is getting smaller each generation. Today the entry level FPGA are produced in 60nm, the midrange are produced in 40nm. FPGA in 28nm are announced already. According to the FPGA vendors roadmap 20nm FPGA will follow quite soon. What does this mean? This means as of today you need an upperclass FPGA of the entry range to fit the SAGA chipset, the 3D core and the CPU in it. If you take the biggest FPGA of the entry range then you could as of today add up to 4 maid cores. A big FPGA of the high end range is big enough for 30 Maid-Cores. In three years the smallest available FPGA will probaly have enough room for the whole NATAMI. And a little bigger FPGA will be big enough to hold a couple of MAID-Cores. Is this clear now?
| |
Cesare Di Mauro Italy
| | Posts 526 05 Sep 2010 08:09
| I know FPGAs evolution, so I need to be more clear. Think about a "Robin" coprocessor with 32 512 bits registers configurable as 16 FP32 / Int32 data, or 8 FP64 / Int16 data to do some SIMD-like work, with FMAs (which are better than FMADDs), and 4 threads. Think about an updated 3D core with DirectX 11 / OpenGL 4.1 / OpenCL 1.1 like capabilities (unified shaders). They are state-of-the-art technologies that I think that Natami can integrate too. Question: how much of it can be possible on current high-end FPGAs, and next-generation 20nm ones?
| |
Claudio Wieland Germany
| | (Natami Team) Posts 703 05 Sep 2010 08:32
| I don't think that it is possible to give any meaningful answer to your question. Natami is an evolving system, and this evolution takes time. Attempting reliable projections about what Natami will be in several years from now, is just not possible. However, we know some things for sure: o) Natami is an evolving and adapting system o) FPGA technology gets exponentially more powerful in time So we will proceed slowly but steadily and try to give Natami the best and most powerful capabilities that fit into a FPGA. Everything else is pure conjecture. Cheers
| |
Richard GATINEAU France
| | Posts 107 05 Sep 2010 08:53
| Is that in the same family, FPGAs are pin compatible? If yes, perhaps it would be interesting to put it on a socket to be able to upgrade it?
| |
Claudio Wieland Germany
| | (Natami Team) Posts 703 05 Sep 2010 09:12
| I fear this is technically not possible. The FPGAs needed for Natami have 700+ pins in (fine) ball grid array packages (BGA). Those are intended for being precision soldered onto a PCB.
| |
Marcel Verdaasdonk Netherlands
| | Posts 3974 05 Sep 2010 09:31
| Richard Maudsley wrote:
|
Marcel Verdaasdonk wrote:
| Richard that is what drivers are for. |
Marcel how can a driver allow SMP?
|
SMP isn't the goal IMHO, it's the bridge too far.(operation marketgarden)
| |
Richard GATINEAU France
| | Posts 107 05 Sep 2010 09:56
| So, with a daughter board ? It's what Natomi doing with the 68060 CPU if I'm not wrong ?
| |
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 05 Sep 2010 10:21
| Richard GATINEAU wrote:
| So, with a daughter board ? |
FPGA family members are in general only partially pin kompatible. A socket is not a feasable solution because of two reasons: A) A socket does reduce signal quality which means the system clockrate and perfromance will suffer. B) A good socket for such an FPGA as very expensive and would make the product significant more expensive. Different FPGA families or different generations of the same family are also not electrical kompatible. What would be possible is producing different batches of NATAMI with compatibel FPGA. E.g one couldl solder 100 Units with a 40K FPGA and 100 Units with a 60K FPGA. But this would divide the user community. This would lead to unessary issues. The bigger NATAMI types could then host more CPU units and people will write software that would require them. Dividing the community - is not want we want. We want a stable platform which is 100% compatible. Therefore our plan is to produce only one type of NATAMI for a very long time. All those NATAMI units will be compatible to each other. Only after a couple of year we want to bring out a new NATAMI model - and then we will do a bigger technolgy jump. This means people buying the NATAMI this year should have fun with their board for the next 2-3 years. :-D
| |
Richard GATINEAU France
| | Posts 107 05 Sep 2010 11:44
| Thank you very much for your answer. :)
| |
Børge Nøst Norway
| | Posts 53 17 Jul 2011 23:12
| Just a tech question: Is the design pipelined? (I kinda expect so.) If so, have you thought about stealing some ideas from Intel and the P4? Instead of storing the "raw" bits from memory in the instruction cache, could it be possible to "shorten" the pipeline depth by having one of the pipeline steps in-between the memory interface and the cache by doing a part of the decoding and storing a decoded format in the instruction cache? The memory used by the instruction cache would probably grow a good deal compared to the usual way.
| |
Megol .
| | Posts 671 18 Jul 2011 12:07
| Børge Nøst wrote:
| Just a tech question: Is the design pipelined? (I kinda expect so.) If so, have you thought about stealing some ideas from Intel and the P4? Instead of storing the "raw" bits from memory in the instruction cache, could it be possible to "shorten" the pipeline depth by having one of the pipeline steps in-between the memory interface and the cache by doing a part of the decoding and storing a decoded format in the instruction cache? The memory used by the instruction cache would probably grow a good deal compared to the usual way.
|
It's pipelined and designed to support RMW instructions without splitting. The design is IIRC using a predecoded instruction cache, going full out trace cache isn't really beneficial for small width scalars/superscalars IMHO.
| |
Samuel D Crow USA
| | (Natami Team) Posts 1295 18 Jul 2011 14:53
| Keep in mind when "stealing ideas", that we are subject to patent laws just like everyone else. The project is a "go" as long as we are stealing ideas from old expired Amiga patents. Anything newer than that is a "no go".
| |
Amiga Blitter Italy
| | Posts 34 19 Jul 2011 13:26
| Cesare Di Mauro wrote:
| I know FPGAs evolution, so I need to be more clear. Think about a "Robin" coprocessor with 32 512 bits registers configurable as 16 FP32 / Int32 data, or 8 FP64 / Int16 data to do some SIMD-like work, with FMAs (which are better than FMADDs), and 4 threads. Think about an updated 3D core with DirectX 11 / OpenGL 4.1 / OpenCL 1.1 like capabilities (unified shaders). They are state-of-the-art technologies that I think that Natami can integrate too. Question: how much of it can be possible on current high-end FPGAs, and next-generation 20nm ones?
|
Go search in youtube for "fpga raytracing". Four Virtex 5 are needed for 640x480 realtime raytracing.
| |
Børge Nøst Norway
| | Posts 53 19 Jul 2011 17:14
| Megol . wrote:
| | The design is IIRC using a predecoded instruction cache |
Can you elaborate a bit more what you mean with that?
| |
Deep Sub Micron Germany
| | (MX-Board Owner) Posts 566 19 Jul 2011 21:01
| Børge Nøst wrote:
|
Megol . wrote:
| | The design is IIRC using a predecoded instruction cache |
Can you elaborate a bit more what you mean with that?
|
There is a little predecoder (or better 4 of them) that calculate some hint bits for every 16bit word put into instruction cache. These bits are used to calculate the next address to fetch opcodes from. The rest of the opcode is stored as is. The idea of making an instruction cache that contains kind of translated micro opcodes is more difficult. One problem is the mapping of instruction address to micro op instruction address. Another problem is that an opcode with extension word can later on interpreted different. That happens when the program flow somehow branches to an extension word and treats that as an instruction. So no, it is not a trace cache, that would solve things like that.
| |
Børge Nøst Norway
| | Posts 53 20 Jul 2011 17:01
| deep sub micron wrote:
| There is a little predecoder (or better 4 of them) that calculate some hint bits for every 16bit word put into instruction cache. These bits are used to calculate the next address to fetch opcodes from. The rest of the opcode is stored as is.
|
That is halfway to what I had in mind!
deep sub micron wrote:
| Another problem is that an opcode with extension word can later on interpreted different.
|
I know, which is why I said the memory need would expand. You would need to store a decoded instruction for every 16 bits, even if it is part of a previous instruction. (SAS/C does the code trick with branch/cmpi.l)
| |
|