| Hi Bernd, nice to see you. bernd afa wrote:
| that sound good, its also possible to change the gcc to avoid slow instructions.
|
Yes, this is true. But this should not be needed - as the 68050 does not really has slow instructions. :-) Maybe my previous post was a bit misleading? Let me try to clarify this: The design of the 68050 is optimized that EVERY of the above listed Address-modes is for free and that every "normal" instruction needs 1 clock only. Example: add.l 64(PC,D1*8),D0 = 1 clock total - the address mode is for free. For comparison, the Motorola 68040 CPU needs 5 clocks for the above instruction. If you compare the 68040 with the 68050 then the 68040 have many instruction which took 1 clock - which is fast. The 68040 also has many instruction which need 2 - 6 clocks. The majority of these instruction now only takes 1 clock on the 68050. If you compare the 68040 with the 68050 - then the 68050 is designed to be lot faster clock by clock. Of course the 68050 can be clocked a lot higher also. The only address modes which are slow on the 68050 are the memory indirect modes that do not use the brief extension words format. But these modes were always very slow. And these modes were IMHO very rarely used anyway. I'm not sure if GCC does use them at all. Code generation for the 68050 should be easier than for any previous 68K CPU. What we could do is remove a few of the performance hacks which are in GCC - as they are unneeded now. For example: On 68050 MUL is as fast as it can get. On older 68K CPU like the 68040 it make sense to replace MUL #3,D0 with something like: move D0,D1 add D0,D0 add.l D1,D0 The 68k backend of GCC has a few of these hacks. But none of them is needed for the 68050 any more. The timing calculation for the 68050 is very simple: - Every normal instruction needs 1 clock. - Instructions that do 2 memory accesses to different location need 2 clocks (e.g. move mem,mem) - There are a few instructions with need 2 clocks (e.g PEA) - Divide needs more than 1 clock of course. :-) The timing for div is currently not final - Our goal is to be in the range of 10 clocks. - There are a few instructions which we regard as obsolete: E.g TAS, and the BCD instructions. Timing of these is unsure at the moment. bernd afa wrote:
| I think a FPU very important, a strip down FPU that support only the coldfire adressing mode is maybe more easy to add and is at first usefull and maybe can add in the GCC 68k backend.
|
I agree... My opinion on the 68K FPU is the following: I think the way 68K FPUs are designed, they are difficult to hugely improve in speed. Therefore if we really need a huge FPU performance then the best way it to go for a new, separate SIMD-FPU. For running legacy FPU code running the FPU code with the integer unit is IMHO good enough. If we provide a slightly HW-acceleration integer emulation of the FPU codes than this is equivalent to the "microcoded" 68K 68881 FPU designs. So basicly what we get can/will be about the same as before. I think the question boils down to what we want. A) If we need a solution to transform huge amounts of Float matrix code e.g. needed for a 3d game - then a dedicated SIMD-FPU unit is the way to go. Such a unit will be do the job 10 times faster than the fastest 68060 was ever doing it. B) If we just have to execute some light 68K-FPU workload typical for normal a C program that mix integer with some float calcualtions, then a mix between microcode and SW emulation should be fully fast enough for this. What is your opinion on this? Do you see any other requirement? The question simply is: Is there FLOAT code that is uses huge amounts of float and is performance critical. This would be code where we need a lot more power than 68040 did had which can NOT be off-loaded to a dedicated unit or re-compiled to go through the integer pipe. The beauty is our design is FPGA based. If we ever need a real fully pipelined HW-FPU we could still work on this later. bernd afa wrote:
| If your CPU in FPGA is ready there is also a good possible chance, that there is a market outside amiga.FPGA are produce in masses, so they are cheap. there are still many that use m68k for embedded and like faster speed.HP for example have licence coldfire V5 for many printers. I dont see Coldfire V5 on freescale page but HP use coldfire V5 with over 500 MHZ.
|
I know the Coldfire V5 as I took part in the V5 developer program. The V5 is quite a beast - there were V5 with 2nd level cache and they outran the smaller PowerPC used in EFIKA and SAM. As long as our core is inside an FPGA - we play in a league below the V5. Target for the 68050 in a lower cost FPGA is 133 MHz. The 050 should perform a lot better than a 68040 with 133Mhz. Simply because: big cache, faster instructions (less clocks), faster memory. The 68050 will be very nice for an AMIGA. The 68050 will also outrun an Coldfire V5 on 68k legacy code - on which the Coldfire would need to use emulation. bernd afa wrote:
| If your CPU is cheaper for HP, so wy HP should not use your CPU design ?
|
Yes .... but ... Actually the Coldfire are quite cheap if you buy them in volumes as HP did. Of course if you would "bake" an ASIC of the 68050 then in colume the chip would very cheap also and in an ASIC we could reach a comparable clockrates to the V5. But are baking ASICS does cost serious money.... bernd afa wrote:
| I dont know how many HP must buy to use the coldfire technoligy.
|
The V5 is only available to big customers which buy chips that are exclusively produced for them.bernd afa wrote:
| Most important for a CPU is, that it is cheap and fast.and if you can add in the FPGA core the most needed stuff as DCT, YUV, H264 deblocking or else what need to decode blueray, DVD fast, then there is no high clockrate need.
|
I fully agree. I think the 68050 will be cheaper for us than an external CPU will be. Compared to legacy 68K CPUs or Coldfires in 68K emulations mode the 050 is fast. :-) Playing Blueray is maybe a bit high target - Playing regular DVD is reasonable to achieve IMHO.Regarding our Milestone timeline: IMHO we are about halve way through the testcases for Milestone 1. Both Jens and myself have still bug to fix for the 1 release. If no other issues show up the Milestone 1 is very close. Its difficult to say if Milestone 1 will be in 1 or 2 weeks or in 3-4 for. Also we of course do not work full time but only in our spare time. I have to say I'm very surprised how quickly and fast our development was so far. The CPU is fully "architectured" we are more or less only implementing already designed stuff now or fixing small issues. Maybe it was luck or maybe its because Jens is a Pro - but I had expected that for a weekend project designed a top notch CISC CPU would have taken us much longer. Cheers
|