| Comparing the 68060-Fpu to the 68050-Fpu | |
|---|
|
|---|
Angel of Paradise Germany
| | Posts 61 29 Mar 2011 12:21
| Hi, You posted about the new 68050-Fpu. How does it compare to the 68060-Fpu? Does it include all instructions of the 68060? What precision does the 68050 FPu support? Thanks in advance
| |
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 29 Mar 2011 15:17
| Angel of Paradise wrote:
| Hi, You posted about the new 68050-Fpu. How does it compare to the 68060-Fpu?
|
Both are similar. First of all the 68060 FPU is done. The 68050 is nearly done. It will certainly take a few more weeks for it to get 100% finished. Angel of Paradise wrote:
| Does it include all instructions of the 68060?
|
Yes. Actually we plan to include even instructions that the 68060 had only in software. Angel of Paradise wrote:
| What precision does the 68050 FPu support?
|
Currently the FPU does support full 80Bit precision. We consider to offer a "light" version which has 64bit precision. Today its common to have 64bit FPU precision. PowerPC and many other systems have 'only' 64bit. This means for "normal" software this precision is fully enough. The advantage of the 64bit precision would be saving in chip real estate - which is at least interesting. I wonder what the experienced FPu coder think about this. Feedback?
| |
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 31 Mar 2011 09:56
| Looking at our current prelimenary performance numbers you can also start estimating a comparsion to the very popular 68882 FPU. 68882 68050 Latency FMOVE mem,reg 40 1 FADD 75 1 8 FSUB 75 1 8 FMUL 95 1 8
Because of different sequential dependancies in different code the the FPU performance ratio from a 68050 to a 68882 will vary. With full sequential code the 68050 is only about 10 times faster than a 68030 with 68882. This means the 100Mhz 68050 reaches then only a performance comparable to an 68882 clocked at 1 GigaHerz. For code with little sequential dependencies the performance ratio will go much higher. The peak performance of the 68050 would be equal to an 68882 running at 13 GigaHerz . As the 68050 can fuse a FMOCE and FMUL in a single cycle, at can do this combination in 1 cycle while these two instructions would need 135 cycles on the 68882. Interesting?
| |
Przemek Tkaczyk Poland
| | Posts 54 31 Mar 2011 10:19
| O_o JUST WOW
| |
Sergio Gabbiani Italy
| | Posts 18 31 Mar 2011 10:40
| Woww... Great!! Simply great!! :)
| |
Wawa Tk Germany
| | Posts 581 31 Mar 2011 10:44
| a comparison with fpu unit of 040 or 060 would be maybe telling. extern fpus have had enourmos extra lag (something like 30 clocks iirc). has that been taken into account?
| |
Megol .
| | Posts 671 31 Mar 2011 11:01
| wawa tk wrote:
| a comparison with fpu unit of 040 or 060 would be maybe telling. extern fpus have had enourmos extra lag (something like 30 clocks iirc). has that been taken into account?
|
This^
| |
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 31 Mar 2011 11:23
| wawa tk wrote:
| a comparison with fpu unit of 040 or 060 would be maybe telling. extern fpus have had enourmos extra lag (something like 30 clocks iirc). has that been taken into account? |
68040 Latency 68050 Latency 80486 FMOVE mem,reg 3 1 3 FADD 3 7 1 8 8-20 FSUB 3 7 1 8 8-20 FMUL 5 9 1 8 14
As you see the latencies on the 68050 and 68040 are quite similar. From the maximum throughput the 68050 can fuse two instructions in one cycle which take on the 68040 8 cycles. With normal could FPU code the 68050 could probably score in the range of an 68040@300 MHz. At peak a 68050 @100 MHz equal to a 68040 @800 Mhz. At typical Matrix Operations the 68050 could score roughly like a 1.5 GigaHerz 80486. Not yet a PS3 Killer but not bad either. :-DA proper testcase like the Mandelbrot from SP or a Matrixmul testcase will IMHO make most sense. This will give realistic numbers. Cheers
| |
Loïc Dupuy France
| | Posts 253 31 Mar 2011 14:03
| Gunnar von Boehn wrote:
| At typical Matrix Operations the 68050 could score roughly like a 1.5 GigaHerz 80486.
|
It means that we can have a VBL Quake I on the NATAMI (software rendering or Gl rendering, GPL code EXTERNAL LINK ). Quake I was the first game to use massively the x86 FPU unit for software rendering. Every gamer that had a Cyrix 166 change to the Pentium 133, because the FPU was two times faster. I'm not found of quake, but it would be a good mixed benchmark (integer/memory/fpu) to see what Natami has in the guts compare to a 96's PC without 3D card (Quake II has a software renderer also). Amiga port http://planetquake.gamespy.com/View.php?view=Quake.Detail&id=326#Files ClickBoom amiga commercial port www.lemonamiga.com: EXTERNAL LINK
video on youtube : EXTERNAL LINK Amigas equiped with full 68060@50, fast gfx card and AHI compatible soundcard would have seen a good 8-10 fps. My own setup AGA +68040/25 would get around 4 fps with postage stamp size window and 2x2 pixel mode ;) Game uses CD Audio for music (Not Recorded).
|
| |
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 31 Mar 2011 14:06
| Loïc Dupuy wrote:
| I'm not found of quake, but it would be a good mixed benchmark (integer/memory/fpu) to see what Natami has in the guts compare to a 96's PC without 3D card (Quake II has a software renderer also).
|
But its to big a testcase to learn anything from it. With such a big testcase you will get a "score" but you will have no clue why your system scored this score. A smaller testcase allows you to analyse both the code and the CPU behaviour reaction on it. And only by analyzing you can learn from it and improve our evolve your CPU. This means for me as CPU developer such a big testcase is of little value.
| |
Loïc Dupuy France
| | Posts 253 31 Mar 2011 17:25
| @Gunnar von Boehn You are 100% right from a designer point of view. But for an user point of view and "penis enlargement", knowing that the 133mhz PC at the time were doing 30-45 fps in 640x480 EXTERNAL LINK , by wich margin we beat and stomp over them :-D It's neither urgent or necessary outside knowing the qualitative "enlargement" gain :-DMy point was that "Quake I" is FPU bounded for the software renderer, it will not help to design the FPU, but will help to have an appreciation of its relative efficiency in an FPU bound application.
| |
|