Home   News   Concept   AMIGA-Compatible   Hardware   Forum   Questions+Answers   Pictures   Contact & Team

Welcome to the Natami / Amiga Forum

This forum is for AMIGA fans interested in the new NATAMI platform.
Please read the forum usage manual.



All TopicsNewsQAFeaturesTalkTEAMLogin to post    Create account
The team will post updates and news here

Performance of the 68070 Pipelinepage  1 2 3 
Gunnar von Boehn
Germany
(Moderator)
Posts 5775
28 Mar 2009 21:04


People did ask for some status updates.
 
Here are some very prelimary performance numbers:
I did run some test of the CPU-benchmark through the Chip simulator of the 68070 CPU core.
 
Please mind that the 68070 CPU is not finished. Currently not everything is finished and only parts are tested. The below number are a snapshot of the current pipeline behavior. Our longterm goal is to increase this performance (the goal is to double). Before the CPU will be enhanced we of course first need to finish it. And this will take quite some time. Our realistic nearterm goal is to finish the 68070 core and bring it out with behavior in the performance range of the below result. If the improved next version of the Core is ever finished it will probably get another name aka 68080.
 
 

 
The cart compares the 68030, 68040, 68060, and 68070 core on 3 tests of the minibench CPU-Mark. These tests execute 4 different types of immediate operations operating on registers. The test run to 100% inside the CPU-Cache of each chip.

The test show the peak performance of part of the integer pipeline.
For the real-live performance the size of the CPU cache is also important. With each generation 030 - 040 - 060 - 070 the Cache size was increased and therefore the application performance did increase also.

I hope you like this. Please feel free to ask questions.

Cheers
Gunnar

One Thousand
USA

Posts 832
28 Mar 2009 22:02


Thanks, this is good info.  The 070 is looking good so far.  A steady 1 instruction a clock is great for a 1-way CPU.

My questions:
Is it steady with 1 instruction a clock on dependent instructions?
What are the stages in the pipeline? 


Fabian Nunez
USA

Posts 312
28 Mar 2009 23:39


It's interesting that ADDQ is a lot slower on a 070 than it is on an 060 (assuming linear performance with clock speed, a 50MHz 070 would score around 42 - roughly about on a par with 030 performance).  Does this reveal some implementation bug, or is the design simply optimized for the other addressing modes?

Samuel D Crow
USA
(Natami Team)
Posts 1295
28 Mar 2009 23:57


I'd suspect it would have more to do with being made in an FPGA rather than a full-fledged custom die-layout.

One Thousand
USA

Posts 832
29 Mar 2009 00:05


It looks like the 070@50MHz should be a score closer to 50, I think. 
   
But the reason why the 060 is faster per clock is because it is superscalar and can do 2 instructions at a time, but the 070 is only 1-way.  A bottleneck of the 060 is how much it can fetch, so it takes a hit when the instruction is longer.  He said this test is on immediate adds, so the addq is 16 bits, the addw is 32, the addl is 48.  But the 070 (and 040) is steady through it all and does not have that problem.  The 070 is doing great on this test.

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
29 Mar 2009 08:25


One Thousand wrote:

    My questions:
    Is it steady with 1 instruction a clock on dependent instructions?
    What are the stages in the pipeline? 
   

   
Here is the result of depending instructions:
E.G.
  ADDq.L #1,A0
  ADDq.L #1,A0
  ADDq.L #1,A0
  ADDq.L #1,A0
   

The 68070 result shows how it will behave next week - when I've finished the forwarding workitem thath Jens gave me. As of today the 070 does not forward fully.

The 68000, 68020, 68030, 68040, and 68070 are unaffected by depending instructions. But the 68060 as Superscaler CPU has to take a performance hit if instructions are depending.
 

The pipeline looks like this:
 
      FETCH
  0) DECODE
  1) Reg-Load-ALU1
  2) EA-CALC (ALU1)
  3) Reg-Writeback-ALU1 / MEMLOAD / Reg-Load-ALU2
  4) ALU (ALU2)
  5) Reg-Writeback-ALU2 / MEMSTORE
 

 
The 070 has two ALUS and a LOAD-Unit behind each other.
The 070 can do max 2 ALU operations with 2 Register-updates plus a Memory-LOAD per clock.



Ayodele Stephenson
USA

Posts 83
29 Mar 2009 17:55


Thank You for the Update.  Even non-technical people like myself have enjoyed reading the posts about the new N070 You have in development... This project continues to show the renewed "spirit" that this community needs.  Thanks Again!!

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
29 Mar 2009 19:55


Ayodele Stephenson wrote:

This project continues to show the renewed "spirit" that this community needs.  Thanks Again!!

Our pleasure :-)

The 68070 is a very nice project.
The 68070 is not finished yet but we make good progress and I really ensjoy the work on it. Jens and myself have learned quite a lot when working on the CPU design.

As all of us, I've started programming the 68K CPUs over 20 years ago. Some of the things that I knew about the 68K CPU internals - I just knew but I never understood why some behavior were designed like they are. But now after rebuilding a very fast 68K, I've learned a lot of the reasons behind. I now understood why certains things were designed inside the 68K CPU like they were in the 68040 or 68060.
It kind of all makes sense now. :-)

Some more info regarding the pipeline of the 68070:

We tried to achieve a few design goals with the 68070.

- A 100% real 68K CPU.
  Support for all needed 68K integer Instructions and all addressing modes.
  We wanted a real 68K CPU for maximum AMIGA-OS performance.
  A 100% 68K CPU and not a reduced Coldfire.

- Dropping MMU and 68K-Floating point for now.
  The reason was to speed up time to market and to speed up the integer pipeline. Our 68070 behaves like a Motorola 68EC0x0 CPU.
  The EC-68K CPUs were integer only and not only used in the A1200 but in many other AMIGAs too.

- Optimized for high clockrates.
  Our design goal was to reach more than 100 Mhz.
  Some people believed that this impossible but we proved that if you know what you are doing, then you can create a high clocked 68K CPU. Our 68070 currently runs at up to 133Mhz, maybe we might even reach a higher clockrate.

- The pipeline of the 68070 is designed to execute the majority of  instructions in just 1 clock. With every new 68K CPU the needed clocks per instruction went down. The 68070 continues this and is so far the 68K CPU with the lowest number of clocks per instruction.

- Like the other top 68K CPUs the 68070 is designed to do several "operations" in one instruction.
The 68070 is designed to do
1 Address calculation and
1 Data-Cache access for free per instruction in addition to
1 ALU operation.

This seperates the 68K and the 68070 from other RISC CPUs.
For what the 68070 can do in just 1 instruction other CPUs (like PowerPC) often need 2 or 3 instructions.

- The 68070 is single 68K instruction per clock.

For the next iteration (called 68070B or 68080) we target multiple 68k instructions per clock.

Cheers

Marcel Verdaasdonk
Netherlands

Posts 3976
29 Mar 2009 22:54


Those results are quite impressive for a unfinished product. :P

Mr. Derp
USA

Posts 41
30 Mar 2009 10:03


Two questions -
What is the size of the cache of each CPU (030, 040, 060, 070)?

Forgive my ignorance - is the 070 a new CPU you are developing? Is the intent for the Team to design the chip and then forward that design to a chip manufacturer - like IBM or UMC or RealTek or something?

Gio G.
Germany

Posts 24
30 Mar 2009 10:23


No, it's a "Soft CPU" which lives happily inside the FPGA. :)

Bartek "Banter" K.
Poland
(Natami Team)
Posts 2277
30 Mar 2009 10:51


Yes, and the best part is, it's probably one of the very first CPUs you can actually DOWNLOAD:)

Take care.

Team Chaos Leader
USA
(Moderator)
Posts 2094
30 Mar 2009 14:53


68020
256 bytes instruction cache
000 bytes data cache

68030
256 bytes instruction cache
256 bytes data cache

68040
4096 bytes instruction cache
4096 bytes data cache

68060
8192 bytes instruction cache
8192 bytes data cache



One Thousand
USA

Posts 832
30 Mar 2009 15:48


Thanks for the answers.  Things are looking good.  I am also glad to see that the reckoning of the ALUs are only one stage.
 
This CPU work does look fun.
 
On the next version, you are looking to have multiple instructions?  Nice.  I take it that is by adding ss/ooo?


Wawa Tk
Germany

Posts 581
30 Mar 2009 17:21


@bartek:
 
Bartek wrote:

  Yes, and the best part is, it's probably one of the very first CPUs you can actually DOWNLOAD:)
 

  as far as i know there are already softcores available to download.

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
30 Mar 2009 20:12


Gunnar von Boehn wrote:

  The 68070 result shows how it will behave next week - when I've finished the forwarding workitem thath Jens gave me. 
 

 
  UPDATE: Forwarding is working now!
 
  The 68070 does now execute the above testcases (the depending instructions) just like shown on this barchart!
   
 

  Kudos go to Jens for his "overnight" forwarding work.
 
  We are making some progress :-)

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
30 Mar 2009 20:14


wawa tk wrote:

  as far as i know there are already softcores available to download.
 

 
This is true, there are about halve a douzand different 68K softcores.
 
But the 68070 is designed to become by far the fastest.

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
30 Mar 2009 20:44


One Thousand wrote:

Thanks, this is good info.  The 070 is looking good so far.  A steady 1 instruction a clock is great for a 1-way CPU.

Yes I agree.

Actually the ADD instruction shown in the barchart, is not good indecator of the CPU performance.

ADD is a relative simple instruction.
Even the 68030 could execute the ADD instruction fast.
Much more meaningfull for performance will these instruction like SHIFT or MUL which were slow and took many cycles on 68020/68030/68040.


One Thousand
USA

Posts 832
30 Mar 2009 22:39


That is great that forwarding was put in so swiftly.  Good work, Jens.  And thanks for the little update. 

I am tempted to join the team because of this excitement.

Marcel Verdaasdonk
Netherlands

Posts 3976
31 Mar 2009 09:48


I am also willing to help but being literate isn't such a great help. ;)
Ah, Schematics, and code, life as a tester was easy, it worked or it didn't.
To bad i did the fixing part too, Perhaps in that manner i could help.
But that's because i lack knowledge in some areas of the electronics.
Such is life, for one can't know it all, Marcel.

posts 52page  1 2 3