Home   News   Concept   AMIGA-Compatible   Hardware   Forum   Questions+Answers   Pictures   Contact & Team

Welcome to the Natami / Amiga Forum

This forum is for AMIGA fans interested in the new NATAMI platform.
Please read the forum usage manual.



All TopicsNewsQAFeaturesTalkTEAMLogin to post    Create account
The team will post updates and news here

68050 Core Milestone Overview:page  1 2 
Gunnar von Boehn
Germany
(Moderator)
Posts 5775
20 May 2009 15:58


For the development of the 68050 CPU Core we have set a few milestone goals:

For those interested here is rough overview of the Milestone targets:

Milestone 1

This Milestone provides verified readiness for basic 68000 user code execution.

Verified fully working Adress-modes
Dn
An
(An)
(An)+
-(An)
d16(An)
d8(An,Rn*X)
d16(Pc)
d8(Pc,Rn*X)
xxx.w
xxx.l

To be verified fully working Instructions

CLR
MOVE
MOVEA
MOVEQ

ADD
ADDA
ADDI
ADDQ
ADDX
SUB
SUBA
SUBI
SUBQ
SUBX

AND
ANDI
EOR
EORI
NEG
NEGX
NOT
OR
ORI

CMP
CMPA
CMPI
CMPM
TST

BCC
BRA
BSR
JMP
JSR
RTS

BCHG
BCLR
BSET
BTST

LEA
PEA
NOP

ANDI to CCR
ORI to CCR
EOR to CCR
MOVE f CCR
MOVE to CCR
MOVE f SR
MOVE to SR

Milestone 1 also includes:
- working I-Cache
- working Branch-Acceleration

 
 
 
 
 
 
 
Milestone 2

Milestone 2 provides verified readiness for full 68000 user code execution - and basic 68060 user code execution.

To be verified fully working Instructions

MOVEM
DIVS
DIVSL
DIVU
DIVUL
MULS
MULU

ASL
ASR
LSL
LSR
ROL
ROR
ROXL
ROXR

DBCC
DBRA

RTD
RTR

EXG
EXT
EXTB
SWAP

LINK
UNLK
SCC

CHK

Milestone 3

Milestone 3 provides verified readiness for full 68020 user code execution.

To be verified fully working Adress-modes
All memory indirect modes.

To be verified fully working Instructions

BFCHG
BFCLR
BFESTS
BFEXTU
BFFFO
BFINS
BFSET
BFTST

Stack frames and other Supervisor mode instructions.

I hope this helps

Michael Ward
USA

Posts 234
20 May 2009 18:33


Gunnar,

Perfect!

, Michael

Jens Drößler
Germany

Posts 137
21 May 2009 04:01


This sounds very good. Will full 68060 comand set be supported? Or 68040 only? Just 68030?

Also, you stated that a MMU would slow things down... How about implementing it, but make it bypassable? We won't need it in the beginning, but I think it would be a nice feature for the future.

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
21 May 2009 06:26


Hi Jens,
 
 
Jens Drößler wrote:

  This sounds very good. Will full 68060 comand set be supported? Or 68040 only? Just 68030?
 

 
What is "full 68060" command set?
If you compare the 68060 with 68040 you see that the 68060 did drop a few unneeded integer instructions - so the 68040 did have more native integer instructions than the 68060.
 

The 68050 copycats many of the good design points of the 68040 and 68060 CPUs.
On the first 68000 CPU - all instruction were done using microcode.
This means a single instruction took several clocks.
A very simple instruction took 4 clocks.
A complex instruction took 20 clocks.
Very complex instruction like MUL took 40 or more clocks.

The 68060 was optimized to hardcode all important instructions to execute them as fast as possible.

The 68050 does the very same.
On the 68050 nearly every instruction needs just 1 clock.
There are a few rare execptions which a bit longer.

I think our design goal for the 68050 is quite clever.
The 68050 is optimized for executing all instruction in 1 clock.
the 68050 supports all important and used instructions of the 68K family natively (without microcode) for the best possible performance.

Instructions which are fluff and mostly useless, are supported in a way that they work fully - but do reduce chipsize they are not optimized speed. This reduzing of chipsize allows us to increase the performace and speed of the Core - therefore this increases the overall performance of the chip again.
 
 
 
 
 
Jens Drößler wrote:

  Also, you stated that a MMU would slow things down... How about implementing it, but make it bypassable? We won't need it in the beginning, but I think it would be a nice feature for the future.
 

 
As you know our design is FPGA based.
This means that anyone can whenever they want add an MMU to it later.

Personally, I regard an MMU as useless on the AMIGA.
Its clear that a good/complete MMU needs quite huge chip resources. As we know they features that a MMU gives are mutually exclusive to the AMIGA design. Therefore on the AMIGA only a fraction of the full feature set of the MMU was ever used.
If you want these "AMIGA only MMU features" there are better and much more costeffective solutions than implementing an expensive full MMU.

Does this make sense to you?

Cheers


Marcel Verdaasdonk
Netherlands

Posts 4043
21 May 2009 11:38


I personaly find having a FPU like the 68882 or the intergrate variant like in the 68040 more important then a MMU.
but that is IMHO, but first things first a fully functional NatAmi, softcoreCPU, expation on the orignal concept.
and then start at square A again.

Bernd Afa
Germany

Posts 161
21 May 2009 12:43


that sound good, its also possible to change the gcc to avoid slow instructions.I think a FPU very important, a strip down FPU that support only the coldfire adressing mode is maybe more easy to add and is at first usefull and maybe can add in the GCC 68k backend.

If 68050 have no 32 bit*32 bit -> 64 bit mul. see also 64 bit Problems with GCC.there need add code in longlong.h.
If somebody know fast code let us know

EXTERNAL LINK 
the 68k is a nice architecture, have no blow up asm code as Risc CPU, so program or Compiler Bugs can find more easy by look on asm output.Only problem is there if nobody that build real CPU with fast clockrates.

If your CPU in FPGA is ready there is also a good possible chance, that there is a market outside amiga.FPGA are produce in masses, so they are cheap.

there are still many that use m68k for embedded and like faster speed.HP for example have licence coldfire V5 for many printers.
I dont see Coldfire V5 on freescale page but HP use coldfire V5 with over 500 MHZ.

EXTERNAL LINK 
If your CPU is cheaper for HP, so wy HP should not use your CPU design ?

I dont know how many HP must buy to use the coldfire technoligy.

Most important for a CPU is, that it is cheap and fast.and if you can add in the FPGA core the most needed stuff as DCT, YUV, H264 deblocking or else what need to decode blueray, DVD fast, then there is no high clockrate need.

if you offer too features in your CPU to add custom instructions so the fpga can program direct for fast customize function is also a goog feature for future.

Rf Tx
Finland

Posts 15
21 May 2009 13:22


EXTERNAL LINK 
300 - 366 MHz, 549 - 670 Dhrystone

Samuel D Crow
USA
(Natami Team)
Posts 1304
21 May 2009 14:34


@Bernd AFA

The floating point support will be built into the new vector processing unit.  As long as the standard AmigaOS libraries are available, they can be patched to send messages to and from the vector processing unit although software floating point may be faster than the communication with the VPU.

Channel Z

Posts 227
21 May 2009 14:57


@Bernd Afa

Welcome, Bernd! I want to express my gratitude for what you have accomplished with AfaOS so far, and I hope that you will continue the good work!

Bernd Afa
Germany

Posts 161
21 May 2009 16:15


> EXTERNAL LINK
    >300 - 366 MHz, 549 - 670 Dhrystone
   
    But look in the left menu, under coldfire.here you find only entries for CPU Types in V4 V3 V2.but i see no V5 CPU Type i can buy here.
   
    or here in the roadmap.when you click on MPU.there is only v2 v3 v4 listet
   
    EXTERNAL LINK 
   
    no V5 but HP use several years V5 Coldfire in many Printers.can find when you type in google coldfire V5.
   
    No 68k compatible FPU i find not so good, because compiling is slow
    and build 2 versions with and without FPU cost additional time that is left to add more features to a program.
   
    or maybe somebody write a tool such as oxy patcher or cyberpatcher, that replace all FPU code and make jumps to replacement funcs.gcc can too change, so it use only coldfire FPU adressing modes and so this can be easy done.
   
    this can too faster as using amiga mathlibs and there are never need additional non FPU funcs.
   
    here are now the milestones, but what most important to know, when they reach, when a workable 68050 can see.
 
  A MMU is not need first, but at least a thing that report illegal memadresses that are out of amiga ram and the zero page so a enforcer is possible

Marcel Verdaasdonk
Netherlands

Posts 4043
21 May 2009 17:13


Samuel D. Crow wrote:

@Bernd AFA
 
  The floating point support will be built into the new vector processing unit.

@Samuel

That is why i mentioned the 68882 for it is a extenal unit to the CPU. :P

@Bernd

Having those two is indeed sane to do but i do have one thing make them on call only.

As for the first 3 milestanes none of us need to expect any FPU or MMU.
And this i consider a good thing. ;)

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
21 May 2009 19:40


Hi Bernd,

nice to see you.

bernd afa wrote:

that sound good, its also possible to change the gcc to avoid slow instructions.

Yes, this is true.
But this should not be needed - as the 68050 does not really has slow instructions. :-)

Maybe my previous post was a bit misleading? Let me try to clarify this:

The design of the 68050 is optimized that EVERY of the above listed Address-modes is for free and that every "normal" instruction needs 1 clock only.

Example:
add.l 64(PC,D1*8),D0 = 1 clock total
- the address mode is for free.

For comparison, the Motorola 68040 CPU needs 5 clocks for the above instruction.

If you compare the 68040 with the 68050 then the 68040 have many instruction which took 1 clock - which is fast.
The 68040 also has many instruction which need 2 - 6 clocks.
The majority of these instruction now only takes 1 clock on the 68050.

If you compare the 68040 with the 68050 - then the 68050 is designed to be lot faster clock by clock.
Of course the 68050 can be clocked a lot higher also.

The only address modes which are slow on the 68050 are the memory indirect modes that do not use the brief extension words format. But these modes were always very slow.
And these modes were IMHO very rarely used anyway.
I'm not sure if GCC does use them at all.

Code generation for the 68050 should be easier than for any previous 68K CPU. What we could do is remove a few of the performance hacks which are in GCC - as they are unneeded now.

For example: On 68050 MUL is as fast as it can get.
On older 68K CPU like the 68040 it make sense to replace
MUL #3,D0 with something like:
move D0,D1
add D0,D0
add.l D1,D0

The 68k backend of GCC has a few of these hacks.
But none of them is needed for the 68050 any more.

The timing calculation for the 68050 is very simple:
- Every normal instruction needs 1 clock.
- Instructions that do 2 memory accesses to different location need 2 clocks (e.g. move mem,mem)
- There are a few instructions with need 2 clocks (e.g PEA)
- Divide needs more than 1 clock of course. :-) The timing for div is currently not final - Our goal is to be in the range of 10 clocks.

- There are a few instructions which we regard as obsolete:
  E.g TAS, and the BCD instructions.
  Timing of these is unsure at the moment.

bernd afa wrote:

I think a FPU very important, a strip down FPU that support only the coldfire adressing mode is maybe more easy to add and is at first usefull and maybe can add in the GCC 68k backend.

I agree... My opinion on the 68K FPU is the following:
I think the way 68K FPUs are designed, they are difficult to hugely improve in speed.

Therefore if we really need a huge FPU performance then the best way it to go for a new, separate SIMD-FPU.

For running legacy FPU code running the FPU code with the integer unit is IMHO good enough. If we provide a slightly HW-acceleration integer emulation of the FPU codes than this is equivalent to the "microcoded" 68K 68881 FPU designs. So basicly what we get can/will be about the same as before.

I think the question boils down to what we want.

A) If we need a solution to transform huge amounts of Float matrix code e.g. needed for a 3d game - then a dedicated SIMD-FPU unit is the way to go.
Such a unit will be do the job 10 times faster than the fastest 68060 was ever doing it.

B) If we just have to execute some light 68K-FPU workload typical for normal a C program that mix integer with some float calcualtions, then a mix between microcode and SW emulation should be fully fast enough for this.

What is your opinion on this?
Do you see any other requirement?
The question simply is:
Is there FLOAT code that is uses huge amounts of float and is performance critical. This would be code where we need a lot more power than 68040 did had which can NOT be off-loaded to a dedicated unit or re-compiled to go through the integer pipe.

The beauty is our design is FPGA based.
If we ever need a real fully pipelined HW-FPU we could still work on this later.

bernd afa wrote:
 
  If your CPU in FPGA is ready there is also a good possible chance, that there is a market outside amiga.FPGA are produce in masses, so they are cheap.
 
  there are still many that use m68k for embedded and like faster speed.HP for example have licence coldfire V5 for many printers.
  I dont see Coldfire V5 on freescale page but HP use coldfire V5 with over 500 MHZ.

I know the Coldfire V5 as I took part in the V5 developer program.
The V5 is quite a beast - there were V5 with 2nd level cache and they outran the smaller PowerPC used in EFIKA and SAM.

As long as our core is inside an FPGA - we play in a league below the V5.
Target for the 68050 in a lower cost FPGA is 133 MHz.
The 050 should perform a lot better than a 68040 with 133Mhz.
Simply because: big cache, faster instructions (less clocks), faster memory.

The 68050 will be very nice for an AMIGA.
The 68050 will also outrun an Coldfire V5 on 68k legacy code - on which the Coldfire would need to use emulation.

bernd afa wrote:
 
  If your CPU is cheaper for HP, so wy HP should not use your CPU design ?

Yes .... but ...
Actually the Coldfire are quite cheap if you buy them in volumes as HP did.
Of course if you would "bake" an ASIC of the 68050 then in colume the chip would very cheap also and in an ASIC we could reach a comparable clockrates to the V5.
But are baking ASICS does cost serious money....

bernd afa wrote:
 
  I dont know how many HP must buy to use the coldfire technoligy.

The V5 is only available to big customers which buy chips that are exclusively produced for them.

bernd afa wrote:
 
  Most important for a CPU is, that it is cheap and fast.and if you can add in the FPGA core the most needed stuff as DCT, YUV, H264 deblocking or else what need to decode blueray, DVD fast, then there is no high clockrate need.

 
I fully agree.
I think the 68050 will be cheaper for us than an external CPU will be.
Compared to legacy 68K CPUs or Coldfires in 68K emulations mode the 050 is fast. :-)
Playing Blueray is maybe a bit high target -
Playing regular DVD is reasonable to achieve IMHO.

Regarding our Milestone timeline:

IMHO we are about halve way through the testcases for Milestone 1.
Both Jens and myself have still bug to fix for the 1 release.
If no other issues show up the Milestone 1 is very close.
Its difficult to say if Milestone 1 will be in 1 or 2 weeks or in 3-4 for. Also we of course do not work full time but only in our spare time.

I have to say I'm very surprised how quickly and fast our development was so far. The CPU is fully "architectured" we are more or less only implementing already designed stuff now or fixing small issues. Maybe it was luck or maybe its because Jens is a Pro - but I had expected that for a weekend project designed a top notch CISC CPU would have taken us much longer.

Cheers

Bernd Afa
Germany

Posts 161
21 May 2009 20:08


>B) If we just have to execute some light 68K-FPU workload typical >for normal a C program that mix integer with some float >calcualtions, then a mix between microcode and SW emulation >should be fully fast enough for this.

yes this I mean, mostly need no extrem fast FPU.and when a fast fpu is need, its better to use SIMD.

amiblitz have a emu10k1 DSP emulator that do this on FPU.so sb-live effects can use easy.

but change this to use another syntax is easy and can enhance so all work stereo and SIMD can process 2 channels at once. 

only when have no software emulation and all FPU programs crash because need a non FPU Version is not so good.

Jens Drößler
Germany

Posts 137
22 May 2009 03:43


My basic question was: Will every Amiga software written for the 68060/68040/680x0 run on the 68050? I should have asked that way :)
 
  Also, of course only the MMU functions used on Amiga are important. Any solution which will allow the typical Amiga MMU-using software to run will be ok. Of course I know that the FPGA is "firmware upgradeable", but I think having those MMU functions should be on the "to do list", after everything important is done and the system is running.

Also, of course FPU-using software should run on the NatAmi. Also, on that matter, would someone rewrite some libraries to use like the math-libs to use the extended features of NatAmi? That would be really nice :D

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
22 May 2009 06:48


Jens Drößler wrote:

  My basic question was: Will every Amiga software written for the 68060/68040/680x0 run on the 68050? I should have asked that way :)
 

 
YES!
The 68050 design includes full software compatibility to all 030/040/060 Software.
We believe that the 68050 will also be more compatible with old 68000 software than the 68020/68030/68040/68060 used to be.
 
 
 
 
Jens Drößler wrote:
 
    Also, of course only the MMU functions used on Amiga are important. Any solution which will allow the typical Amiga MMU-using software to run will be ok. Of course I know that the FPGA is "firmware upgradeable", but I think having those MMU functions should be on the "to do list", after everything important is done and the system is running.
 

 
YES
We want to support those functions that were usefull on AMIGA.
But we don't aim for a full, fat MMU which you could only use under Linux. Our design aim to be the best CPU for an AMIGA System. We don't aim to be the best CPU for Linux.
 
 
 
 
Jens Drößler wrote:
 
  Also, of course FPU-using software should run on the NatAmi. Also, on that matter, would someone rewrite some libraries to use like the math-libs to use the extended features of NatAmi? That would be really nice :D
 

 
YES!
Our goal is that the 68050 float software compatible to the 68040.
This means all AMIGA FPU using software should run well on Natami.
 
 
 
Cheers


George Mystiloglou

Posts 295
22 May 2009 08:27


Hi Gunnar.
I know that you have a lo to do, and I dont want to be "extra trouble" but, no MMU means that famous programms like ClariSSA pro won't run on NatAmi if no MMU is present. So there must be at least a "somethnig like a virtual mmu" to cheat those programms to run on NatAmi.
I remember Clarissa running very smooth at 25fps a 768x512 ham8 animation on my 040 AGA machine, I really can't wait to see the possibilities on SuperAga...

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
22 May 2009 08:47


George Mystiloglou wrote:

  Hi Gunnar.
  I know that you have a lo to do, and I dont want to be "extra trouble" but, no MMU means that famous programms like ClariSSA pro won't run on NatAmi if no MMU is present.
 

 
Are you sure that clarissa needs an MMU?
Do you know for what Clarissa would use the MMU?
To be honest I can not think why clarissa could need one. :-)
 
Did clarissa not run on a stock A4000/30 or stock A1200 ?

George Mystiloglou

Posts 295
22 May 2009 09:55


ClariSSA needs MMU because it uses Virtual Memory (it has to convert the animations to its own .ssa format, so it needs lots of ram).
I tried in the past to run it on a standard 1200, but no luck. The programm started with "no mmu detected" and closed. It run fine on Blizzard 1230IV, Blizzard 1240 and Blizzard 1260.

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
22 May 2009 10:07


George Mystiloglou wrote:

ClariSSA needs MMU because it uses Virtual Memory (it has to convert the animations to its own .ssa format, so it needs lots of ram).
  I tried in the past to run it on a standard 1200, but no luck. The programm started with "no mmu detected" and closed. It run fine on Blizzard 1230IV, Blizzard 1240 and Blizzard 1260.

Using virtual memory makes only sense if you lack Fast-memory.
I would assume that Clarissa does not use Virtaul mem if you have enough Fast-mem. Using fast-mem will be a lot faster then going a v-mem detour. :-)

How does Clarissa behave on an AMIGA with enough memory and no MMU?
E.g. How does it run on a 68030 without MMU but with 128 MB ram or so?

If you have enough main memory there should really be no reason to use an MMU at all.

Cheers

George Mystiloglou

Posts 295
22 May 2009 10:26


I know that, the point is that IF the programm searches for MMU then what :-)

I dont have a cpu with no MMU to test it, if someone has a cpu card with enough RAM and NO MMU to test it, we shall know.
I will try to test it today with a Blizzard 1230IV running a program that disables MMU...

posts 29page  1 2