Home   News   Concept   AMIGA-Compatible   Hardware   Forum   Questions+Answers   Pictures   Contact & Team

Welcome to the Natami / Amiga Forum

This forum is for AMIGA fans interested in the new NATAMI platform.
Please read the forum usage manual.



All TopicsNewsQAFeaturesTalkTEAMLogin to post    Create account
The team will post updates and news here

NatAmi LX Evaluation Baseboard Bringuppage  << 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 
Ayodele Stephenson
USA

Posts 83
02 Jun 2010 10:28


2 meg chipram was a huge limitation... nice to see that it is already a past issue for natami.

Evil Igel
Germany

Posts 154
02 Jun 2010 13:55


THANKS the NatAmi-Team for (re-)writing new chapters of Amiga-History every Week, sometimes every DAY!

The latest miracle: There is an AMIGA with more than 2 MB of Chip-Mem out there!! Really!! I read about it! ;-)

AWESOME!

Forget Chuck Norris, Spongebob doing BBQ UNDER WATER! :-D

But forget Spongebob too! Thomas doing more than 2 Megs in a AMIGA!! :-D

Its sooo exciting to see progress here, keep that fantastic work up!

Wojtek P
Poland

Posts 1597
02 Jun 2010 17:24


@christian

of this 256MB memory, 128 is fast-to-be RAM.
Thomas did not made fastram controller so now fastram is static RAM
on CPU board that is supposed to be a cache.

so now it's 4MB of VERY-fast RAM :)

Claudio Wieland
Germany
(Natami Team)
Posts 703
02 Jun 2010 17:26


Actually, no-one forces you to have a 128MB CHIP + 128MB FAST configuration. It could be anything in between.. . For example, 2MB CHIP and 254MB FAST ;-) .

Wojtek P
Poland

Posts 1597
02 Jun 2010 22:23


AFAIK board design have separate memory chips as FAST and CHIP memory. so you can't

Fahed Al Daye
Canada

Posts 282
02 Jun 2010 22:29


Claudio Wieland wrote:

Actually, no-one forces you to have a 128MB CHIP + 128MB FAST configuration. It could be anything in between.. . For example, 2MB CHIP and 254MB FAST ;-) .

* giggles to himself * Can you imagine 2 MB chip RAM SAGA? hehehe

I would like to have 128 MB CHIP RAM + 128 MB FAST RAM, that too me sounds a perfect configuration!

Marcel Verdaasdonk
Netherlands

Posts 3974
02 Jun 2010 22:44


Wojtek P wrote:

AFAIK board design have separate memory chips as FAST and CHIP memory. so you can't

Actualy you could if you interleave the memory but then you need some sort of MMU to do the chip/fastMem split. :(

Wojtek P
Poland

Posts 1597
02 Jun 2010 22:54


the best in user/programmer point of view would be fully shared 256MB, no chip&fast separation.

But it will make hardware MUCH more complex and possibly slower.

the CPU cache would need to be write-through and extra snooping logic would be required - that will watch what rest of hardware wrote and update cache.

for CPU on separate board it will mean lots of traffic on S-zorro bus too.

Fast and chip memory separation is great idea in amiga - it simply removes these problems AT ALL by making chipram uncache'able, while fastram cache'able and separate.

Not only there is no need for all this extra logic but when CPU runs in fastram there are no slowdowns at all.

in PC (and not only x86 PC or servers) there are REALLY huge amount of logic just to keep caches coherent with one-memory-for everything. CPU are now not much slowed down only because I/O traffic is usually small compared to available memory bandwidth.

In multicore CPUs - each write from external device must go to EVERY CORE's L1 and L2 cache!

GFX cards memory is equivalent to amiga CHIP but it's graphics only.

and because of PCI/PCIe bus latency it's actually slower to access than natami chipram - while CPU can be 20-30 times faster!

On PC graphic cards are actually separate computer connected by high-latency connection.


Marcel Verdaasdonk
Netherlands

Posts 3974
03 Jun 2010 00:04


Wojtek that is the theory and in practice this held up too.
But nobody besides technical people look at latency that is why DDR SDRAM caught on so well.
cheap DDR2 RAM needs to run twice as fast as it's predecessor to spend the same time on latency, sad fact. :(

End down goes the credo "it's faster because it has more MHz!"

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
03 Jun 2010 05:24


Wojtek P wrote:

the best in user/programmer point of view would be ... no chip&fast separation.
 
But it will make hardware MUCH more complex and possibly slower.

Yes, from a programming point of view - having only 1 type of memory is of course simplest and simple equals best in this case.

And you are also right, that when you look at it closer then its becomes visible that the opposite is the case. The implementations of a full coherent system is not only very complex - it makes a system by design also slow.

If you look in the "other world" where cache coherency of "modern" multicore systems is needed - then you see that you do not want to have this.
The cache coherence protocol of modern a 2GHz multicore PPC system takes about 120 CPU cycles. This means this portocol adds an extra latency which is tremendous. Such a latency overhead will at the end of the day cripple the system performance.

Cache coherence implies that the CPU will have to snoop all traffic on the memory bus - and for the 68060 it implies that the CPU Cache cache will need to run in write through mode.

What I like to do is adding snooping for the CPU internally.
So that the 68050 will automaticly "invalidate" an ICache lines if it writes to it. This will make the 68050 relative robust against selfmodifying code. And as its CPU internal this can be implemented without any performance impacts.


Wojtek P
Poland

Posts 1597
06 Jun 2010 13:01


DDR rams simply have just all the time the same access time but lot of (8 per chip) parallel blocks.

It's very inefficient with superfast CPU like in PC, it's much more efficient with multicore CPUs and it's ACTUALLY efficient with lots of slower thread like Sun did in Ultrasparc T1/T2 CPUs.
This chip have 8 processors - each doing 8 threads in parallel. every time single thread waits from memory - another executes. In multitasking unix loads such chips are close to 100% utilized in spite of DDR2 memory and smaller cache than current intel CPUs.

AmigaOS is single CPU, and good amiga software isn't CPU bound.
most work is done by hardware accelerators.
That's why Natami will be already good in utilizing bandwidth of DDR2 ram, but it could be much better if blitter could be multithreaded. And 3D core too.
In the similar way that SUNs processor - when one job is stalled - run another.

Wojtek P
Poland

Posts 1597
06 Jun 2010 13:14


@Gunnar

I know that today method of fully simulating common memory, cache coherence is plain stupid done this way. It's good Natami doesn't try this.

While your multicore PPC system example is about unix servers it's very good.

It's dumb software that forces them to do.

99% of memory mapped to unix processes DO NOT need cache coherency.
program code - shared but readonly, software and MMU should be used to wipe out cache data when programs are removed from RAM and new are fetched from disk in the same place.

most accessed data memory is private too - stacks, per process/thread data.

What needs to be shared are main data  that is R/W for programs - like say database table.

MMU could be used to mark it and then - this slow cache coherency used on it. Or even - just make it uncached with only prefetch/write buffer logic.

This would be simpler, but if you know todays "modern" unix software.... well no comments.

I am unix user BTW.

For Natami - separate chip/fast RAM is best solution.
but extra prefetch and write buffer (one DRAM burst-sized each) would be adventageous.
What is excellent compared to PC is that chipram is accessible for ALL hardware. On PC there is graphic card memory but it's only for graphics card, and time needed for CPU to access any part of this ram is just incredibly slow. I would say something like 1000 cycles.

That's why ALL PC graphics card drivers are optimized for batch mode - pull lots of data to/from GFX card, then execute something on them.

fast ram should be on CPU board not mainboard. But i understand that Natami LX is temporary - final model will have CPU on mainboard.

As for PCs - as multicore processors are more and more common - the cache coherency bottleneck is already serious but it will stop it at all soon.

Until software will be rewritten it's dead end.

Marcel T/Freshman
Germany

Posts 12
11 Jun 2010 19:17


Wow superb.

Thomas Hirsch
Germany
(MX-Board Owner)
Posts 647
20 Jun 2010 18:22


Paula UART
The paula serial port is now working! This allows the NatAmi for the first time to establish full communication to the outside world!

This picture shows AWeb browsing through a ppp connection over the serial port and writing forum posts.

       Frame generation .......... ECS, fixed 28MHz pixel clock
      SyncZorro Interface ....... preliminary version
      Copper .................... fully implemented, with buffered data fetch
      Video DMA ................. fully implemented
      256 color registers ....... fully implemented
      Sprites ................... 16bit linebuffer
      blitter ................... basic implementation. Block and fill mode only, line to come
      Video priority ............ half implemented
      Scandoubler ............... fully implemented
      Interrupts ................ fully implemented
      Paula DMA control ......... fully implemented
      Audio out ................. fully implemented
      VGA out ................... working
      DVI out ................... o
      PCI ....................... o
      IDE ....................... fully implemented
      CIAs ...................... fully implemented
      Disk DMA .................. 880k and 1760k, read only
(new) Serial Port Paula UART .... fully implemented
      Slow peripheral I/O ....... fully implemented
        (Joy/Mouse/Keyb/PRT/DSK/SER)
      PC mouse and kbd support .. o
      Fast RAM controller ....... o
      Kickstart flash logic ..... o
      Battery-backed up clock.... o
      15k Video out ............. o
      15k Video in .............. o
      Audio in .................. o



Marcel Verdaasdonk
Netherlands

Posts 3974
20 Jun 2010 19:09


Progress this is cool!

Wojtek P
Poland

Posts 1597
20 Jun 2010 19:23


@Thomas

if you don't already have plan what to work next my proposal is:

blitter - complete
Fast RAM
frame generation - all video modes
DVI out
PC mouse and kbd support
RTC
audio in
TV Out
kickstart flash logic

:)

Fahed Al Daye
Canada

Posts 282
20 Jun 2010 19:29


Wojtek P wrote:

@Thomas
 
  if you don't already have plan what to work next my proposal is:
 
  blitter - complete
  Fast RAM
  frame generation - all video modes
  DVI out
  PC mouse and kbd support
  RTC
  audio in
  TV Out
  kickstart flash logic
 
  :)

TV out! TV out! You don't understand how much TV out feature is important for ME! I am running my entire NatAmi hooked into the TV. My TV is my monitor for NatAmi like it is for my A1200.


Fahed Al Daye
Canada

Posts 282
21 Jun 2010 00:07


Thomas Hirsch wrote:

Paula UART
  The paula serial port is now working! This allows the NatAmi for the first time to establish full communication to the outside world!
 
 
 
  This picture shows AWeb browsing through a ppp connection over the serial port and writing forum posts.
 
 
       Frame generation .......... ECS, fixed 28MHz pixel clock
        SyncZorro Interface ....... preliminary version
        Copper .................... fully implemented, with buffered data fetch
        Video DMA ................. fully implemented
        256 color registers ....... fully implemented
        Sprites ................... 16bit linebuffer
        blitter ................... basic implementation. Block and fill mode only, line to come
        Video priority ............ half implemented
        Scandoubler ............... fully implemented
        Interrupts ................ fully implemented
        Paula DMA control ......... fully implemented
        Audio out ................. fully implemented
        VGA out ................... working
        DVI out ................... o
        PCI ....................... o
        IDE ....................... fully implemented
        CIAs ...................... fully implemented
        Disk DMA .................. 880k and 1760k, read only
  (new) Serial Port Paula UART .... fully implemented
        Slow peripheral I/O ....... fully implemented
        (Joy/Mouse/Keyb/PRT/DSK/SER)
        PC mouse and kbd support .. o
        Fast RAM controller ....... o
        Kickstart flash logic ..... o
        Battery-backed up clock.... o
        15k Video out ............. o
        15k Video in .............. o
        Audio in .................. o

 

Curious question about "Video priority ............ half implemented". What is special about it? What if it never fully implemented and the rest of NatAmi is implemented 100%, would it break performance and compatibility? What does it do?

Thomas Hirsch
Germany
(MX-Board Owner)
Posts 647
21 Jun 2010 01:33


Fahed Al Daye wrote:

 
Curious question about "Video priority ............ half implemented". What is special about it? What if it never fully implemented and the rest of NatAmi is implemented 100%, would it break performance and compatibility? What does it do?

 
There is nothing special about it. It will be fully implemened in time. It will not break performance or compatibility when completed.
 
Fahed Al Daye wrote:

What does it do?

 
It generates the CLUT values. For further information please see the Hardware Reference Manual chapters about playfield hardware, dual playfield, sprites and collision detection.

Fahed Al Daye
Canada

Posts 282
21 Jun 2010 01:43


So this is important for collision detection, sprites and dual playfield? So without it fully implemented games will not function correctly?

posts 735page  << 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37