|
|---|
Ayodele Stephenson USA
| | Posts 83 02 Jun 2010 10:28
| 2 meg chipram was a huge limitation... nice to see that it is already a past issue for natami.
| |
Evil Igel Germany
| | Posts 154 02 Jun 2010 13:55
| THANKS the NatAmi-Team for (re-)writing new chapters of Amiga-History every Week, sometimes every DAY! The latest miracle: There is an AMIGA with more than 2 MB of Chip-Mem out there!! Really!! I read about it! ;-) AWESOME! Forget Chuck Norris, Spongebob doing BBQ UNDER WATER! :-D But forget Spongebob too! Thomas doing more than 2 Megs in a AMIGA!! :-D Its sooo exciting to see progress here, keep that fantastic work up!
| |
Wojtek P Poland
| | Posts 1597 02 Jun 2010 17:24
| @christian of this 256MB memory, 128 is fast-to-be RAM. Thomas did not made fastram controller so now fastram is static RAM on CPU board that is supposed to be a cache. so now it's 4MB of VERY-fast RAM :)
| |
Claudio Wieland Germany
| | (Natami Team) Posts 706 02 Jun 2010 17:26
| Actually, no-one forces you to have a 128MB CHIP + 128MB FAST configuration. It could be anything in between.. . For example, 2MB CHIP and 254MB FAST ;-) .
| |
Wojtek P Poland
| | Posts 1597 02 Jun 2010 22:23
| AFAIK board design have separate memory chips as FAST and CHIP memory. so you can't
| |
Fahed Al Daye Canada
| | Posts 282 02 Jun 2010 22:29
| Claudio Wieland wrote:
| Actually, no-one forces you to have a 128MB CHIP + 128MB FAST configuration. It could be anything in between.. . For example, 2MB CHIP and 254MB FAST ;-) .
|
* giggles to himself * Can you imagine 2 MB chip RAM SAGA? hehehe I would like to have 128 MB CHIP RAM + 128 MB FAST RAM, that too me sounds a perfect configuration!
| |
Marcel Verdaasdonk Netherlands
| | Posts 3991 02 Jun 2010 22:44
| Wojtek P wrote:
| AFAIK board design have separate memory chips as FAST and CHIP memory. so you can't
|
Actualy you could if you interleave the memory but then you need some sort of MMU to do the chip/fastMem split. :(
| |
Wojtek P Poland
| | Posts 1597 02 Jun 2010 22:54
| the best in user/programmer point of view would be fully shared 256MB, no chip&fast separation. But it will make hardware MUCH more complex and possibly slower. the CPU cache would need to be write-through and extra snooping logic would be required - that will watch what rest of hardware wrote and update cache. for CPU on separate board it will mean lots of traffic on S-zorro bus too. Fast and chip memory separation is great idea in amiga - it simply removes these problems AT ALL by making chipram uncache'able, while fastram cache'able and separate. Not only there is no need for all this extra logic but when CPU runs in fastram there are no slowdowns at all. in PC (and not only x86 PC or servers) there are REALLY huge amount of logic just to keep caches coherent with one-memory-for everything. CPU are now not much slowed down only because I/O traffic is usually small compared to available memory bandwidth. In multicore CPUs - each write from external device must go to EVERY CORE's L1 and L2 cache! GFX cards memory is equivalent to amiga CHIP but it's graphics only. and because of PCI/PCIe bus latency it's actually slower to access than natami chipram - while CPU can be 20-30 times faster! On PC graphic cards are actually separate computer connected by high-latency connection.
| |
Marcel Verdaasdonk Netherlands
| | Posts 3991 03 Jun 2010 00:04
| Wojtek that is the theory and in practice this held up too. But nobody besides technical people look at latency that is why DDR SDRAM caught on so well. cheap DDR2 RAM needs to run twice as fast as it's predecessor to spend the same time on latency, sad fact. :(End down goes the credo "it's faster because it has more MHz!"
| |
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 03 Jun 2010 05:24
| Wojtek P wrote:
| the best in user/programmer point of view would be ... no chip&fast separation. But it will make hardware MUCH more complex and possibly slower.
|
Yes, from a programming point of view - having only 1 type of memory is of course simplest and simple equals best in this case. And you are also right, that when you look at it closer then its becomes visible that the opposite is the case. The implementations of a full coherent system is not only very complex - it makes a system by design also slow. If you look in the "other world" where cache coherency of "modern" multicore systems is needed - then you see that you do not want to have this. The cache coherence protocol of modern a 2GHz multicore PPC system takes about 120 CPU cycles. This means this portocol adds an extra latency which is tremendous. Such a latency overhead will at the end of the day cripple the system performance. Cache coherence implies that the CPU will have to snoop all traffic on the memory bus - and for the 68060 it implies that the CPU Cache cache will need to run in write through mode. What I like to do is adding snooping for the CPU internally. So that the 68050 will automaticly "invalidate" an ICache lines if it writes to it. This will make the 68050 relative robust against selfmodifying code. And as its CPU internal this can be implemented without any performance impacts.
| |
Wojtek P Poland
| | Posts 1597 06 Jun 2010 13:01
| DDR rams simply have just all the time the same access time but lot of (8 per chip) parallel blocks. It's very inefficient with superfast CPU like in PC, it's much more efficient with multicore CPUs and it's ACTUALLY efficient with lots of slower thread like Sun did in Ultrasparc T1/T2 CPUs. This chip have 8 processors - each doing 8 threads in parallel. every time single thread waits from memory - another executes. In multitasking unix loads such chips are close to 100% utilized in spite of DDR2 memory and smaller cache than current intel CPUs. AmigaOS is single CPU, and good amiga software isn't CPU bound. most work is done by hardware accelerators. That's why Natami will be already good in utilizing bandwidth of DDR2 ram, but it could be much better if blitter could be multithreaded. And 3D core too. In the similar way that SUNs processor - when one job is stalled - run another.
| |
Wojtek P Poland
| | Posts 1597 06 Jun 2010 13:14
| @Gunnar I know that today method of fully simulating common memory, cache coherence is plain stupid done this way. It's good Natami doesn't try this. While your multicore PPC system example is about unix servers it's very good. It's dumb software that forces them to do. 99% of memory mapped to unix processes DO NOT need cache coherency. program code - shared but readonly, software and MMU should be used to wipe out cache data when programs are removed from RAM and new are fetched from disk in the same place. most accessed data memory is private too - stacks, per process/thread data. What needs to be shared are main data that is R/W for programs - like say database table. MMU could be used to mark it and then - this slow cache coherency used on it. Or even - just make it uncached with only prefetch/write buffer logic. This would be simpler, but if you know todays "modern" unix software.... well no comments. I am unix user BTW. For Natami - separate chip/fast RAM is best solution. but extra prefetch and write buffer (one DRAM burst-sized each) would be adventageous. What is excellent compared to PC is that chipram is accessible for ALL hardware. On PC there is graphic card memory but it's only for graphics card, and time needed for CPU to access any part of this ram is just incredibly slow. I would say something like 1000 cycles. That's why ALL PC graphics card drivers are optimized for batch mode - pull lots of data to/from GFX card, then execute something on them. fast ram should be on CPU board not mainboard. But i understand that Natami LX is temporary - final model will have CPU on mainboard. As for PCs - as multicore processors are more and more common - the cache coherency bottleneck is already serious but it will stop it at all soon. Until software will be rewritten it's dead end.
| |
Marcel T/Freshman Germany
| | Posts 12 11 Jun 2010 19:17
| Wow superb.
| |
Thomas Hirsch Germany
| | (MX-Board Owner) Posts 647 20 Jun 2010 18:22
| Paula UART The paula serial port is now working! This allows the NatAmi for the first time to establish full communication to the outside world!
This picture shows AWeb browsing through a ppp connection over the serial port and writing forum posts. Frame generation .......... ECS, fixed 28MHz pixel clock SyncZorro Interface ....... preliminary version Copper .................... fully implemented, with buffered data fetch Video DMA ................. fully implemented 256 color registers ....... fully implemented Sprites ................... 16bit linebuffer blitter ................... basic implementation. Block and fill mode only, line to come Video priority ............ half implemented Scandoubler ............... fully implemented Interrupts ................ fully implemented Paula DMA control ......... fully implemented Audio out ................. fully implemented VGA out ................... working DVI out ................... o PCI ....................... o IDE ....................... fully implemented CIAs ...................... fully implemented Disk DMA .................. 880k and 1760k, read only (new) Serial Port Paula UART .... fully implemented Slow peripheral I/O ....... fully implemented (Joy/Mouse/Keyb/PRT/DSK/SER) PC mouse and kbd support .. o Fast RAM controller ....... o Kickstart flash logic ..... o Battery-backed up clock.... o 15k Video out ............. o 15k Video in .............. o Audio in .................. o
| |
Marcel Verdaasdonk Netherlands
| | Posts 3991 20 Jun 2010 19:09
| Progress this is cool!
| |
Wojtek P Poland
| | Posts 1597 20 Jun 2010 19:23
| @Thomas if you don't already have plan what to work next my proposal is: blitter - complete Fast RAM frame generation - all video modes DVI out PC mouse and kbd support RTC audio in TV Out kickstart flash logic :)
| |
Fahed Al Daye Canada
| | Posts 282 20 Jun 2010 19:29
| Wojtek P wrote:
| @Thomas if you don't already have plan what to work next my proposal is: blitter - complete Fast RAM frame generation - all video modes DVI out PC mouse and kbd support RTC audio in TV Out kickstart flash logic :)
|
TV out! TV out! You don't understand how much TV out feature is important for ME! I am running my entire NatAmi hooked into the TV. My TV is my monitor for NatAmi like it is for my A1200.
| |
Fahed Al Daye Canada
| | Posts 282 21 Jun 2010 00:07
| Thomas Hirsch wrote:
| Paula UART The paula serial port is now working! This allows the NatAmi for the first time to establish full communication to the outside world!  This picture shows AWeb browsing through a ppp connection over the serial port and writing forum posts. Frame generation .......... ECS, fixed 28MHz pixel clock SyncZorro Interface ....... preliminary version Copper .................... fully implemented, with buffered data fetch Video DMA ................. fully implemented 256 color registers ....... fully implemented Sprites ................... 16bit linebuffer blitter ................... basic implementation. Block and fill mode only, line to come Video priority ............ half implemented Scandoubler ............... fully implemented Interrupts ................ fully implemented Paula DMA control ......... fully implemented Audio out ................. fully implemented VGA out ................... working DVI out ................... o PCI ....................... o IDE ....................... fully implemented CIAs ...................... fully implemented Disk DMA .................. 880k and 1760k, read only (new) Serial Port Paula UART .... fully implemented Slow peripheral I/O ....... fully implemented (Joy/Mouse/Keyb/PRT/DSK/SER) PC mouse and kbd support .. o Fast RAM controller ....... o Kickstart flash logic ..... o Battery-backed up clock.... o 15k Video out ............. o 15k Video in .............. o Audio in .................. o
|
Curious question about "Video priority ............ half implemented". What is special about it? What if it never fully implemented and the rest of NatAmi is implemented 100%, would it break performance and compatibility? What does it do?
| |
Thomas Hirsch Germany
| | (MX-Board Owner) Posts 647 21 Jun 2010 01:33
| Fahed Al Daye wrote:
| Curious question about "Video priority ............ half implemented". What is special about it? What if it never fully implemented and the rest of NatAmi is implemented 100%, would it break performance and compatibility? What does it do?
|
There is nothing special about it. It will be fully implemened in time. It will not break performance or compatibility when completed.
Fahed Al Daye wrote:
| What does it do?
|
It generates the CLUT values. For further information please see the Hardware Reference Manual chapters about playfield hardware, dual playfield, sprites and collision detection.
| |
Fahed Al Daye Canada
| | Posts 282 21 Jun 2010 01:43
| So this is important for collision detection, sprites and dual playfield? So without it fully implemented games will not function correctly?
| |
|