|FIFO Stacks and Other Mechanics|
24 Feb 2012 13:24
|This is to prevent another topic to go well, WAY OFF topic. ;)|
okay, This is not to talk about some mechanics that could be found in the Natami.
At least not for a direct approch.
We are gonna discuss some mechanics that could be used in arbitration.
And could being the magic word here.
I have my own idea's on this but i let others talk about theirs first. ;)
24 Feb 2012 13:30
|I already made some proposals in the other thread. I'll add another one: CPU gets undelayed memory access when processing an interrupt.|
24 Feb 2012 14:28
|Is all Natami memory shared? If so, how about a MK II with separate fastmem? (Would that need two separate FPGAs preferably?)|
24 Feb 2012 15:09
|Hold your horses. Let's complete the first one first. :-)|
24 Feb 2012 15:29
|Børge Nøst wrote:|
| Is all Natami memory shared? If so, how about a MK II with separate fastmem?|
I believe all of the memory can be configured to be "chipmem" or separated into "chipmem" and "fastmem". Physically all devices can access everything because there is only one memory bus.
Having two fast memory buses will probably increase overall cost and power consumption quite a bit. As long as the processor clock speeds are so low in comparison to the memory speed, I think dedicated fastmem is less attractive.
24 Feb 2012 15:51
|Even worse we can make a logical software split but if we really have a single memory bus we really do have one physical memory type.|
It would make sense making this split in the bus arbiter so priorities can be made correctly.
Like Nixus already said you wouldn't notice this if memory is faster as the CPU.
The reality is this needs to be about 2 to 3 times faster then the CPU.
Not taking the latency in account.
Okay, you had your chance now it's mine. ;)
Here is mine idea.
Priority scedual like the OS has.
Address sorting into a sequence.(burst operations)
The later has a higher priority since then the setup time of the memory latency can be hidden using burst access.
PS: Another trick for RAS refresh is have RAS active all at the same time but, all except the needed RAS signals should drop before CAS is active.
24 Feb 2012 16:43
|Marcel Verdaasdonk wrote:|
| Here is mine idea.|
Priority scedual like the OS has.
How would you determine the priority of each access? Obviously the accesses have to be prioritised, arbitration wouldn't make sense without this. So the interesting stuff starts here.
| Address sorting into a sequence.(burst operations)|
How would you sort addresses without adding a lot of latency? And do you really think that the probability is high that you will get random accesses in the same so-and-so many bytes? Especially when they come from different instances like blitter and CPU?
I would expect that if there is a burst, this will usually be data fetch for display, data operations by the blitter or loading/writing of cache lines by the CPU. If you have random accesses in a small field of data (a common case would be into a stack frame), this field will be loaded to cache in a burst anyway and all subsequent accesses will be serviced by the cache. And sorting these accesses will stall program flow because you will have to wait for the first date to arrive in the pipeline. In the case that some code accesses data at incrementing addresses, the cache will also perform burst reads to fill the corresponding cache line. So as far as I understand, apart from I/O, low bandwidth DMA and the Copper, everything will be burst reads and writes anyway.
24 Feb 2012 23:49
|Each register has a address we can use this for it's ID.|
Another method could be per unit but this would group the said above segment.
DDR2 SDRAM has sufficient latency at the start that this could be attained however this latency also is lost when a burst access is done which is the goal.
Reducing latency, and delays
This is a mechanism that would make itself unemployed if correctly operating.
Besides I forgot to mention this earlier the first two upcomming commands for memory should be in a FIFO which we shouldn't be able to touch.
This would give us Worst case about 28 cycles, if we would have 4-4-4-16 timings.(This is a example not real values)
We only need to be sure the third address is a sequential number to the second address.(but this needs not be the case)
Okay a recap.
A FIFO for the next two look-ups
The rest goes on a FIFO like stack, which gets sorted on priority, when the request is received.
And then a hardware bubble sort will do the address sequence.(this would over rules the priority when the sequence is more favorable in that manner)
This only works because of the latency that RAM has because most of this would be hidden in the setup time.
Remember this is only read access so far, I doubt using this for write access would be of interest.
25 Feb 2012 00:03
|err, about your last few lines, go have some fun and disassemble a Few pieces of Amiga software.|
Then you will see some PIC programs, burst access with the CPU, nope. ;)
However one thing does hold true data chunks, Audio samples, copper listings, Display data.
But this kite won't fly for the CPU since we need to keep libraries in mind.