Home   News   Concept   AMIGA-Compatible   Hardware   Forum   Questions+Answers   Pictures   Contact & Team

Welcome to the Natami / Amiga Forum

This forum is for AMIGA fans interested in the new NATAMI platform.
Please read the forum usage manual.



All TopicsNewsQAFeaturesTalkTEAMLogin to post    Create account
Do you have questions about the Natami?
Post it here and we will answer it!

How Will Robin Communicate With 68050?page  1 2 
Samuel D Crow
USA
(Natami Team)
Posts 1295
02 Nov 2009 21:19


One question I tried to bring up in the FPU thread was the title of this thread.  How will Robin threads communicate with the main thread of the 68050?  And while we're on the subject, how will robin threads communicate with each other?

In the FPU thread Gunnar brought up the possibility of using one of the threads of the Robin core as an FPU but I'm not convinced that is a general enough solution.  If the compiler is already capable of trapping FPU calls and rerouting them through a library then that is good enough for most software, assuming that there is good communication between the IEEESingBas.library and the Robin core through some sort of interprocess or intercore communications link.

What should that link be, however?

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
03 Nov 2009 07:04


But isn't the situation like this:
If the library "Puts an instruction into Robin" then there are two options:
 
a) For zero latency and best performance - there will need to be a free thread awaiting this instruction. Effectively one thread needs to be allocated to this.

 
b) If there is no allocated free thread then this could take infinite. It could be that all 4 threads are busy with big jobs. This could make a single FPU instruction have to wait for Robin to become free - which could take forever if Robin is decoding a video.

 
I like the zero latency option much better.
What do you say?

One Thousand
USA

Posts 832
03 Nov 2009 13:14


I think you can see this as two separate matters.  And I think you can do both at the same time.

As Gunnar described, allocate a Robin thread like a CIA timer or whatever.  You have claimed it for a certain task and it is claimed until released.  I think this is the best answer for this.

Crow wants an easy and efficient method in hardware of communicating between threads.  I think perhaps this would be helpful to make multi-threaded coding easier, and could be combined with the allocation scheme above.

So if we were using a Robin thread for FPU things within the regular 68k instruction stream, we would allocate a thread for that task.  Then have that communication mechanism send the required operation over to Robin for accelerated emulation.

Or perhaps one thread in Robin is doing some type of physics, and needs to send a flag to the audio mixer thread also in Robin that something has collided and to mix in the appropriate sound. 

So I do not think it is just trying to get a little job picked up by Robin when a thread is free, but to talk between threads that are already running.

Samuel D Crow
USA
(Natami Team)
Posts 1295
03 Nov 2009 15:34


@Gunnar

I was thinking how this scenario seems like the sprites versus bobs discussion we had earlier.  A thread hardwired to do nothing but floating point costs something for every program while a thread allocated by the IEEESingBas.library would only be allocated if floating point is used by the calling program.  If a Robin thread couldn't be allocated, then IEEESingBas.library would fail to open and the host program would report an error.  This would not cause a deadlock.

@OneThousand

Indeed, I see this as a prerequisite decision to the other FPU issue.  If we can have a channel of communication between the 68050 and each of the threads used by the program mapped into the FPU instruction set of the 68050, then we can make the libraries efficient enough that a hardwired solution wouldn't be necessary.

If we can't figure out a "mailbox" of sorts between threads, then we'll be forced to communicate using main memory as the intermediate step which will be slow.

One Thousand
USA

Posts 832
03 Nov 2009 15:42


Yes, this is how I see it too.  I also think it is an important matter to consider, even before having answers to the FPU implementation.  I think Robin should be a priority.

Samuel D Crow
USA
(Natami Team)
Posts 1295
04 Nov 2009 02:38


How about this:  Robin will have control registers mapped into memory.  68050 will communicate with Robin by passing parameters using a MOVEM.L to or from an absolute address in Robin's control register area.  This way we won't even need to add any instructions to the 68050 instruction set and the Natami will work with the 68060 plugged in also.

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
04 Nov 2009 07:01


Regarding the communication of Robin-threads.
My concept is not finished and there is room for improvements.

The main idea is that Robin has multiple Program-counters, one for each thread. These program-counters will sit somewhere in the DFFxxx area. The 68K can read them and write these PC's.

Robin will do its own DMA to fetch data into registers and can do its own DMA to store registers in memory. Robin can also store register in the CHIPSET DFFxxx area. Thereby Robin can poke AUDIO-output, control the Blitter and can control the TAMI.

The are several options on controlling the starting and finishing of a Robin threads.
- To start a ROBIN thread the 68K will need to set the PC to the start of the Robin routine.
- If a Robin thread is finished it could shut down it self by jumping to an illegal address e.g. FFFFFFFF. An ODD PC address will disable ROBIN.
- The 68K could set the PC of Robin pointing to a Robin Routine which saves or restores all Robin registers into memory. When the registers are saved the PC could be set to the new Robin routine.
This way you can in theory even run an infinite number of tasks on Robin.
- I theory a Robin HW-thread could set poke the PC of another Robin-HWthread, this was Robin could start/stop and control itself. ("Its alive Igor!" *evil laugh* muhahahaa)

Each Robin HW-thread has its own set of 64-Registers.
We could slightly change this so that some 1-2 registers are physical the same register for all threads. This way Robin threads could set semaphore to each other using their register set.

I think the above is simple but gives us a lot of freedom i designing complex solutions.

In some ways, Robin works very similar to the Copper.

There is one challenge that I see so far:
The copper did not need a cache, but to get a good performance ROBIN will need an instruction cache.
Currently I'm unsure what the best way of designing the cache or buffer will be.
We talked creating a ZERO-Page local-store with for example 4KB.
Such a zero-page will work fine but it will limit the size of the ROBIN programs - unless ROBIN itself will reload more routines to the Buffer per DMA.
We have to find out what size is enough for most task and if a real dynamic cache would not be better allowing reloading of more instructions. If we have a dynamic cache then the best might be the split it from the beginning into individual cache for each thread. This way we can guarantee that one thread will ot push out the other from the cache.
A simple solution might be to have a direct mapped 1KB Cache per thread. The cache will reload new content automatically per DMA. This way a ROBIN program will have the possibility to have infinity size.

With a small individual cache each Robin thread will behave a little bit like a Super-Powerful 68020 CPU.

Any comments, ideas?

Cheers

Claudio Wieland
Germany
(Natami Team)
Posts 703
04 Nov 2009 11:16


Awesome ideas. Robin DMA-feeding itself sounds very nifty. If I get it right, Robin will multiply the system's performance by a manifold this way ^^ . This is going to be a hell of a little machine :-))

Marcel Verdaasdonk
Netherlands

Posts 3975
04 Nov 2009 13:43


why have two registers, common in all?
If you would make them aware of each other wouldn't it me wise to just add to bit in the addressing to allocate to another register of another thread?

And yes 1Kb per thread could suffice, when updated regularly.

But how would we DMA 1 per thread or one for Robinn?

I think a DMA channel per thread would be cool. :P

Samuel D Crow
USA
(Natami Team)
Posts 1295
04 Nov 2009 16:14


As for the idea of local-store vs. cache, local store is only useful if you need to burst in large quantities of data sequentially.  For code this would be bothersome but for data it is sometimes useful.  Since most of the parameter passing to the chipset registers does not support burst fetches or stashes anyway, the main advantage to local-store is lost.  A direct-mapped cache for each thread is the best way to go.

Marcel Verdaasdonk
Netherlands

Posts 3975
06 Nov 2009 02:18


Hm, let's see where i see posible trouble.

DMA 4 or 1 channels? (this is important to the addressing bits used)
cache if you have one 4kb one thread could well indeed claim it all pushing the others out of it.
Having it divided between between the threads could solve this.
cache would have to be flushed because selfmodifying code atleast less without a bcc.

well communication between the bus and the unit are more clear but how about between the threads?

Samuel D Crow
USA
(Natami Team)
Posts 1295
06 Nov 2009 02:34


@Marcel

Cache snooping might be preferred over a full cache dump but that takes a lot of LU, I think.  Especially with 4 caches separate from each other.

Marcel Verdaasdonk
Netherlands

Posts 3975
06 Nov 2009 02:40


Samuel to be honest i would be amazed if it realy would be more then a over glorified buffer.

One Thousand
USA

Posts 832
06 Nov 2009 02:58


I favor having buffers over caches right now.  If a nice mechanism is set up to do it, I think that would alleviate some of the pain.  The explicit loading allows Robin to be proactive (hopefully ready for when you want to move on) instead of reactive (waiting when something misses).  Robin is intended for small media loops, not big programs.

I think a good way to look at this is with the small kernel idea that many GPUs or media processors use.  Load up a kernel or tool, work on it.  Load up another that is needed soon.  Move to that, eject what you are not using.  And so on.

Perhaps there is a good way to encapsulate them as objects where addressing is taken care of more easily.

Samuel D Crow
USA
(Natami Team)
Posts 1295
07 Nov 2009 01:47


Ok One Thousand.

If that is the case then we'll use a local-store type of idea.  It will be for code only though in this version.  Who needs a data storage area when you have 62 general purpose registers and a program counter addressable by the instruction set?  Not many people!  Especially when there is a way to pull in 4 registers at a time with a burst fetch.  The only time you'd need more than that burst fetch is if you have fully interleaved memory like RAMBus memory.  Other than that?  Hardly ever!

Now the question is:  Will the registers that control the local store be memory mapped or will they be mapped into the registers of the Robin itself?  I think they should be memory mapped so that the local store can be prefetching and then execute the program counter when memory is finished fetching.  Otherwise there would be too many things tying up the bus at once.

One Thousand
USA

Posts 832
07 Nov 2009 02:16


I agree that the localstore be for code only.  As you said, 62 GPRs with a 4 register burst is rather nice.  The only only thing that does bug me a bit with having so many GPRs is that all the loops may be bigger because you must explicitly address each register.  Having a level of indirection might be handy for that.  Although, without loop acceleration, the smaller loops take more maintenance.  *Sigh*

Memory mapped control registers sound good to me.

Were there any more thoughts on thread communication?  I guess this could include how to manage "tools"/RobinObjects in a nice and easy fashion. 

In the other thread, it sounded like Gunnar was getting geared up for a go on a small first version, and this is critical stuff.

Samuel D Crow
USA
(Natami Team)
Posts 1295
07 Nov 2009 02:33


I was thinking that maybe one bank of 3 or 4 GPRs in addition to the Program Counter could be memory addressable so that the 68050/060 could read and write some of the values to the Robin thread before the program counter was activated.  This would also allow other threads to write to each other as well.

My only concern with this method is how do we handshake with the '050/060?  A halt instruction in the Robin core would probably be implemented by an ORI #1,PC to set it to an odd address.  Perhaps a better halt would be to set the PC to 0 using an OR R0,R0,PC.  That way the completion of the calculation by the robin core could be indicated to the CPU with a single TST.L to the address of the Program Counter register of the Robin thread.

What do you think of that idea?

One Thousand
USA

Posts 832
07 Nov 2009 02:50


Yeah, handshaking is also a concern.  I think the biggest question in my mind right now is "WHAT exactly is being communicated?"

Samuel D Crow
USA
(Natami Team)
Posts 1295
07 Nov 2009 02:57


Well, I'm thinking that for the IEEESingBas.library we're going to have to pass two floats in and get a float out for most operations.  The trick is going to be knowing when the operation is done since the Robin and CPU have different clock speeds.  Knowing the value of the Robin threads program counter is going to be a key function to know.  If the program counter is forced to NULL whenever an illegal value is fed into it, that will make it easy to know when the subroutine in the Robin thread is complete.

Marcel Verdaasdonk
Netherlands

Posts 3975
07 Nov 2009 04:07


Uhem, only one would be used by the CPU.
Wouldn't the others be DMA driven like the rest of the chi[set?

posts 35page  1 2