Welcome to the Natami / Amiga ForumThis forum is for AMIGA fans interested in the new NATAMI platform.
Please read the forum usage manual.
|
Welcome to the Natami lounge. Meet new AMIGA friends here and enjoy having a friendly chit chat. |
| (Non-68K) Supervisor Entry Models? | page 1 2
|
|---|
|
|---|
Marcel Verdaasdonk Netherlands
| | Posts 3991 02 May 2012 23:50
| Okay Megol, you set it to supervisor mode now what? Oh wait you need instructions ah. ;) TAS is required for semaphores, why you want it to be atomic so you can prevent a potential lockup.
| |
Thomas Richter Germany
| | (MX-Board Owner) Posts 1425 03 May 2012 08:28
| deep sub micron wrote:
| But now back to cache pollution: I wonder what the difference is between a function call and a syscall? Both should pollute the cache. If yes then same tricks for reducing syscalls should also speed up library calls that also massively pollute cache.
|
This is where I believe the article is not quite fair. Of course this will cause cache polutions, no matter whether this is a syscall or a function call. There are a couple of noteworthy points, though:First, it *might* make a difference if address spaces are switched (by means of the MMU). This first means that at least the MMU address translation cache needs to be refilled (probably a couple of 100 cycles, not much), it *could* also mean that user code/data and Os code/data share the same addresses and hence occupy the same cache lines. But this is not a problem of the calling mechanism at all. What the authors propose will, on a non-unified cache, instead polute the cache of another processor (namely that executing the Os call) and in that sense prevent the cache polution. However, all the x86 have a unified cache, except for the L1 cache as I remember, so I wonder whether that actually makes much of a difference. At least the L2 and L3 caches will be poluted, and the data deposited by the "Os core" will still have to find its way from the L1 cache of the "Os core" to the L1 cache of the "user core", so why that is any better than poluting the L2 cache I do not know. deep sub micron wrote:
| And on AmigaOS, where no mode switch is required (since all is usermode), is a syscall causing a similar overhead only due to cache pollution?
|
The same type of overhead, at least. Unavoidable, if you execute a function, and going to supervisor mode itself is *not* the costy operation. However, caches are much smaller for the 68K, and the difference between core clock and memory clock is not as extreme as it is for the x86 cores. Hence, it really doesn't matter that much.
| |
Thomas Richter Germany
| | (MX-Board Owner) Posts 1425 03 May 2012 08:31
| deep sub micron wrote:
| If a mode switch is caused by an exception then processor usually assumes the exceptionally occurring exception is unlikely and puts the code after the syscall into the pipeline. In a syscall case the probability for an exception is 100% and the pipeline need to be fushed.
|
I don't quite see the point. Of course the processor knows where execution continues after a line-A trap - in the line-A exception handler of course, where else? The question rather is whether the CPU designer was smart enough to actually implement this prefetch.Whether you call it "syscall" or "line-A" doesn't really make any difference. deep sub micron wrote:
| That is a good reason to introduce a syscall opcode where a processor can predict the system mode switch and can fill the pipeline with the correct stuff. So in this case there should be no pipeline flush anymore.
|
There shouldn't be in general. But leaving that aside, the 68020 had this "syscall opcode" you mentioned. It was called "callm", though nobody ever used it.
|
|
|
Thomas Richter Germany
| | (MX-Board Owner) Posts 1425 03 May 2012 08:34
| Marcel Verdaasdonk wrote:
| TAS is required for semaphores, why you want it to be atomic so you can prevent a potential lockup.
|
In a single core system (as the Amiga) TAS is completely superfluous. What do you need atomicity for if there is only one CPU in the system in first place? (-;Otherwise, TAS is a bit limited. Arguebly, there is CAS (which is more useful), and except on the 060, even CAS2 (definitely useful for lock-free queues), though something as an atomic add or atomic subtract is missing. The x86 have these (or rather, have the lock-prefix, which does quite the same). They do come handy at times.
| |
Nixus Minimax Germany
| | Posts 275 03 May 2012 09:57
| Thomas Richter wrote:
| | it *could* also mean that user code/data and Os code/data share the same addresses and hence occupy the same cache lines. |
In my opinion this is quite likely. In a four-way associative cache the probability should be 1/4 if the user code fits in one cache frame. If at least one of the user code or the supervisor code is larger than that, the probability will rise correspondingly. | But this is not a problem of the calling mechanism at all. |
I think that was the original point. | What the authors propose will, on a non-unified cache, instead polute the cache of another processor (namely that executing the Os call) and in that sense prevent the cache polution. However, all the x86 have a unified cache, except for the L1 cache as I remember, so I wonder whether that actually makes much of a difference. |
Well, a polluted L1 will hurt more than a polluted L2 because read latencies grow while reading through the hierarchy. I think the idea of executing OS/library calls on a dedicated core is quite good. But generally this will only have some benefit with processor load being relatively low. With higher loads, the probability of continuing the process having code in the cache without execution of any intermediary processes polluting that same cache is too low.
| |
Marcel Verdaasdonk Netherlands
| | Posts 3991 03 May 2012 15:20
| Thomas Richter wrote:
|
Marcel Verdaasdonk wrote:
| TAS is required for semaphores, why you want it to be atomic so you can prevent a potential lockup. |
In a single core system (as the Amiga) TAS is completely superfluous. What do you need atomicity for if there is only one CPU in the system in first place? (-; Otherwise, TAS is a bit limited. Arguebly, there is CAS (which is more useful), and except on the 060, even CAS2 (definitely useful for lock-free queues), though something as an atomic add or atomic subtract is missing. The x86 have these (or rather, have the lock-prefix, which does quite the same). They do come handy at times.
|
That is your misconception Thomas you know what the chipset are a bunch of co processors who can access memory. There for you must know TAS breaks. I only started about it because it is one of the few atomic instructions the 68K has. it's not my fault the chipset doesn't play fair and share. ;)
| |
Megol .
| | Posts 690 04 May 2012 13:55
| Marcel Verdaasdonk wrote:
|
Thomas Richter wrote:
| Marcel Verdaasdonk wrote:
| TAS is required for semaphores, why you want it to be atomic so you can prevent a potential lockup. |
In a single core system (as the Amiga) TAS is completely superfluous. What do you need atomicity for if there is only one CPU in the system in first place? (-; Otherwise, TAS is a bit limited. Arguebly, there is CAS (which is more useful), and except on the 060, even CAS2 (definitely useful for lock-free queues), though something as an atomic add or atomic subtract is missing. The x86 have these (or rather, have the lock-prefix, which does quite the same). They do come handy at times. |
That is your misconception Thomas you know what the chipset are a bunch of co processors who can access memory. There for you must know TAS breaks. I only started about it because it is one of the few atomic instructions the 68K has. it's not my fault the chipset doesn't play fair and share. ;)
|
In what way do you synchronize with the coprocessors that need TAS? TAS/CAS or other synchronization primitives have nothing to do with supervisor entry nor do interactions with coprocessors.
| |
Marcel Verdaasdonk Netherlands
| | Posts 3991 04 May 2012 19:59
| That is not what i am saying Megol. I am saying when using Semaphore TAS can be used for it!!! That is all i am saying and my why is because it's a atomic instruction. This also means no interrupt would stop a flag status change no matter when it takes place. Hence no faulty data because of a missing flag. Why would i need a second CPU before i can use this instruction when this already brakes when the Chipset is added to the equation.
| |
Thomas Richter Germany
| | (MX-Board Owner) Posts 1425 04 May 2012 20:59
| Marcel Verdaasdonk wrote:
| That is your misconception Thomas you know what the chipset are a bunch of co processors who can access memory.
|
And your point is...? The blitter couldn't respect TAS in first place (not only because none of the busses in the Amiga support the RMW-cycle, but also because it is simply not smart enough deduce any suitable action from it). Even in a multicore system TAS is of only limited usefulness, which is what I wanted to express. But that's all off-limits for the current Amiga hardware anyhow.
| |
Thomas Richter Germany
| | (MX-Board Owner) Posts 1425 04 May 2012 21:05
| Marcel Verdaasdonk wrote:
| That is not what i am saying Megol. I am saying when using Semaphore TAS can be used for it!!!
|
In a multi-core system, yes. In a multi-core system whose busses respect the RWM-cycle, to be more precise. None of which holds for the Amiga hardware.Marcel Verdaasdonk wrote:
| This also means no interrupt would stop a flag status change no matter when it takes place.
|
That is also true for bset #7,(an). It is not atomic, but there is no way an interrupt could interrupt the instruction in the middle of the execution, i.e. bset #7,(an) does exactly what TAS does if only a single core is in the system.The important aspect is that *no other processor* can modify the memory while the first processor is executing the TAS. This is *not* the case for bset. bset performs a read and a write, TAS performs a locked read-modify-write cycle. Marcel Verdaasdonk wrote:
| Why would i need a second CPU before i can use this instruction when this already brakes when the Chipset is added to the equation.
|
It breaks for a different reason, namely that the bus cycle is not suported by the chipset. Except that nothing in the chipset is actually clever enough to do anything useful with a flagged bit. IOW, there are simpler means to synchronize the blitter to the CPU.
| |
Marcel Verdaasdonk Netherlands
| | Posts 3991 04 May 2012 21:59
| Thomas Richter wrote:
| Marcel Verdaasdonk wrote:
| This also means no interrupt would stop a flag status change no matter when it takes place. |
That is also true for bset #7,(an). It is not atomic, but there is no way an interrupt could interrupt the instruction in the middle of the execution, i.e. bset #7,(an) does exactly what TAS does if only a single core is in the system. The important aspect is that *no other processor* can modify the memory while the first processor is executing the TAS. This is *not* the case for bset. bset performs a read and a write, TAS performs a locked read-modify-write cycle. |
ThoR your a great software developer i am sure you could sell another revision when the rest of the team gets it in their heads to add those maid cores they've been talking about.of course this is just conjecture and assumption of you 1 selling software and 2 the other members wanting to add more processing power.
| |
SID Hervé France
| | Posts 666 04 May 2012 22:55
| Can we imagine that the CPU as the chipset are just clients of the memory part? Maybe this could help to solve the problem?
| |
Thomas Richter Germany
| | (MX-Board Owner) Posts 1425 05 May 2012 19:35
| Marcel Verdaasdonk wrote:
| ThoR your a great software developer i am sure you could sell another revision when the rest of the team gets it in their heads to add those maid cores they've been talking about. of course this is just conjecture and assumption of you 1 selling software and 2 the other members wanting to add more processing power.
|
Note that I'm not suggesting to make any changes to the architecture. I'm just noting the implications of TAS and the (current) Amiga infrastructure.
| |
Marcel Verdaasdonk Netherlands
| | Posts 3991 13 May 2012 05:58
| Børge Nøst wrote:
| Does anyone have some good links for different models for going from user to supervisor mode (or generally from a lower to a higher rights level)? I am thinking of something like a bit in the mmu tables saying that code in that page has the right to change the privilege mode. That would probably need some kind of "COMEFROM"/"PREAMBLE" instruction that marks the start of a function so other code can't just jump into the code stream and hijack the escalate. Or are there other architectures that have cheap escalation models instead of the typical interrupt like type?
|
Børge your on the track to repeating something like the Mach kernel. You could read up on the L4 Kernel which has very little overhead.
| |
Megol .
| | Posts 690 13 May 2012 12:34
| Marcel Verdaasdonk wrote:
|
Børge Nøst wrote:
| Does anyone have some good links for different models for going from user to supervisor mode (or generally from a lower to a higher rights level)? I am thinking of something like a bit in the mmu tables saying that code in that page has the right to change the privilege mode. That would probably need some kind of "COMEFROM"/"PREAMBLE" instruction that marks the start of a function so other code can't just jump into the code stream and hijack the escalate. Or are there other architectures that have cheap escalation models instead of the typical interrupt like type? |
Børge your on the track to repeating something like the Mach kernel. You could read up on the L4 Kernel which has very little overhead.
|
Again your post doesn't make any sense. What do the Mach kernel have to do with another type of supervisor entry? What does the L4 kernel have to do with the same?
| |
Nixus Minimax Germany
| | Posts 275 13 May 2012 14:01
| Megol . wrote:
| | Again your post doesn't make any sense. What do the Mach kernel have to do with another type of supervisor entry? What does the L4 kernel have to do with the same? |
If there was a PM system on this site, he could explain it to you...
| |
Marcel Verdaasdonk Netherlands
| | Posts 3991 13 May 2012 15:37
| The overhead caused by the IPC in the mach kernel makes it so slow.(security inside the kernel made the IPC costly) What overhead I exactly meant you can read in the wikipedia article which is quite complete IMHO.L4ka is a minimalistic micro-kernel without this overhead. Again google is your friend. ;) And if it still makes no sense, your not looking in the right places.
| |
Megol .
| | Posts 690 13 May 2012 21:46
| Marcel Verdaasdonk wrote:
| The overhead caused by the IPC in the mach kernel makes it so slow.(security inside the kernel made the IPC costly) What overhead I exactly meant you can read in the wikipedia article which is quite complete IMHO. L4ka is a minimalistic micro-kernel without this overhead. Again google is your friend. ;) And if it still makes no sense, your not looking in the right places. |
I'm very familiar with L4, the work of Liedtke and the current work in the area including the of the OKL. Still doesn't make any sense as the same entry model could be used for L4 which (on x86) have used interrupts and then the faster sysenter instruction. In fact newer versions of L4 have abstracted the kernel entry so to make it transparently changeable. However this is very OT so I'll just stop replying. Edit: forgot the t in Liedtke :/
| |
|