 |
Welcome to the Natami / Amiga ForumThis forum is for AMIGA fans interested in the new NATAMI platform.
Please read the forum usage manual.
|
Do you have questions about the Natami? Post it here and we will answer it!
|
| So, MMU, What's the Story? | page 1 2 3 4
|
|---|
|
|---|
Fabian Nunez USA
| | Posts 312 31 Aug 2009 19:43
| @Gunnar The idea of having two versions of the softcore CPU, one for debugging and one for performance, is IMHO a great way to resolve this dilemma. If one can choose which FPGA config file gets loaded at boot time (like one can with the C-One), that would in my mind completely solve the problem.
| |
Matt Hey USA
| | Posts 734 31 Aug 2009 21:16
| Thomas Richter wrote:
| The problem is not getting the sources of exec. The problem is getting the rights on exec. There isn't much source, actually, exec is pretty tiny, and it doesn't do much.
|
The first problem is finding out who owns the rights to exec. Obtaining the source would be very helpful. It would be commented and would show the tricky inner workings and initialization. Reworking the functions would be the easy part. I believe Piru's exec.library optimized the functions but used an existing exec.library core. This would still violates copyright laws. In order to use any substantial amount of existing code would require licensing. AROS exec.library is an option but it would probably be substantially slower than the existing one.
| |
Thierry Atheist Canada
| | Posts 1828 31 Aug 2009 21:32
| bernd afa wrote:
| yes, exec library is so small and thin that also AROS exec is very compatible.
|
Considering that the exec.library is such, and with all the crazy fast optimizations that the 68K is getting, are we close to RTOS speeds??? Can context switching be fast enough in this new SUPER Amiga to achieve this?I just love the Amiga. :-)
| |
Wawa Tk Germany
| | Posts 581 31 Aug 2009 22:53
| Matt Hey wrote:
| | I believe Piru's exec.library optimized the functions but used an existing exec.library core. This would still violates copyright laws. |
i dont think pirus exec violates anything. if i recall right it is distributed as a patch that has to be applied via spatch to original exec.library. i think im using pirus exec as resident from flashrom, it identifies itself as 45.20, right? Matt Hey wrote:
| In order to use any substantial amount of existing code would require licensing. AROS exec.library is an option but it would probably be substantially slower than the existing one.
|
hmm. to be honest i didnt think much about that up till now, but since bernd says his afaos exec_lib.exe is an almost complete replacement of exec.library functions one could measure the speed differences. since i have pirus exec in rom, the only thing to compare it to aros exec would be to activate and deactivate afaos in the s-s. if only anyone would put together a benchmark.
| |
Matt Hey USA
| | Posts 734 01 Sep 2009 00:03
| wawa tk wrote:
| i dont think pirus exec violates anything. if i recall right it is distributed as a patch that has to be applied via spatch to original exec.library. i think im using pirus exec as resident from flashrom, it identifies itself as 45.20, right?
|
Yes, I believe you are correct and I was wrong. Distributing the patch would probably be legal. exec.library 45.20 is the latest from AmigaOS 3.9 BB2. However, Piru's exec.library 44.1 could have it's version string changed. hmm. to be honest i didnt think much about that up till now, but since bernd says his afaos exec_lib.exe is an almost complete replacement of exec.library functions one could measure the speed differences. since i have pirus exec in rom, the only thing to compare it to aros exec would be to activate and deactivate afaos in the s-s. if only anyone would put together a benchmark.
|
I wonder what functions he replaced. Two of the most time consuming and common are memory allocation and copying. There is already benchmarks for that. I believe there may be a benchmark program in the TLSF package for memory allocation and I state where to find a benchmark and how to test CopyMem() & CopyMemQuick() in the readme of my CopyMem patch... EXTERNAL LINK
| |
Golem X
| | Posts 46 01 Sep 2009 01:42
| Thierry Atheist wrote:
|
bernd afa wrote:
| yes, exec library is so small and thin that also AROS exec is very compatible. |
Considering that the exec.library is such, and with all the crazy fast optimizations that the 68K is getting, are we close to RTOS speeds??? Can context switching be fast enough in this new SUPER Amiga to achieve this? I just love the Amiga. :-)
|
Real time systems is not about speed, it is about predictability. AmigaOS is not a RTOS.
| |
Wawa Tk Germany
| | Posts 581 01 Sep 2009 02:39
| @matthey: arrgh! one day people have to make some appointments, who patches what and when.. now when you say that i start to suspect your mem copy patch has no effect on my system at all. it is applied before afa_os runs its patches so the functions are very likely patched again. matthey, dont you think it would maybe be good idea to incorporate your work into afa, or even to pull some kind of cooperation with bernd?? also a good thing would be perhaps to get kind of chart of functions provided by amiga system libraries, including existing replacements and optimisations? edit: uhm, found a nice overview of fd files on this very page (although it refers to os2 i think), but running scout i might identify patches. CLICK HERE
| |
Matt Hey USA
| | Posts 734 01 Sep 2009 05:31
| wawa tk wrote:
| @matthey: arrgh! one day people have to make some appointments, who patches what and when.. now when you say that i start to suspect your mem copy patch has no effect on my system at all. it is applied before afa_os runs its patches so the functions are very likely patched again. matthey, dont you think it would maybe be good idea to incorporate your work into afa, or even to pull some kind of cooperation with bernd?? also a good thing would be perhaps to get kind of chart of functions provided by amiga system libraries, including existing replacements and optimisations? |
That is one of the problems with SetFunction(). The last SetFunction() of a function is the one installed. Bernd's AFA code, AROS, Piru's code and my code are free as far as I know. It's just a matter of figuring out which parts are the best and putting them together in one package. Even that is not so simple. I try to get programmers to use my functions so my "patches" disappear but many don't trust me or my patches. Dieterg (MCP) and Thomas Richter (mu 68060.library) were not very receptive to using my code even though it's faster and the source is freely available. I would like to see a clean assemble-able open source exec.library that incorporates TLSFmem (Chris Hodges), some mmu awareness from Thomas Richter's code, and some speed enhancements from Piru's code and mine. I think I could assemble the functions but I don't know if I could figure out the inner workings in a reasonable amount of time. You should probably leave my CopyMem patch out of the startup-sequence and just run it by double clicking on the icon after boot-up when testing. Speed results should then be AFA if it patches CopyMem() and CopyMemQuick(). You would then turn off AFA and reboot to test original AmigaOS. Just make sure you don't have MCP or similar in your WBStartup or elsewhere. To verify that my code is being used, look at the CopyMem() function in Scout. The first instruction should be a subq.l #$4,d0 if it's mine. It will be a moveq.l #$c,d1 if it's the real exec.library 45.20.
| |
Wawa Tk Germany
| | Posts 581 01 Sep 2009 06:19
| @matthey, bernd: your posts motivated me to run some testst tonightim too dumb to run snoopy script matthey proposes for testing so i used the amigamark (http://natmeg.stamey.at/downloads/amigamark/) i tried different loop lengts and of course turned off the cache for the most time. ive put the mattheys patches behind afaos init in the s-s in that way it was executed last. i could verify that it is loaded via scout. but anyway i do not see that afaos patches any copymem functions. what concerns exec.library the following is patched by exec.lib.exe: RemHead, CreateMsgPort, DeleteMsgPort, CeratePool, DeletePool, AllocPooled, FreePooled. the rest is mostly patched by ramlib. well, now to the results: matthey seems to be the winner! with turned off cache i have on fast2fast copy following results: original exec: 9.62 mb/s afaos: 9.62-20.70 mb/s depending on the loop length afaos with matthey ontop: 18.57-20.70 mb/s btw: here is most actual ndk3.9 containing also fd files, autodocs and such. EXTERNAL LINK
| |
Gunnar von Boehn Germany
| | (Moderator) Posts 5775 01 Sep 2009 06:29
| @wawa: Turning of the caches could give strange results. Don't you agree thats its unrealistic to measure the performance with caches turned off.Cheers
| |
Wawa Tk Germany
| | Posts 581 01 Sep 2009 06:31
| @matthey: i dont know how to look at a function in scout, but its positively your patch loaded. im am all for assembling common optimized exec as well as the rest of kickstart modules. looking at how much original commo code is already replaced by different patches it seems its like already half a way to an open source replacement.
| |
Wawa Tk Germany
| | Posts 581 01 Sep 2009 06:52
| @gunnar: the average results with caches turned on were still best for mattheys patch together with afa_os (loop length 64): exec (45.20): 35.~ afaos: 35,~ matthey: 37,~ afaos/matthey: 39.~ (all mb/s) i left out the results behind coma, they differ too much.
| |
Thomas Richter Germany
| | (MX-Board Owner) Posts 1425 01 Sep 2009 07:35
| Thierry Atheist wrote:
| Considering that the exec.library is such, and with all the crazy fast optimizations that the 68K is getting, are we close to RTOS speeds??? Can context switching be fast enough in this new SUPER Amiga to achieve this? I just love the Amiga. :-)
|
No. RTOS doesn't require any specific "speed" to be obtained. It rather requires an operating system that can ensure a reaction within a given timespan. AmigaOs cannot do that simply because Forbid() and Disable() exist. If a program decides to call Forbid(), no other task can run, and thus one cannot guarantee that for a given input the machine reacts in a given time because it is up to the programs whether they call Forbid(), and for how long they hold Forbid().Greetings, Thomas
| |
Thomas Richter Germany
| | (MX-Board Owner) Posts 1425 01 Sep 2009 07:44
| Matt Hey wrote:
| Bernd's AFA code, AROS, Piru's code and my code are free as far as I know. It's just a matter of figuring out which parts are the best and putting them together in one package. Even that is not so simple. I try to get programmers to use my functions so my "patches" disappear but many don't trust me or my patches. Dieterg (MCP) and Thomas Richter (mu 68060.library) were not very receptive to using my code even though it's faster and the source is freely available. I would like to see a clean assemble-able open source exec.library that incorporates TLSFmem (Chris Hodges), some mmu awareness from Thomas Richter's code, and some speed enhancements from Piru's code and mine. I think I could assemble the functions but I don't know if I could figure out the inner workings in a reasonable amount of time. You should probably leave my CopyMem patch out of the startup-sequence and just run it by double clicking on the icon after boot-up when testing. Speed results should then be AFA if it patches CopyMem() and CopyMemQuick(). You would then turn off AFA and reboot to test original AmigaOS. Just make sure you don't have MCP or similar in your WBStartup or elsewhere. To verify that my code is being used, look at the CopyMem() function in Scout. The first instruction should be a subq.l #$4,d0 if it's mine. It will be a moveq.l #$c,d1 if it's the real exec.library 45.20.
|
Folks, what is this about speed? Did anyone ever measure a specific speed impact by micro-optimizing exec? The majority of programs do not even call CopyMem resp. CopyMemQuick, but rather use what the compiler generated for memcpy() instead. Even that would only make a minor difference, unless you're moving MBs of data around. And if you do that, you have more likely a broken algorithm requiring you to shuffle data than a slow exec."Premature optimization is the root of all evil". If the machine isn't fast enough, use profiling to identify the bottle neck, then fix that. In *this* order. Unless somebody can convince me that any particular function of exec is a specific bottle neck, just leave it alone. The current implementation has a very important advantage: *IT WORKS*, and it did so for years. Heinz for example replaced the memory pool functions of exec, with some using a smarter algorithm. Nice move, except that it made the RAM-disk slower due to its "dump" usage of such pools. I'm against risking the stability of the machine by making changes that haven't been tested out in the wild. So long, Thomas
| |
Matt Hey USA
| | Posts 734 01 Sep 2009 08:09
| wawa tk wrote:
| @matthey: i dont know how to look at a function in scout, but its positively your patch loaded. im am all for assembling common optimized exec as well as the rest of kickstart modules. looking at how much original commo code is already replaced by different patches it seems its like already half a way to an open source replacement.
|
Run Scout, click on Libraries, click on exec.library, click on Functions, click on the function, click Disassemble. The function names will only show if you have the FD files in the proper assignments. For Snoopy 2.0, drop the CopyMem.script in the directory with other scripts and double click on it. Read ram:Snoopy.txt for output. The output can go to a CON: (change tooltypes) but it might not be able to keep up. The output is good for debugging and profiling but not benchmarking. Much of the work of writing a new exec.library is already done and tested. Some is not very well documented or commented though.
| |
Matt Hey USA
| | Posts 734 01 Sep 2009 09:26
| Thomas Richter wrote:
| Folks, what is this about speed? Did anyone ever measure a specific speed impact by micro-optimizing exec? The majority of programs do not even call CopyMem resp. CopyMemQuick, but rather use what the compiler generated for memcpy() instead. Even that would only make a minor difference, unless you're moving MBs of data around. And if you do that, you have more likely a broken algorithm requiring you to shuffle data than a slow exec.
|
Bernd measured several frames per second difference with SDL RedAlert when CopyMem() is optimized. AWeb uses CopyMem() extensively and I can tell a difference when my CopyMem() patch is active. Many programs use these functions a lot including the AmigaOS. Use my Snoopy script and log them. I hope you have a lot of memory. My patch gives a speedup of several hundred percent in some small copies. It's never slower. Memory Copying is CPU intensive and all those cycles add up. Take a look at the old MacOS for example. It had the same processor but more overhead in the OS and it was SLOW and unresponsive. The extra overhead of going through a CPU trap (A-line?) and passing registers on the stack instead of in registers did make a difference. If the machine isn't fast enough, use profiling to identify the bottle neck, then fix that. In *this* order.
|
What if the bottleneck is in the OS? Then everybody rolls there own routines which causes bloat and bangs the hardware to get maximum speed. If the OS routines are efficient, then people will use them. The concept of shared libraries and the OS is useless if programmers don't use it. Unless somebody can convince me that any particular function of exec is a specific bottle neck, just leave it alone. The current implementation has a very important advantage: *IT WORKS*, and it did so for years.
|
What's a bottleneck for one program is not for another. But when the OS becomes the bottleneck this creates a problem. The memory allocation and Copying routines DO create bottlenecks for some programs. Should we forget about using TLSFmem for memory allocation? It doesn't matter if our memory allocations are slow and grow slower with fragmentation? With out even trying it you will say it doesn't make a difference? Heinz for example replaced the memory pool functions of exec, with some using a smarter algorithm. Nice move, except that it made the RAM-disk slower due to its "dump" usage of such pools. I'm against risking the stability of the machine by making changes that haven't been tested out in the wild.
|
But it dosen't matter that it's slower because it's not a bottleneck, right? If TLSFmem were faster in all cases we should not use it either? Why did anyone bother creating the wheel if we could carry things on our back? Why does your 68060.library patch the utility.library functions at all if correct results are obtained going through the CPU trap? Why didn't you just stay with AmigaOS 1.3 and develop for it? Don't get me wrong. I am very conservative but I do take risks and I'm open minded. With no risk there is only laziness, stagnation and little return. I didn't just release some untested bug infested code. Efficient 68k code is like art to me and wasting it is like throwing a Picasso in the trash.
| |
Christian Kummerow Germany
| | Posts 314 01 Sep 2009 09:51
| Matt Hey wrote:
| If TLSFmem were faster in all cases we should not use it either?
|
Please dont include TLSFmem into exec, or make an Option. That gives incompatiblity with for example one Programm i use. cmp.020 - TLSFmem increase the speed of this alot but the results are wrong. That may not a problem with cmp inself, maybe too the CD-Rom or HD-drivers. For secure than i stop using TLSFmem, maybe there are other programms too that has a problem with it. Btw. Copymem i use alot in my programms.
| |
Wawa Tk Germany
| | Posts 581 01 Sep 2009 10:04
| @thomas richter: i must strongly support matthey. i already sacrifised maself as a lab rat a lot of thimes and never regret. if somthing works wrong one hast to rollback. one of improtant principles is not too much changes at once. 1) for instance im running afa_os which is completely stable except for a few known cases mostly related to warpos which im running too but to limited extend. for instance i resigned on wos datatypes for time being but since we try to improve mostly the 68k i do not care much. 2) i was runnign tlsf of chris hodges and it never gave me any problem in daily practice. except it doesnt work with wos. i didnt use your debug tools at that time so i didnt care. but i switched back for the time being. 3)im running matheys patch now on startup and it works without problem. otherwise i run a lot of patches in the background without which i could not live anymore like powerwindows and such. for me the system remains highly stable, except if there is sometimes something screwed with new software i test. like the issues with nowined and the two mui classes atm. but it will be fixed so why worry?
| |
Wawa Tk Germany
| | Posts 581 01 Sep 2009 10:07
| @christian: maybe it would be better just to fix cmp or tlsf, not to resign on it? but it being an option is fine and well. its done on morphos like that.
| |
Team Chaos Leader USA
| | (Moderator) Posts 2094 01 Sep 2009 10:23
| Thomas Richter wrote:
| | Folks, what is this about speed? Did anyone ever measure a specific speed impact by micro-optimizing exec? The majority of programs do not even call CopyMem resp. CopyMemQuick, but rather use what the compiler generated for memcpy() instead.
|
Blech! What self-respecting gamecoder would do that? All my games use mostly CopyMemQuick().
| |
|
|
|
|