 |
Welcome to the Natami / Amiga ForumThis forum is for AMIGA fans interested in the new NATAMI platform.
Please read the forum usage manual.
|
Do you have questions about the Natami? Post it here and we will answer it!
|
| N68k C / C++ Compilers. | page 1 2
|
|---|
|
|---|
Krystian Bacławski Poland
| | Posts 7 01 May 2011 06:07
| Hi, I can see a lot of people on this forum proposing ISA extensions for N68k. While having cool new instructions is great, it raises my concern whether any compiler is able to produce code utilizing these instructions. After all why spending any effort to implement something that only skilled assembly coder can make us of? IMHO being conservative on ISA extensions is good: 1) makes soft-core verification simpler (obviously - less instructions = less bugs), 2) probably saves logic gates, 3) leaves space for identifying mostly used instructions and optimize their execution, 4) by having feedback from microarchitecture design evolution one can provide better ISA extensions than just blindly guessing, 5) leaves time to optimize control/register/memory flow optimizations (dynamic/hybrid predictors, hardware prefetchers, etc.) 6) helps saving precious bits in ISA for really useful extensions which in most cases cannot be easily added (ie. transition to 64-bit architecture - 64-bit address space, quad-words). Anyway the post is not about making arguments about ISA. So the question is: Does NatAmi team plan to: 1) extend GCC 4.x backend and port 2.95.3 amiga specific changes to new frontend, or: 2) implement LLVM amiga-m68k backend and modify Clang? AFAIK last usable (and considered to be stable) compiler for Amiga M68k is SAS/C 6.58 (obviously discontinued) and GCC 2.95.3 (heavily outdated - 1999-2001). PS. I would be really interested to see N68070 sources or detailed design docs ;) Kind regards Krystian Baclawski
| |
Rune Stensland Norway
| | (MX-Board Owner) Posts 871 01 May 2011 07:59
| Samuel Crow from the Natami team is currently working on a LLVM port to AROS. There is a bounty for this project here: EXTERNAL LINK The fastest c-compiler for amiga is VBCC. It generates code in average 2 times faster code than gcc2.9 and sasc.The latest release is from 2009: EXTERNAL LINK
| |
André Jernung Sweden
| | (MX-Board Owner) Posts 988 01 May 2011 09:07
| IIRC the VBCC maintainers offered to add N050 support.
| |
Matt Hey USA
| | Posts 737 01 May 2011 17:54
| Krystian Baclawski wrote:
| After all why spending any effort to implement something that only skilled assembly coder can make us of? |
Although instruction enhancements that are easily used by a compiler are more important, it's useful to add enhancements that only experienced assembler programmers can take advantage of. Assembler inlines and linked assembler functions can be used in C programs where speed is needed. This is especially viable with the ease of use of the 68k and has been used a lot on the Amiga. These kinds of additions should be limited to low overhead enhancements though. Most of the N68k changes could be used by a compiler. Krystian Baclawski wrote:
| IMHO being conservative on ISA extensions is good: 1) makes soft-core verification simpler (obviously - less instructions = less bugs), |
This is true but it's mostly the core/backward compatible instructions that need to be bug free at first and changes can be made in an fpga. Krystian Baclawski wrote:
| 2) probably saves logic gates, |
I don't think this is so important any more. The fpga's are large enough that a few logic gates don't matter much. Many logic gates in an fpga can be used in parallel with no slowdown. It's still important not to be wasteful, but in many cases, the N68k changes will result in smaller code which allows to save logic gates in caches and memory for example. Krystian Baclawski wrote:
| 3) leaves space for identifying mostly used instructions and optimize their execution, |
There is still room for more instruction enhancements. Motorola/Freescale did extensive tests for the 68060 and ColdFire to identify what changes were most useful. The majority of the N68k changes come from the ColdFire. Other ideas could be changed before release. I would like to see tests of competing ideas for changes in the fpga. Some changes made to the N68k did come from evaluation. Keeping and accelerating the bit field instructions came from looking at GCC code that makes extensive use of them for example. Krystian Baclawski wrote:
| 4) by having feedback from microarchitecture design evolution one can provide better ISA extensions than just blindly guessing, |
Yes, this is important. We have looked at some of the newer designs. If you have any ideas you think are especially good and suited to the 68k then please post your ideas. Krystian Baclawski wrote:
| 5) leaves time to optimize control/register/memory flow optimizations (dynamic/hybrid predictors, hardware prefetchers, etc.) |
Prioritizing is important. From what Gunnar and Jens have said, the cache and memory controllers are coming along well. The branch prediction scheme seems to be less thought out at this point. I tried to start a discussion about it at one point but no one replied. Maybe you can try. I would be happy to discuss it although I'm no expert and not a hardware guy. Krystian Baclawski wrote:
| 6) helps saving precious bits in ISA for really useful extensions which in most cases cannot be easily added (ie. transition to 64-bit architecture - 64-bit address space, quad-words). |
I don't think 64 bit will be supported beyond very basic support that already exists in 68k. This was discussed and most thought that 64 bit support was unnecessary because... 1) 32 bit support is faster because... A) 64 bit multiplication and division are slower. B) The 68k is pipelined and optimized for 32 bit. 2) 64 bit addressing is not needed with efficient memory use. 3) The 68k can use instructions that use the extend flag and bit field instructions to do 64 bit integer math nearly as fast. 64 bit floating point math is supported in the FPU already. 4) The instruction encoding for 64 bit would be inconsistent and waste valuable encoding space for something slower. Krystian Baclawski wrote:
| AFAIK last usable (and considered to be stable) compiler for Amiga M68k is SAS/C 6.58 (obviously discontinued) and GCC 2.95.3 (heavily outdated - 1999-2001). |
The last versions of SAS/C and GCC 2.95.3 are still the best compilers on the Amiga IMHO. In my experience, GCC 3.4.0 is quite usable even though it has more bugs than 2.95.3 and doesn't optimize as well. VBCC is the most current Amiga compiler and shows great promise but needs more work. It is currently developed. The assembler for VBCC is vasm which is developed by Frank Wille. He has offered to add the new N68k support. This assembler already supports the ColdFire so enabling this support should be easier. GCC and VBCC both support the ColdFire also so only minor changes would be necessary to enable in these compilers. Vasm will likely be used for other compilers as well. Samuel Crow had talked about using it for LLVM 68k support eventually. Vasm could be used to replace the GCC compiler as it was designed to be mostly compatible. Krystian Baclawski wrote:
| PS. I would be really interested to see N68070 sources or detailed design docs ;) |
The following link is not current but was intended to give a basic overview... CLICK HERE
| |
André Jernung Sweden
| | (MX-Board Owner) Posts 988 01 May 2011 18:45
| Here is a more recent decoder description: EXTERNAL LINK
| |
Krystian Bacławski Poland
| | Posts 7 01 May 2011 19:31
| Matt Hey wrote:
| Although instruction enhancements that are easily used by a compiler are more important, it's useful to add enhancements that only experienced assembler programmers can take advantage of.
|
I agree. Complex instructions similar to Intel SSE/AVX or IBM Altivec are important to implement in long-term. But simple scalar instructions? I doubt so. Usually good C compiler is much better at utilizing simple instructions than any human being. It does all the dirty work related to register coloring, instruction scheduling, handling complex pipelines and superscalar features. Matt Hey wrote:
| Assembler inlines and linked assembler functions can be used in C programs where speed is needed. This is especially viable with the ease of use of the 68k and has been used a lot on the Amiga.
|
Seriously, do you want Amiga developers to remain in '70/'80 mentality model? Assembly was used extensively on Amiga because for a long time there was no good C compiler and ABI itself does not integrate very well with C. Matt Hey wrote:
| This is true but it's mostly the core/backward compatible instructions that need to be bug free at first and changes can be made in an fpga.
|
Anyway... I perceive it as something that distracts softcore developers from focusing on real issues. Matt Hey wrote:
| I don't think this is so important any more. The fpga's are large enough that a few logic gates don't matter much. Many logic gates in an fpga can be used in parallel with no slowdown. It's still important not to be wasteful, but in many cases, the N68k changes will result in smaller code which allows to save logic gates in caches and memory for example.
|
Being aware of resource usage is always desired ;-) I doubt the number of additional logic gates is small in this case. Still there's no compiler nor software which can utilize these instructions, so why bother? Because it's easy and fun? Matt Hey wrote:
| There is still room for more instruction enhancements. Motorola/Freescale did extensive tests for the 68060 and ColdFire to identify what changes were most useful. The majority of the N68k changes come from the ColdFire.
|
I agree. Motorola dumped m68k prematurely and room for exploration is still there although professionals think different. Allow me to quote from "Modern Design Processor: Fundamentals of Superscalar Processors" book: Although it was the fad of past decades, instruction set architecture (ISA) design is no longer a very interesting topic. We have learned a great deal about how to design an elegant and scalable ISA. However, code compatibility and the software installed base are more crucial in determining the longevity of an ISA. It has been amply shown that any ISA deficiency can be overcome by microarchitecture techniques.
|
Matt Hey wrote:
| Other ideas could be changed before release. I would like to see tests of competing ideas for changes in the fpga. Some changes made to the N68k did come from evaluation. Keeping and accelerating the bit field instructions came from looking at GCC code that makes extensive use of them for example.
|
This makes me even more concerned about N68k. I think that NatAmi project is missing a good compiler to help exploring N68k enhancement. If I understand you correctly evaluation was based on 2.95.3 compiler output. GCC has intermediate code representation called RTL which is mapped onto target's ISA. How can you know that there're no more efficient mappings (RTL->ISA) if you didn't provision compiler with additional transformation rules? Matt Hey wrote:
| Yes, this is important. We have looked at some of the newer designs. If you have any ideas you think are especially good and suited to the 68k then please post your ideas.
|
The road to hell is paved with good intentions ;-) I'd only post my idea if I was convinced I had a solid proof it brings benefits. Matt Hey wrote:
| The branch prediction scheme seems to be less thought out at this point. I tried to start a discussion about it at one point but no one replied. Maybe you can try. I would be happy to discuss it although I'm no expert and not a hardware guy.
|
I'm not a hardware guy neither. If you're interested in branch predictors I suggest reading chapter 5.1 and 9 from "Modern Processor Design" book. Matt Hey wrote:
| I don't think 64 bit will be supported beyond very basic support that already exists in 68k. This was discussed and most thought that 64 bit support was unnecessary because...
|
I think different but I'll not argue with you about that. For me it's obvious that if NatAmi is going to be successful then at some point N68k will have to provision for 64-bit address space. Eventually VAX was limited to 32-bit address space which effectively killed that architecture. Matt Hey wrote:
| VBCC is the most current Amiga compiler and shows great promise but needs more work.
|
Knowing maturity of GCC optimization techniques, VBCC is not a real competitor. Correct me if I'm wrong. Matt Hey wrote:
| Vasm will likely be used for other compilers as well. Samuel Crow had talked about using it for LLVM 68k support eventually. Vasm could be used to replace the GCC compiler as it was designed to be mostly compatible.
|
Ouch. I think you messed everything up. GCC + binutils still have support for m68k (but no AmigaOS ABI or link file format) - no assembler needed there. If there were m68k backend for LLVM it would need no assembler neither.
| |
Krystian Bacławski Poland
| | Posts 7 01 May 2011 19:46
| S P wrote:
| Samuel Crow from the Natami team is currently working on a LLVM port to AROS.
|
This is irrelevant as it touches x86 backend. S P wrote:
| The fastest c-compiler for amiga is VBCC. It generates code in average 2 times faster code than gcc 2.9 and sasc.
|
Do you have any hard proof for that? Real-life / synthetic benchmarks, code output comparisons? I have quite a contrary experience with VBCC. André Jernung wrote:
| Here is a more recent decoder description:
|
I'm looking for microarchitecture description. The link you sent me is actually Googleable so I saw it before ;-)
| |
Rune Stensland Norway
| | (MX-Board Owner) Posts 871 01 May 2011 20:48
| Krystian Bacławski wrote:
| S P wrote:
| Samuel Crow from the Natami team is currently working on a LLVM port to AROS. |
This is irrelevant as it touches x86 backend. |
Then we need to make a new N050 backend. Do you have any hard proof for that? Real-life / synthetic benchmarks, code output comparisons? |
EXTERNAL LINK I have quite a contrary experience with VBCC. |
Me too. Tried it with a recursive qsort. My asm implementation was 2.4 times faster. with -cpu=060 -speed You can find my benchmark in the sysinfo speed thread. (With source) CLICK HERE
| |
Matt Hey USA
| | Posts 737 02 May 2011 02:09
| Krystian Baclawski wrote:
| I agree. Complex instructions similar to Intel SSE/AVX or IBM Altivec are important to implement in long-term. But simple scalar instructions? I doubt so. Usually good C compiler is much better at utilizing simple instructions than any human being. It does all the dirty work related to register coloring, instruction scheduling, handling complex pipelines and superscalar features. |
SIMD is not the only way to achieve parallelism. In many cases, SIMD is unable to be used and it is one of the most costly in terms of resources to implement. Many of the units on the 68070 run in parallel and Superscalarity will add parallelism "cheaper" and in a more commonly used way. This will provide a speed up for most existing 68k and ColdFire code while the SIMD would need new code to take advantage. SIMD may eventually be added in some form but the softcore developers should focus on the longevity of the existing API which isn't as outdated as some might think. As far as the argument that assembler shouldn't be needed any more with efficient compilers, that's kind of like the search for the Holy Grail. Even on x86 with the most mature compilers ever and more support money than the GDP of some small countries, there is room for assembler optimization. Krystian Baclawski wrote:
| Seriously, do you want Amiga developers to remain in '70/'80 mentality model? Assembly was used extensively on Amiga because for a long time there was no good C compiler and ABI itself does not integrate very well with C. |
I agree that 68k compilers need to be improved so assembler is used less. However, assembler inlining still has advantages as it increases speed and reduces bloat even today. Also, a programmer that looks at and understands the code generated is ahead of the programmer that blindly generates bloat with C inlining optimization. I look at a lot of disassembled code as I update a 68k disassembler. Krystian Baclawski wrote:
| Being aware of resource usage is always desired ;-) I doubt the number of additional logic gates is small in this case. Still there's no compiler nor software which can utilize these instructions, so why bother? Because it's easy and fun? |
As mentioned, several compilers will be able to use the ColdFire instructions with minor changes (turning them on). Some of the changes will be done at the assembler level where new optimizations and encodings will be used transparently. Only a recompile or reassemble would be needed. For the very few instructions that do not fit the above 2 categories, lack of current compiler support does not justify ignoring good additions. If that were true, we would ignore SIMD support forever because 68k/CF has no API for it. Krystian Baclawski wrote:
| I agree. Motorola dumped m68k prematurely and room for exploration is still there although professionals think different. Allow me to quote from "Modern Design Processor: Fundamentals of Superscalar Processors" book: |
The 68060 was one of the best processors of it's time and killed for marketing reasons. An improved 68k makes much more sense than RISC processors with 16 bit variable length instructions added like the ARM with thumb(2). CISC instructions with RISC core was the right direction for the lower 75% of the CPU market from the beginning. The N68k does leverage the existing code base for 68k and CF. This is not true for variable length RISC. The N68k changes made so far are conservative and mostly attempt to modernize the API while making compiler support easier. Krystian Baclawski wrote:
| This makes me even more concerned about N68k. I think that NatAmi project is missing a good compiler to help exploring N68k enhancement. If I understand you correctly evaluation was based on 2.95.3 compiler output. GCC has intermediate code representation called RTL which is mapped onto target's ISA. How can you know that they're no more efficient mappings (RTL->ISA) if you didn't provision compiler with additional transformation rules? |
Actually, GCC 4.x (cross compiler with 68k target) uses the bit field instructions more than 2.95.3 which used them more intelligently and sparingly. GCC 4.x fails to properly calculate the performance cost of using them (vs multiple other instructions) and instead seams to use them any time possible. Before examining this, the first notion was to trap the bit field instructions which would have been very slow. After examining how much they are used by existing compilers, looking at how useful they are and considering the cost in hardware, it looks like very fast bit field instructions will be an awesome addition to the N68k. Most of the bit field instructions will be 1 cycle so there will be no possibility for faster instruction combinations. Krystian Baclawski wrote:
| The road to hell is paved with good intentions ;-) I'd only post my idea if I was convinced I had a solid proof it brings benefits. |
To hell the well traveled "easy" path may lead. Only Micky$oft and the dark side you will find. Overcome your worst fears you must. Mistakes we all make 8). Krystian Baclawski wrote:
| I'm not a hardware guy neither. If you're interested in branch predictors I suggest reading chapter 5.1 and 9 from "Modern Processor Design" book. |
I have read some very good information on branch prediction but lack of practical experience and hardware leaves me with just enough knowledge to get me in trouble ;). Krystian Baclawski wrote:
| I think different but I'll not argue with you about that. For me it's obvious that if NatAmi is going to be successful then at some point N68k will have to provision for 64-bit address space. Eventually VAX was limited to 32-bit address space which effectively killed that architecture. |
I think 64 bit was another fad in computers. It does have it's place in high end processors but most processors don't need it, especially a CPU with compact code and an efficient OS like the Amiga. Many of the CPUs for modern electronic devices and low end computers have reversed the trend and are going back or staying with 32 bit. Krystian Baclawski wrote:
| Knowing maturity of GCC optimization techniques, VBCC is not a real competitor. Correct me if I'm wrong. |
VBCC does some very sophisticated optimizations and takes a very different approach from GCC. Some work well and some do not work as well as planned yet. Vasm takes care of the peephole optimizations which works quite well. Vasm 68k is a mature optimizing assembler with many enhancements in the last couple of years. VBCC has progressed more slowly. I would suggest you take a look at the optimizations before making judgment. The documentation is easier to read than GCC docs although not as complete... http://mail.pb-owl.de/~frank/vbcc/docs/vbcc.pdf Many of GCC's optimizations don't work very well on the 68k/Amiga either. The current maintainers don't make much of an effort to improve 68k support as they see the CPU as dying. We have the chicken and the egg problem (no new hardware = no compiler support). Getting new hardware out like the Natami could help change their mind. Krystian Baclawski wrote:
| Ouch. I think you messed everything up. GCC + binutils still have support for m68k (but no AmigaOS ABI or link file format) - no assembler needed there. If there were m68k backend for LLVM it would need no assembler neither. |
Hmm. Several Amiga developers use GCC 4.x. Compiled assembler files (.asm) could be output, assembled and linked with a N68k aware assembler with some gain. As far as LLVM, I have no experience but I do remember Samuel talking about using vasm for some purpose. Maybe he will chime in. Perhaps you could help with one of the compiler projects. I would love to have newer versions of any of them for the Amiga. I have helped with Vasm by the way. It's more geared toward my skills. Compilers are very sophisticated in comparison.
| |
Samuel D Crow USA
| | (Natami Team) Posts 1295 02 May 2011 03:26
| AROS 68k is currently compiled with GCC 4.4 or so. It's getting pretty inflexible though. I might be able to convince the AROS 68k crew to help me port LLVM to 68k though. Jason McMullan, in particular, would have us switch over to LLVM for 68k. LLVM may someday have a binutils substitute but currently most platforms use binutils as their backend for the host OS. I thought about using VAsm and VLink for an early version of LLVM so that we wouldn't need to add our own peephole optimizer to the backend right away. If you've got any better ideas, let me know and I can relay them to the appropriate mailing list. Currently I've got my hands kind of full with the x86 version of LLVM. I'm hoping somebody will help me with that too, so I can split the bounty.
| |
Krystian Bacławski Poland
| | Posts 7 02 May 2011 03:31
| S P wrote:
| Then we need to make a new N050 backend. |
Unfortunately this is more complicated that it may appear. Firstly one need to implement m68k backend, then sub-backend for N050 / N070. Secondly one should convince LLVM community to accept new backend into mainstream, which may be even harder than first step. Ugh... be reasonable ;-) This "benchmark" compares performance of code compiled for PPC and therefore is irrelevant :P Just to add - the benchmark compares vbcc from 2005 and gcc from 1999 (to be clear 2.95.x is from 1999; 2.95.3 was released in 2001, but it was a bugfix release and hadn't added any new features). CPU mentioned on the page was released somewhere between 1999 and 2000, so 2.95.x probably at the time was not so good at PowerPC 740/750 targeted optimizations. And obviously, PovRay is mostly about benchmarking FPU and shouldn't be used as standalone benchmark.
| |
Samuel D Crow USA
| | (Natami Team) Posts 1295 02 May 2011 03:34
| Patches are welcomed at the LLVM repository. Ours wouldn't be the first softcore to have their backend hosted in the LLVM repository either. The trouble I have with writing backends for LLVM is that they all use the TableGen utility to save coding time but there's hardly any documentation for that utility.
| |
Krystian Bacławski Poland
| | Posts 7 02 May 2011 03:50
| Samuel D Crow wrote:
| Patches are welcomed at the LLVM repository. Ours wouldn't be the first softcore to have their backend hosted in the LLVM repository either.
|
Haven't worked with them, so I don't know how open they're. If you say they're open indeed then it's a good news. Samuel D Crow wrote:
| The trouble I have with writing backends for LLVM is that they all use the TableGen utility to save coding time but there's hardly any documentation for that utility.
|
As far as I understand the role of TableGen - it covers three task: translating IR -> Asm, Asm -> Bin, Bin -> Asm. I know only of one document describing TableGen on LLVM page. But you know - there's always LLVM mailing list with bunch of friendly people readily answering questions, and a few already written backends you can look at. I wish I had time to get my hands on it -_-
| |
Samuel D Crow USA
| | (Natami Team) Posts 1295 02 May 2011 04:00
| The most similar supported processor to the 68k is the i386. SInce the Intels use the same backend code for all x86 and AMD64 with all subtargets included. That makes the code nearly undecipherable for learning TableGen, IMHO.
| |
Team Chaos Leader USA
| | (Moderator) Posts 2094 02 May 2011 12:34
| Krystian Bacławski wrote:
| S P wrote:
| The fastest c-compiler for amiga is VBCC. It generates code in average 2 times faster code than gcc 2.9 and sasc. |
Do you have any hard proof for that? Real-life / synthetic benchmarks, code output comparisons? I have quite a contrary experience with VBCC.
|
He meant compile-time, not execution time. At least that is what I thought he meant.The way he wrote that can't be right because a properly configured SASC is 2x as fast (or more) as gcc so that would mean VBCC would have to be 4x faster than gcc. I have never raced them against each other. But I have my doubts that vbcc is 2x as fast as SASC when using precompiled symbol tables for the OS (which is how you are supposed to do it). Some ppl do not use symbol tables in SASC. This makes SASC compilation 2x slower than it needs to be. I hope vbcc really is 2x as fast. And since it is written by a proper megac0der it has the potential to reach that.
| |
Matt Hey USA
| | Posts 737 02 May 2011 13:34
| Team Chaos Leader wrote:
| S P wrote:
| The fastest c-compiler for amiga is VBCC. It generates code in average 2 times faster code than gcc 2.9 and sasc. |
He meant compile-time, not execution time. At least that is what I thought he meant.
|
The English makes it sound like S P meant compile time but the comparison and reality points to execute time. VBCC is very slow at compiling on the 68k. The execution speed of programs compiled with VBCC on the other hand is better than most Amiga 68k compilers. Team Chaos Leader wrote:
| I hope vbcc really is 2x as fast. And since it is written by a proper megac0der it has the potential to reach that.
|
Frank is great but but he didn't write VBCC. He only wrote vasm and even there multiplatform multi-endian support takes it's tole on compiler speed. There is room for a lot of improvement again. SAS/C still reigns supreme in speed of compiling and will probably be for a while. I doubt you were ready to throw it away ;).
| |
Krystian Bacławski Poland
| | Posts 7 04 May 2011 03:10
| Matt Hey wrote:
| | SIMD is not the only way to achieve parallelism. In many cases, SIMD is unable to be used and it is one of the most costly in terms of resources to implement. |
Yes, SIMD is not the only way, but it's relatively easy to add and can boost numeric application. Moreover it's been proven by another platforms to be useful.
Matt Hey wrote:
| | Many of the units on the 68070 run in parallel and Superscalarity will add parallelism "cheaper" and in a more commonly used way. |
I doubt it. After studying "Modern Processor Design" book I can only say that superscalarity is not a cheaper way at all. Think of handling data dependencies between instructions, multiple issue, reordering / completion buffers and keeping consistent state of whole machine in case of an interrupt. Complexity of superscalar processor makes my brain melt ;) Intel has been working on superscalarity since Pentium processor and they still gain improvements with each design. Good superscalar design is humongously complex and is certainly not cheap (in terms of logic gates and development time)!
Matt Hey wrote:
| | As far as the argument that assembler shouldn't be needed any more with efficient compilers, that's kind of like the search for the Holy Grail. |
Ideally, 99% of developers should not need it. Quoting Donald Knuth: "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil".
Matt Hey wrote:
| | Even on x86 with the most mature compilers ever and more support money than the GDP of some small countries, there is room for assembler optimization. |
MAO project (http://code.google.com/p/mao) proved not be a real win. Talked to a few experienced guys about that and finally I got convinced that it's better to look at high-level optimizations rather than low-level. This is essentially the same lesson taken as with algorithms and data structures - would you go for reducing time complexity from O(n^2) to O(n) or reducing constant in O(n^2)? Assembly is still needed as such, mostly in low-level programming - operating systems, etc. But having a good C / C++ compiler for a platform is CRUCIAL.
Matt Hey wrote:
| | *snip* I look at a lot of disassembled code as I update a 68k disassembler. |
Every serious programmer should know assembly. I look into GCC generated assembly a lot, just to verify compiler is not doing stupid things.
Matt Hey wrote:
| | *snip* For the very few instructions that do not fit the above 2 categories, lack of current compiler support does not justify ignoring good additions. If that were true, we would ignore SIMD support forever because 68k/CF has no API for it. |
We can add as many new instructions to gcc backend as we want. The question is whether a compiler can make use of those new instructions effectively. Is it worth adding ~100 new instructions if they constitute less than 1% of generated code on average? Look at CHK, CHK2, CLR, Scc, ABCD, NBCD, SBCD, PACK, UNPK instructions, and more advanced addressing modes (chapter 2.2.{9,10,14,15} from 68k programmer's reference manual).
Matt Hey wrote:
| | The 68060 was one of the best processors of it's time and killed for marketing reasons. |
Though FPU was not pipelined and therefore much less efficient than in Intel Pentium.
Matt Hey wrote:
| | The N68k does leverage the existing code base for 68k and CF. This is not true for variable length RISC. The N68k changes made so far are conservative and mostly attempt to modernize the API while making compiler support easier. |
Sounds fair, if you don't try to add anything more than is present in 68k/CF ISA. BTW. Please be consistent: API != ABI != ISA ;-)
Matt Hey wrote:
| | Actually, GCC 4.x (cross compiler with 68k target) uses the bit field instructions more than 2.95.3 which used them more intelligently and sparingly. *snip* After examining how much they are used by existing compilers, looking at how useful they are and considering the cost in hardware, it looks like very fast bit field instructions will be an awesome addition to the N68k. *snip* |
Your way of thinking in this case is exactly what I'm concerned about. If you're trying to make architectural decisions being biased by compiler deficiencies, then please stop! It's easier to learn GCC's RTL and fix a bug in compiler than design functional unit in CPU! Putting it in simpler words: To redesign CPU because compiler sucks is insane! But anyway I consider 1-cycle BF* instructions to be nice :>
Matt Hey wrote:
| | To hell the well traveled "easy" path may lead. Only Micky$oft and the dark side you will find. Overcome your worst fears you must. Mistakes we all make 8). |
Two (or more) issues are bugging me. Will post them in another topic.
Matt Hey wrote:
| | VBCC does some very sophisticated optimizations and takes a very different approach from GCC. *snip* I would suggest you take a look at the optimizations before making judgment. The documentation is easier to read than GCC docs although not as complete... |
Did the homework, still think the same. I won't argue. The only way to verify this is to run a benchmark suite on code produced by both compilers.
Matt Hey wrote:
| | Many of GCC's optimizations don't work very well on the 68k/Amiga either. The current maintainers don't make much of an effort to improve 68k support as they see the CPU as dying. *snip* |
True. But I can always try to improve m68k backend in GCC.
Matt Hey wrote:
| | Several Amiga developers use GCC 4.x. Compiled assembler files (.asm) could be output, assembled and linked with a N68k aware assembler with some gain. |
Interesting. Because you still need support in m68k gcc backend for small-code, small-data code generation, and a few directives like __saveds.
Matt Hey wrote:
| | Perhaps you could help with one of the compiler projects. I would love to have newer versions of any of them for the Amiga. |
Certainly I could, but I don't want to commit prematurely.
| |
Wojtek P Poland
| | Posts 1597 04 May 2011 19:04
| Krystian Bacławski wrote:
| Hi, I can see a lot of people on this forum proposing ISA extensions for N68k. While having cool new instructions is great, it raises my concern whether any compiler is able to produce code utilizing these instructions. After all why spending any effort to implement something that only skilled assembly coder can make us of?
|
Even without ANY new instructions current compilers WILL be inefficient. Sometimes it was more desirable to use more instructions on 68040/60 and actually execute code faster. It was also often desirable to unroll some loops somehow. With N68k the rule of less instructions=faster program will work 99.9% of time, and loop unrolling is not needed too IMHO with "zero cycle" branching and looping.I will not expect anything good happening to GCC soon. Not only on 68k, but by general. I checked some time ago how code produced by various old and new versions of gcc looks for x86 using standard -O2 optimization. ALWAYS older gcc produced tighter code, while sometimes new gcc produced faster one, but more often - a bit slower. And under real load just being smaller means performance gain because of less DRAM stalls. caches are not infinite. gcc is acceptable compiler for x86, acceptable for few other platforms, and quite bad for other. And never really good. anyway gcc is not that bad using -Os optimization (optimize for size). It do produces much smaller output which is not really much slower. For N68k it may be best usage+time critical parts in assembly. No idea how LLVM stuff performs.
| |
Wojtek P Poland
| | Posts 1597 04 May 2011 19:05
| André Jernung wrote:
| IIRC the VBCC maintainers offered to add N050 support.
|
If they will it will be really great gift!
| |
Wojtek P Poland
| | Posts 1597 04 May 2011 19:09
| Krystian Bacławski wrote:
| I agree. Complex instructions similar to Intel SSE/AVX or IBM Altivec are important to implement in long-term. But simple scalar instructions? I doubt so. Usually good C compiler is much better at
|
Both humans and compilers deal better with simple instruction set :)
| |
|
|
|
|