Home   News   Concept   AMIGA-Compatible   Hardware   Forum   Questions+Answers   Pictures   Contact & Team

Welcome to the Natami / Amiga Forum

This forum is for AMIGA fans interested in the new NATAMI platform.
Please read the forum usage manual.



All TopicsNewsQAFeaturesTalkTEAMLogin to post    Create account
Welcome to the Natami lounge.
Meet new AMIGA friends here and enjoy having a friendly chit chat.

Alignment of Functions
Jakob Eriksson
Sweden
(Moderator)
Posts 1097
04 Dec 2011 22:58


Hello all!

I am trying to port a program, "miniPicoLisp" to Amiga m68k.

But it seems all symbols and functions need to be aligned, the code contains :

if (addr & 3) {
    error();
}

where "addr" is the address of a function

But no matter what options I give GCC, the addresses aligns to something on a 2 bit offset or so. It this because of how the HUNK loader works?


Samuel D Crow
USA
(Natami Team)
Posts 1295
05 Dec 2011 00:55


The alignment for a 68k processor or any subsequent model in the series should be a multiple of 2 bytes.  The code should read as follows:

if (addr & 1) {
  error();
}


Matt Hey
USA

Posts 737
05 Dec 2011 01:17


Jakob Eriksson wrote:

  But no matter what options I give GCC, the addresses aligns to something on a 2 bit offset or so. It this because of how the HUNK loader works?

It doesn't have anything to do with the hunk loader. It's GCC that decides this alignment. Normal alignment for code (and functions) on the 68k is word (16 bit) alignment as that is the minimum instruction size. Branching to an odd address will cause a guru. The 68020+ can benefit in speed from long word (32 bit) alignment of functions and branch targets and may be the default if you compile for the 68020+. It should be possible to force this long word (32 bit) alignment of functions with a command line option like...

-falign-functions=4

I don't know what the Lisp concept of a symbol is. Maybe it's a type of variable or a name associated with a variable? I believe variables in GCC are aligned depending on the size unless they are in a structure. A 64 bit variable is aligned to 64 bits (8 bytes), a 32 bit variable is aligned to 32 bits (4 bytes), a 16 bit variable is aligned to 16 bits (2 bytes) and an 8 bit variable is always aligned as is. This is pretty standard for any 32 bit or 64 bit computer. The 68000 being 16 bit does not benefit from 32 bit aligned variables. This is also why some Amiga structures have 32 bit variables that are not aligned to 32 bits. It may be necessary to force 32 bit alignment of 32 bit variables for the 68000 with the GCC command line option...

-malign-int

I don't recommend it though as it may affect structures in an incompatible way for the Amiga. Compiling for the 68020+ should give 4 byte alignment of int and long types variable. Using 32 bit int for short variables will give the long word alignment and is often more efficient on a 32 bit processor. Characters will still be accessed as a byte though. You might also consider making the
"if (addr & 3)" a "if (addr & 1)" for the 68k. It shouldn't be a problem for an executable but may be for a cross platform save or data file.


Team Chaos Leader
USA
(Moderator)
Posts 2094
05 Dec 2011 03:54


Compiler Guru Matt Hey wrote:

The 68020+ can benefit in speed from long word (32 bit) alignment of functions and branch targets and may be the default if you compile for the 68020+. It should be possible to force this long word (32 bit) alignment of functions with a command line option like...

Does anyone know of any sneaky way that I can force SASC to always align functions on 32bit boundaries?

Does 060 get speedboost from calling functions that are 32bit aligned?

Does WinUAE JIT get speedboost from calling functions that are 32bit aligned?



Samuel D Crow
USA
(Natami Team)
Posts 1295
05 Dec 2011 04:05


@TCL

The speedboost comes from cache row alignment.  Remember that far call trick I had you use for the image rotator on Total Chaos?  That should do it because the alignment of a hunk start should be allocated by AllocMem() which aligns at sizeof(long) as a minimum.  Oftentimes it aligns as sizeof(long long) on OS 3.

Team Chaos Leader
USA
(Moderator)
Posts 2094
05 Dec 2011 04:40


Samuel D Crow wrote:

  The speedboost comes from cache row alignment.  Remember that far call trick I had you use for the image rotator on Total Chaos?


Uhmmm..... *scratching head* .... nope.  I don't remember :(


Jakob Eriksson
Sweden
(Moderator)
Posts 1097
05 Dec 2011 08:00


Matt, TCL, Sam:
 
  What a lisp symbol is depends, but on this lisp, all data is represented in 32 bit words. Those must be aligned.There are also functions built in to the language, coded in C, and the addresses to those need to be 32 bit aligned too. So (addr & 3) must be false or the interpreter will not work.
 
  I will try -falign-functions=4 again, but I tried all sorts of values and the address of the function did not change. (I also need malloced data to be 32 bit aligned, but I think it is already and if it was not I could fix that.)
  I also tried -m68020 and it made no difference. But maybe that option only optimizes the scheduler for 68020?  Are there more relevant options to GCC?
 
 
  TCL: do you think your beloved SAS/C could align like this?
 
  Sam: did you have some trick to align functions?  Maybe in asm?
 
  I guess I could recode the functions in asm, put a NOP in front, and the add +2 on the pointer if it's not aligned...
 
  Thanks for your great response. I will try some more maybe tonight or this weekend.
 
Edit: saw this EXTERNAL LINK it mentions .balign  interesting.
 

Phil "meynaf" G.
France
(Natami Team)
Posts 393
05 Dec 2011 08:35


I have made some tests about function entry point alignment on 030, some time ago.
Result was : no effect on speed.

I remember having read (disassembled) some code. Compiler did "nicely" align to 32 bit boundaries but the linker did not follow. Result : 100% misaligned functions (and the programmer didn't notice) !

So if i were you i'd just leave the things the way they are. Function aligning will only make you waste bytes, or at best the gain is small, and not worth the trouble.


Jakob Eriksson
Sweden
(Moderator)
Posts 1097
05 Dec 2011 09:16


But this is not about speed, it's about working at all. The interpreter works only when the 4 byte words it interprets are aligned. Also, it wastes a few bytes, but only for the "built in" functions of the language, i.e. a handful, maybe 20 functions. (The rest of them are implemented in lisp.)
But you are saying that perhaps the functions are aligned in the object files but are later misaligned by the linker?

Matt Hey
USA

Posts 737
05 Dec 2011 14:42


Team Chaos Leader wrote:

  Does anyone know of any sneaky way that I can force SASC to always align functions on 32bit boundaries?

I don't know SASC well enough to say. GCC has the global setting for aligning functions and an attribute that can be used to align data (variables and structures) that is specified with each definition. The attribute does NOT work on GCC for the Amiga up to 3.4.0...

__attribute__ ((aligned(alignment)));

C should have had a standard way of forcing alignment.

Team Chaos Leader wrote:

  Does 060 get speedboost from calling functions that are 32bit aligned?

The 060 and 040 handle misaligned data as well as possible and better than the 020 and 030. 32 bit aligned or even better, cache aligned (16 byte) functions should offer a small speed boost. It's small enough that I couldn't measure it though ;).

Team Chaos Leader wrote:

  Does WinUAE JIT get speedboost from calling functions that are 32bit aligned?

The speed boost should be larger on WinUAE for aligning data than on the 060. Much depends on the actual implementation and CPU though.

Jakob Eriksson wrote:

  I also tried -m68020 and it made no difference. But maybe that option only optimizes the scheduler for 68020?  Are there more relevant options to GCC?

The CPU option should affect much more than the scheduler. It may not affect the alignment of the data as the 68020+ is quite tolerant of misaligned data and the speed advantage of alignment may be deemed not to be worth the increase in code size. I have seen the executables from some compilers that pad functions and even byte sized text to 32 bits (4 byte alignment).
 
Jakob Eriksson wrote:

  Sam: did you have some trick to align functions?  Maybe in asm?

Assembler, now we're talking. That's consistent unlike that portable C stuff :P. Use CNOP 0,4 before functions or data to get 32 bit (4 byte) alignment :).

Phil "meynaf" G. wrote:

I have made some tests about function entry point alignment on 030, some time ago.
  Result was : no effect on speed.

The Cape 68k manual by Wesley Howe mentions a measured 5% increase in speed for 1-2% increase in program size on a 68020. It doesn't mention the tests used but it sounds like it was code alignment only. I was unable to measure any speed gain on the 68060 by aligning code and data but there is less penalty when the data is in the cache. It is possible that the data would come from 2 cache lines and require 2 cache lines (and 2 reads) instead of 1. It's best to align speed critical data (and maybe code) to it's size if practical. Too much padding can result in the cache being full of useless pad data and of course larger code is slightly slower on the 68k also. Each 68k CPU varies somewhat too.


Thomas Richter
Germany
(MX-Board Owner)
Posts 1425
05 Dec 2011 17:02


Team Chaos Leader wrote:

  Does anyone know of any sneaky way that I can force SASC to always align functions on 32bit boundaries?

You can't.

Team Chaos Leader wrote:

  Does 060 get speedboost from calling functions that are 32bit aligned?

Not much. If any speed impact is measurable, it is because if the function is aligned to a cache line boundary, the CPU doesn't have to fetch data (or rather code) it doesn't need in first place, i.e. data part of the same cache line but not the same function.

Given the Amiga memory allocator, the best you can ensure is four-byte alignment anyhow as LoadSeg() uses a regular AllocMem().

Greetings from Dana Point, CA.

Thomas


Team Chaos Leader
USA
(Moderator)
Posts 2094
05 Dec 2011 22:39


I spent hours reading doing an RTFM.  I only have v6 instruction manuals.  I wandered around the net and could only find 1 fake v6.5 manual (its the same as v6.0)

SASC has the __aligned keyword which works on variables and structs.  It aligns them to a 32-bit boundary.

But the v6.0 manual indicates that it mysteriously does not align functions. (!)  When __aligned is applied to a function it hogs a register for the life of the function and it aligns the stack.  When writing normal C code the stack is already aligned.  So this keyword should always be avoided with functions unless your SASC code is being called from another language and you are an expert and know what you are getting into.



Team Chaos Leader
USA
(Moderator)
Posts 2094
05 Dec 2011 22:45


Lattice C v5 has compiler options to force all externs to be 32-bit aligned.

Ok but I donno what the definition of an "extern" is.  Does that cover functions?

There is another option in v5 that forces all "objects" to be 32-bit aligned.  Great.  What is an "object"?

It is possible that these compiler options still work in v6.58. 



Thomas Richter
Germany
(MX-Board Owner)
Posts 1425
06 Dec 2011 00:59


Team Chaos Leader wrote:

  SASC has the __aligned keyword which works on variables and structs.  It aligns them to a 32-bit boundary.
 
  But the v6.0 manual indicates that it mysteriously does not align functions. (!)  When __aligned is applied to a function it hogs a register for the life of the function and it aligns the stack.  When writing normal C code the stack is already aligned.  So this keyword should always be avoided with functions unless your SASC code is being called from another language and you are an expert and know what you are getting into.
 

  No, this makes perfect sense, actually. The main reason why this feature is there is to allow BCPL objects on the stack, i.e. objects you want to allow pointed at by a BPTR. These must be aligned to 4-byte boundaries obviously. If you align the stack, you don't need to align the individual objects. As long as all objects are BCPL objects, of course.
 
 

Samuel D Crow
USA
(Natami Team)
Posts 1295
06 Dec 2011 02:16


Team Chaos Leader wrote:

Samuel D Crow wrote:

The speedboost comes from cache row alignment.  Remember that far call trick I had you use for the image rotator on Total Chaos?

Uhmmm..... *scratching head* .... nope.  I don't remember :(

On SAS/C when using the large code model, if you want a function to be cache row aligned, make the subroutine a far-call.  This makes it into its own hunk allocated separately by AllocMem().  AllocMem() aligns all allocations to sizeof(long) at the minimum.  In practice it allocates to a wider alignment than 32-bit on OS 3.  Usually it allocates the hunk to an 8-byte boundary making it perfectly cache-row aligned.

@TCL
Re:Total Chaos
You had optimized the image rotator down to the size of the '060 code cache but it didn't alway run as fast as you wanted it to.  I told you that to get maximum speed all the time, you needed to align the start of the main loop of the rotator to a cache row otherwise it wouldn't always have 100% cache hits.  That's when I told you about the far call trick.

Matt Hey
USA

Posts 737
06 Dec 2011 03:13


Samuel D Crow wrote:

  On SAS/C when using the large code model, if you want a function to be cache row aligned, make the subroutine a far-call.  This makes it into its own hunk allocated separately by AllocMem().  AllocMem() aligns all allocations to sizeof(long) at the minimum.  In practice it allocates to a wider alignment than 32-bit on OS 3.  Usually it allocates the hunk to an 8-byte boundary making it perfectly cache-row aligned.

This method may be slower *if* a jsr is needed to reach the code instead of the faster (implicitly PC relative) bsr. It's easy to do more harm than good with such methods.

I believe TCL uses TLSFMem which allocates memory that is 8 byte aligned. Other users may not have this alignment so it's not good to assume anything beyond 4 byte alignment.


Jakob Eriksson
Sweden
(Moderator)
Posts 1097
06 Dec 2011 07:23


Thank you all!  It worked with -malign-functions=4 :-)

Now it went into an infinite loop instead, but it runs. Something for this weekend perhaps. :)

Phil "meynaf" G.
France
(Natami Team)
Posts 393
08 Dec 2011 08:07


Jakob Eriksson wrote:

  But you are saying that perhaps the functions are aligned in the object files but are later misaligned by the linker?

Yes. Apparently startup code wasn't compiled with any alignment, and result was that anything linked after it, ended up with wrong alignment.


Thomas Richter
Germany
(MX-Board Owner)
Posts 1425
08 Dec 2011 15:47


Phil "meynaf" G. wrote:

Jakob Eriksson wrote:

  But you are saying that perhaps the functions are aligned in the object files but are later misaligned by the linker?
 

  Yes. Apparently startup code wasn't compiled with any alignment, and result was that anything linked after it, ended up with wrong alignment.
 

Amiga object files cannot ensure any alignment except alignment to long-word boundaries. Even if the startup code is a multiple of 16 bytes long, you do not know where LoadSeg() will put it, where AllocMem() takes memory from (all it ensures is 4 byte alignment), and whether the linker inserts ALVs to resolve long jumps.

The speed difference is usually minor, so it isn't worth bothering, and if the jump into a function does make a noticable difference, it is worth considering to give the compiler a hint to inline it in first place - that will typically make then more a difference than alignment alone.

Greetings from Orange County,

Thomas


posts 19