Home   News   Concept   AMIGA-Compatible   Hardware   Forum   Questions+Answers   Pictures   Contact & Team

Welcome to the Natami / Amiga Forum

This forum is for AMIGA fans interested in the new NATAMI platform.
Please read the forum usage manual.



All TopicsNewsQAFeaturesTalkTEAMLogin to post    Create account
Welcome to the Natami lounge.
Meet new AMIGA friends here and enjoy having a friendly chit chat.

OK Teamers, Could Someone Show Us the Progress?page  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 
Comp Arch

Posts 33
19 Mar 2012 17:13


Nixus Minimax wrote:

  All instructions using address modes of this type ([...]) need to read a pointer from memory and then use it to read the actual data. The second read will always depend on the result of the first.

Well, I guess it's me getting confused with the new syntax. What I meant was, when you write
  JSR d16(Ax)
you merely create a pointer to this location. There is no read involved here. Only when using an instruction like
  JSR [d16(Ax)]
the value found at address 'd16 + Ax' will be read and used as the jump target (as in 'CALL [esi + offset]' on the x86). So there is only one read, not two. This read has to be done anyway when fetching the following instruction 'JMP abs.l.' It's a combination of the mentioned two instructions and very useful as it is so often done in a program. IOW:
  JSR -offset(Ax)
  JMP abs.l
turns into
  JSR [-offset + 2(Ax)]
(+2 because you would have to skip the 'JMP' instruction for compatibility)
You would not gain any bytes here, since both instructions are 4 bytes long, however there *might* be an advantage because you skip one jump. I'm afraid, I have no idea whether JMP instructions on the 68050 cause any problems with the pipeline. They often do on other processors. If not so, then, of course, a special instruction like this would be redundant.

Nixus Minimax wrote:

The disadvantage is that you lock everything for the time between two read time slots. In case each read operation triggers a burst read, this can be a significant amount of time.

Regarding such addressing modes you are perfectly right, of course.


Nixus Minimax
Germany

Posts 273
19 Mar 2012 19:49



So basically you are only concerned about branch-type instructions, not moves?

Comp Arch

Posts 33
19 Mar 2012 20:45


Nixus Minimax wrote:

  So basically you are only concerned about branch-type instructions, not moves?

Mainly yes. There is just one addressing mode I miss occasionally which is
    MOVE.s  <ea>, d32(Ax) (or vice versa)
IOW a displacement of 32 bit instead of 16 bit (for bigger objects or accessing global data via a global pointer). Otherwise just like you I haven't found any usage for further addressing modes. In fact, when I designed the ISA for a virtual processor (just for fun), I reduced the standard addressing modes to just 4: reg, (reg), (reg)+, and offset(reg) (#imm ==> (pc)+ and -(reg) only with move and movem as a target). As mentioned before, I prefer writing OO code. Hence the addressing mode I use most of the time is simply offset(Rx). I don't use absolute addressing at all (don't see why), and code like
    EOR -(Ax), d8(pc, Dx)
is totally out of question. (You can see, I'm not in favour of a VAX design as Gunnar once proposed.)
However I always left JSR [off(Rx)] in as it is used so often in OOP.


Marcel Verdaasdonk
Netherlands

Posts 3979
19 Mar 2012 21:14


Comp Arch you have a good PIC programming habit.

Matt Hey
USA

Posts 735
20 Mar 2012 01:04


Rune Stensland wrote:

Matt Hey wrote:

  2) How many parallel processing units (i.e. 2xinteger, branch, fp)?
 

 
  We where discussing up to 4 integer units. But 2 is more realistic.

If it's easy enough to replicate the integer units, I think it would be interesting to see some test results that show the diminishing returns of additional integer units.


 

  3) How many average instructions per cycle and MHz in simulation or expected?
 

 
  The core is too unfinnished yet to publish any numbers, but the peaknumbers are excellent.

Alright, my expectation of excellent is 4-5 instructions/cycle peak and 2.5 average :D.


 

  4) What instructions are in the core (missing 68k, CF, custom)?
 

 
  The main core will support everything. The rest of the cores will be reduced. This is the same motorola did with the Mc68060. remember P1/P2? Some instructions will pipeline, others will not.

You mean pOEP and sOEP? Yes, I remember. I hope more common instructions are included in all the integer cores. The 68060 sOEP is missing some common instructions like swap, exg, bit and bitfield instructions, pea and muls/mulu. Many instructions are fine being pOEP only like outdated and supervisor instructions.

 
 

  6) What is needed to be able to insert the core into the current Natami's fpga?
 

 
  Gunnar is testing on the cycloneIV already. Not only in the simulator.
 

Is he testing on the CycloneIV in his Natami?

Rune Stensland wrote:

  If you search for Gunnars old posts on the forum you can see some of the ideas we discussed some years ago. Some of these features has been included, and some have been left out. 020++ adressing modes that are rarely used Should we support them?

Memory indirect preindexed and postindexed addressing modes are slow on all 68k processors including the 68060 (relatively) which is why they aren't used much. The superscaler 68060 is able to do them without trapping. Locking the memory bus from other instructions is probably not much different than movem instructions but not optimal in a superscaler environment. They can avoid change use stalls, save registers and offer better code density if they are at least as fast as the simpler instructions that replace them. If they cause more problems and complexity in a superscaler environment than they are worth, then trap them. Don't judge them based on how little they are used where they are slow (like the bitfield instructions). The timings that Gunnar gave in the old threads were promising if they can be used in all integer units. It's certainly alright to trap them for now to get the soft core on the road.


Comp Arch

Posts 33
21 Mar 2012 16:44


Sorry for posting again (Normally I'm just lurking...)
There is just one thing I'd like to add to my last post (a bit OT though). There might be a reason why the designers of the 68020 integrated more complex addressing modes. Think about passing a struct as a & parameter in C, or a RECORD as a VAR parameter in Oberon like
  PROCEDURE p(VAR r : t_record);
Accessing an element of this struct/record now in the form of
  r.element := value; (or r->element = value; in C)
will usually be translated into
  movea.l offset_r_address_on_stack(sp), a0
  move.s <value>, offset_element(a0)
Maybe the designers tried to squeeze these common instructions into one. (You often see byte codes for interpreters working like this.) Of course, this only applies for code that does not store the base address in an address register for further use. Thus for a statement sequence like
  r.element1 := value1;
  r.element2 := value2;
bad compilers might produce
  movea.l offset_r_address_on_stack(sp), a0
  move.s <value1>, offset_element1(a0)
  movea.l offset_r_address_on_stack(sp), a0  ; reloading!
  move.s <value2>, offset_element2(a0)
The indirect modes '[]' might have 'helped' bad compilers in the 80ies here since they still tend to produce short code even without further analysis of the register usage. However, in his essay 'Good Ideas, Through the Looking Glass' (2005) Niklaus Wirth comments:
'Several years after our Oberon compiler had been built and released, new, faster versions of the processor appeared. They went with the trend to implement frequent, simple instructions directly by hardware, and to let the complex ones be interpreted by an internal microcode. As a result, those language-oriented instructions became rather slow compared to the simple operations. So I decided to program a new version of the compiler which refrained from using the sophisticated instructions. The result was astonishing! The new code was considerably faster than the old one. It seems that the computer architect and we as compiler designers had "optimized" in the wrong place.'
Since in my special case I am the (dilettantish) author of the (primitive) compiler I use, I can rewrite the (bad) backend to whatever is needed. So, if you decide not to support these modes directly I will happily change the code generator accordingly. And talking about gcc and LLVM: there are a lot of excellent developers working on optimal register allocation, which makes the '[]' modes even more redundant. However I consider my warning as still valid since you never know what other people might come up with. (Think of the ABCD instruction: who would have thought that game programmers used it for counting high scores, and others for emulating the ADC instruction in 6502 emulators?)

Matt Hey wrote:

It's certainly alright to trap them for now to get the soft core on the road.

Yep, this is certainly of highest priority as always. :)


Nixus Minimax
Germany

Posts 273
21 Mar 2012 17:39


Your comments are very interesting, no need to apologize for posting!

I strongly believe that the 68020 indeed was an attempt to make instruction patterns commonly produced by compilers become individual processor instructions. The 020 was developed at the beginning of the 80s, a time in which even things like "string compare" would be turned into a processor instruction. To me it seems very likely that Motorola looked at what new instructions people building compilers would want. I believe that around the same time all the object-oriented stuff became big which probably meant much more pointer handling on the machine-level. The quote you cite is very interesting because it illustrates how RISC totally changed the direction of development within a few years. Eventually the hardware engineers stopped listening too attentively to the software guys... :)

RISC may well have been a reaction to the folly that was the 020 from a circuit designers point of view. Until RISC came around, progress in processors was often equaled to offering more processor instructions. BTW, with the stalling clock rates of recent years there seems to be a new trend to adding instructions which help certain code, e.g. encryption.


Rune Stensland
Norway
(MX-Board Owner)
Posts 871
21 Mar 2012 19:24


Matt Hey wrote:

  If it's easy enough to replicate the integer units, I think it would be interesting to see some test results that show the diminishing returns of additional integer units.

 
  To early to publish any results. But the older N050 results are still on the site if you search.
 
 

    4) What instructions are in the core (missing 68k, CF, custom)?
 

 
  CF instructions are left out. But they can be added. 020+ indirect adressing modes might be moved out to emulation/trapped.
   
 

  Is he testing on the CycloneIV in his Natami?
 

 
  I don't think so.
 
 
 

  Memory indirect preindexed and postindexed addressing modes are slow on all 68k processors including the 68060 (relatively) which is why they aren't used much. The superscaler 68060 is able to do them without trapping. Locking the memory bus from other instructions is probably not much different than movem instructions but not optimal in a superscaler environment. They can avoid change use stalls, save registers and offer better code density if they are at least as fast as the simpler instructions that replace them. If they cause more problems and complexity in a superscaler environment than they are worth, then trap them. Don't judge them based on how little they are used where they are slow (like the bitfield instructions). The timings that Gunnar gave in the old threads were promising if they can be used in all integer units. It's certainly alright to trap them for now to get the soft core on the road.
 

 
  Remember that the softcore can be upgraded with a flash. Instructions can be added later. Moving them to the illigal instruction interrupt is just a quick way to make it 100% compatibel.

Rune Stensland
Norway
(MX-Board Owner)
Posts 871
21 Mar 2012 19:27


More news about the chipset: Thomas has added the possibility to set the dpi of the mouse. This meens that more ps/2 mouses are working. He has also implemented the Akikoc2p in AHDL

Evil Igel
Germany

Posts 154
21 Mar 2012 19:41


Rune Stensland wrote:

More news about the chipset: Thomas has added the possibility to set the dpi of the mouse. This meens that more ps/2 mouses are working. He has also implemented the Akikoc2p in AHDL

Yeah, nice new features! :-)

Rune Stensland
Norway
(MX-Board Owner)
Posts 871
21 Mar 2012 20:18


Here is a new printout.
 
  Next step is to add statistics for superscalar performance and fusion possibilites. Then modify winua source code to dump binary/asm streams. Then all of you can help to gather statistics:))
 
 

 

 
  Mc680x0 Asm source analyzer made by Rune Stensland 2012
  Loading and parsing file:d:\amiga\workbench3.5\code\asm-pro\asmpro-src\asmpro.s
  Loading and parsing file:d:\amiga\workbench3.5\code\asm-pro\asmpro-src\disasm.data2.s
  Loading and parsing file:d:\amiga\workbench3.5\code\asm-pro\asmpro-src\include\replay\player6.1.s
 
  Statistics:
  Total sourcecode files:  3
  Total Instructions Count:  40061
  Total codelines Count:  62362
  Total register Count:  53333
 
  RegToReg:  2662 (6)%
  MemToReg:  4414 (11)%
  RegToMem:  2505 (6)%
  Branch/Lea #:  6877 (17)%
  One reg:    848 (2%)
  MemToMem:  1364 (3)%
  No reg:      1151 (2%)
 
  Registers:
 
  dx  18666
  #const  12062
  label  7418
  xxxx(ax) 5402
  ax  3336
  (ax)+  2430
  d0/d1/d2.. 986
  (ax)  883
  (sp)+  771
  -(sp)  708
  xx(ax,dx) 220
  -(ax)  155
  fpx  110
  xxxx(pc) 107
  (sp)  70
  fp0/fp1.. 8
  ([020++]) 1
 

 
 

Wawa Tk
Germany

Posts 581
21 Mar 2012 20:20


Rune Stensland wrote:

 
  To early to publish any results. But the older N050 results are still on the site if you search.
 

were there any? dont even know how to search this site. cant you provide a link?

 

  CF instructions are left out. But they can be added. 020+ indirect adressing modes might be moved out to emulation/trapped.

 


 
im fine with that so far, this would only introduce incompatibility to existing hardware.

Louis Dias
USA

Posts 217
21 Mar 2012 20:21


Rune Stensland wrote:

More news about the chipset: Thomas has added the possibility to set the dpi of the mouse. This meens that more ps/2 mouses are working. He has also implemented the Akikoc2p in AHDL

Good news!  However, the Akiko was more than just a C2P chip...wasn't it essentially a "northbridge" chip for the CDROM and AUX port as well?  Perhaps that aspect of it is redundant...?  That's why (I suggested) depending how it's handled, the "super" Akiko can perhaps add other "nice-to-have" functionality like DSP functions...

Rune Stensland
Norway
(MX-Board Owner)
Posts 871
21 Mar 2012 20:38


wawa tk wrote:

  were there any? dont even know how to search this site. cant you provide a link? 

 

I did a search on google and din't find it, so i think it hhas been published in the team secton only. Sorry about that

Rune Stensland
Norway
(MX-Board Owner)
Posts 871
21 Mar 2012 20:42


Maybe you find some info here: ( old thread)

CLICK HERE

Comp Arch

Posts 33
21 Mar 2012 20:47


Nixus Minimax wrote:

I strongly believe that the 68020 indeed was an attempt to make instruction patterns commonly produced by compilers become individual processor instructions.

Yes, you're right again. Anyone remember the CALLM or RTM instructions? Has anybody ever used them? Or at least know how they work? :)
Nixus Minimax wrote:

Eventually the hardware engineers stopped listening too attentively to the software guys... :)

Yes, but I've got the feeling these software guys all went to Intel then where the engineers just couldn't stop listening...
Nixus Minimax wrote:

BTW, with the stalling clock rates of recent years there seems to be a new trend to adding instructions which help certain code, e.g. encryption.

Unfortunately, yes. MMX, SSE, SSE2, SSE3, SSE4, AVX, AES, FMA and so on... you name it.  I just hope that the NatAmi team does not follow this path. I'd rather have a second 680x0 core (or more) than one with a highly-specialized instruction set, since a second core does speed up *all* programs *all* the time, whereas special instructions are only useful in some cases. Which brings back the question: what are the current plans of the team? Will there be some kind of (reduced) second core? I know there has been a lot of discussion going on, but what is the latest official opinion?
And another heretic question: when will the software developers be given information about the new hardware, e.g. how to detect the 68050, memory or the new SAGA chipset? How are the ports (USB, IDE, PCI, memory card) integrated into the system? The problem is I need to know this to fool around with it like writing drivers etc. And yes, I understand that lots of this stuff will change over the time, but I can change the code as well. It's just that I would like to start somehow...

André Jernung
Sweden
(MX-Board Owner)
Posts 988
21 Mar 2012 21:05


@Comp Arch

You can look at Thomas rudimentary docs here:
CLICK HERE

Samuel D Crow
USA
(Natami Team)
Posts 1295
21 Mar 2012 21:12


Re:Specialized instructions.

This may be a related idea to your last comment Comp Arch:

Intel, in order to add 3 operand encoding and predication has added them by adding addressing modes, prefixes and otherwise added bits to the instruction set.  This has the distinct disadvantage that software needs to be recompiled to take advantage of the new features.

The NatAmi processor core is planned to do some magic in the decoder called "opcode fusion".  This means that if a branch has an offset the same as the next instruction it will automatically be promoted from a branch to a predicated instruction fetch.  Also, if a two-instruction sequence moves from one register to another and then modifies it with a math function, the decoder will automatically convert it to the equivalent 3 operand opcode so that the original two instructions can be executed in one clock instead of two.

As you mentioned about doing a maid core, there has been much discussion on here about that.  At one time it was named "Robin" core but the plans of the custom instruction set have been dropped because it would take too much time to deal with two instruction sets.

I hope this satisfies at least some of the curiosity.  ;-)

Comp Arch

Posts 33
21 Mar 2012 21:57


André Jernung wrote:

You can look at Thomas rudimentary docs here:

Thanks a lot for the hint. Now I've got some basic information at least. Let's see how far I can get from this... :)

Samuel D Crow wrote:

The NatAmi processor core is planned to do some magic in the decoder called "opcode fusion".  This means that if a branch has an offset the same as the next instruction it will automatically be promoted from a branch to a predicated instruction fetch.  Also, if a two-instruction sequence moves from one register to another and then modifies it with a math function, the decoder will automatically convert it to the equivalent 3 operand opcode so that the original two instructions can be executed in one clock instead of two.

Also thanks for the fast response! Yes, indeed, this "opcode fusion" is something the NatAmi team can really be proud of. (As a totally uninteresting and irrelevant aside note: the somewhere above mentioned Amiga programmer and author of the Apple2000 emulator, Kevin Kralian, used such a method to speed up execution of the 6502 opcodes. :) )  Is there already a list of all opcodes that can be fused? Or is there still room left for additions? Could I ask you, please, whether the instruction 'MOVEQ' will be fused with other instructions as well? I'm especially thinking of the combination
      MOVEQ  #0, d0
      RTS
as I use this all the time at the end of a function to signal "no error occurred". Would be nice if that was fused together.
Samuel D Crow wrote:

  As you mentioned about doing a maid core, there has been much discussion on here about that.  At one time it was named "Robin" core but the plans of the custom instruction set have been dropped because it would take too much time to deal with two instruction sets.

I agree. Get the 68050 running first. And then a second 680x0 would be the right way to go IMHO. Not only could it be used as a co-processor but also as another core to run a different program in parallel, despite the fact that the current AROS (or whatever OS) does not support second cores yet. But who knows, this may change in the future.
Samuel D Crow wrote:

  I hope this satisfies at least some of the curiosity.  ;-)

Well, for a brief moment... Remember Oliver Twist? 'Please, sir, I want some more...' :)


Samuel D Crow
USA
(Natami Team)
Posts 1295
21 Mar 2012 22:48


comp arch wrote:

  Also thanks for the fast response! Yes, indeed, this "opcode fusion" is something the NatAmi team can really be proud of. (As a totally uninteresting and irrelevant aside note: the somewhere above mentioned Amiga programmer and author of the Apple2000 emulator, Kevin Kralian, used such a method to speed up execution of the 6502 opcodes. :) )  Is there already a list of all opcodes that can be fused? Or is there still room left for additions? Could I ask you, please, whether the instruction 'MOVEQ' will be fused with other instructions as well? I'm especially thinking of the combination
        MOVEQ  #0, d0
        RTS
  as I use this all the time at the end of a function to signal "no error occurred". Would be nice if that was fused together.

As nice as it would be, it would not be using 3 operand encoding nor predication so it cannot be fused.  The moveq operation writes to the D0 register while the RTS instruction writes to the A7 (stack pointer) and PC registers.  You cannot write to two registers at the same time.  The only reason that a move d0, d3;  add xxx, d3 can be fused is that the destination register is the same for both opcodes.  It's not a question of finding commonly used sequences of opcodes, it's a question of mapping existing ALU functionality into the existing instruction set so it can be used without recompiling.

posts 370page  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19