Home   News   Concept   AMIGA-Compatible   Hardware   Forum   Questions+Answers   Pictures   Contact & Team

Welcome to the Natami / Amiga Forum

This forum is for AMIGA fans interested in the new NATAMI platform.
Please read the forum usage manual.



All TopicsNewsQAFeaturesTalkTEAMLogin to post    Create account
Welcome to the Natami lounge.
Meet new AMIGA friends here and enjoy having a friendly chit chat.

Amiga's Bottleneckspage  1 2 3 4 
Gunnar von Boehn
Germany
(Moderator)
Posts 5775
17 Oct 2010 18:37


Thomas Richter wrote:

  SAD.W (a0),(a1),d0,d1
 
  compute the differences of the 16-bit words pointed to by a0 and a1, of a block of d0 entries long, add up the absolute values of the differences to d1. Probably a "step" instruction (increment, take difference, add up) would be sufficient.
 
  The native 68K instruction sequence is longer and requires an additional register.
 
  move.w (a0)+,d1
  sub.w (a1)+,d1
  bcc.s .nocarry
  neg.w d1
  .nocarry:
  ext.l d1
  add.l d1,d2
 
  Similarly, the same with multiplication:
 
  move.w (a0)+,d1
  sub.w (a1)+,d1
  muls.w d1,d1
  add.l d1,d2
 
  Scalar product:
 
  move.w (a0)+,d1
  muls.w (a1)+,d1
  add.l d1,d2
 


hmm, let me think about this a little.

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
17 Oct 2010 18:43


Thomas Richter wrote:

  One of the often used schemes in encoding is "bit-packing" of bits into a byte-array, i.e. you hold a bit-pointer, have an array of ULONG or UBYTE and want to insert bits, then incrementing the pointer. BFINS does the insertion quite fine, but it does not increment a bit-pointer, i.e. doesn't adjust its registers.
 
  BFINS Dn, {offset:width}
 
  would require a form where the offset is incremented by the number of bits inserted, and when this wraps around,  is incremented:
 
  BFINS D0,(a0)+{d1:3}
 

 
  The offset in D1 can be a 32bit value.
  This means if you do:
 

  BFINS D0,(a0){d1:3}
  ADDq.l #3,d1
 

  Then this should do exactly what you want.
  This Bitoffset in the register allows you to address bitwise 256 MB this way.
  Would this work for you?
 
 
 
 
Thomas Richter wrote:

  Similar for BFEXT.
 
  For JPEG coding/decoding it would be better if the instruction would be byte-oriented, i.e.
 
  insert three bits into the bit-buffer pointed to by a0, increment d1 by 3, if that wraps around 8, increment a0 by one, subtract 8 from d1.
 
  Bits should be filled left to right (i.e. MSB first).
 
  Similar with dynamic width in a third register.
 
  This is the core instruction for huffman coding, the basis for many codecs.
 

 
  I did not understand your BFEXT example.
  Can you please explain this again?
 

Thomas Richter
Germany
(MX-Board Owner)
Posts 1425
17 Oct 2010 19:30


Gunnar von Boehn wrote:

Thomas Richter wrote:

  One of the often used schemes in encoding is "bit-packing" of bits into a byte-array, i.e. you hold a bit-pointer, have an array of ULONG or UBYTE and want to insert bits, then incrementing the pointer. BFINS does the insertion quite fine, but it does not increment a bit-pointer, i.e. doesn't adjust its registers.
   
    BFINS Dn, {offset:width}
   
    would require a form where the offset is incremented by the number of bits inserted, and when this wraps around,  is incremented:
   
    BFINS D0,(a0)+{d1:3}
 

 
  The offset in D1 can be a 32bit value.
  This means if you do:
 

    BFINS D0,(a0){d1:3}
    ADDq.l #3,d1
 

  Then this should do exactly what you want.
  This Bitoffset in the register allows you to address bitwise 256 MB this way.
  Would this work for you?

Nope, it wouldn't. The problem is that d1 cannot grow arbitrarily large. After a while, you need to reset d1 and increment a0, and for that additional instructions are needed that should be better avoided. Bit-packing is one of the heavy-duty operations in codecs.

I probably make an example:


void PutBits(UBYTE n,ULONG bitbuffer)
{
  assert(n > 0 && n  m_ucBits) {
    // If so, output all bits we can.
    n -= m_ucBits;  // that many bits go away
    m_ucBuffer |= (bitbuffer>>n) & ((1

Thomas Richter
Germany
(MX-Board Owner)
Posts 1425
17 Oct 2010 19:32


Gunnar von Boehn wrote:

Thomas Richter wrote:

  One of the often used schemes in encoding is "bit-packing" of bits into a byte-array, i.e. you hold a bit-pointer, have an array of ULONG or UBYTE and want to insert bits, then incrementing the pointer. BFINS does the insertion quite fine, but it does not increment a bit-pointer, i.e. doesn't adjust its registers.
   
    BFINS Dn, {offset:width}
   
    would require a form where the offset is incremented by the number of bits inserted, and when this wraps around,  is incremented:
   
    BFINS D0,(a0)+{d1:3}
 

 
  The offset in D1 can be a 32bit value.
  This means if you do:
 

    BFINS D0,(a0){d1:3}
    ADDq.l #3,d1
 

  Then this should do exactly what you want.
  This Bitoffset in the register allows you to address bitwise 256 MB this way.
  Would this work for you?

Nope, it wouldn't. The problem is that d1 cannot grow arbitrarily large. After a while, you need to reset d1 and increment a0, and for that additional instructions are needed that should be better avoided. Bit-packing is one of the heavy-duty operations in codecs.

I probably make an example:


Posting doesn't work... I'll write an email.

Most of this stuff could be done in hardware by a single instruction. In C, it looks kind of messy, but it is really just a bit-inserter. In reality, a couple of extras need to be considered, as JPEG and JPEG 2000 use a "bit-stuffing" procedure. JPEG inserts a 0x00 byte after every 0xff byte written, JPEG 2000 inserts a zero-bit after every 0xff byte written.

BFEXTU should work alike, just in the opposite direction.

Greetings,
Thomas


Cesare Di Mauro
Italy

Posts 526
18 Oct 2010 03:14


Gunnar von Boehn wrote:

Ok, i'll bite on this one.

Gunnar, I have only answered to the thread question: "What are the bottlenecks of the Amiga, 68k and OS".

So, my answers aren't related to Natami, but I'll give now some. ;)
 
Cesare Di Mauro wrote:

  1) Lack of a modern SIMD unit (CPU)
 

  A SIMD UNit can be added to the 68k in a clean compatible manner
  But SIMD is useful for a fraction of software only.
  While a good INTEGER unit is always needed.

Sure, but I think that Motorola 68K ISA is mature enough (that doesn't mean that NO more changes are needed!), whereas a new SIMD unit opens up a new world of possibilities.

Also, if you take a look at x86 evolution from the MMX introduction, you see that a few changes were made to the "core" ISA, reserving HUGE changes to the SIMD side.

Needs are changing over the time. Massive intra-parallelization (SIMD) and inter-parallelization (multicore -> SMP or AMP) are common these days.
Cesare Di Mauro wrote:

    2) Lack of SMP (OS)
 

Well we can certainly live without it.

 
We already discussed about this. SMP isn't a problem for Natami, since it will never be a multi-FPGA project (Claudio also stated this).

So, monitoring the nested ExecBase's task count can be made relatively cheap and efficient, in order to allow SMP even on an (new) "Amiga platform".

FPGAs are becoming more and more capable, and you said that copy & pasting a 68K core is just a joke.
That's why you chose 68K ISA for "Robin".
That's why you can add more general purpose 68K cores in SMP configuration, when more space (and time ;) will be available.
Cesare Di Mauro wrote:
 
    3) Stack frames make interrupts and traps expensive. Shadow registers ala-ARM or FIDO are a better solution in terms of speed and latency. (CPU & OS)
 

But interrupt handling never was limitation on AMIGA.

It depends strictly on what kind of usage you make of interrupts or traps.

Think about trapping the unsupported Integer or FPU opcodes for 68040 or 68060 in order to emulate them: it was VERY EXPENSIVE.

As an emulator writer, I was fascinating about the idea of using MMU's page fault mechanism to trap accesses to specific guest emulated area, in order to do useful work without killing the whole emulation speed.

Just to make an example, suppose that I want to emulate an x86 with full VGA support. Normally I need to check EVERY (guest) memory read or write, in order to see if it targets the VGA segments (which ranges from A0000 to BFFFF), and make specific operations based on the current VGA memory mapping (which is quite a mess for 16 colors modes or X-Modes).

Those checks eats up MUCH performance, even if you never address VGA's memory in the executed applications.

Using an MMU with faulting pages mapped on A0000-BFFFF guest memory pages can make the emulator running at full speed on normal memory access (the guest memory can be addressed without checks at all!), trapping only on VGA's pages. But if traps are slow, they can kill performance for VGA accesses.

That's why a light-weight trap mechanism such as the one on ARMs or FIDO can be very useful. Emulators can run pretty well on such systems, with properly written code.
Cesare Di Mauro wrote:

    6) Lack of resource tracking, virtual memory, and memory protection (OS)

I agree that resource tracking is something very good.
But Virtual memory is fluff.

It's questionable. ;) The problem here is AmigaOS, which makes it very difficult to implement, but some applications where made even on AmigaOS to let it usable.
Memory protection is mainly a kludge - and in a DMA driven system like the AMIGA it can by desing never work to 100%.

It's true for Amiga, not for a SoC such as Natami. That's talking only at the hardware level.

On the software side, memory protection is a big problem for AmigaOS, unfortunately. 
Cesare Di Mauro wrote:

    7) Lack of GUI enhancements, such as vectorial graphic presentation (OS)

Taste is a personal matter but I would consider this fluff.

Retargetting graphic keeping fonts and graphics at the maximum quality every time isn't fluff, but a nicer feature.

Take a bitmap and zoom and/or rotate it whatever you like. Now do the same with a PDF or a vector image (SVG, Adobe Illustrator). The difference is HUGE.

That's were modern operating system are going.
Cesare Di Mauro wrote:
 
    8) Lack of modern transactional/journaled filesystem, with metadata support and large storage (OS)

PFS?

I doesn't support large storages.

Also, I haven't found technical stuff about its robustness (journaling) and metadata handling, but that's a problem of mine. :P
Cesare Di Mauro wrote:

    9) chip / fast memory subdivision which makes no sense if the available bandwidth is almost the same. It makes sense only if we have chip mem = embedded or graphic memory with enormous bandwidth compared to the main memory. (Hardware)

I think the opposite is true, my friend.
  The logical Chip and fast subdivsion was very important for the AMIGA. This siperation is a clever and simple way to create a working syncronise beetwen chipset and CPU. Without this seperation no AMIGA with cache and without a MMU would have ever worked.

You are talking about early and/or lower-ends Amiga Systems. Amiga 3000 and 4000 had a full MMU, and the same was true for most accelerator boards.

Now in 2010 you have MMUs available even on potato chips. :P

Natami have several options to address this problem. You can add a PARTIAL MMU in order to mark memory that was allocated as "chip ram", and let the system be coherent.

A better option, which doesn't require operating system patches, will be to add bus-snooping logic to monitor memory changes made by the "custom chipset" part, and update CPU's memory cache.
Natami is a SoC, so it can be cheaper than a multichip system.

At this time I think that it is a waste to have chip and fast with similar bandwidths, but logically separated. With such low bandwidth available, it's quite better to let the system use all the memory bandwidth based on runtime needs.

It'll be a totally different question if you had an asymmetric system, with the chipset part having an order of magnitude more bandwidth compared to the main memory's one, such as the CPU vs GPU bandwidth on PCs or on XBox 360 (which has massive bandwidth available thanks to its e-DRAM).
Cesare Di Mauro wrote:
 
    10) Lack of multiuser support (OS)

  I would say this is only important for Server but not for homecomputers.

I used it in '94/'95, when I was working on Fighting Spirits, thanks to MultiUserFileSystem (I don't know if someone else remember it) with my plain Amiga 1200.

I use it right now in my desktop to have limited and controlled access for my wife, children and... me too. ;)

Cesare Di Mauro
Italy

Posts 526
18 Oct 2010 04:10


Thomas Richter wrote:

  As I stated before, I just answered the thread question: "What are the bottlenecks of the Amiga, 68k and OS", so you have to give the right context on my previous answers. ;)
 
Cesare Di Mauro wrote:

      1) Lack of a modern SIMD unit (CPU)
     

    This is not a bottleneck, but only one solution for bottlenecks.

    Any modern system have SIMDs, because they solve common problems very well.
   
Cesare Di Mauro wrote:

      2) Lack of SMP (OS)
     

    This is a Os problem, not a hardware problem. Not fixable by hardware.

    Right, and I already stated it: (OS). ;)
   
Cesare Di Mauro wrote:

      3) Stack frames make interrupts and traps expensive. Shadow registers ala-ARM or FIDO are a better solution in terms of speed and latency. (CPU & OS)

    Are they? I'm not sure. Interrupt handling was rarely a problem for me.

    Neither for me when I was developing games for Amiga, since I had totally killed interrupts. :D
   
    Anyway, trapping can be a problem. Take a look at my answer to Gunnar.
   
Cesare Di Mauro wrote:

      4) Lack of chunky graphic modes (GPU)
     

    This is a hardware problem, but not really CPU related.

    Right again, thats why I wrote: (GPU).
   
However, there is one point here: Amiga gfx is planar-operated and uses many planar concepts like planemasks. Which means that converting from and to chunky will be a bottleneck. It was a bottleneck for P96. This means that some provision in the system should help here. This could be the blitter, but it requires extensions for that.
     
    Actually, I would probably design the system in a different way today. Instead of a blitter, I would add blitter-type instructions to the CPU to let it perform most of the operations the blitter would have done. That would include bit-packing and unpacking instructions - this stuff could be useful, either in the CPU or in the blitter.
     
    Actually, as the blitter is limited to chip mem, the CPU is probably a better place. Off-image bitplanes could then be blitted by the CPU.
     
    2D operations, similar topic: This could also be done by the CPU, and would then be possible in fastmem as well. Rectangle-fill, rectangular copy, cookie-cut.

    Moderns SIMD have bit shifing, masking, un/packing, so may be a carefully designed SIMD can be used to implement these operations.
   
    But I don't think that something like cookie-cut can be made fast this way. I'll talk below about it.
   
Cesare Di Mauro wrote:

      5) Lack of modern GPU architecture, and shaders particularly (GPU)
     

    True enough. What could be done in the CPU to help here? For 3D, I would need multiply-add a lot. For shaders, you likely need a table-based interpolation, similar to what you find in the CPU32. Further, vector operations to handle red-green-blue simultaneously in a single instruction.

    Modern GPUs do them better, because they were already designed to make those tasks in the most efficient way.
   
    Shaders, also, need a lot of MADDs too (but in the near future FMA's will be used instead of MADDs); nowadays they are specialized stream-processors that execute... code. Even conditional code is now part of them. And GPU's like nVidia's Fermi can execute object-methods too (C++ programs can be easily ported and run).
   
    So, why putting this burden on the CPU's shoulders? It will became more complex and will never keep modern GPUs capabilities and speeds.
   
Cesare Di Mauro wrote:

      6) Lack of resource tracking, virtual memory, and memory protection (OS)

    Unfortunately, an Os design issue, nothing that requires a fix in hardware. Of course, a MMU would be helpful for some of them. For example, a MMU would help to implement a virtual sandbox for applications, but this is a major work to design and write.

    I fully agree.
   
Cesare Di Mauro wrote:

      7) Lack of GUI enhancements, such as vectorial graphic presentation (OS)

    Not exactly a hardware issue.
   
Cesare Di Mauro wrote:

      8) Lack of modern transactional/journaled filesystem, with metadata support and large storage (OS)

    Neither exactly a hardware issue.

    That's why I wrote (OS). :P
 
Cesare Di Mauro wrote:
 
      9) chip / fast memory subdivision which makes no sense if the available bandwidth is almost the same. It makes sense only if we have chip mem = embedded or graphic memory with enormous bandwidth compared to the main memory. (Hardware)

    This is a hardware design issue. If Gunnar wants to stick to this, at least mechanism need to be provided to transfer memory to and from chip mem fast. As DMA devices can only access chip memory currently, the CPU must shuffle data between chip and fast mem for any type of IO that has the fast memory as target. Thus, a possible resolution would be to have a CPU block-move instruction that does not fill the cache, but invalidates cache-entries only, and that works faster than a series of MOVE16. That *would* be useful.

    I already proposed in the past some sort of "block" instructions, which will make excellent usage of the available bandwidth (and caches too).
   
    They also were "bus width agnostic", so they can scale linearly with the system evolution. But the idea was rejected.
   
Cesare Di Mauro wrote:
 
      10) Lack of multiuser support (OS)

    Not really an issue fixable by hardware.

    Same as above. ;)
   
Let me add a couple from my list:
     
      MP3-Playing: Again, multimedia-related. Would require fast bit-manipulations for decoding, fast vector operations, fast DCT transformations. Here again multiply-add comes into play, and support for fixpoint arithmetics if FPU extensions are too hard. That means multiply-round-shift instructions.

    Let use SIMDs for these.
   
JPEG: Mostly DCT and bit-shuffling. Something like a bit-buffer instruction might be helpful: Append bits from one register into another register, increment bit-position, set carry if full. bit-field instructions might be helpful here.

    I wrote a JPEG 2000 decoder in the past, so I know it very well.
   
    Having "bit buffer" instructions will be a premium for not only JPEG / JPEG 2000, but general compression algorithms, because they are heavily based on this kind of operations.
   
    But how compilers can make use of them? It's quite difficult to recognize such patterns in the ASTs, and emit proper instructions.
   
    I think that such instructions will be used only in hand-written assembly code (or parts of it). 
   
Saturating arithmetics: Often used as last stage in such processing chains, both in JPEG and MP3.
     
    Video coding: Fast "sum of absolute differences", the main speed-brake is the motion prediction. Better even, "sum of squares" of two rectangular memory regions, aka "scalar product".

    SIMDs are well for these too.
   
Cryptography: Is there a need for this, for example?

    Yes. VIA already supported AES hardware acceleration on its CPUs, and Intel added specific instructions too into its last processors.
   
Alpha-channel, transparency: Again, multiply-add instructions required here.

    Again, SIMD are useful here.
   
Bobs: Can be currently done by the blitter. But should they?

    I prefer specific "block" instructions for this. SIMDs are too general, and can wast bandwidth and time.
   
    Also, the SIMD unit have fixed-size registers, so can't scale as well as a block instruction (which only need to know the buffer size to work on).
   
Would it probably make sense to have parts of the CPU microcode re-writable while the CPU is working, i.e. "define your own instruction at run time". No need to go deep into the CPU, but very short instruction sequences could be pre-compiled by the CPU, buffered somewhere within the CPU and executed faster by a single opcode than a subroutine jump.

    I prefer some table-based jumps and/or calls: they can be used efficiently on several tasks. Emulation is one, virtual machines another, and compilers can use them to implement switch constructs.
   
Emulation: "Virtual hardware": Would require the CPU to run through an exception cycle if a predefined hardware address is read from or written to. Could be done by a full MMU plus complete exception processing, but might be worth implementing separately.

    What about trapping speed? It need to be fast to accommodate such needs. ;)
   
String-handling: strlen,strcpy and strchr could be handled by fast and simple copy/test and loop instructions.
     
      Greetings,
      Thomas

    That was part of the "block" instructions which I talked about.

Cesare Di Mauro
Italy

Posts 526
18 Oct 2010 04:17


Geoffrey Kramer wrote:

 
Cesare Di Mauro wrote:

    2) Lack of SMP (OS)
 
    10) Lack of multiuser support (OS)
 

 
  As the Amiga don't have multicore CPU, i don't consider the lack of SMP as a bottleneck (it is for the X1000, not for the NatAmi).

That doesn't mean that we cannot have SMP. ;)
I failed to see where the multiusers support will increase the performances of the Amiga architecture.

The thread was not targeting performances only.

Cesare Di Mauro
Italy

Posts 526
18 Oct 2010 04:26


Gunnar von Boehn wrote:
This Bitoffset in the register allows you to address bitwise 256 MB this way.
  Would this work for you?

We are talking about a single compressed file (or stream), so for a 32 bits system I think we can live on with this limit. :P

Thomas Richter
Germany
(MX-Board Owner)
Posts 1425
18 Oct 2010 12:10


Gunnar von Boehn wrote:

  The offset in D1 can be a 32bit value.
  This means if you do:
 

    BFINS D0,(a0){d1:3}
    ADDq.l #3,d1
 

  Then this should do exactly what you want.
  This Bitoffset in the register allows you to address bitwise 256 MB this way.
  Would this work for you?

Sorry, was getting late tonite. Yes, indeed, this should be sufficient. However, I also need direct access to the last byte(!) written. Byte = really aligned byte in memory, not the last eight bits written.

Reason for that is the bitstuffing in JPEG and JPEG 2000. JPEG inserts a zero-byte after each 0xff byte, JPEG 2000 inserts a zero-bit after each 0xff byte.

So long,
Thomas



Geoffrey Kramer

Posts 21
18 Oct 2010 13:06


Cesare Di Mauro wrote:

Geoffrey Kramer wrote:

 
Cesare Di Mauro wrote:

    2) Lack of SMP (OS)
   
    10) Lack of multiuser support (OS)
   

   
    As the Amiga don't have multicore CPU, i don't consider the lack of SMP as a bottleneck (it is for the X1000, not for the NatAmi).

  That doesn't mean that we cannot have SMP. ;)
 
I failed to see where the multiusers support will increase the performances of the Amiga architecture.

  The thread was not targeting performances only.

By bottlenecks, i was thinking about the parts who are actually a brake in the Amiga architecture. Like the planar who was great for 2D but who was a pain in the a.. for raycasting and other stuff (i'm not telling that planar is crap, but if the 1200 had some chunky modes too it was great), or the "chip" and "fast" ram concept.

SMP or Multiusers could improve things, but they are not there, and so, they are not slowing down the Amiga.

My first idea was to list most of the existing weaknesses of the Amiga and maybe find some that the team missed.

But it's also true that people like Gunnar, Thomas and you are better than me for that topic.

Richard Maudsley
United Kingdom

Posts 821
18 Oct 2010 14:48


Maybe multiuser can be implamented similar to what win9x (or OS 9) had.

A program in startup sequence that you use to log in, then assigns folders for each user,moves files like user-startup and prefs, etc...

Obviously nearly no security, but would give the illusion of multiple users to the users, and still be compatable with all software.

Megol .

Posts 676
18 Oct 2010 14:51


Cesare Di Mauro wrote:

Geoffrey Kramer wrote:

 
Cesare Di Mauro wrote:

    2) Lack of SMP (OS)
   
    10) Lack of multiuser support (OS)
   

   
    As the Amiga don't have multicore CPU, i don't consider the lack of SMP as a bottleneck (it is for the X1000, not for the NatAmi).

  That doesn't mean that we cannot have SMP. ;)
 
I failed to see where the multiusers support will increase the performances of the Amiga architecture.

  The thread was not targeting performances only.

Please look at earlier discussions about SMP to understand the complexities involved. Even with special hardware support there will be incompatibilities, in fact almost every change in either hardware or OS will be incompatible with some Amiga software...

Samuel D Crow
USA
(Natami Team)
Posts 1295
18 Oct 2010 18:36


@Cesare Di Mauro

Amiga has always supported parallelism in hardware.  SMP is just a shortcut for CPU manufacturers that don't want to design custom cores.  The Amiga way is to design custom processors to run in parallel.  I think that an SPU-like non-symmetric processor is the proper solution for parallel processing on the NatAmi.

SID Hervé
France

Posts 663
18 Oct 2010 19:16


Hello

If I refer to proposals relating to JPEG and other specific use, would it more appropriate to extend the instruction set according to general purpose? Or will there be commonly re-employ those new instructions?

Thanks


Cesare Di Mauro
Italy

Posts 526
19 Oct 2010 05:23


Geoffrey Kramer wrote:
By bottlenecks, i was thinking about the parts who are actually a brake in the Amiga architecture. Like the planar who was great for 2D but who was a pain in the a.. for raycasting and other stuff (i'm not telling that planar is crap, but if the 1200 had some chunky modes too it was great), or the "chip" and "fast" ram concept.
 
SMP or Multiusers could improve things, but they are not there, and so, they are not slowing down the Amiga.
 
My first idea was to list most of the existing weaknesses of the Amiga and maybe find some that the team missed.
 
But it's also true that people like Gunnar, Thomas and you are better than me for that topic.

OK, I have understood now. I hope that something that I said was useful for your scope.

Regards bottlenecks and weaknesses of the Amiga (hardware), I wrote an article some time ago.
The original in italian is here: EXTERNAL LINK and a rough translation with Google here: EXTERNAL LINK  Its quite technical and you have to know some internals on how the Amiga chipset worked (there are links inside to other articles that explain them).

Cesare Di Mauro
Italy

Posts 526
19 Oct 2010 05:27


Richard Maudsley wrote:

Maybe multiuser can be implamented similar to what win9x (or OS 9) had.
 
  A program in startup sequence that you use to log in, then assigns folders for each user,moves files like user-startup and prefs, etc...
 
  Obviously nearly no security, but would give the illusion of multiple users to the users, and still be compatable with all software.

You don't need such trick.

MultiUserFileSystem worked and works quite well: EXTERNAL LINK ;)

Cesare Di Mauro
Italy

Posts 526
19 Oct 2010 05:33


Megol . wrote:
Please look at earlier discussions about SMP to understand the complexities involved. Even with special hardware support there will be incompatibilities, in fact almost every change in either hardware or OS will be incompatible with some Amiga software...

I remember the discussion.

The (extreme) synthesis is that Thomas said that SMP isn't possible on Amigas, particularly because of the ExecBase:
BYTE TDNestCnt; /* task disable nesting count */
contention between the CPUs cannot be controlled, since this global variable can and WAS (sic!) be accessed directly instead of using the Forbid API.

Gunnar's response was that Natami can have direct control over this memory location (remember that we are talking about a SoC), regulating the access and contention between the CPUs.

Do you know of other problems that SMP can raise on Amiga?

Cesare Di Mauro
Italy

Posts 526
19 Oct 2010 05:52


Samuel D Crow wrote:
@Cesare Di Mauro
 
  Amiga has always supported parallelism in hardware.  SMP is just a shortcut for CPU manufacturers that don't want to design custom cores.

That's really questionable. You can say that SMP have pros and cons about solving specific problems, but rejecting the whole idea isn't a technically valid reason.

As a professional, I must keep in mind the needs and requirements about a problem, evaluate the cost / benefits of the tools available, and chose the "better" solution.

Certainly I cannot discard some tools just on an empathy basis.
The Amiga way is to design custom processors to run in parallel.  I think that an SPU-like non-symmetric processor is the proper solution for parallel processing on the NatAmi.

But you are just translating the problem here, with a Cell-like AMP.

In such system you still have an SMP system with multiple symmetric cores working together to solve the problem(s).

The difference stays in a single element, the PPE, which works as the "master" / arbiter / controller of the SMP subsystem. This mode is called PPE-centric in the Cell literature.

However, there's another mode which is called SPE-centric, whereas the SPE self-regulate themselves without PPE intervention. This looks as an SMP system with NUMA, with the addition of some useful synchronization directives between the cores.

Anyway, you still have an SMP. It just changes how it used by the system, but in the end you have a number of identical cores on which distribute the work. So you have to think about dividing the problems in subproblems to parallelize, the same way you do with and SMP system.

Certainly an AMP approach is preferable on Amiga, since SMP poses big problems, as we know (unless Gunnar solves them! :D). So I'm with you here: in Natami it'll work more "natural".

But I pretty like that the additional cores being 68K derived, since I really like this architecture and I'll find cumbersome having to program with two different ISAs, such is in the Cell.

Cesare Di Mauro
Italy

Posts 526
19 Oct 2010 06:03


SID Hervé wrote:

Hello
 
  If I refer to proposals relating to JPEG and other specific use, would it more appropriate to extend the instruction set according to general purpose? Or will there be commonly re-employ those new instructions?
 
  Thanks

Manipulating bit-streams is certainly a specific thing, but a common need also.

I think that we cannot have JPEG-only instructions, such as seeking the bit stream to find a $FF byte written or read (in order to put/discard $00 byte or a 0 bit, as it was made in JPEG and JPEG 2000).

But a general instruction set to work with bit streams helps A LOT on many areas, and can make the code FASTER (and compact, also).

A multi-cycle instructions solution is preferable instead of have nothing.

Just to be clear, I don't care if the pipeline stalls for one or more cycles, waiting for its result. The most important thing is that it works solving a problem that on other architecture will require MANY cycles and introduces A LOT of data dependency, slowing down the execution.

The only question is about the compiler support for such useful things, that can be really problematic to solve (how to recognize those patterns and emit proper code?).

Richard Maudsley
United Kingdom

Posts 821
19 Oct 2010 08:29


Cesare Di Mauro wrote:

Richard Maudsley wrote:

  Maybe multiuser can be implamented similar to what win9x (or OS 9) had.
 
  A program in startup sequence that you use to log in, then assigns folders for each user,moves files like user-startup and prefs, etc...
 
  Obviously nearly no security, but would give the illusion of multiple users to the users, and still be compatable with all software.

  You don't need such trick.
 
  MultiUserFileSystem worked and works quite well: EXTERNAL LINK ;)

Looks more like a trick than my approach :p

Besides, changing file system will no doubt cause problems somewhere.

posts 70page  1 2 3 4