Home   News   Concept   AMIGA-Compatible   Hardware   Forum   Questions+Answers   Pictures   Contact & Team

Welcome to the Natami / Amiga Forum

This forum is for AMIGA fans interested in the new NATAMI platform.
Please read the forum usage manual.



All TopicsNewsQAFeaturesTalkTEAMLogin to post    Create account
Do you have ideas and feature wishes? Post them here and discuss your ideas.

Stream Decompressionpage  1 2 3 4 5 6 7 8 
Matt Hey
USA

Posts 729
05 Feb 2011 20:44


Wojtek P wrote:

 
Gunnar von Boehn wrote:
   
    2nd Example:
    Our Shoot Em Up game currently has a size of a bit over 200MB.
    I assume in the final version it will be around 500MB.
    The 500 MB will probably compress down to 200 MB.
    300 MB space saved and loading speed more than doubled.
 

  What incredibly good lossless compressor can compress graphics data so well??
 

 
  I think Wojtek made a good point here. I also talked about the need for texture compression, eventually. What is needed is hardware gfx decompression. S3TC is the industry standard but it costs money and would likely take a lot of fpga space. Dithering is not compression but it does make a smaller image look closer to a bigger image. What we need is light weight (cheap) image compression that is easy and simple to decompress to truecolor and 16 bit chunky formats. How difficult would HAM8 be to decompress in fpga to truecolor and 16 bit? Maybe it's an ignorant suggestion, but many Amiga users already have savers in their paint programs and it is 1/2 the size of 16 bit and 1/4 the size of truecolor. I don't think new HAM modes make sense as a display format but HAM might make sense as a light weight compression format.


SID Hervé
France

Posts 663
05 Feb 2011 20:57


The type of decompression is not important because the possibility of updating the FPGA.
 
I guess the hardware decompression costs less than a processor. In addition, the CPU would be discharged accordingly.
 
Another consideration is related to the use of the NatAmi, I think that it will be more often used as a simple player.

Cesare Di Mauro
Italy

Posts 526
05 Feb 2011 21:04


I think it'll be better to have and HAM8/YUV mode instead of the classic HAM8/RGB, since with a YUV color space a pixel transition will be much soften.

Eyes are more sensible to light (gradient) changes, so with an HAM/YUV mode if the first "change command" is related to the Y channel, it'll make quite less artifacts.

May be a test program will prove it.

Thomas Richter
Germany
(MX-Board Owner)
Posts 1425
05 Feb 2011 21:07


Actually, I'm not too keen about this stream compression idea all over. It means that you cannot fill the harddisk compatibly without having access to the FPGA, from other systems. This, however, would be a nice feature to have, namely prepare disks or read them from a PC or a native Amiga.

Woitjek and others: Lossless image compression can get you about 1:2 as compression rate, not more. There are simple hardware-suitable algorithms for that, for example JPEG-LS would do that. But in either case, despite being the JPEG guy here, I would *not* recommend doing that. It adds another level of complexity that is quite unnecessary, and the decompression would better be done in software if it is needed by the program, instead being done invisibly to the system by some hardware.

Greetings from Newark,

Thomas
 

Cesare Di Mauro
Italy

Posts 526
05 Feb 2011 21:19


Team Chaos Leader wrote:

I vote for having a second 68070 to do the compression/decompression.
 
  - It uses a ton of LE and SRAM
 
  + It is 100,000x more flexible.
  + Codecs can be changed at will
  + Multiple codecs can be used.  Specialized codec for audio, a 2nd for text and exes, a 3rd for gfx, etc.
  + Makes excellent use of all Jens' 100s of hours of hard work debugging. You have a great codeslave.  Why not use him? :D

Because when you need to decompress something, the main CPU has always to wait for the data to be decompressed and available before continuing executing and use them.

So, I see no advantage on having the decompression made buy a second CPU: the main one can do the work as well.
Most of the time this unit will be unused.  While it is not being used for compression/decompression it can be used for any other thing imaginible.

Sure.
A hardcoded FPGA codec would be very kewl but it is just so unflexible.  We get stuck with just 1 algorithm.  When we discover a better algorithm in the future?  Oh well, we are stuck with an old hardcoded algorithm. :(

We don't know how the NOVA algorithm works, so you can't say that.

From Gunnar's words, it seems pretty easy and very cost effective for the FPGA.

ZIP's deflat algoritm will require a lot more resources, since it involves both LZ77 + Huffman, and proper buffers to be used.
So I vote for using a 2nd 68070 to do the job.  Like Wojtek said, we can leave out some lame instructions such as BCD, if that helps.

I don't know how much space is required for the BCDs instructions, but if they are cheap, there's no need to change the ISA for the second 68K processor.

Anyway, if the ISA have to simplified, we can remove:
- BCDs;
- CHK/CHK2/CAS/CAS2;
- MOVEP;
- double indirect address modes.
And I am sure that in the first Natami we would probably have to greatly reduce its cache size.  But a few years down the road when better FPGA's are available then we can increase the cache fpr faster performance and 100% compatibility.

Yup.

SID Hervé
France

Posts 663
05 Feb 2011 22:17


Maybe this could be configurable decompression, the user could choose something like file mode, disk mode or nothing. This could satisfied each user.

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
05 Feb 2011 22:33


Cesare Di Mauro wrote:

So, I see no advantage on having the decompression made buy a second CPU: the main one can do the work as well.

This boils down to the same argument as polling IO or DMA.

If you have a seperate unit (either a DMA FPGA engine or a second CPU) doing the loading of the disk-data and the decompression for you then you can load a next level in parallel while playing the game.

The same is true with AMIGA OS and the Workbench.
Your main CPU (remember AMIGA OS only supports 1 CPU!) is free to run AMIGA OS - while the acceleration Unit (again it does not matter whether this is a CPU or special Unit) will do the copy work.

This means the "IO-acceleration" will at the end of the day off-load the core CPU.
This is very much the same situation that SCSI versus ATA was in the old days.

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
05 Feb 2011 22:37


Thomas Richter wrote:

  Actually, I'm not too keen about this stream compression idea all over. It means that you cannot fill the harddisk compatibly without having access to the FPGA, from other systems.
 

But of course you can.
You know XPK, don't you?
Its just the same concept.
 
If you don't know XPK think of Powerpacker and PPlib.

Thomas Richter
Germany
(MX-Board Owner)
Posts 1425
06 Feb 2011 03:04


Gunnar von Boehn wrote:

Thomas Richter wrote:

  Actually, I'm not too keen about this stream compression idea all over. It means that you cannot fill the harddisk compatibly without having access to the FPGA, from other systems.
 

 
  But of course you can.
  You know XPK, don't you?
  Its just the same concept.
 
  If you don't know XPK think of Powerpacker and PPlib.

There's one essential difference: PPLib and XPK are software libraries; they are not transparent between the hardware layer and the driver layer, but between the loader and the driver (at best). Which means that at least a user can disable them, can be aware of them, and can undo the operation by software. This is all not so clear if it happens lower on the stack. A disk read on Amiga by the raw disk interface will look the same on a PC, which is quite a useful feature for recovery and data analysis. This will be no longer the case with hardware compression, and requires additional tools on the PC side.

Greetings (now from Boston),

Thomas



Cesare Di Mauro
Italy

Posts 526
06 Feb 2011 05:38


Gunnar von Boehn wrote:

Cesare Di Mauro wrote:

  So, I see no advantage on having the decompression made buy a second CPU: the main one can do the work as well.
 

 
  This boils down to the same argument as polling IO or DMA.
 
  If you have a seperate unit (either a DMA FPGA engine or a second CPU) doing the loading of the disk-data and the decompression for you then you can load a next level in parallel while playing the game.

For me this is an uncommon scenery, since I have used all available memory for my games, and I think that for Natami about the same will be made.

So the CPU will have to wait in any case, and the advantage is too little.
The same is true with AMIGA OS and the Workbench.
  Your main CPU (remember AMIGA OS only supports 1 CPU!) is free to run AMIGA OS - while the acceleration Unit (again it does not matter whether this is a CPU or special Unit) will do the copy work.
 
  This means the "IO-acceleration" will at the end of the day off-load the core CPU.
  This is very much the same situation that SCSI versus ATA was in the old days.

That's an interesting scenery, but you have the CPU to do something else while the system is decompressing datas.

Usually, you know, a human being works with a single (main) task at the time.

When I launch a new application, I wait it to be loaded and executed, then I'm interested on working with it.

So, I must wait anyway. That's the primary point for a user.

Cesare Di Mauro
Italy

Posts 526
06 Feb 2011 05:43


Thomas Richter wrote:

Gunnar von Boehn wrote:

 
Thomas Richter wrote:

    Actually, I'm not too keen about this stream compression idea all over. It means that you cannot fill the harddisk compatibly without having access to the FPGA, from other systems.
   

 
  But of course you can.
  You know XPK, don't you?
  Its just the same concept.
   
  If you don't know XPK think of Powerpacker and PPlib.
 

  There's one essential difference: PPLib and XPK are software libraries; they are not transparent between the hardware layer and the driver layer, but between the loader and the driver (at best). Which means that at least a user can disable them, can be aware of them, and can undo the operation by software. This is all not so clear if it happens lower on the stack. A disk read on Amiga by the raw disk interface will look the same on a PC, which is quite a useful feature for recovery and data analysis. This will be no longer the case with hardware compression, and requires additional tools on the PC side.
 
  Greetings (now from Boston),
 
  Thomas

I don't know how Gunnar intended to implement the "stream decompression", but if he will create an XPK module, or something like PowerPacker's PPLoadSeg, everything will be transparent to the user.

The module will just use the decompression hardware when needed, instead of the CPU, to accelerate the operation.

Claudio Wieland
Germany
(Natami Team)
Posts 703
06 Feb 2011 09:02


That's right, Cesare. And the decompressor engine takes a very, very tiny amount of logic resources, compared to even just a couple more sprites (which are in fact crippled mini playfields, as little as you like to hear it. Thomas H. fully agreed to this btw).

I just don't see why a highly valuable resource like a real CPU should do this very low-level stuff, which can be integrated seamlessly into the DMA (switchable). Doing a MP3 decoder, or game AI etc with a subcore CPU, makes sense, IMO. But not this type of work.

Think about it - this tiny, realtime hardware decompressor accelerates the system in general. It is not about saving space, obviously. Who thinks that saving storage space is the motivation, has misunderstood the whole discussion. Data is best, when you don't have to transmit it. I think we all agree on this,especially on out not so super fast memory interfaces.

A fully-fledged CPU is a incredibly scarce and expensive resource in a system. If you use it all the time for grunt work, you are blocking this valuable resource, which could otherwise do complex, high-level work which makes no sense to directly implement a special solution for it in hardware. Using a CPU for grunt work lowers overall system speed, no matter whether it's the main CPU or a helper CPU.

In Natami, we must find ALL possible ways to accelerate the overall system speed, at the lowest possible price in logic resources. Adding an experimental tiny decompressor, doesn't hurt. Even if someone finds it useless, then this is one opinion of many many others. If the majority finds it actually not worthwhile, then we can remove this experimental feature again. As long as it is marked as experimental and people know it could possibly be removed again in the future, they know what they are dealing with. Natami is one great experimenting platform, for a major part. If we don't do experimental things here and there, we can stop right now and buy some Intel chips.

Deep Sub Micron
Germany
(MX-Board Owner)
Posts 567
06 Feb 2011 11:00


Wojtek P wrote:

  Cool feature that nobody will use.
  Because it's USELESS.

OK Wojtek, I understand. You will not use a decompression hardware.
That is fair and OK for me.
But then all further comments on this topic from your side are obsolete as HW decompression no longer affects you. I don't need to read them. Others don't need to read them. So you can save your time not writing them.

PS: Sorry I could not resist feeding him. I am no good in acting as role model. :)


Wojtek P
Poland

Posts 1597
06 Feb 2011 11:46


Cesare Di Mauro wrote:

I think it'll be better to have and HAM8/YUV mode instead of the classic HAM8/RGB, since with a YUV color space a pixel transition will be much soften.
 
  Eyes are more sensible to light (gradient) changes, so with an HAM/YUV mode if the first "change command" is related to the Y channel, it'll make quite less artifacts.
 
  May be a test program will prove it.

It's better to have 3-plane 8-bit YUV with U,V planes resolution halved.

already proved.


Wojtek P
Poland

Posts 1597
06 Feb 2011 11:51


Cesare Di Mauro wrote:

  - BCDs;
  - CHK/CHK2/CAS/CAS2;
  - MOVEP;

Exactly. second processor DOES NOT need to be backward compatible.
I talk about asymmetric multiprocessing. Single-processor OS, plus one or even 100 processors controlled by the first.
A simple scheduling library for extra processors would be enough.
no task switching at all - just run one program, then other.
With an option to use main CPU if it's free and all extras are busy.
Or maybe even ignore that option.

Actually there is no need for extra CPU to be 68k.


I Immortal
Netherlands

Posts 67
06 Feb 2011 12:14


Claudio Wieland wrote:

  In Natami, we must find ALL possible ways to accelerate the overall system speed, at the lowest possible price in logic resources. Adding an experimental tiny decompressor, doesn't hurt. Even if someone finds it useless, then this is one opinion of many many others. If the majority finds it actually not worthwhile, then we can remove this experimental feature again. As long as it is marked as experimental and people know it could possibly be removed again in the future, they know what they are dealing with. Natami is one great experimenting platform, for a major part. If we don't do experimental things here and there, we can stop right now and buy some Intel chips.

+1
I realy like the idea. New applications developed for the Natami will most likly grow in size(/me throwing oil on the fire). for several reasons. we have more mem and cpu. standards like xml/html are well spread. using more script languages like python etc. these "standards" use more space due to the text vs binary. and larger files take more time to load. Maybe the speed increase for games will not be that much(when using already compressed data standards) but applications can certainly benefit a lot.

SID Hervé
France

Posts 663
06 Feb 2011 12:52


There is an evident fact: most of the applications have a tendency to take more and more space, mainly data attached to them.
 
Due to the versatility of use of the Amiga and hence NatAmi, I doubt that such a facility available by default would not be used. This is especially true since it will cost nothing to the processor and this could delay the purchase of another hard disk or related.

Thomas Richter
Germany
(MX-Board Owner)
Posts 1425
06 Feb 2011 15:15


SID Hervé wrote:

There is an evident fact: most of the applications have a tendency to take more and more space, mainly data attached to them.
 
  Due to the versatility of use of the Amiga and hence NatAmi, I doubt that such a facility available by default would not be used. This is especially true since it will cost nothing to the processor and this could delay the purchase of another hard disk or related.

You basically set one specific compression algorithm in stone. However, compression is always source dependent, that is, you can either create a universal, but not well-performing algorithm, or a specialized, but great-for-its-source algorithm. By implementing the algorithm in hardware, you basically freeze the development of better algorithms.

I don't really see that this is much of a speed problem that requires hardware support, or should require hardware support. For example, if a PP compatible packer is implemented in hardware, would I use it? Probably not, there are better algorithms than PP, and zip would be more versatile. Yet, zip is not suitable for images, and bz2 is a better universal algorithm. For audio, I would neither use bz2 nor zip, but some lossy code...

Instead of implementing a specific hardware algorithm, I would rather look at which CPU instructions would be beneficial for compression. This is mostly bitjuggling, searching and table-lookup type of instructions, low-level entropy coding stuff (huffman, arithmetic coding) and then leave it to software to use this type of hardware support to implement specific codecs.

Greetings from Boston,

Thomas


Wojtek P
Poland

Posts 1597
06 Feb 2011 15:43


Cesare Di Mauro wrote:

  Because when you need to decompress something, the main CPU has always to wait for the data to be decompressed and available before continuing executing and use them.
 

  Second CPU is useful but because it's general purpose so you can do it for anything.
  as for decompressing i don't see a problem getting >=10MB/s full software gunzip, far more with compressible data.
  you may make use of second CPU by decompressing 2 things at the same time if you wish.

quick test - 2GB files compressed to 700MB with gzip -9
decompressing it from /tmp (ramdisk) under unix takes 13 seconds, so 140MB/s

  CPU: Pentium(R) Dual-Core CPU      T4500  @ 2.30GHz (2294.53-MHz K8-class CPU)

gzip is C code, no asssembly. how much slower would be n68k with code that do lots of random L1 cache accesses and non-predictable branches - huffman decode+copy blocks.

i bet 10MB/s by just compiling gzip with gcc, 20MB/s with assembly.


Wojtek P
Poland

Posts 1597
06 Feb 2011 15:56


Thomas Richter wrote:

compression. This is mostly bitjuggling, searching and table-lookup type of instructions, low-level entropy coding stuff (huffman, arithmetic coding) and then leave it to software to use this type of hardware support to implement specific codecs.

Unless i miss something 68k instruction set is rather good in it.

As for bzip2 it's BAD unless you have megabytes of cache.
grzip is too but much less bad and compresses actully far better than bzip2. This is the packer i use under unix when i really care of size not time - for eg before transmitting file over slow connection.

EXTERNAL LINK 
Porting is trivial. There is one .c file - compressor library and other handling command line and file i/o

i REALLY recommend it.
It packs always faster than bzip2, unpack with similar speed, and always packs better.

there are options to select how it works, in my experiments -b8m -m3 gives best results.



posts 143page  1 2 3 4 5 6 7 8