 |
Welcome to the Natami / Amiga ForumThis forum is for AMIGA fans interested in the new NATAMI platform.
Please read the forum usage manual.
|
Do you have questions about the Natami? Post it here and we will answer it!
|
| About the Memory | page 1 2 3
|
|---|
|
|---|
Marcel Verdaasdonk Netherlands
| | Posts 3976 21 May 2012 06:26
| SID i can tell you that without a doubt the Harvard architecture is more secure then a Von Neumann. So can you explain to me why two memory requests are bad? Data should be in chip instructions in fast.
| |
SID Hervé France
| | Posts 663 21 May 2012 19:42
| There is no question of good or bad. Two queries are needed for the Amiga. But the chip and fast memories are virtual in the Natami (unlike the Amiga). Physically, this is one type of memory which hosts data (DDR2). It would be useful to request it directly. A new flag could be created (just for example: MEM_DDR). Roughly, the idea is: A single memory allocation request with this new flag Loading a structured data file. Pointers to use its data (images, animations, sounds, text, etc.). It seems cheaper for writing a code.
| |
Matt Hey USA
| | Posts 734 21 May 2012 20:03
| SID Hervé wrote:
| I am sorry, I forgot to indicate that the database contains text, images and sounds. What about for displaying images. Their data must be loaded into Chip memory chip. This memory is allocated through a request with the flag "MEMF_CHIP". So finally, two requests are needed.
|
The pictures (and sounds) in the database would probably be compressed as most picture formats today. The decompression can happen from MEMF_ANY to MEMF_CHIP by the CPU with good speed avoiding a separate memory copy. The database could keep filenames of the pictures (and sounds) and simply pass the filename to datatypes.library which should minimize any copying if it's smart. Marcel Verdaasdonk wrote:
| SID i can tell you that without a doubt the Harvard architecture is more secure then a Von Neumann. So can you explain to me why two memory requests are bad? Data should be in chip instructions in fast.
|
Are you trying to confuse SID? The only useful aspect of a Harvard Architecture that applies to the Amiga is having separate instruction and data caches. This allows the instructions to be fetched at the same time as data is fetched as requested by instructions. This is faster but doesn't provide any extra security. Maybe you refer to not allowing instructions to use PC relative addressing modes for writing? That's really a joke as it can be defeated with 2 instructions. A MMU/MPU is needed for security.
| |
Marcel Verdaasdonk Netherlands
| | Posts 3976 23 May 2012 21:49
| Matt to be honest i was goading him. The 68K is a hybrid Harvard CPU, as a full Harvard would imply two memory buses and address spaces.(Almost sound quite Amiga like) Data may never be placed in the instruction memory and instructions can never be written into Data memory.(Seems very secure until you blur those simple rules, however it is not relevant)In the Amiga there are two options either use any memory or chip memory. The prior being the more safe choice when writing OS friendly software. To be honest I would simply place the Data in chipmem and the Main in any.(Hey this seems a little like a Harvard CPU design) A MMU on the other had does not directly imply security.(on the Xbox there are several fine examples where it usually goes wrong in the chain of trust.)
| |
SID Hervé France
| | Posts 663 24 May 2012 17:46
| Matt Hey wrote:
| The decompression can happen from MEMF_ANY to MEMF_CHIP by the CPU with good speed avoiding a separate memory copy.
|
This means there are 2 AllocMem() : MEMF_CHIP and MEMF_ANY. May be it could be avoided with a single AllocMem(z, MEMF_DDR2). I think the code would be both more concise and readable.For this facility, the header file EXEC_MEMORY could be extended. According to Thomas Richter, this is just a matter of how decoding and routing address.
| |
Marcel Verdaasdonk Netherlands
| | Posts 3976 24 May 2012 18:32
| SID your request is redundant at best, there might be more modern algorithms Which i sincerely doubt in this case. But why add another option where there are only two options needed, could you please explain this to me?WHY do we need it!!!
| |
SID Hervé France
| | Posts 663 24 May 2012 23:02
| Amount of memory required chip-size = x fast-size = y Flags attributes chip = MEMF_CHIP fast = MEMF_FAST Example assembly-like language: move.l EXECBASE,A6 move.l x,D0 move.l chip,D1 jsr EXEC_ALLOCATEMEM(A6) move.l D0,ptr-mem-chip move.l EXECBASE,A6 move.l y,D0 move.l fast,D1 jsr EXEC_ALLOCATEMEM(A6) move.l D0,ptr-mem-fast The same example but in C-like language: ptr-mem-chip=AllocMem(x,chip) ptr-mem-fast=AllocMem(y,fast) And now, what is requested: ddr2-size = z ddr2 = MEMF_DDR2 move.l EXECBASE,A6 move.l z,D0 move.l ddr2,D1 jsr EXEC_ALLOCATEMEM(A6) move.l D0,ptr-mem-ddr2 or ptr-mem-ddr2=AllocMem(z,ddr2)
| |
Matt Hey USA
| | Posts 734 25 May 2012 00:20
| @SID If you have raw data (bitmap or sound sample) that the chips can handle, chip memory can be allocated and the data loaded directly to this location. It's more likely that the data is compressed. Fast memory would be allocated for the compressed data and it would be copied to this location. The uncompressed size would be read or determined and memory would be allocated in chip memory. The data would be read from the fast memory compressed source and written to the uncompressed destination. It should be possible to stream the compressed data avoiding having all of it in fast ram but, in most cases, 2 buffers are needed for decompression. Two memory allocations/buffers would be required for decompressing with unified memory also. Identical data is never required in both chip and fast memory. I don't see a problem.
| |
Marcel Verdaasdonk Netherlands
| | Posts 3976 25 May 2012 02:10
| Besides what Matt said what is the point again in replacing a Keyword? Since your feature request pretty much shows a lot of similarities with MEMF_ANY. So what sets you requested feature apart?
| |
SID Hervé France
| | Posts 663 26 May 2012 10:50
| Hello There is no problem. It's just a matter of simplification, readability and writing. MEMF_CHIP and MEMF_FAST could be replaced by just one new attribute, just one allocation to load, uncompress and read.
| |
Thomas Richter Germany
| | (MX-Board Owner) Posts 1425 26 May 2012 12:50
| SID Hervé wrote:
| Hello There is no problem. It's just a matter of simplification, readability and writing. MEMF_CHIP and MEMF_FAST could be replaced by just one new attribute, just one allocation to load, uncompress and read.
|
Sorry, I still don't get it. It sounds for me as if you believe that allocating memory is a heavy-weight operation. It isn't. The overall time of a codec is spend in the actual coding or decoding, and not in memory allocation. It really doesn't matter, and there is nothing to worry about. That AmigaOs reacts more critical on memory allocation changes is rather a by-product of not providing automatic stack extension. Under Linux, stack and heap are mirrored into the overall address space of the process, and the stack is extended automatically whenever needed, so grows linearly. AmigaOs doesn't have that (lacking a MMU), and thus programs usually allocate memory for all types of temporary object you would otherwise simply put on the stack. The latter method is usually faster and simpler, but not available with the rather draconic 4K of stack you get by default. Besides, if that stack overflows - regardless of whether you extended it yourself or not - the program just crashes. Not a good option if you ask me.
| |
SID Hervé France
| | Posts 663 26 May 2012 15:37
| Thomas Richter wrote:
| Sorry, I still don't get it. It sounds for me as if you believe that allocating memory is a heavy-weight operation.
|
No, this is not the case and it's simpler. It is only about creating a memory area with no access restriction. The chipset retrieves directly raw data previously decompressed by the CPU.
| |
Matt Hey USA
| | Posts 734 26 May 2012 17:25
| SID Hervé wrote:
| No, this is not the case and it's simpler. It is only about creating a memory area with no access restriction. The chipset retrieves directly raw data previously decompressed by the CPU.
|
There are other considerations besides simple (generally simple=easy for humans). You may only see the problem from that of a programmer (software view). I see 3 considerations: 1) Simple or easy to use from programmers perspective 2) Simple or easy to create from hardware perspective 3) Speed and performance I think 1 supports a unified memory while 2 and 3 likely support a divided memory.
| |
Thomas Richter Germany
| | (MX-Board Owner) Posts 1425 26 May 2012 17:53
| SID Hervé wrote:
|
Thomas Richter wrote:
| Sorry, I still don't get it. It sounds for me as if you believe that allocating memory is a heavy-weight operation. |
No, this is not the case and it's simpler. It is only about creating a memory area with no access restriction.
|
I still don't get it. If you just need generic memory, MEMF_PUBLIC | MEMF_ANY will do. In fact, this is what most programs do, and MEMF_FAST is rarely useful.There are good reasons not to have a separate memory model as the Amiga has, but easyness of the interface or performance of the memory allocation is not one of them. SID Hervé wrote:
| The chipset retrieves directly raw data previously decompressed by the CPU.
|
There are good arguments, but this isn't one of them. For the CPU it is for free just to write the data elsewhere. Just to give you an example, JPEGs output DCT stage would just output to a chipmem buffer instead of a regular memory buffer. There is no double copy involved in this case. The point should rather be (your point) that with a memory interface fast enough, custom chip access cannot block off the CPU in first place, so it doesn't make sense to have a distinct chip memory anyhow. So in a sense, I do not see why we have MEMF_CHIP there as I don't quite see why it makes things simpler. It only makes things complicated, and it is an anachronism which is based on design decisions taken 20 years ago that are no longer valid.
| |
SID Hervé France
| | Posts 663 26 May 2012 20:35
| Thomas Richter wrote:
| Just to give you an example, JPEGs output DCT stage would just output to a chipmem buffer instead of a regular memory buffer. There is no double copy involved in this case.
|
This chipmem buffer is in chip memory then it makes no difference. Two AllocMem() are required. Maybe a solution would be that the CPU and the chipset could have a deregulated access to a same memory area.
Thomas Richter wrote:
| The point should rather be (your point) that with a memory interface fast enough, custom chip access cannot block off the CPU in first place, so it doesn't make sense to have a distinct chip memory anyhow. So in a sense, I do not see why we have MEMF_CHIP there as I don't quite see why it makes things simpler. It only makes things complicated, and it is an anachronism which is based on design decisions taken 20 years ago that are no longer valid.
|
It seems to me it's a better or an another way of expressing the problem.
| |
Marcel Verdaasdonk Netherlands
| | Posts 3976 27 May 2012 17:31
| SID you do not seem to fully understand what your asking besides your two AllocMem problem does not exist in the way you explain it.(Since what you say is what is already done on a hardware level) ThoR is correct but he forgot to mention somethings, but that is besides this point.(it is more likely he didn't felt like bringing system subsystems into the tread.)
| |
Thomas Richter Germany
| | (MX-Board Owner) Posts 1425 27 May 2012 20:36
| SID Hervé wrote:
|
Thomas Richter wrote:
| Just to give you an example, JPEGs output DCT stage would just output to a chipmem buffer instead of a regular memory buffer. There is no double copy involved in this case. |
This chipmem buffer is in chip memory then it makes no difference. Two AllocMem() are required.
|
I still don't understand why you believe that. Nor why you believe that it makes any practical difference. If the overhead from AllocMem() in your program is considerable, something is wrong with your program. And not with AllocMem.Look, a cleverly written JPEG will do the following: Take the source data, which is in its own buffers (AllocMem() in the jpeg library). Transform it into the output buffer provided by the user/client program (AllocMem() of the screen memory). There is no way the jpeg library could or would allocate screen memory in first place. How should it know where the program wants it? Typically, even other things happen: The data first goes to an output memory provided by the user program not even on-screen, and is then blitted and clipped into the output window, thus there are then three buffers involved. And even more: If the image is color-transformed, the DCT transformation will also first go into a buffer, and an additional color transformation and interpolation step will transform the DCT output buffer into the JPEG output buffer, so four buffers then. IOW, AllocMem() is completely irrelevant as an operation, and the data is copied over and transformed more than once. File system buffer -> huffman buffer (1) -> huffman to DCT intput buffer (2) -> Dequatization DCT input to DCT output buffer (3 - typically different) -> (4) DCT output buffer to color transformation output buffer (5) -> color transformation output = client program input buffer to screen memory (6). So the data is more or less touched six times, in a sense. Some of the buffers could be co-located, depending on the coding options, but usually they can't.
SID Hervé wrote:
| It seems to me it's a better or an another way of expressing the problem.
|
I'm sorry, but the two seem to be rather unrelated.
| |
SID Hervé France
| | Posts 663 28 May 2012 00:02
| The problem is not the number of AllocMem() but but the use of various attributes. Your details seems to indicate that it is mainly a hardware issue. I rephrase the question: while remaining compatible, would it be possible that the chipset and the CPU have a memory access without priority over the other? If yes, would it be possible that adding a new attribute supports this possibility?
| |
Nixus Minimax Germany
| | Posts 273 28 May 2012 12:02
| SID Hervé wrote:
| I rephrase the question: while remaining compatible, would it be possible that the chipset and the CPU have a memory access without priority over the other? If yes, would it be possible that adding a new attribute supports this possibility?
|
Is this what you are talking about: MEMF_CHIP = "I want RAM where the custom chips have priority over the CPU" MEMF_FAST = "I want RAM where the CPU has priority over the custom chips" MEMF_SID = "I want RAM where first come, first serve" MEMF_ANY = "Don't care, just give me some RAM" I don't think that this makes much sense but it seems to me that your question has not come across which is why I am asking. With regard to the Natami, since all accesses go through the same single hardware bus, there is no obvious reason for a differentiation between chip mem and fast mem but perhaps Thomas has some ideas for the access arbitration where priorities of accesses will be different depending on where the target addresses reside. For example, one might assign some of the DMAs a relatively low priority and provide some internal buffers to compensate for some arbitrary latency. If the CPU accesses virtual fast mem, its access could be preferred over the low-priority DMA but not if the CPU accesses virtual chip mem. This would be a tribute to the fact that there has been this kind of differentiation in the AmigaOS from the very beginning. Other than that in my opinion it wouldn't be a compelling design decision to tie software to a specific hardware implementation. On the other hand, this is an Amiga where we don't want a million levels of abstraction... :) Regarding adding yet another type of RAM to allocate, I don't think that even more arbitration complexity should be visible from the perspective of a programmer.
| |
Marcel Verdaasdonk Netherlands
| | Posts 3976 28 May 2012 15:51
| Besides that all i think what SID wants in software should be part of the memory controller to start with and culled from the programmers frustum.
| |
|
|
|
|