0 Implementing ipatix's High Quality Audio Mixer
FieryMewtwo edited this page 2021-12-20 09:41:45 -05:00

Preface

All GBA games use the sound driver that comes in the GBA SDK, m4a (also known as mp2k or "Sappy"). While not a bad engine, it does have the issue of being notoriously noisy. In response to this, ipatix created a custom audio mixer for use in the m4a engine based on the custom engine of the Golden Sun games. This engine has two major benefits:

  1. Less noise. The SDK's audio mixer mixes every channel in 8-bit before also outputting at the final 8-bit DAC. What this means is that every additional DirectSound channel will add more noise, and with enough channels it can get pretty bad. This mixer avoids that issue by mixing the audio in a 16-bit space so that the only conversion to 8-bit is at the final DAC output; this means that no matter how many channels you have, the amount of noise will be the same (equivalent to having a single channel worth of noise in the original mixer).
  2. Better performance. The code is highly optimized and contains a self-modifying code loop that is loaded into RAM and allows for faster processing than the SDK mixer, even despite the new 16-bit arithmetic. For more details on how the mixer itself actually works, visit ipatix's repository for the mixer. However, none of this information is necessary if you just want to use it in your game, and with that said, we can now move on to the actual tutorial.

Implementing the Mixer

Replacing the Mixer Code

Replace the src/m4a_1.s file with this one. This is a rewritten version of the file with a version of ipatix's mixer already inserted. The version in this file has been tailored specifically for pokeemerald and thus is slightly different than the version on his repository.

In src/m4a.c, change the size of SoundMainRAM_Buffer to 0xB40. This buffer is where the code itself is loaded into. Still in src/m4a.c, add this line right below the line mentioned above:

BSS_CODE ALIGNED(4) u32 hq_buffer_ptr[size] = {0};

This is the intermediate audio buffer in which the mixing will be done. With the aforementioned changes, the lines should look like this:

BSS_CODE ALIGNED(4) char SoundMainRAM_Buffer[0xB40] = {0};
BSS_CODE ALIGNED(4) u32 hq_buffer_ptr[size] = {0};

Here, [size] is the length of one frame of audio, which varies by the sample rate you use. Here is a list of the possible sample rates and their corresponding frame sizes:

5734Hz: 0x60
7884Hz: 0x84 (This mode is not aligned to the buffer length and is not supported by the mixer)
10512Hz: 0xB0
13379Hz: 0xE0 (This is the default engine rate; without any modifications, this is what the GBA Pokemon games use)
15768Hz: 0x108
18157Hz: 0x130
21024Hz: 0x160
26758Hz: 0x1C0
31536Hz: 0x210
36314Hz: 0x260
40137Hz: 0x2A0
42048Hz: 0x2C0

Find the sample rate you are using and use its corresponding frame size as the size of the array. For example, for the default sample rate, the size of the array is 0xE0 (note that the size of the array actually needs to be the frame size x 4; this is why this array is of type u32 rather than char or u8).

We have finished "inserting" the mixer, but DO NOT COMPILE YET. It will fail, and that is because... we are out of memory!

Making Room in RAM

The one downside to the new mixer is that it does require more RAM than the default mixer due to the increased code size and additional mixing buffer. IWRAM is precious in GBA programming, but it is also incredibly limited (you only have 32KB). Luckily for us, there are structures that we can move from IWRAM to EWRAM without affecting the gameplay--this will allow us to make room for our new mixer.

We will be moving two structs that store data related to the RFU--that is, the Wireless Adapter. These two structs are gRfu and gRfuAPIBuffer, both located in src/link_rfu_2.c. If you head to that file, you will see these two lines starting at line 80:

u32 gRfuAPIBuffer[RFU_API_BUFF_SIZE_RAM / 4];
struct RfuManager gRfu;

To move these to EWRAM, change these two lines to:

EWRAM_DATA u32 gRfuAPIBuffer[RFU_API_BUFF_SIZE_RAM / 4] = {};
EWRAM_DATA struct RfuManager gRfu = {};

Next, we need to go to librfu_rfu to make a small code change to allow the Wireless Adapter to function with these structs in EWRAM. Scrolling down to line 134 will reveal there is actually a check to see if the structs are in EWRAM, and if the check returns true, it will send an error and not allow the usage of the Wireless Adapter. I have confirmed that this works on console, so I am not sure what the purpose of this check is, but all we need to do is comment out or delete the two lines:

-if (((uintptr_t)APIBuffer & 0xF000000) == 0x2000000 && copyInterruptToRam)
-    return ERR_RFU_API_BUFF_ADR;

Next, go to sym_common.txt and delete the line that says .include "link_rfu_2.o" on line 37:

    .include "AgbRfu_LinkManager.o"
-   .include "link_rfu_2.o"
    .include "rtc.o"

Finally, delete the file common_syms/link_rfu_2.txt.

Now we are done--the ROM can finally be built and should work with the new mixer. But there are more things we can do.

Making More Room in RAM (Optional)

The changes we have made leave enough room in RAM for the mixer to work with no problems at the default engine rate. However, if you increase the engine rate, and thus increase the size of our new hq_buffer_ptr array, you may still run into memory issues. Luckily, there is one more thing we can do.

The Pokemon games use 5 DirectSound channels, but m4a always allocates space for the maximum amount, 12. This is wasteful; we can reclaim some free memory from this. Note that if you have edited the number of DirectSound channels in m4a.c to be larger than 5, you will want to use that value instead of 5 in the following replacements.

In constants/m4a_constants.inc, change MAX_DIRECTSOUND_CHANNELS from 12 to 5. Next, in include/gba/m4a_internal.h, change the same MAX_DIRECTSOUND_CHANNELS define from 12 to 5 again. Now we have reclaimed all of the memory we can, and the mixer can go to higher rates without overflowing memory.

Changing the PCM Buffer Size (for Accurate Reverb)

Note that this section only applies if you have both increased the engine rate in m4a.c and have not disabled reverb. By default, the size of the PCM DMA buffer is enough to hold 7 frames of audio, which is how many frames the reverb algorithm expects to be there. If you increase the engine rate without increasing the PCM buffer size, the game will not crash or anything, but you will notice that the reverb sounds shorter, and depending on how high you set the engine rate, almost nonexistent. The fix for this is increasing the PCM buffer size to be able to hold 7 frames of audio in our new engine rate. You will notice that by default the buffer size is 1584; this value comes from the frame size (see the list in the Replacing the Mixer Code section) multiplied by 7, with 0x10 added for safety (this is unnecessary and we will not be doing it for our new sizes). Thus we can calculate the PCM buffer size that we need for every possible engine frequency:

5734Hz: 672
7884Hz: 924 (This mode is not aligned to the buffer length and is not supported by the mixer)
10512Hz: 1232
13379Hz: 1568 (As mentioned, the actual value used by the default mixer adds 16 to this)
15768Hz: 1848
18157Hz: 2128
21024Hz: 2464
26758Hz: 3136
31536Hz: 3696
36314Hz: 4256
40137Hz: 4704
42048Hz: 4928

Now all that we need to do is replace the definitions of the buffers. This must be done in the following two locations:

  1. In constants/m4a_constants.inc, PCM_DMA_BUF_SIZE is set on line 3; change it there.
  2. In include/gba/m4a_internal.h, PCM_DMA_BUF_SIZE is set on line 169; change it there.

Making DMA transfers atomic

Because the new sound engine uses DMA3 (by default, see Mixer Config below), which the game does not expect, we have to make some adjustments to prevent game crashes. The reason we have to do this is that pokeemerald writes the DMA registers value by value which leaves a very small chance for interrupts inbetween those register writes. If we interrupt those writes before the DMA transfer is enabled (which is done by the last write) we will corrupt the register state and once the interrupt returns, a DMA transfer is initiated with the wrong values.

We can fix this very easily by using the STMIA instruction for DMA register writes, which cannot be interrupted. If we do that we can make sure the registers are always in a well defined state. Either a transfer is initialized and completed or it is has not begun yet. See the following patch which makes the game write to DMA by using STMIA in include/gba/macro.h:

 #define DmaSet(dmaNum, src, dest, control)        \
 {                                                 \
     vu32 *dmaRegs = (vu32 *)REG_ADDR_DMA##dmaNum; \
-    dmaRegs[0] = (vu32)(src);                     \
-    dmaRegs[1] = (vu32)(dest);                    \
-    dmaRegs[2] = (vu32)(control);                 \
-    dmaRegs[2];                                   \
+    u32 eval_src = (u32)(src);                    \
+    u32 eval_dst = (u32)(dest);                   \
+    u32 eval_ctl = (u32)(control);                \
+    register u32 r_src asm("r0") = eval_src;      \
+    register u32 r_dst asm("r1") = eval_dst;      \
+    register u32 r_ctl asm("r2") = eval_ctl;      \
+    asm volatile("stmia %0!, {%1, %2, %3}" : "+l" (dmaRegs) : "l" (r_src), "l" (r_dst), "l" (r_ctl) : "memory");  \
 }

Mixer Config

Lastly, I would like to briefly explain the two options at the top of the mixer code, namely ENABLE_REVERB and ENABLE_DMA.

ENABLE_REVERB does exactly what it sounds like; if set to 1, reverb will be applied as normal, sounding just like the original game. If set to 0, all reverb processing will be skipped, and the game will have no reverb. This is completely up to personal choice, but if you just want it to sound like the original game, leave it on.

ENABLE_DMA, when set to 1, has the code use DMA3 for transferring data rather than doing a standard CPU copy. This results in faster and smaller code, but it can lead to the DMA being used for so long that it halts the CPU for an extra hblank cycle. This almost certainly will not affect you, and the results are not noticeable. If you do not know what exactly this means, just leave it on as-is. However, if you know that your use case does not allow DMA, you can set this to 0 to make the mixer do a CPU copy instead; note that this will make the code slightly larger (the code buffer size is set up for the worst-case scenario, so do not worry about that) and slightly slower (but the speed difference should not be noticeable).