Solving two RX5 mysteries
2025-12-21
In my last blog post I wrote about how I made my own software to program GliGli's RX5USB memory cartridge for the Yamaha RX5 drum machine. There were still some open problems and in this post I will report on my progress addressing those.
Mystery 1: reversing audio playback sometimes did not work
The RX5 has a function to play back sounds in reverse. It's a bit of a gimmick but it sounds cool. Once I got my own software to the point where I could put my own samples on the RX5USB and play them back on the RX5, I was disappointed to discover that in reverse mode, many sounds came out as digital garbage noise. Why was this happening?
I now know this was because I was incorrectly writing the sample end points into the ROM header in the case of 12-bit sample audio.
Digital audio data is a stream of numbers. Confusingly both the stream as a whole and the individual numbers get referred to as "samples". To avoid confusion, I will use the alternative term "word" for the individual numbers in the audio stream.
The RX5 supports two word sizes ("bit depths"): 8-bit and 12-bit. The 12-bit format is awkward because the memory of the RX5 is organized into 8-bit bytes. If you store 12 bits in 2 bytes (16 bits) then loading the data becomes easy but you waste 4 bits of memory. If you don't want to waste those 4 bits, you end up storing 2 12-bit words across 3 bytes but this makes loading words awkward. On the RX5 a pair of 12-bit words, say 0x123 0x456, is stored in memory as 0x63 0x12 0x45.
If the RX5 has to jump to a specific word in a 12-bit sample then it needs to know if it is landing on an odd or an even word. In principle it could figure it out by dividing the offset from the beginning by 3 and looking at the remainder but apparently that is not what the RX5 does. Instead, the ROM header has an extra flag bit set on sample addresses to distinguish even and odd word addresses.
What does this have to do with reversing audio playback? In order to play back in reverse, the RX5 has to jump to the end of the sample. My reversed audio bug was happening for 12-bit samples with an even number of words: these need the flag, but I was never setting it. Only odd length samples would play back correctly in reverse. The even length samples were getting garbled.
I came across this extra bit flag while studying the original Yamaha ROMs, trying to get loop points working for custom samples. Once I noticed the flag I wanted to know what it was for and I eventually found the pattern. When I fixed my code that sets the end address on a custom sample to include the bit flag, that turned out to fix the reverse playback problem.
I still haven't found a good way to set loop points but now that the end point is set correctly, you can at least loop whole samples with loop 1 in rx5-build.
Mystery 2: the "unknown" header field
The hard work of reverse engineering the RX5 ROM format was done by GliGli but he left us a few challenges. This "even address flag" was one of them; I don't see it anywhere in the original RX5USB software. But a more obvious mystery is the "unknown" header field. GliGli figured out what all the bytes in the RX5 ROM header mean, except one which he labeled as "unknown".
Looking at the WRC01 Yamaha ROM, the "unknown" field looks somehow correlated with the envelope generator parameters of the voices:
| WRC01 A | WRC01 B | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Name | AR | D1R | D1L | D2R | Unk | Name | AR | D1R | D1L | D2R | Unk | |
| SD 3 | 99 | 24 | 59 | 42 | 31 | Gun | 99 | 7 | 59 | 30 | 13 | |
| BD 3 | 99 | 39 | 59 | 53 | 29 | FMprc1 | 99 | 28 | 58 | 33 | 23 | |
| CgaHMT | 99 | 55 | 51 | 47 | 35 | FMprc2 | 99 | 34 | 58 | 36 | 24 | |
| CgaHOP | 99 | 34 | 59 | 42 | 30 | FMprc3 | 99 | 33 | 58 | 38 | 25 | |
| CgaLO | 99 | 29 | 59 | 34 | 28 | EbassH | 99 | 43 | 55 | 36 | 23 | |
| BgoHI | 99 | 47 | 1 | 99 | 35 | EbassL | 99 | 12 | 59 | 41 | 23 | |
| BgoLO | 99 | 46 | 52 | 41 | 31 | DXorch | 99 | 12 | 59 | 38 | 21 | |
| TimblH | 99 | 37 | 49 | 37 | 23 | DXmrmb | 99 | 37 | 47 | 39 | 25 | |
| TimblL | 99 | 37 | 45 | 37 | 23 | DXclav | 99 | 16 | 59 | 44 | 25 | |
| AgoHI | 99 | 39 | 45 | 36 | 20 | Hey | 45 | 2 | 59 | 99 | 99 | |
| AgoLO | 99 | 34 | 55 | 37 | 20 | Wao | 45 | 2 | 59 | 99 | 99 | |
| Cuica | 99 | 21 | 59 | 52 | 39 | Ooo | 70 | 2 | 59 | 99 | 99 | |
| Cstnt | 99 | 64 | 55 | 51 | 31 | |||||||
| Whstl | 99 | 6 | 59 | 60 | 24 | |||||||
| Timpn | 99 | 16 | 59 | 30 | 13 | |||||||
| GlsCsh | 99 | 35 | 1 | 99 | 25 | |||||||
But what do these envelope parameters mean, and what is the envelope generator for?
The RX5 envelope generator
The RX5 is able to adjust the volume of samples it plays back on the fly using a volume envelope generator. The user can change the envelope parameters for creative effect but the main reason it's there is because of dynamic range and storage efficiency.
Nowadays sample-playback instruments may contain gigabytes of audio data but an RX5 ROM cartridge can hold only 254KB of audio data (plus 2KB of metadata). The smaller the word size the more audio you can squeeze into that memory. But with low word sizes you get a reduced dynamic range in your digital audio. As a workaround the RX5 uses samples that have been compressed to play back at a constant volume, resulting in unnatural sounds with a good signal-to-noise ratio. On playback, the RX5 uses its volume envelope generator to restore a natural volume profile.
As an example, consider the timpani sound from WRC01.
It looks and sounds different when played back by the machine:
The envelope generator takes the unnatural, constant loudness sound in ROM and applies a natural sounding fade-out effect to it.
Now we know what the envelope generator is for. How does it work? The manual says:

Each sound starts at volume level 0 and rises to full volume at the "attack rate". Then it decays to an intermediate "decay 1 level" at the "decay 1 rate". After that it decays to zero at the "decay 2 rate". If the gate time runs out at some point during the envelope it decays to zero with the "release rate" but in practice this never happens: on the RX5, all the default sounds set the gate time to 6.5s and the release rate to 60, which are the maximum values. The decay 2 stage will reach 0 before the gate time has elapsed.
The reverse playback envelope
What happens during reverse playback? The RX5 manual says the machine will play the audio in reverse, shaped by the normal volume envelope in reverse. On a hunch I decided to see if this is true and surprise surprise, it's not!
I recorded a number of sounds from the WRC01 cartridge in their normal forward form, and reversed by the RX5. Then in post processing I reversed the reverse playback sounds a second time, so that they run forward again. Here is the double reversed sound next to the original.
Above: normal playback, below: RX5 reversed playback, reversed a second time in Reaper.
This shows that reverse playback uses a different envelope from normal playback. Could the "unknown" ROM parameter have to do with this reverse envelope?
I made a ROM image with 4 times the same 1 second sine wave tone in it. The tone has constant volume. The 4 instances have different "unknown" values: 10, 20, 30 and 40. This is what they looked like when played in reverse on the RX5:
Four 1-second sine tones with "unknown" set to 10, 20, 30 and 40 respectively.
Instead of a constant tone, like the raw audio we put in the ROM, we get a swell. The higher the "unknown" parameter, the faster the swell. This looks like a rate parameter!
The "unknown" field is the attack rate of the reverse playback envelope.
GliGli's RX5USB software hard-codes the reverse envelope attack rate to 99, the maximum, which makes reversed sounds play back without a swell. This is a good default for custom samples because they usually don't have the heavy compression of the tail of the sound you see in the original Yamaha samples.
Conclusion
Sometimes being stubborn and pedantic pays off. I think I'm running out of RX5 mysteries. See you next time!