Expansion Evolution – Something Nerdy Studios

The Former Dawn project is largely about pushing the limits of game development on the NES, and thus far I’ve focused on the video aspects. But I’d be remiss if I didn’t explain how our audio components of MXM-1 (and therefore Former Dawn) came to be.

Click here to skip to the TL;DR. What follows is the detailed account.

Prehistory

The much vaunted, most advanced memory mapper before ours. Its expansion audio left much to be desired.

We spent most of 2019 getting our feet wet with the NES’s hardware and 6502 assembly. At the time, despite the fact that I already wanted a new and advanced memory mapper, we did not have one. So for a while we vacillated between MMC3 and MMC5 as our best option from the established ones. As some of you may know, MMC5 was used in many North American NES localizations of Famicom (read: Japanese) games — most notably Castlevania III. Unsurprisingly, it actually has expansion audio capabilities that saw commercial use in Japan but not elsewhere. In particular it offers two additional square wave channels and one extra PCM channel. (Its PCM channel has to be spoon-fed by the CPU, which is death to a high performance game engine.) The first NES cartridge manufacturer that we reached out to told us that they could produce MMC3 based cartridges easily, but that MMC5 was more questionable. Therefore we thought we were limited to MMC3, which meant no expansion audio at all and (by our judgment) extremely limited ROM space. So we never really entertained the possibility of using MMC5’s infamously bad 8-bit PCM feature, which is a good thing. But even with MMC3, I was willing to dedicate a larger amount of the ROM to DPCM samples than was typically done in NES game dev in the original commercial era.

As amazing as this is, it would’ve been overkill *and* far too expensive to put into an NES expansion system around the year 1990. Even most PC owners couldn’t afford it at the time.

Around the same time, we were figuring out ways we could innovate in software alone, and that included an attempt by Dominic to simulate Roland’s LA synth (“Linear Arithmetic Synthesis”) technology that debuted in their D-50 keyboard but later became the basis for the MT-32 and related MIDI modules. See here for an excellent explanation and showcase.

He created a proof-of-concept demo ROM and it was intriguing to say the least, but it wasn’t long before we teamed up with INL and it became clear that we were going to be able to craft our own memory mapper for the NES after all. But expansion audio still wasn’t something we wanted to deal with because of how daunting the task seemed and our relative lack of experience in it.

Around this time, I contacted a veteran demo coder I had met on the aforementioned VOGONS forum; specifically, I asked him what he thought about the possibility of implementing a MOD player on the NES. It’s no secret that despite my love for the NES, my greatest influences in the audio department come from two principle sources: the SNES, and the MOD tracker community which was largely associated with the demoscene of the late 80s and early 90s. He replied and told me that something like a MOD player had been implemented on the NES using native PCM, but it (like MMC5’s PCM channel) took so much CPU time that it wasn’t really possible to create a good game that uses it. It’s for demo purposes only. Therefore, he gave me referrals for two well seasoned chiptuners who know how to wrangle the native 2A03 channels of the NES.

I reached out to both of them and they both ended up involved in the project in some capacity; more on that later.

QuadDPCM Era

Up to this point in the project I had been far stricter in my philosophy concerning what was legitimate to put into our memory mapper and what wasn’t. I wanted to absolutely minimize anything that could be regarded as “computation” in the cartridge, and expansion audio seemed like the closest thing on the NES to that; a vaguely related precursor to the general purpose processors that were used as video coprocessors in SNES cartridges in the mid 90s. Because of that, there didn’t seem to be much of anything we could do to improve the audio directly. (Or video, for that matter.)

Runtime Mixing Period

A graphical representation of the L² quadDPCM LUT. This won’t help you understand the math, but it looks cool.

Then one day, Dominic had the idea of doing what had never been done on the NES before: virtual DPCM channels that are mixed in software. It was a very clever approach: the basic idea is to construct a lookup table (“LUT”) that takes in two bytes as input and produces a single byte of output. The inputs are bit-packed bytes of DPCM sample data, and the output is as well. By precomputing this LUT and storing it in the ROM, we could mix two channels at a time. The number of channels that can be mixed this way is theoretically unbounded, but in practice the noise floor becomes so bad after 3 mixings that quad channel was about what we could hope for in a sound track that we were willing to put in our game. [Math jargon: Dominic used what I recognized as a distance minimization algorithm which embeds the vector space F₂⁸ (representing DPCM) in R⁸ (representing PCM) and uses the induced metric from the L¹ norm on the vector space R⁸. I suggested using the L² norm instead, which did alter the LUT but we’ve never been sure if it really improved the quality of the results.] The ROM footprint for the LUT was 32KiB, which is the same size as the entire PRG-ROM for Super Mario Bros.. Most developers in the NESdev community would consider this an excessive expenditure, but by this point we were already beginning to develop our own memory mapper and it was very clear to us that 32KiB was not only something we could accommodate, but a price well worth paying.

Technique in hand, I had Dominic download and modify the C++ source code for FamiTracker to support this pseudo-expansion audio system. He was quite successful despite the difficulty of working with that codebase, and we now had an internal tool for multichip compositions using the native 2A03 (non-DMC) audio channels alongside what we called, by then, quadDPCM. A big part of the reason why we considered quadDPCM viable is that we were willing to have all audio samples for it to use the NES’s maximum sample frequency of 33,144Hz. This was done in the classical NES era(famously the bit-reversed Double Dribble intro voice sample), but never universally employed within a game. Samples back then were always judged in terms of their quality at different playback rates and then the lowest acceptable rate chosen in order to reduce ROM footprint. Since we could afford it, we went for the max. But one of the consequences of doing this runtime based mixing was that individual virtual DPCM channels could not have “virtual playback frequencies”; they were all locked to the same one. This implied that every pitch-shifted version of a DPCM sample had to be created at build time and baked into the ROM so that the music engine could call upon whatever it needed on demand, never varying the overall playback frequency of 33,144Hz.

These were a few of the very first experiments that composers on our team came up with using gen1 quadDPCM:

“Inversion Battle”

“Coping ATM”

“Walk With Me”

“Bass1”

Build Time Mixing Period

One of the those two composers took our quadDPCM-enhanced FamiTracker and created several excellent demo pieces with it. This generated a lot of excitement for the project and helped propel us further into the engineering effort. Although the noise floor was raised by the virtual DPCM channel mixing, it wasn’t unacceptably bad…but I did find a way to improve it. Specifically, I came up with the idea of giving FamiTracker full PCM instead of DPCM for the instruments, then taking the track data generated by FamiTracker and using it to identify the sections of PCM that would become individual DPCM streams; they were then mixed together ahead of time as PCM, crunched down to DPCM, and stored in the ROM for playback on the NES. This had the effect of lowering the noise floor significantly but at the expense of more ROM space, which we could by that point afford. (The entire OST was slated to use no more than ~30MiB of DPCM data, which is well within what we can support on the cartridge.) I also implemented mapper code in Verilog to support DPCM samples up to 16MiB in length.

I then had Dominic modify FamiTracker again in order to support the new approach, but this time I chose Dn-FamiTracker because it is being actively maintained and is more popular than other versions.

Here a few demo tracks exhibiting the improvement in clarity over the runtime mixing version:

“Holy Crap Dn”

“Hella Beats”

Synth2 2G Demo

“Come Into My Garden”

In parallel to all this, I was reconsidering just how strict I should be in my mapper philosophy; I ended up concluding that helping enhance the video directly instead of indirectly was acceptable, as long as A) the circuit complexity was kept low enough to be economically viable in 1990 B) graphics were not generated on the cartridge but merely made available to the PPU based solely on what was stored in the CHR-ROM(or CHR-RAM, if we could manage to manipulate or create it there). Because of this relaxation of the engineering boundaries, I began to consider bona fide expansion audio after all.

Sometime in the near future, we may release the both our modified FamiTracker and the technical details of this for those who wish to employ it in homebrew NES games to achieve better music without expansion audio. For now, here’s an example of what we were able to produce with it:

FM Synth Era

Yamaha YM2610 Period

So I approached the senior composer on the project and asked him what kind of expansion audio on the NES he would want if he were given carte blanche. He immediately suggested that we abandon all of this and use the Yamaha YM2610 audio chip which was used in the Neo·Geo MVS arcade machines (and their home console equivalent called the AES). The YM2610 sports an insane amount of fancy tech, especially for the early 90s when it debuted. It essentially has the FM core from the Sega Genesis and the PSG from the MSX & Apple // with 7 channels of ADPCM bolted on. This seemed like overkill to me, but I wanted him to have whatever he needed to make the OST for Former Dawn really shine. It also passed the period-correctness test, since the YM2610 was in consumer electronics by 1990(technically by 1987 if you count arcade machines).

The Yamaha YM2610. Paul from INL once referred to these as “beast mode”, and he’s not wrong.

I proceeded to order a 50X batch of real, physical YM2610 chips from a parts supplier in China. I presume they were pulled from old MVS arcade boards. We began an investigation into the hardware engineering side of this sub-project; this time around, it would not be a mere re-flashing of an FPGA or beefier chip that’s pin-compatible with the existing PCB. This would mean either a daughter board attached to the main board, or a complete re-design of the main board. Our cartridge manufacturer was not going to do the latter, so we tentatively settled on the former. However, I knew that that part of the engineering could wait because it was really only necessary for the production of the final game’s cartridges. For our dev environment, we could and should only consider what can be done in the FPGA with additional Verilog code. (And in fact, at this point I was considering having 2 if not 3 editions of the Former Dawn cartridge, based on whether or not the FPGA was handling the expansion audio or if a genuine YM2610 was present in the cartridge, and whether or not to use the EPSM for the FM part.)

As luck would have it, I was able to get special licensing for a pre-existing Verilog implementation of the YM2610; “all” that we had to do was adapt it to the Cyclone IV FPGA inside the EverDrive N8 Pro and then integrate that with the rest of our mapper design and game engine. The first part of that was a wild success; Josué was able to get it running beautifully on the EverDrive N8 Pro and for the first time, we heard genuinely amazing music pouring from our frontloader NESes. The second part was a bit of a chicken-and-egg problem, since we didn’t have a music playback engine created yet for Former Dawn or a convenient way to create tracks for the engine that would have to be created. For a while, I investigated the various possible solutions:

Since the FM core of the YM2610 is very close to that of the YMF288(/YM2608), use FamiTracker for the 2A03 part of the soundtrack and BambooTracker for the FM part. Whether the FM would play from an EPSM, our FPGA, or a real YM2610 was a hardware choice that would have software implications. Since the YMF288 doesn’t have ADPCM channels, they could be hardware-emulated in our FPGA on an FPGA-only cartridge, or they could play directly from the YM2610 for the edition of our cartridge with a genuine Yamaha chip inside. This would’ve required synchronization between the two trackers for the composer to have a ghost of a chance of getting into a flow state, so I proposed hacking in IPC features into FamiTracker and/or BambooTracker.
Modify BambooTracker to include multi-chip functionality so that the 2A03 channels could be tracked directly, getting rid of the IPC problem. After speaking to the lead maintainer of BambooTracker, it became clear that this was not a very viable option because of the messiness of that codebase.
Hack FamiTracker yet again to add support for the YM2610. We’d already encountered pushback on this topic from people who maintain FamiTracker because they regard what we’re doing as “fantasy mapper” territory; a bare minimum requirement for them is that a commercial NES game already exists that uses this multi-chip configuration; yet another chicken-and-egg problem. So this would be a pretty substantial source code modification done by us or someone we commissioned to do the work. Didn’t go very far down this road before rejecting it. (Especially because it would’ve resulted in yet another unmaintained public fork of FamiTracker.)

And then, a miracle happened: FurnaceTracker came out. Word travels fast in the retro space, so I was abreast of its existence very quickly and was delighted to discover that it natively supported everything we’d need; our tracker problem was solved as long as our composer was willing to work with this new tool. He was! So now we moved on to what our game engine and hardware would need to look like.

“Hell Hath No Women”

“Town_2 Theme (YM2610)”

“Square Boi” (YM2610 PSG experiment)

I’ve been adamant about the fact that I do not want a general purpose coprocessor on the Former Dawn cartridge, at least not in such a way that our design requires it. (I.e. although I’ll grumble about it, I’ll accept something like INL using an STM32 chip to communicate with an SD card reader because it eases implementation, but not if that chip is available to the NES’s CPU for general assistance.) The question “How are we going to feed this thing?” had been hanging over our heads for a while on this project; “this thing” here means the YM2610. It is a very complex chip, and feeding the data it needs was going to be a serious problem to overcome. We knew that in the case of the Sega Genesis and the Neo·Geo, a Z80 CPU was included on the motherboard and used as a coprocessor to handle the I/O involved in music and sound effects playback. As it turns out, even the SNES has something similar going on although it doesn’t seem to be as widely recognized as such. The audio subsystem of the SNES has two principle chips: the S-DSP and the SPC700. (People seem to misattribute things that the S-DSP is doing to the SPC700 in the retro community, a bit like confusing “HDMA” with “Mode 7”.) But the SPC700 is essentially yet another processor with a 6502 core whose function has been dedicated to audio I/O handling; the S-DSP actually renders the sound.

Without a coprocessor, the only way that Josué had been able to feed it the register data that it needed was to simply stream the instructions in; he had only considered the bandwidth restrictions we have in place, not the way that we are forced to conform to them because of LARPing. Yes, it is true that in the MS-DOS era, CD-ROM based games would often come in a hybrid format where part of the disc was filesystem data and the rest was redbook audio ─ but this way of listening to the soundtrack was only viable on those games because the games (almost) always installed the bulk of the assets to the hard drive and there was enough RAM on the system to load the entire level/area. This freed up the CD-ROM drive to play music; we have no such luxury with Former Dawn. It’s bad enough that I’m designing this game around the idea of what is essentially an alternate timeline in which the upgrade to the NES was done as a bolt on instead of a separate console, but it’s beyond the pale to imagine a hard drive and/or several megabytes of extra RAM being included with it. These were the very reasons that so many computers had been so expensive compared to home video game consoles, all the way up until the original Xbox’s debut in 2001.

After it became apparent that that solution was untenable, there remained only 1 possibility: feeding the YM2610’s registers directly from the NES’s CPU. Although this would make the amount of data and the bandwidth manageable and period-plausible, it didn’t take long to prove that this was completely untenable because of how much CPU time it would take every frame. So at long last, the YM2610 period was over. Coincidentally, the guy who was slated to be our main composer quit the project around this same time, so we were free again to consider other options without upsetting anyone involved.

Other Yamaha FM Synth Chip Period

The Yamaha YM2414 FM synth chip. As you can see, it is not as beastly as the YM2610.

For a while, I wasn’t quite ready to abandon the FM synth idea for the expansion audio, so I had one of the other potential composers on the project take on the role of helping me figure out what off-the-shelf solutions might be acceptable. This was partly done by having him take a song he’d already composed purely for the 2A03 and enhance it with FM channels but not any kind of PCM. The following line-up of FM competitors (all from Yamaha) was considered: YM2414, YMF288, YMF262/YMF289, YM3812, and DX7.

“Town_2 Theme (YM2414)”

Within this same period, I also asked to hear proofs-of-concept for the PAULA chip from the Amiga A500(which MOD is based on) and the SID chip from the Commodore 64. (I looked sideways at the Namco N163 wavetable synth and a few other things, but never bothered to have a PoC delivered based on them.)

“Town_2 Theme” (Amiga PAULA/MOD experiment)

“Beep Boop” (Commodore 64 / SID experiment)

None of the FM options struck my fancy, and I finally realized why. They just sound out of place on a Nintendo console. Although I strongly associate FM synth with video games from the 80s and 90s, none of the systems they were used in, arcade or home console, were made by Nintendo. Here’s the list of systems I definitely remember FM synth being used in during those decades:

Sega Master System
Sega Genesis/Megadrive
Neo·Geo
Adlib and Sound Blaster cards for MS-DOS and Windows games
Miscellaneous arcade systems

In contrast, here’s the list of home consoles created by Nintendo that were contemporaneous with the above:

NES/Famicom
SNES/Super Famicom
Nintendo 64
GameCube

Not one of these sports an FM synth chip. Some people have pointed out that the SNES’s audio system can support FM synthesis, technically…but it is an extreme technicality, and misses the point. The sound of that system is nothing like the sound of a Sega Genesis, and it’s because it lacks the kind of FM synthesis that was, for better or worse, almost entirely engineered by Yamaha. There are cases of FM instruments being sampled and used as the basis for instruments in SNES games, but the general sound of SNES games is fundamentally different. The only example that I am aware of that is any kind of counterexample is the game Lagrange Point for the Famicom. It did indeed use the Konami VRC7 chip which included a stripped down Yamaha OPLL core. But it is the only known game, of any kind, playable on a Nintendo console that has FM synth. It was also something in the cartridge, not the console.

So I concluded that it was against the soul of a Nintendo game to use FM synth at all, and thus I returned to my original vision: something like MOD, but customized for the NES context.

Custom ADPCM Era

A high level overview of ADPCM compression; it does not really show the difference between plain DPCM and Adaptive DPCM.

If you’re going to make a MOD player for a primitive console, one of the biggest problems is the tension between ROM space and audio quality. If you set your bit-depth too low, you get an unacceptable noise floor. If you set it too high, you blow out your available memory. ADPCM is a clever compromise, and one that has historical precedent on various arcade and home console systems. Nintendo used a form of it called BRR for compressed audio samples that can fit inside the paltry 64KiB of RAM available to the SPC700 / S-DSP combo. They used another form of it in the Nintendo 64 and yet another form of it in the GameCube. So now we were getting somewhere authentic. Pushing it back 1 generation into the NES makes a certain kind of sense, especially because it is used in the TurboGrafx-16 which is what MXM-1 on the NES is inspired by in the first place.

But I wanted this to remain an 8-bit system; BRR uses 16-bit ADPCM, so I considered it inappropriate. What might 8-bit ADPCM sound like? It seemed like there were some pre-existing 8-bit ADPCM “standards” out there, but technical specs were not forthcoming. Dominic and I embarked on a brief journey of original research into a bespoke ADPCM compression system that our mapper could handle and would still meet my quality requirements.

I came up with two broad forms of it: 1) “flattened” ADPCM with the bit-depth specified for the entire sample (either 1-bit, 2-bit, or 4-bit step sizes) and 2) “multi-adaptive” (hence “MADPCM”) that not only allowed individual headered chunks in the audio stream to specify the step size’s bit-depth for just that one chunk, but for the number of chunks to hold that configuration. Both approaches had merit, and sometimes the quality drop from true PCM was barely noticeable. I created my own ADPCM compressor tool in Java to experiment with these approaches and we even tweeted our results (one time) when we were confident that we had something that was workable. But in the end, the quality was too hard to nail down in a general way. I didn’t want my composers to have to struggle with every single sample to get the quality to an acceptable level. We only achieved a compression ratio of about 1.4X over raw PCM when the quality wasn’t unacceptably compromised.

But I did notice that the compression afforded by this custom ADPCM was tantalizing for a specific purpose within Former Dawn ─ audio to accompany FMV. Every little bit of data compression counts in that case. The video will probably have to be compressed with something like LZ77, which means that the decompression simply can’t happen on the NES’s CPU. So the video compression will have to be implemented in MXM-1, freeing up the NES’s CPU almost entirely for audio decompression. It is also the case that LZ77 compression, despite working fairly well with the CHR graphics format of the NES, does not work well with audio samples. A back-of-the-envelope calculation revealed that ADPCM decompression could be done on the NES’s CPU, just barely, and meet my quality standard for FMV. All of this combined with the fact that audio is often a little scratchy in FMV of the late 80s / early 90s made me decide to relegate ADPCM for this purpose. So we moved on to raw PCM as the basis for MXM-1 expansion audio to be used in-game outside of FMV.

“Pluck (MADPCM_max4bit”

Try to ignore the pops and clicks in this last example(they’re merely bugs in some old code, but I don’t have a bug-free version). This is the Kzer-Za theme from my favorite video game: Star Control II: The Ur-Quan Masters, re-rendered using MADPCM samples instead of straight PCM:

“Kzer-Za Theme (MADPCM)”

QuadPCM Era

With all of that history behind us, it was time to re-examine raw 8-bit PCM for plausibility. This seemed like the best way to achieve my dream of enhancing the native 2A03 audio with something like a MOD player bolt-on ─ a kind of simultaneous homage to the NES, MOD, S3M, and the SNES. There were six main questions that emerged:

Q1 – Could we afford the space needed for PCM?

Q2 – Should the samples be stored in ROM or RAM?

Q3 – What kind of interpolation should we use? (Using FFTs seemed right out, so variable pitching with interpolation seemed the only way.)

Q4 – How many channels of this stuff could we afford, given our LARPing constraints?

Q5 – What should the cap on sample frequency be?

Related to Q5 was

Q6 – Could we support toploader NESes?

…and there was one side question, given the SNES influence:

Q7 – Would an echo buffer like on the SNES be possible within our 8-bit constraints? Would it sound good enough to justify?

To investigate the first question, I looked more closely at the MOD standard and ensured that the samples used were 8-bit mono PCM (they are) and that there’s no data compression employed (there isn’t). Given that many amazing MOD files exist that are measured in a few dozen KiB, I concluded that we could store all of the samples for a decent OST in ROM, use no RAM, and move on with life. Dedicating 1MiB to the sound font for Former Dawn’s soundtrack is well within my philosophy. That much space should afford us, conservatively, somewhere around 50 to 200 audio samples to use as the basis for instruments in the soundtrack. Since the SNES’s audio system is the closest pre-existing one to what we’re engineering, I looked at the sample suite employed in a few of my favorite SNES games and concluded that we were in the right ballpark.

But whether or not there should be a global sound font stored entirely in ROM, or a partially global sound font which is stored mostly in ROM but partly in CD-ROM and loaded per-track into RAM…that remains an open question. We will be OK either way, so which way we end up will emerge as a consequence of the soundtrack being developed over the course of the rest of the project and the memory model evolving to fit the game’s design and implementation.

The *correct* filtering kernel to use. Essentially, the engineers at Sony who created the audio subsystem for the SNES made the “mistake” of only using the middle lobe, which strips all of the harmonics that you get from the small lobes emanating off to the sides.

The interpolation question was far more interesting. Dominic and I had recently watched an excellent video explaining (among other things) how the SNES re-pitches samples at runtime. The gist is that it uses what is called a “gaussian filter”, which, although enjoying some nice mathematical properties, results in the extremely muffled sound that is characteristic of the SNES. We can only speculate, but it seems like they chose what is essentially a fancy low-pass filter because they wanted to be able to get away with BRR compression without noticeable high frequency artifacts in the output. I wanted to investigate the right way to do it; to learn from the SNES’s mistakes. So we delved into sinc function interpolation, again with some experimental tooling written in Java. I decided that dedicating 1KiB of ROM space would be reasonable and set the effective LUT size to 2048 entries, taking advantage of symmetry. Unlike in the SNES’s case, though, my LUT has 8-bit signed values instead of 11-bit unsigned values. In some sense it’s really only 7½ bit because we can’t easily take full advantage of that last ½ bit of numerical range without greatly complicating our hardware design as synthesized on the FPGA. Intuitively, I set the number of lobes of the sinc function that should be included to 7; the results were absolutely amazing, except that we had a “ringing” problem that was audible to the human ear and easily detectable with the frequency analyzer built into Audacity. So Dominic slapped a Hann window on the LUT, the ringing went away, and we had our filter.

There were 5 concerns that underpinned my decision to go with 8 channels of PCM:

Could the NES drive that many channels without coprocessing?
Could the kind of RAM that was economically available in 1990 (100ns DRAM) service enough reads to make the sinc function interpolation and re-pitching possible without data corruption, stability problems, etc.?
Would we be able to create a mixer and output stage in the mapper that would actually sound good with that many channels?
I wanted the 2A03 synth channels to meaningfully participate in the soundtrack, but never have the sound effects cut out the music or vice versa. Would 8 suffice?
Was going to 8 channel PCM on the cartridge going to make the circuit complexity exceed what was plausible in 1990? Would our dev cartridge be able to handle it? Would our production cartridge?

As it turns out, the fact that we accept the NES’s CPU as the main performance bottleneck was a boon in this case. If an audio system is based on PCM sample playback and the samples are “long” (typically lasting dozens of NTSC frames), it means that the CPU doesn’t have to update any particular PCM channel very often, freeing it up for video and game state related tasks. Contrast this with the case of PSG or FM synthesis, which often have updates going to the channel registers every single frame, especially in order to achieve richer timbres. The only aspect of a PCM channel that has to be updated routinely in the middle of playback is a loopback instruction, but we decided to facilitate that with a header that the mapper can store in its own memory space. Effectively, the end result is a fairly efficient, simple, high performance wavetable synth.

On the RAM performance question, the relevant factors are maximum playback frequency, size of each data fetch, and granularity of the convolution with the filtering kernel. If this system had been created in 1990, the LUT would’ve probably been internal to the ASIC of the wavetable synth chip, and it’s very clear that access speeds would be no problem with that kind of implementation. The samples, too, might’ve been stored on separate ROM chips on a cartridge or even included in the expansion system — imposing a truly “global” sound font across many games. There have been wavetable synths released into the consumer electronics market that employ this kind of design. For us specifically, the LARPing constraints aren’t really the relevant ones — we have to deal with the realities of our dev cartridges and the eventual conversion to the production ones. With 1 byte per fetch, 7 LUT fetches per re-pitched sample, and a small ring buffer for the fetches…we were able to get it working perfectly on the EverDrive N8 Pro which is what we use as our dev cartridge.

Having a 16-bit resistor ladder DAC on the production cartridge would be nice, but it’s hard to justify because it would complicate the hardware design and we have a high enough frequency oscillator (100MHz) that a PWM DAC can be easily constructed. Similarly, on the EverDrive N8 Pro, the DAC that krikzz supplies with the out-of-the-box mapper support (SUNSOFT 5b, VRC6, etc.) is a PWM DAC, and we have followed suit with our own. (Technically, like krikzz, we have implemented a variation on the PWM DAC called a Delta-sigma DAC.) What feeds the DAC is the mixer output, and there are eight channels of 8-bit audio. This results in a theoretical lossless bit-depth of 14-bit, not 16-bit. But in practice, the lower 3 bits can be dropped entirely without audible quality loss. The only scenario in which that 14-bit level is reached is when the output is at max volume, which means that a human can’t hear the least significant bits. (Imagine trying to hear a mouse squeak when you’re standing next to a jet engine.) So our mixer & DAC combination consumes 64 bits of digital input and produces 11 bits of analog output to the NES’s audio out line. (Much thanks to Zeta for hammering out the fine technical details when he implemented the mixer in Verilog!) We think that the resulting quality is excellent — probably the best 8-bit sampled audio system ever created, and we hope you agree.

Speaking of 8-bit audio, it is very important to me that all of this constitutes an expansion system. I never wanted a whole-hog replacement of the NES’s audio. Therefore, I require that the composers on the project do their best to fully utilize the non-DMC audio synth channels present in the 2A03 alongside the PCM channels. The two systems should complement each other, not vie for centrality. At the same time, I want the sound effects of Former Dawn to be rich and distinctive, so I wanted to use the PCM channels for SFX and dedicate the 2A03 synth channels to the overall polyphony and texture of the music. Having eight total PCM channels allows us to dedicate four of them to SFX and four of them to music, and never the twain shall meet. On the SFX side, I reserve one PCM channel to an “ambient” or “mood” channel for each area of the game ─ something like wind blowing or water flowing. That leaves three PCM channels open for business: attacks, footsteps, special abilities, UI sounds, etc. It should be plenty for the kind of game I’m trying to create.

The circuit complexity question is one that is constantly on our minds, and we think we’ve done a great job of keeping it under control. Our system is very simplified compared to a lot of systems in the late 80s / early 90s (see: YM2610) but it also offers a lot of expressive power. The balance struck seems to be the right one for this project. Also, we do consider the circuit complexity of MXM-1’s audio subsystem separately from the memory mapping per se, so even though we have now exceeded MMC5’s circuit complexity, it’s in a way that does not violate our principles. Although it could’ve been put on individual cartridges, this part of the system seems more appropriate to embed into an expansion module that would fit underneath the frontloader NES along with the CD-ROM drive. What would’ve been on the cartridges for such a system would’ve been the fast-access ROM needed for the game’s program code, and probably the core of game’s graphics that need to be globally accessible on demand. In reality, we have something that fits well within the constraints of the EverDrive N8 Pro and our production boards, so the project’s music and SFX development is now full steam ahead!

Going back to Q5: the answer is 44.1KHz. This is the standard that was established by redbook audio used on CDs, and we’ve decided to adopt it for similar reasons that they did. Because of the Nyquist theorem and the normal range of human hearing, it’s more than enough and also matches the highest quality audio samples that we will take off the shelf or record ourselves. It also makes it possible to turn the NES into a CD player in a weird sense; CD quality sample rate with 8-bit samples, and also mono. It does seem like the kind of thing that Nintendo might’ve done if they had gone this route in 1990; i.e. upgrading the NES instead of replacing it completely. In reality, we rarely if ever use a source sample that has that high of a sampling frequency, but having a cap that high allows us to re-pitch samples up and down, achieving a sort of “meta compression” in our sound font. A single sample can be used to span many octaves and still sound good, which is a rare feat on these kinds of audio systems.

In the early stages of developing this audio expansion system, we investigated the possibility of supporting the toploader NES(model NES-101) as well as the frontloader and Famicom. Because the toploader was an economy model, it lacked that mysterious expansion port on the bottom of the unit. In order to get any enhanced audio out of that system, therefore, we would have to manually update the $4011 DMC register hundreds of times per frame during normal gameplay. Although there are ways to achieve this (see David Crane’s brilliant historical example on the Atari 2600), it would definitely impose a very high development cost on us as well as an unacceptable performance hit if we were to do it the full 734 times per frame required to achieve the max playback frequency of 44.1KHz. The NES’s available CPU cycles are its most limited resource, and we’re simply unwilling to take that high of a performance hit because of the needed richness in the gameplay.

This is the type of “interpolation” that one gets when pushing PCM audio out of a toploader NES; it is sometimes referred to as “zero order hold”, and it results in **terrible** audio quality.

As a possible compromise, we considered making it an option for toploader owners to drop the audio’s playback frequency to 50% or even 25% and thereby recover the CPU cycles needed for smooth gameplay. Thus we went ahead and did testing on a toploader using the techniques that would be required and discovered to our dismay that there is an unacceptable, unavoidable non-white-noise increase on top of the mere reduction in clarity. This is because of the way that manually updating the $4011 register results in a waveform change on the system. Once the DMC’s output is manually set to a specific level in its 7-bit range, the 2A03 holds the value there instead of letting it naturally fluctuate. This amounts to one of the worst kinds of interpolation one ever encounters in practice ─ a kind of weird ringing tinge to everything that’s hard to tune out. White noise is one thing; intermodulation is something far less pleasant. Incidentally, not using the DMC for the purpose of PCM playback frees it up for a seldom used trick: modulating the volume of the 2A03’s triangle channel. This is a huge win because it allows the triangle to join its square brothers as a dynamic part of the NES’s musical ensemble.

Therefore, the only way to play Former Dawn properly on a toploader NES will be to mod the unit. All things considered, it’s a fairly simple and inexpensive mod, but it does require opening the chassis and soldering a resistor in between two pins. We know that this is a fairly common mod already because there are enthusiasts out there that like playing Famicom games on their toploaders via adapter cartridges; such people will be able to play Former Dawn just fine. For unmodded toploader owners who are both unwilling to mod their units and unwilling to buy frontloader NESes, it would be highly advisable to play the PC port of the game instead. We’d rather they do that than play the game with the sound muted, or even worse…listening to a confusing broken half of the soundtrack and no sound effects!

On a more positive note(no pun intended), the last question (Q7) is one of the most exciting ones. One of the things that really helps sell the SNES’s technical richness is its echo buffer. It was used for reverb in music and sound effects, notably in Chrono Trigger, A Link to the Past, Terranigma, Super Mario World, Super Metroid, and many others. We ended up implementing an 8-bit version of it that is a little more robust than the one in the SNES; in fact, we implemented 2 separate ones. One of them is dedicated to SFX and the other to music. They can be independently configured, and each of the 4 channels within each part of the music/SFX divide can choose to subscribe to the echo buffer or not. This will allow for the music to have a subtle reverb effect layered in while also having an echoing-off-the-walls effect for in-world SFX in e.g. caves or tunnels. This might be slight overkill, but it was relatively inexpensive for us to do. The bang for buck was quite high, and we think players will rather enjoy the added immersion that it allows.

Here are a few examples of what our final system, 2A03 + MXM-1, sounds like. All of this is working on real NES hardware:

“This Is Really Exciting”

“Titat Intro or Title idk”

“Vicroy Scott”

“First Place Will Be Mine”

“See It With Your Eyes”

“That’s A Funny Trick To Play On God”

All of the above 7 tracks were composed by hEYDON. In fact, all of the tracks in this article were composed by him except for “Inversion Battle”, “Synth2 2G Demo”, and “Kzer-Za Theme”. (Many thanks!)

TL;DR

MXM-1 now has full blown, finalized expansion audio. It sports the following features:

8 channels of 8-bit mono PCM with a maximum playback rate of 44.1KHz (4 for SFX and 4 for music)
Sinc function interpolation which is butter smooth and allows re-pitching up and down several octaves
2 configurable echo buffers (1 for SFX and 1 for music)
11-bit delta-sigma DAC
Simple wavetable support for fine grained, composer-defined looping
Works perfectly on any unmodded Famicom, frontloader NES with expansion audio bridge installed, or expansion-modified toploader NES

To hear what it sounds like, scroll up a bit and start clicking Play to your heart’s content.

It’s been a long journey, my friends. It’s wonderful to have arrived at our final stage of evolution.

-Jared

4 thoughts on “Expansion Evolution”

Emnesium

October 2, 2024 at 2:30 am

Is this tech going to be publicly released? This would be fantastic to compose with.
- Jared Hoag
  
  October 7, 2024 at 7:08 pm
  
  Yes, eventually. If we hit the stretch goal on our Kickstarter campaign, we’ll release the mapper (and therefore the expansion audio subsystem) before Former Dawn itself is released.
Anonymous

October 2, 2024 at 6:57 pm

Wow this is awesome. Hopefully this chip will be released publicly someday
- Jared Hoag
  
  October 7, 2024 at 7:07 pm
  
  It will! It might come sooner rather than later if our Kickstarter campaign is sufficiently successful.