Unlocking the NES (for Former Dawn)

Definition of unlock
transitive verb
3 : to free from restraints or restrictions
// the shock unlocked a flood of tears

Former Dawn aims to be the most extreme example yet of what could be called Neo Vintage — a new game for an old system. This is literally the opposite of the excellent forum site VOGONS (Very Old Games On New Systems). When forming Something Nerdy Studios in early 2019 and launching our first game project, we all felt as though creating an advanced modern RPG that targets the NES would be a really fun and cool thing to do, but that the available memory mappers from the system’s heyday seemed too limiting. We knew quite well that the easiest thing to do would be to give up and make a retro game on the PC instead, but that rubbed us the wrong way. Not only was that already a very crowded market by 2019, it wasn’t good enough merely to create an NES-like game that could never actually work on a real NES. We wanted to find out how far we could go on the real thing!

This is as close to the NES-CD as we got, and we didn’t even get this. Funny that it did give birth to one of the most successful game consoles of all time.

At first, it seemed like either MMC3 or MMC5 would be our best bet, despite the nagging feeling that something much better could be crafted if we just had the know-how and manufacturing connections. It seemed to us that there was vast untapped potential in the NES, and that providing it with gobs of ROM is the primary way to tap it. Although SNK pioneered very large ROM sizes in the early 90s via the Neo·Geo, both that system and the games for it were exceedingly expensive because of mask ROM prices at the time. What if there had been a more economical solution back then? Specifically, what if the NES had enjoyed CD-ROM games that would’ve continued its lifespan and taken a bite out of the TurboGrafx-CD, Sega CD, and 3DO? Try to imagine what modern PC gaming would be like if you only had 1 meg to play with; would anyone even take it seriously? Sure, a lot could still be done in 1 meg but it would be nowhere near enough space for the features that modern gamers expect in new games — even indie games. Similarly, there’s no good reason to think that the same is not true of the NES.

Yes, I saw this on CRTs back in the 1990s. Yes, it bothered me. Don’t pretend like this isn’t a problem. 😛

In addition to space requirements, there are many pieces of low hanging fruit that almost no classical NES games plucked. For instance, did you know that glitchless diagonal scrolling (using 4 “nametables”) was baked into the hardware design of the NES from the very beginning? Despite Super Mario Bros. 3 using the MMC3 memory mapper, Nintendo opted not to include the tiny amount of extra RAM on the cartridge necessary to unlock MMC3’s scrolling enhancement, which is why the obnoxious graphical glitches appear at the right side of the screen during gameplay. That sort of thing felt inexcusable to us, so to get our feet wet, we implemented glitchless and perfectly smooth 8-way scrolling in an MMC3 “walking simulator” game demo. As far as we know, this had only been accomplished on the NES (read: not Famicom) in 1 title — Tengen’s Gauntlet (which is technically a simpler predecessor to MMC3) . And even in that game there were only 4 screens per level, which is the easy way to do it. Our proof of concept sported a 16-screen level with lots of complex graphics, with the ability to add even more screens. This was a much harder feat, but more importantly, much more conducive to something like a sprawling RPG.

Just Breed is arguably the most advanced RPG on the Famicom/NES to date. It uses MMC5 and 8×8 attributes. Note the scrolling glitches at the top and right!

Thus emboldened but still lacking the ability to create our own mapper, we pushed forward with inventing as many new tricks as we possibly could — what were called “novelties” around the office. We ended up creating quite a few of these…so many that it really did seem like the ROM space constraints of MMC3 would’ve prevented us from fully exploiting them. MMC5 allowed for a little more space than MMC3 (about double), but lacked MMC3’s quad nametable feature. MMC5 also sported 8×8 attributes (I.e., denser coloration of the screen than the NES’s stock 16×16 attributes allow), but the feature is hardwired to only use a single nametable. Thus 8×8 is so broken in MMC5 that it doesn’t work correctly with hardware scrolling. This whole classical mapper situation was a mess, and in any case, no existing memory mapper for the NES allowed anywhere near enough space to facilitate expansive games full of advanced level design, rich soundtracks and sound effects, high frame rate sprite animations, intricate background animations, or SNES/PlayStation JRPG quantities of dialogue…let alone something as crazy as FMV.

It was at that time that we had the good fortune of meeting Paul Molloy from Infinite NES Lives. We told him of our ambitions and he in turn told us that he had designed a new type of FPGA-based NES cartridge that would probably serve our needs. At the time, he needed to work out some issues with it and partially redesign the PCB, but there was enough functionality there to tell us that we had our path forward. The main problem was, the hardware wasn’t enough; at least one of us had to learn Verilog in order to implement our mapper on that hardware. But in turn, before specifying the mapper in Verilog we had to know what to specify!

This logo demake for the NES by Ellen Larsson is the only legitimate part of Doom that can actually run on the NES.

Up until that point, I had cheekily called my fantasy mapper “The Gigamapper”, because it would, at a minimum, supply the NES’s CPU and PPU access to a gigabyte of ROM. (We also toyed with only giving it a gigabit of ROM, to be in line with NES and SNES ROM size nomenclature.) This base requirement now seemed almost trivial, and it was quickly apparent that we could do many, many things with a cartridge like this that had heretofore been impossible. But we definitely did not want to “cheat” in the way that Doom-on-the-Raspberry-Pi-on-the-NES does. Something undeniably anachronistic like that lay very far from our interests. In other words, the NES needed something new that could have been something old, but never was. What then, to do?

= Guiding Principles =

We decided that since 1994 marked the end of the NES’s original lifespan, we had to create an expansion system that would’ve been plausible in 1994 — technologically and economically. This is the base principle from which the rest of our mapper design philosophy flows. At the same time, we chose not to do absolutely everything that was possible in 1994. This is in part because we wanted to know what the NES was capable of “on its own” in the very specific sense that it still does all the computations that are ends unto themselves. Essentially, this means that something akin to a CD-ROM expansion is perfectly fine, but that a 3D accelerator or math co-processor is not.

In addition, video compression algorithms like MPEG were suspicious at best, and we’d prefer to avoid them. Although MPEG-2 was out by 1994, it was intended for display resolutions, palette freedoms, and color depths that are not possible on the NES. Because of this, in order to use it at all we’d have to implement additional conversion logic in hardware, which would probably violate the economics requirement mentioned above(even more than adding an MPEG decoder to the BOM would do already). It would probably also result in terrible image quality compared to whatever bespoke algorithms we came up with.

We interrogated these principles and after much debate around the office and with other people in the broader NES development community, we arrived at these conclusions. For ease of reading and contrast, they are split between dos and don’ts.

= Conclusions =

Technical jargon is pretty much unavoidable when discussing a topic like this, so here it is. Anyone who doesn’t care about these details is welcomed to skip below to where I will graphically demonstrate what these features unlock in NES game development.

Allowed:

  1. Direct access to as much data as could be stored on just over 1 CD-ROM. (768MiB)
  2. Indirect access to as much data as could be stored on 4 CD-ROMs. (2.8GiB)
  3. Direct access to up to 1 MiB of RAM.
  4. Interposing the PPU’s data fetches in order to alleviate onerous limitations:
    1. 256 unique tiles per screen -> 960 unique tiles per screen.
    2. 2 nametables -> 4 nametables.
    3. 16×16 attributes -> 8×8 and/or 8×1 attributes.
  5. Automatic bank switching that facilitates items 4.1 and 4.3.
  6. Nametable bankswitching, allowing high performance background animations composed with other mapper features.
  7. Attribute bankswitching that facilitates item 4.3.
  8. Multiple fine-grained CHR banks. (16 banks of 512 bytes apiece)
  9. Multiple medium-grained PRG banks. (4 banks of 8KiB apiece)
  10. Error correction or “de-glitching” features, merely to correct behavior that amounts to bugs in the NES’s hardware design.
  11. Scanline counter. (Better than MMC3’s, in that it works correctly with all other features of the mapper.)
  12. DPCM sample size expansion. (4081 bytes -> 16MiB)
  13. Audio synthesis chip [emulation] (YM2608, YM2610, YM2612, etc) for expansion audio purposes.
  14. Dual-port ROM and RAM.

Disallowed:

  1. Offloading general purpose calculations. (I.e., no CPU, FPU, or any other kind of co-processor on the cartridge.)
  2. Offloading graphical processing. (So nothing like Super FX, SA-1, etc.)
  3. PCM audio streaming via expansion audio.
  4. Exceeding the computational power or complexity of the NES itself.
  5. Exceeding the circuit complexity of MMC5, which was the most complicated classical memory mapper for the NES.
  6. Re-implementing the PPU for any reason whatsoever.
  7. Physical form factor any larger than a traditional Game Pak. (I.e., the game cartridge has to fit properly in a frontloader NES.)
  8. Transferring data from SD card into cartridge RAM at data rate that exceeds that of a quad speed CD-ROM drive. (600KiB/s)

Our memory mapper began life as C++ code inserted into our local fork of Mesen, and only implemented feature 1. All of the other features except for 2, 13 and 14 eventually became implemented together in a single package which we call the Memory eXpansion Module 0, or MXM-0. Taking MXM-0 and combining it with stubs for 2 and 13 is what we call MXM-1. You’ll notice that feature 14 is left out of both; that is because INL is supplying that feature on the cartridge for us. It is therefore not actually part of our mapper(s). Feature 13 is also merely stubbed, because we are creating three different types of cartridges for Former Dawn — one with a genuine Yamaha YM2610 ASIC inside, one with simulated YM2610 audio synthesis coinhabiting the FPGA that implements MXM-1, and one without expansion audio entirely.

MXM-0 and MXM-1 are both now implemented in Verilog as well, with almost all features fully usable and complete. Like all of the classical memory mappers for the NES, ours run beautifully on the EverDrive N8 Pro. Very soon we will adapt MXM-1 to our prototype cartridges from INL. In other words, this mapper is real. It is not a theoretical construct! Any one of you with a genuine hardware NES (frontloader or toploader) could insert one of our N8 Pro dev cartridges and run the current build of Former Dawn right now. Compatibility with NES clones varies, but is quite good; more on that later.

= Explanation (Allowed) =

In order to understand our motivations for implementing each of the Allowed features, each one needs to be described in some detail along with visual examples if possible. In such cases, the image on the left will be an NES game suffering classical restrictions, and the image on the right will be something that is made possible by MXM — preferably from Former Dawn.

1. Direct access to as much data as could be stored on just over 1 CD-ROM. (768MiB)

…we can do this on the NES.
So instead of this…

When we began implementing what became known as MXM-0, we thought that Former Dawn was going to be stored entirely in NOR flash on the new INL board. (Since mask ROM production incurs prohibitive fixed costs, this is what everyone else in the modern NES “homebrew” scene is doing.) Because of various uncertainties in chip supplies and technical difficulties, we opted to take INL’s offer to include an SD card on the cartridge as well. As it turns out, we think that the fast-access part of the game can quite comfortably be stored in 16MiB of (NOR flash) ROM. If we had known at the very beginning that we were going to go the SD card route, we might not have facilitated direct access to 768MiB of ROM in MXM-0. But we did, and now we see no reason to remove it — especially because anyone who uses MXM-0 in the future might want to create a Neo·Geo style cartridge for the NES with a massive amount of ROM in chip form instead of SD card form. Why not? In either case, the basic point is that having so much ROM means that one can now spend that ROM liberally in order to upgrade the quantity and the quality of pretty much any aspect of an NES game you can think of of.

2. Indirect access to as much data as could be stored on 4 CD-ROMs. (2.8GiB)

…we can do this on the NES, but won’t because of the Copyright Act of 1976.
And instead of this…

Because of the fact that we’re basing Former Dawn‘s cartridge specs and memory footprint off of a classical CD-ROM console or console add-on, it made sense to base our ROM limits on actual examples of CD-ROM games from the early 90s. Most CD-ROM games used only 1 disc, but a few of them used more, even early on. Night Trap, released in 1992 for the 3DO and Sega CD came on 2 discs. In late 1994, Slam City with Scottie Pippen was released for the Sega CD and contained 4 discs — and this may be the record by 1994. So we’re setting our max ROM size to 4 CD-ROMs, or 2.8GiB. Whether or not we will come anywhere close to that depends on how much FMV we end up including in the game. No other assets would push the ROM footprint past 1 disc’s worth. Even if the OST were 2 hours long and we stored it as raw 7-bit 33.1KHz mono PCM, we would only need 239MiB. Before MXM-0, ROM sizes and more primitive memory mappers meant FMV on the NES was never achieved with full frame rate at full screen.

3. Direct access to up to 1 MiB of RAM.

In order to justify the decision to include 1MiB of RAM in this expansion system, there are 3 relevant questions: A) Can this amount of RAM reasonably be used by a 6502-based CPU? B) Would it have been economical in late 1994 to do so? C) WHY?

A) Yes. The Apple //e supported up to 1MiB of RAM. All it takes is carefully managed bankswitching and/or serial loading.
B) Technically, we are using 1MiB of static RAM, which would not have been economical in 1994. But this is because it’s more economical for us to use SRAM than DRAM in 2021. Using dynamic RAM would imply having to have some sort of memory controller on the cartridge, which would cost significant money and engineering time, not to mention possibly exceeding our available electrical power. However, consider that the original PlayStation was released in late 1994 and it had over 3MiB of RAM onboard.
C) Why? Because we need it. Classical cartridge games on the NES often had 8KiB of RAM onboard, and some had up to 32KiB. The reason why they needed comparatively little RAM was that the entire game’s assets were stored in mask ROM that acted like RAM in terms of access speeds. This is something that can be hard to appreciate for anyone coming from the PC gaming world where RAM is crucial(no pun intended). When you move to a CD-ROM sort of access model, you need a large buffer to hold levels, graphics, etc. One of the many reasons that early CD-ROM based video game consoles failed is that they didn’t have enough RAM to serve as a buffer for the data coming in from the CD-ROM drive. This unnecessarily increased the frequency of load times, or “thrashing”, and made for a terrible gameplay experience. We are actually constraining ourselves pretty significantly to crowd everything we want to do into 1MiB of RAM. Consider the fact that the FDS had a 32KiB buffer for loads from the floppy disks despite each side only having 56KiB. That means that up to 29% of a game could be cached at any given time in RAM. Given that Former Dawn is likely to take up dozens of megabytes even without FMV involved, our corresponding proportion is more like 3%; I.e., we’re suffering with about a tenth of the buffer in an apples-to-apples comparison. There is a possibility that we can make our code efficient enough to squeeze everything down even more and work with 512KiB of RAM instead of the full 1,024KiB, in which case we will save some money on the BOM for the cartridges and feel slightly smugger.

4. Interposing the PPU’s data fetches.

Most of the classical memory mappers for the NES interpose the PPU in some way. Usually it was to provide more than the paltry 8KiB of CHR, but sometimes it was done to facilitate CHR-ROM and CHR-RAM on the same cartridge (MMC3), provide more than 256 tiles per frame (MMC5), auto-switch CHR in the middle of the frame (MMC2), among other reasons. We have taken all of these to their logical extremes; here are the details:

…and this is the whole shot featuring only 549 tiles. It’s not even using the full 960, but what a difference it makes!
This is how much of a scene from Terminator 3 that can be shown on the NES with only 256 tiles.

4.1. 256 unique tiles per screen -> 960 unique tiles per screen.
Because of the 8-bit nature of the CPU (and PPU), there are many artificial restrictions in the design of the NES. One of these is the fact that without mapper support, you cannot put more than 2^8 = 256 unique tiles onto a single frame, despite the fact that the frame itself requires 960 tiles to fully cover it. MMC5 lifted this restriction up to 960 tiles out of a maximum set of 16,384. MXM-0 lifts it further to 960 tiles out of a maximum of 65,536. When limited to 256 tiles per scene, a severe burden is placed on the artist to create the illusion that such a heavy restriction is not in place. This can be accomplished either by making the scene/image smaller or by reusing tiles all over the place. The typical result in the classical NES era was a very patterned or simplistic look instead of more intricate (“entropic”) art being shown.

This is 8-way scrolling in Former Dawn on the NES. Note the total absence of graphical glitches at the borders.
This is 8-way scrolling in Crystalis on the NES. Note the terrible graphical glitches at the borders.

4.2. 2 nametables -> 4 nametables.
As mentioned in the preamble, we include support for 4 nametables primarily to facilitate smooth 8-way scrolling with no restrictions. In fact, we have support for more than 4 nametables, but only via bankswitching. At any given time, the PPU “sees” 4 nametables because that is how it was designed to work. One of the reasons that the developers of original era NES games accepted having such terrible glitches (resulting from only having 2 nametables) in their scrolling systems is that most retail CRT TVs of the day obscured the errors because of typical NTSC overscan. We don’t have that luxury because a lot of people use PVMs, upscalers, or emulators to play NES games in the modern era. We we are holding ourselves to the ultimate standard — the game must look perfect when viewing the entire 256×240 frame, at all times.

An interior view on Former Dawn exhibiting the 8×1 attributes of MXM-0. Note the sophisticated shapes and textures that can result from the palette freedom.
An interior view on StarTropics exhibiting the 16×16 attributes (palette choices) in classical NES games. Note the blocky appearance that almost always followed.

4.3. 16×16 attributes -> 8×8 and/or 8×1 attributes.
The stock NES hardware imposes an “attribute” grid across the frame where each 16px X 16px square (a “metatile”) has to subscribe to one 4-color palette of the 4 total background tile palettes that are in the PPU’s internal RAM at any given time. This is an extreme restriction that naturally resulted in almost every game for the Famicom/NES having a certain look because of how hard it is to “fight the grid”, to use a term that David Crane coined. MMC5 lifted this restriction so that the attribute grid is 4 times more granular: 8×8 squares instead. Unfortunately, MMC5’s 8×8 attribute mode is not truly compatible with hardware scrolling because it only works with 1 nametable at a time. Because we feel strongly that the hardware scrolling feature of the NES is the most important thing about its design, we went further than MMC5 did in this regard. We re-implemented an 8×8 attribute mode in MXM-0 that is fully compatible with hardware scrolling in all 8 directions, using 4 simultaneous nametables as just mentioned in 4.2. After that, we went even further and created an 8×1 attribute mode which is also fully compatible with quad nametables. This 8×1 attribute mode is key to Former Dawn‘s aesthetic, because it allows the PPU to draw as freely as possible given its intrinsic design constraints. There is no further enhancement that be done, in other words. It is literally impossible to get 1×1 attributes (I.e., fully bitmapped graphics) across the entire frame. In a local region of a frame, multiple sprite overlays can be used to achieve this. But that comes at the extremely high cost of blowing out the sprite system, which is something that is rarely worth it. Our 8×1 attribute mode can be used freely across the entire game, which means that artists on the project have much more freedom to use their pixel art skills to achieve intricate shading across a whole frame — something heretofore impossible on the NES. Strangely enough, this is still impossible on the SNES because of the fact that the (S-)PPU’s address and data pins are not directly exposed to the cartridge slot on that system. Therefore, as far as we know this 8×1 attribute mode in MXM-0 and MXM-1 is something wholly unique across the entire space of vintage gaming consoles which use tile-based background graphics.

5. Automatic bank switching that facilitates items 4.1 and 4.3.

In order to avoid annoying timing difficulties and restrictions, we also enhanced the CHR bankswitching to be automatic based on metadata that we sneak into CHR in between regions of tile data. This helps free up the CPU to conduct the important work that only it can do. You know, running the game logic instead of babysitting the PPU or memory mapper.

6. Nametable bankswitching.

Subtlety is a virtue in game design. We have striven to achieve it, as these 5 different animated background object types integrated into one small area show.
Willow‘s use of animated background tiles was commendable for the time, but its execution missed the mark. It relies on pure CHR bank switching at a unified (and frantic) playback rate.

Going further along the lines of alleviating CPU from babysitting the PPU, we implemented nametable bankswitching in a highly usable way. Again, this is a feature that technically has been implemented in previous memory mappers (E.g. Sunsoft-4, which After Burner used), but those implementations were not fleshed out enough to be truly useful. Our nametable bankswitching is composable with automatic CHR bankswitching and 8×1 or 8×8 attributes. This allows intricately animated background tiles without forcing the CPU to traverse the nametable data and update regions of it to facilitate that animation. It also means that multiple tilesets can be used on the same screen simultaneously, and even animated at different frame rates! This subtlety is key to making Former Dawn‘s environments feel dynamic and alive without feeling overpowering (as it is in Willow) — something that even Chrono Trigger did not accomplish consistently. To be fair, the SNES is more than capable of accomplishing the same thing via other means, so it’s probably only lacking in Chrono Trigger because of ROM size constraints which we do not suffer.

7. Attribute bankswitching that facilitates item 4.3.

This is a straightforward requirement. I only mention it to point out that it was required in order to get other features to work.

8. Multiple fine-grained CHR banks. (16 banks of 512 bytes apiece)

Classical memory mappers had various granularities of CHR bankswitching, ranging from 8KiB (1 bank) down to 1KiB (8 banks). We took this further to 16 banks of 512 bytes apiece. This makes it possible to have 16 sprite-based entities on the screen simultaneously, all animating independently. (E.g., playable characters, NPCs, enemies, or background elements modeled with sprites.) In practice, we will rarely use more than 8 such entities because of the global per-frame sprite limit of 64. However, the freedom to eagerly load entities into independent small banks eases the programming effort enormously. For instance, projectiles and particle effects can be queued in advance of actually displaying them while current assets are being rendered. We can also mix and match different enemies and NPCs across the entire world of Astraea without duplicating graphics in ROM, and without coupling their animation frames to each other. This technical decoupling seemed very important for showing Astraea as the rich, varied world it’s supposed to be.

9. Multiple medium-grained PRG banks. (4 banks of 8KiB apiece)

Splitting PRG into 4 banks of 8KiB apiece seemed like the best approach to address the concerns of 6502 Assembly code organization and ease of management at runtime. Any smaller than 8KiB and related subroutines would often not be available simultaneously. Any bigger than 8KiB and awkward bank switching would have to be conducted much more often when disparate parts of the codebase call each other.

10. Error correction or “de-glitching” features.

We could just time mid-frame shenanigans better than programmers did in the 80s and 90s to get rid of these problems, but we decided to make it easier on ourselves with a tiny bit of extra hardware support.
I always wondered what caused this on Mega Man 3 when I was a kid; now I know! The 6502 Assembly programmers simply lacked the software OR hardware support to make it easy to debug things like this.

Anyone who has played the classical NES game library extensively will surely have run across numerous examples of rendering glitches. Prominent examples include: flickering pixels at the border between the game field and the HUD in Super Mario Bros. 3, flickering pixels mid-frame in the level selection screen in Mega Man 3, and flickering pixels when accessing the Start menu in The Legend of Zelda. These glitches manifest in part because of difficulties that developers faced in the 80s and 90s with the limited development tools of that time. The timing has to be carefully tuned in order to avoid inducing erratic behavior in the PPU when making changes to its internal state mid-frame. But sometimes they’re extremely hard to get rid of even with modern tools like Mesen’s debugger. We implemented one mapper trick to help solve these timing difficulties, and another to help reduce similar glitchiness that results from hardware interrupts firing in the middle of a scanline.

11. Scanline counter.

Currently, we have a scanline counter implemented which is fully compatible with all mapper modes. This facilitates many raster tricks like faux parallax scrolling that would be either difficult or impossible without it. We may also implement a general purpose CPU cycle counter similar to the one in Sunsoft’s FME-7 memory mapper in order to create even more advanced raster tricks. It’s difficult to graphically show the advantage of our approach for scanline counting, but it amounts to being able to do more of it while sacrificing less CPU time, and to accomplish it with less programmer time and headache. These savings can then be spent on making a better game. As mentioned in the previous article, the true number of colors in the NES’s master palette is actually 425, not 54. But accessing those additional 371 colors is difficult to do because the naive way to do it is to tint the entire screen at once to get a different 54 colors for an entire frame instead of mixing and matching them across the larger color space. The less naive way is to use a scanline counter and switch the “emphasis bits” mid-frame in order to get some of those extra colors. We will definitely do this for specific special effects in Former Dawn. It should be noted that 425 colors puts the NES near the TurboGrafx-16 and Genesis in terms of color space size, but on those systems the colors are much more (but not totally) freely usable. What it comes down to is that the NES has much more graphical power available than people are aware of, but unlocking that power either requires enormous software engineering effort, or a small amount of hardware engineering effort. We’re opting to employ both. How much time we will have to invest in fully exploring the possibilities will depend on factors that are unknown at this time.

12. DPCM sample size expansion. (4081 bytes -> 16MiB)

The APU part of the NES’s 2A03 CPU is hard wired to have a maximum DPCM sample length of 4081 bytes, which at the maximum playback rate of 33,144Hz amounts to 1 second of sampled audio. This is one of the most restrictive aspects of the NES’s design, and a real tragedy. The tragedy wasn’t felt much in the original NES era because ROM sizes were so constrained that not much sampled audio could be justified. Thankfully, the memory address range allowed by the APU for DPCM samples makes it possible for a memory mapper to offer assistance in expanding the allowed sample sizes. We’ve done this, all the way up to 16MiB. This expansion will facilitate longer sound effects, a more DPCM-rich soundtrack, and audio tracks to accompany FMV without skipping or tricky mid-frame bankswitching. It will also allow us to implement multiple “virtual DPCM channels”, thereby further enriching the soundtrack and making it possible to play DPCM sound effects in the game at the same time as the soundtrack without either one cutting out the other. It will also make it possible to play multiple DPCM sound effects simultaneously. As far as we know, nothing like this has ever been done in an NES game before.

13. Audio synthesis chip [emulation].

This is a big one that really demands its own blog post, which will come at some point. But in brief, we’ve chosen to have expansion audio on the cartridge that will be made possible on the frontloader NES via the expansion port plug offered by INL. (It should also work natively on the Famicom, of course.) The more advanced expansion port module from Perkka should also enable our expansion audio. The chosen synth chip for this expansion audio is slated to be the Yamaha YM2610, which was the sound chip in the Neo·Geo. We already have FPGA-based emulation of the YM2610 working, but have not written the interface for it that will alleviate the 6502 CPU core from having to feed it. This was accomplished in the Neo·Geo via a dedicated onboard Z80 CPU, which we would like to avoid if at all possible. Several solutions have been proposed and we are working through the implications of them before making a decision. In any case, it is a hard requirement that the native 2A03 portion of the soundtrack sound fantastic on its own as well as combined with the YM2610 portion. Thus, anyone with any kind of NES or Famicom, modded or unmodded, will be able to enjoy the soundtrack to Former Dawn!

14. Dual-port ROM and RAM.

One of the biggest problems when dealing with any vintage video hardware (not just the NES) is managing the timing of reads and writes so that the VRAM is not being accessed simultaneously by the CPU and PPU(generally, GPU). In the specific case of the NES, the most useful portions of the PPU’s internal state cannot be written to at all while rendering is turned on. Thus, the safe solution has always been for the CPU to wait until either vblank or hblank to conduct writes into the PPU’s internal state. Since hblank is so short, almost nothing can be done there and what can be done is extremely difficult to time correctly. Vblank is comparatively longer, but is still quite short. The NES shipped with 2KiB of nametable/attribute RAM soldered onto the motherboard, which subjected it to these problems. But it was also designed so that external VRAM could be used instead — mapped within CHR-ROM or CHR-RAM. Thus, there is nothing that prevents such ROM or RAM from being “dual ported”; I.e., the CPU and PPU can both access it at the same time. All it requires is either special ROM or RAM and/or mapper support. Because INL was already developing a dual port NES cartridge in general, we chose to co-engineer this system with INL so that we can base much of Former Dawn’s programming on the assumption of dual-portedness. Strictly speaking, this is not part of MXM-0 or MXM-1, but it bears mentioning because it does require hardware support in a massive way. Basically, the distinction between PRG and CHR is eroded with such a system. Again, there is just barely a historical precedent for this: MMC5 contains 1KiB of “ExRAM” which is dual ported. We’ve just taken it much further and thoroughly depended on it for certain features of the game instead of treating it like a gimmick as it was in MMC5. We can do things like bank switch a RAM-based nametable into the PPU’s address space (I.e., into CHR) while also bank switching it into the CPU’s address space. Thus configured, the CPU can modify the nametable during rendering, thereby making many more features possible like robust destructible environments, particle effects, other special effects like faux mode-7 from the SNES, and more. How far we take it won’t be known until deeper into the project, because such things have almost no precedent on the NES.

Explaining what’s not allowed in our mapper is almost as important as what is, since designing something like this in the 2020s puts one in constant danger of stepping over the line. Here are the explanations!

= Explanation (Disallowed) =

1. Offloading general purpose calculations.

As cool as this is (and it is), it is definitely not an NES game in the meaningful sense that most of us would care to use the term.

If you’re going to make a game for the NES, you have to question exactly what it means for it to be “on the NES”. What is any video game at its core? It’s a computer program that runs in real time, takes user input, and uses logic to combine the user input with graphics in order to send video output. So it seems straightforward to remain steadfast on the point that the game logic part of all this take place on the NES; I.e., on the NES’s CPU — the 2A03. If you’re using some kind of modern general purpose processor (e.g. an ARM CPU) on the cartridge that runs the logic instead, you’re completely “cheating” in the sense that it’s not truly an NES game. Why? Because it’s akin to strapping a jet engine to a 1910s biplane — it makes it something categorically different and impossible to achieve in the device’s original context. So no matter how interesting or challenging it would be to do the modifications necessary for Doom-on-the-NES, it’s ultimately uninteresting as an addition to the NES’s game library, since almost any game could be added to the NES’s game library that way. In other words, a definition that includes everything is about as useless as a definition that includes nothing. If instead you’re using a period-accurate and purpose-specific processor (e.g. an early FPU) to assist in calculations, it’s less obvious that it’s “cheating”. But we think it’s better to avoid the problem altogether and just eschew any assistance or replacement of the 6502 core of the 2A03 for game logic purposes or related calculations. In this sense, our design is purer than MMC5’s, since MMC5 contains a general purpose integer multiplier feature accessible from a game program, and therefore edges towards fulfilling a CPU’s responsibility. Even worse than that, the 6502 does not even have a multiplication feature! So MMC5 is capable of enhancing the CPU of the NES in a way that’s not just adding a bit more of what it can already do — it pushes the combined system towards something more advanced like the Motorola 6809 or the Intel 8086.

2. Offloading graphical processing.

As cool as this might be, it is not really an SNES game in the fairest sense, since the bulk of the graphical processing is being done on the cartridge, not in the console. It’s a computer within a computer.

Similarly, a big part of what makes an NES game an NES game is the fact that the PPU is rendering the video. Small adjustments or enhancements seem OK, especially because they are grandfathered by various classical memory mappers. But putting something “big” like the SA-1 or Super FX chip on an NES cartridge would turn it into a fundamentally different system. Obviously, the typical consumer doesn’t care at all about whether or not a graphics enhancement chip is present on the cartridge. The SNES/Super Famicom game library contained many popular titles that did exactly that — including some of the most lauded ones such as Star Fox, Yoshi’s Island, and Super Mario RPG. In fact, the Super FX chip began life at Argonaut Games as part of the Star Fox project. The initial proof-of-concept game was made for the NES, not the SNES — and it was in turn adapted from their precursor Amiga game called Starglider. It was right around this time that the Super Famicom semi-final prototype was available, and Nintendo Co. Ltd. provided one to Argonaut. Brand new hardware in hand, Argonaut then ported the game from the NES to the SNES. After preparing a demo, they met with Nintendo in person and told them that despite the SNES having good 2D hardware for the time, they needed a 3D accelerator chip to make the game truly shine; thus the Super FX project was launched. No other company came that close to creating a 3D accelerator for the NES because the SNES was in full force by the time that Argonaut demonstrated the economic viability of putting an accelerator chip on a game cartridge for any system. So this phenomenon never made it to any game in the original NES’s game library, and we don’t want to be the ones to introduce it. We want to remain defensibly period correct, and this is another type of enhancement that is hard to defend. (It is also against our personal tastes.)

3. PCM audio streaming via expansion audio.

The closest analog to streaming PCM audio into the mix via the expansion audio line would’ve been “Red Book audio” — CD audio. But that’s not possible to do while a non-audio data track is being accessed on a CD-ROM game. You might notice on a classical CD-ROM game for the PC that either the audio is cut down in quality and/or is short, or the video is. This is no accident! Given that we do not have a big enough buffer to hold full PCM quality audio for anything but a trivial length of time, using PCM streaming via expansion audio during gameplay is extremely suspect. Doing it during FMV is something else, and we have already accomplished that as our Bad Apple FMV demonstrates. In addition, our composer wants the game to have an authentic early 90s sound to it anyway, and the best way to guarantee that is to use a genuine audio synth chip or at least emulate one in the FPGA. Remaining strictly period correct is much easier to accomplish that way, and helps avoid temptation.

4. Exceeding the computational power or complexity of the NES itself.

This is almost guaranteed by 1. and 2., but it’s worth mentioning anyway. Strictly speaking, it is a weaker requirement but it captures some edge cases that might sneak by without holding firm on this.

5. Exceeding the circuit complexity of MMC5.

Whether it’s fair or not, the MMC5 is a somewhat controversial enhancement chip in the modern NES development community. It would’ve been extremely difficult (if not impossible) to manufacture economically in 1983 when the Famicom was first released, so it represents a clear improvement in the technology that was introduced late in the NES/Famicom’s lifetime. It is also the most advanced enhancement chip that ever made it into a commercially released NES or Famicom game. Therefore, we hold it as a good guidepost to how complex of a circuit MXM-0 can be. Because MXM-1 also contains SD card access logic, we exclude that part of it from the comparison. If MXM-1 had been released in its period correct CD-ROM add-on form, the part of it that would’ve handled the CD-ROM drive itself would likely have been on a separate ASIC or set of ASICs. This helps make the comparison to MMC5 cleaner. MXM-1 will also contain an interface to (but not the implementation of) either the YM2610 or an FPGA-simulated form of the YM2610. The circuit complexity of the YM2610 in either ASIC or FPGA-simulated form is also excluded from a comparison to MMC5. Thus in order to be fair, we exclude the expansion audio portion of MMC5 itself in such comparisons. Thus far, with all these caveats in place, MXM-1(and thus MXM-0) is slightly less complex than MMC5. (This is due largely to the fact that we have rejected inclusion of many features of MMC5 that we regard as gimmicky, inefficient, or unneeded for our game design; examples include vertical split screen scrolling, tile fill, and variable banking modes.) We reserve the right to end up at a place where MXM-0/MXM-1 is marginally more complex than MMC5, but will strive to be reasonable and keep it under control as we finalize the design.

6. Re-implementing the PPU for any reason whatsoever.

This is almost a recapitulation of 2., but it also seemed worth pointing out. It would be crass to do this, even if we could do it and still sneak past the other requirements.

7. Physical form factor any larger than a traditional Game Pak.

This is, to borrow a term, to avoid “the image of impropriety”. It shouldn’t be a problem for us anyway, because we really aren’t doing anything that crazy! It would also be violated in spirit if MXM-1 really manifested as an expansion port module that fit underneath the NES. In any case, we think it’s better to err on the side of caution on this front as it is on several others. We also know that our customers are expecting a cartridge that looks like bog standard Game Pak, at least on the outside. And that’s what we’re going to deliver.

8. Exceeding the loading speed of a 4X CD-ROM drive. (600KiB/s)

Something tells me you’ve probably never even heard of the Pippin. All that glitters is not gold.

The rationale for modeling our data transfers on a quad speed drive is that such drives were available on the retail computer parts market before the release of Wario’s Woods at the end of 1994. It stands to reason that such a drive could’ve been used in a CD-ROM console by the end of 1994 as well. However, there is only one known CD-ROM based video game console that features a 4X drive, which is the Apple Bandai Pippin. Not only that, but the Pippin was released in early 1996 which admittedly causes a weakness in our justification. Most of the successful CD-ROM based consoles in the 90s used a combination of 2X drives and video compression instead, probably to keep costs down on the drive components. (The only exception is the Dreamcast, which sported a 12X speed drive, but it didn’t come out until 1998.) So we are currently experimenting with lossless compression algorithms that could reasonably have been implemented on a relatively inexpensive 2X CD-ROM add-on to the NES in 1994 or earlier. One of them is LZW. Because LZW was patented from 1983 until 2003, it specifically would probably not have been used on an “NES CD” system in 1994 due to licensing costs. However, we are free to use it for Former Dawn since we are creating this game well after 2003. Also, the related but simpler LZ77 algorithm is currently under consideration because it seems to have enough compression power for us while being simpler to implement in Verilog. The compression ratio afforded by LZ77 might even be ample enough to model Former Dawn‘s data transfers on a 1X CD-ROM drive, which would put it in direct period-correct competition with the TurboGrafx-CD.

There are many other (somewhat novel) aspects to Former Dawn’s design than what we’ve facilitated directly in the mapper/expansion chip. But this article is really about unlocking the potential of the NES, which we feel is the responsibility of such a chip. Therefore, software-based tricks that we have invented or are borrowing from other developers will be covered in future posts.

= Frequently Asked Questions =

Q: Isn’t this just cheating?

A: Wow, do we get this question a lot. The answer is a solid no, in the sense that we are not “cheating” any more than The Legend of Zelda or Punch-Out!! are cheating. They used RAM on the cartridge (not just for saving); we use RAM on the cartridge. They did automatic mid-frame bankswitching; so are we. The list goes on, but the two most important things to realize are that everything we’re doing in the mapper per se was possible to accomplish economically in 1989, and that most of the classical NES games you know and love used essentially the same tricks, although in less refined forms and to less overall effect due to limited ROM sizes. The full response to this question deserves its own article, and I will probably write one at some point because this question gets posed more than any other one, and it is also the most controversial.

Q: Was all of this really possible when the NES was a current gen system?

A: Yes.

Q: Was all of this really economically feasible when the NES was a current gen system? Surely it would’ve been too expensive to engineer and deliver to the market at a price people would’ve paid.

A: Actually, we think everything we’ve done could’ve been done cheaply enough to be economically feasible if not compelling — certainly by 1994, but arguably even further back in time than that. People should keep in mind that the TurboGrafx-16 enjoyed its CD-ROM expansion in Japan by 1988 and it was commercially successful there. Why should the NES have been any different? Yes, it would’ve been more difficult to program the CD-ROM games for the NES, but far from impossible as we are continually proving as this project marches forward. The unfortunate reality is that Nintendo Co. Ltd. has had a tendency for a very long time to favor the least expensive option at any given time in history. After the burn caused by the split with Sony and the retail release of the PlayStation independent of any association with Nintendo, Nintendo opted for cartridge-only engineering for the Nintendo 64, which turned out to be very financially damaging. Even when they released a spinning media expansion for the N64 (called the 64DD), they chose to do it with (yet again) anemic data size disks by the standards of the day. Only 64MiB on a disk, while their competitors were putting out discs with 10 times that amount of data. In other words, Nintendo found themselves in the reversed position in the mid/late 90s when the competition was with Sega and Sony as they did in the early/mid 80s when the competition was with Atari and Coleco. One further point is that Nintendo chose to make memory expansion far more expensive in aggregate by including the memory mappers on every single NES cartridge instead of on a common expansion module that new games could all use. If someone owned 20 NES games, they paid for their memory mapper chips 20 separate times, with the costs buried in the prices of the individual games. Our proposed system would’ve been a 1-time expense, with the games themselves being cheaper. This is the same business model as the FDS, except with a far greater amount of storage. That greater amount of storage would’ve prevented the expansion from becoming obsolete, as the FDS did within a year or two of release as cartridge manufacturing prices kept falling.

Q: If this was possible back then, why didn’t Nintendo or some other company do it?

A: The obvious answer is that they already had the Super Famicom / SNES lined up for research and development by the time that this was economically feasible to do (1988-1989). Nintendo probably figured that if they were going to dive into the CD-ROM market, they may as well upgrade the underlying console at the same time. What we are exploring is an alternate timeline in which they kept the base system the same and “merely” expanded it the way that NEC and Sega did. Similarly, it’s akin to what happened with MS-DOS based PCs in the early 90s — the system architecture was left completely intact or at least backwards compatible, but with CD-ROM drives being added on. Those were often bundled with sound cards that interfaced directly to them and allowed a more enriched experience than the extra data alone provided. Ultimately, though, the justification for doing this rests on the technology and the economics, not the business acumen. It is very far from the truth that every decision that Nintendo made was the correct one. Plenty of gimmicky products were engineered and released to market that were far less worthy than what it is we’re trying to accomplish. I offer for your consideration this short list of examples: Virtual Boy, Famicom Disk System, Sufami Turbo, Datach, R.O.B., 64DD, and Power Glove. Insisting on only releasing cheap hardware does not guarantee that that hardware is a good value proposition. What makes it to market and what doesn’t is as much a function of executive caprice as it is intrinsic merit.

Q: Isn’t this just cheating, though?

A: No.

Q: Why don’t you just make Former Dawn for the SNES instead? Or the PC for that matter?

A: There are many reasons for this, but the primary one is that we feel quite a bit of love and respect for the NES and its role in video game history. We see it as a system that never saw its true potential. Frankly, it’s a shame that no one before us has chosen to do the relatively small amount of hardware engineering to “dance” with the CPU and PPU in just the right way.

Q: Doesn’t using an FPGA on the cartridge invalidate your claims to period correctness? FPGAs weren’t even invented yet by the time the NES was pulled off the store shelves. FPGAs are also incredibly powerful.

A: Well, this is true in a very trivial sense — the particular implementation that we’ve chosen to employ was indeed not possible in 1994. Then again, neither were large NOR flash chips that everyone uses for modern NES homebrew releases. Other people/companies use NOR flash for their modern NES cartridges for the same reason that we’re using an FPGA for our memory mapper: modern economics. Mask ROMs are prohibitively expensive in this modern context, and so are ASICs at the production levels we are likely to be at when Former Dawn releases. Nothing technical would prevent us from sending the plans for MXM-0 or MXM-1 to a manufacturer in China and having ASICs stamped out that would accomplish exactly the same thing on our cartridges that an FPGA does. But FPGAs allow us to do it more cheaply and to develop the technology more quickly. It is our level of discipline guided by our philosophy that prevents us from doing something with an FPGA that would’ve been impossible during the NES’s original commercial lifetime. Once we release Former Dawn and subsequently release MXM-0 (and probably MXM-1) to the public under an open source license, anyone will be able to verify this.

Q: Don’t the 8×1 attributes, massive ROM space, and other features of MXM-0/MXM-1 violate the “8-bit aesthetic”? What’s the point of making an NES game if you’re going to try to make it look like an SNES game or something even more advanced?

A: Right; so Battletoads shouldn’t have been created for the NES because it was more advanced looking than Super Mario Bros.? Solstice shouldn’t have been created because it was more advanced than Solomon’s Key? How about Kirby’s Adventure or Batman Return of the Joker? The truth is — on every video game console, the games made later on it look and play better than the earlier ones; it’s not just the NES. Also, as flattering as it is for people to compare what we’ve accomplished to the 16-bit era, we know that we will fail if that is the standard we are being held to. We are simply exploring what it means to maximize 8-bit video game technology, not turn 8-bit technology into 16-bit.

Q: FMV on the NES? Come on.

A: Please tell me with a straight face that kids playing the Jurassic Park NES game in 1993 wouldn’t have lost their minds if they’d seen an FMV cut scene of a T-Rex chasing down the Jeep in the jungle. Rejecting FMV as a candidate part of the NES aesthetic is born out of close mindedness. It is a failure of imagination and recollection of what that time was actually like. FMV is so common now that it’s pretty much expected in a AAA game release, or at least expected to be simulated with real-time rendering. But because it used to be so novel and hard to achieve technologically, almost everyone was excited about FMV in the 80s and 90s. So much so that unfortunately it turned into a gimmick for a lot of game development companies and poor quality FMV games became, for a time, a type of shovelware. What we are intending to do with FMV is comparatively tasteful and driven by a desire to enhance the storytelling medium of the NES, not replace good gameplay with thin wrappers around FMV. Think Another World, not MegaRace.

Q: Cheating!

A: No. Also, that isn’t a question.

-Jared