Last Updated: October 10, 2025
Status: Active Planning
This document outlines improvements, refactors, and optimizations for the yaze emulator core. These changes aim to enhance accuracy, performance, and code maintainability.
Items are presented in order of descending priority, from critical accuracy fixes to quality-of-life improvements.
The emulator's Audio Processing Unit (APU) currently fails to load and play music. Analysis shows that the SPC700 processor gets "stuck" during the initial handshake sequence with the main CPU. This handshake is responsible for uploading the sound driver from ROM to APU RAM. The failure of this timing-sensitive process prevents the sound driver from running.
The process of starting the APU and loading a sound bank requires tightly synchronized communication between the main CPU (65816) and the APU's CPU (SPC700).
$AA to port $F4 and $BB to port $F5$BBAA$CC to APU's input port$CC and prepares to receive data blockThe "stuck" behavior occurs because one side fails to meet the other's expectation. Due to timing desynchronization:
The result is an infinite loop on the SPC700, detected by the watchdog timer in Apu::RunCycles.
The handshake's reliance on precise timing exposes inaccuracies in the current SPC700 emulation model.
The emulator uses a static lookup table (spc700_cycles.h) for instruction cycle counts. This provides a base value but fails to account for:
While some of this is handled (e.g., DoBranch), it is not applied universally, leading to small, cumulative errors.
The step/bstep mechanism in Spc700::RunOpcode is a significant source of fragility. It attempts to model complex instructions by spreading execution across multiple calls. This means the full cycle cost of an instruction is not consumed atomically. An off-by-one error in any step corrupts the timing of the entire APU.
The use of double for the apuCyclesPerMaster ratio can introduce minute floating-point precision errors. Over thousands of cycles required for the handshake, these small errors accumulate and contribute to timing drift between CPU and APU.
The Spc700::RunOpcode function must be refactored to calculate and consume the exact cycle count for each instruction before execution.
bstep mechanism. An instruction, no matter how complex, should be fully executed within a single call to a new Spc700::Step() functionThe main Apu::RunCycles loop should be the sole driver of APU time.
Spc700::Step() and Dsp::Cycle(), decrementing cycle budget until exhaustedExample of the new loop in Apu::RunCycles:
To eliminate floating-point errors, convert the apuCyclesPerMaster ratio to a fixed-point integer ratio. This provides perfect, drift-free conversion between main CPU and APU cycles over long periods.
Snes::RunCycle() advances the master cycle counter by a fixed amount (+= 2). Real 65816 instructions have variable cycle counts. The current workaround of scattering callbacks_.idle() calls is error-prone and difficult to maintain.Cpu::ExecuteInstruction to calculate and return the precise cycle cost of each instruction, including penalties for addressing modes and memory access speeds. The main Snes loop should then consume this exact value, centralizing timing logic and dramatically improving accuracy.Snes::RunFrame() is state-driven based on the in_vblank_ flag. This can be fragile and makes it difficult to reason about component state at any given cycle.Ppu::RunLine calls HandlePixel for every pixel). This is highly accurate but can be slow due to high function call overhead and poor cache locality.Benefits:
dsp.cc and spc700.cc, inherited from other projects, is written in a very C-like style, using raw pointers, memset, and numerous "magic numbers."std::arraymemsetconstexpr variables or enum class types for hardware registers and flagsEmulator::Run queues audio samples directly to the SDL audio device. If the emulator lags for even a few frames, the audio buffer can underrun, causing audible pops and stutters.DisassemblyViewer uses a std::map to store instruction traces. For a tool that handles frequent insertions and lookups, this can be suboptimal.std::map with std::unordered_map for faster average-case performance.ShouldBreakOn... functions perform a linear scan over a std::vector of all breakpoints. This is O(n) and could become a minor bottleneck if a very large number of breakpoints are set.std::unordered_set<uint32_t> for O(1) average lookup time. This would make breakpoint checking near-instantaneous, regardless of how many are active.The SNES emulator experienced audio glitchiness and skips, particularly during the ALTTP title screen, with audible pops, crackling, and sample skipping during music playback.
NewFrame() method was never called, causing timing driftdsp.NewFrame() call before sample generationInterpolation options (src/app/emu/audio/dsp.cc):
| Interpolation | Notes |
|---|---|
| Linear | Fastest; retains legacy behaviour. |
| Hermite | New default; balances quality and speed. |
| Cosine | Smoother than linear with moderate cost. |
| Cubic | Highest quality, heavier CPU cost. |
Result: Manual testing on the ALTTP title screen, overworld theme, dungeon ambience, and menu sounds no longer exhibits audible pops or skips. Continue to monitor regression tests after the APU timing refactor lands.
docs/E4-Emulator-Development-Guide.md - Implementation detailsdocs/E1-emulator-enhancement-roadmap.md - Feature roadmapdocs/E5-debugging-guide.md - Debugging techniquesStatus: Active Planning
Next Steps: Begin APU timing refactoring for v0.4.0