Last Updated: October 10, 2025
Status: Active Planning
This document outlines improvements, refactors, and optimizations for the yaze emulator core. These changes aim to enhance accuracy, performance, and code maintainability.
Items are presented in order of descending priority, from critical accuracy fixes to quality-of-life improvements.
The emulator's Audio Processing Unit (APU) currently fails to load and play music. Analysis shows that the SPC700 processor gets "stuck" during the initial handshake sequence with the main CPU. This handshake is responsible for uploading the sound driver from ROM to APU RAM. The failure of this timing-sensitive process prevents the sound driver from running.
The process of starting the APU and loading a sound bank requires tightly synchronized communication between the main CPU (65816) and the APU's CPU (SPC700).
$AA
to port $F4
and $BB
to port $F5
$BBAA
$CC
to APU's input port$CC
and prepares to receive data blockThe "stuck" behavior occurs because one side fails to meet the other's expectation. Due to timing desynchronization:
The result is an infinite loop on the SPC700, detected by the watchdog timer in Apu::RunCycles
.
The handshake's reliance on precise timing exposes inaccuracies in the current SPC700 emulation model.
The emulator uses a static lookup table (spc700_cycles.h
) for instruction cycle counts. This provides a base value but fails to account for:
While some of this is handled (e.g., DoBranch
), it is not applied universally, leading to small, cumulative errors.
The step
/bstep
mechanism in Spc700::RunOpcode
is a significant source of fragility. It attempts to model complex instructions by spreading execution across multiple calls. This means the full cycle cost of an instruction is not consumed atomically. An off-by-one error in any step corrupts the timing of the entire APU.
The use of double
for the apuCyclesPerMaster
ratio can introduce minute floating-point precision errors. Over thousands of cycles required for the handshake, these small errors accumulate and contribute to timing drift between CPU and APU.
The Spc700::RunOpcode
function must be refactored to calculate and consume the exact cycle count for each instruction before execution.
bstep
mechanism. An instruction, no matter how complex, should be fully executed within a single call to a new Spc700::Step()
functionThe main Apu::RunCycles
loop should be the sole driver of APU time.
Spc700::Step()
and Dsp::Cycle()
, decrementing cycle budget until exhaustedExample of the new loop in Apu::RunCycles
:
To eliminate floating-point errors, convert the apuCyclesPerMaster
ratio to a fixed-point integer ratio. This provides perfect, drift-free conversion between main CPU and APU cycles over long periods.
Snes::RunCycle()
advances the master cycle counter by a fixed amount (+= 2
). Real 65816 instructions have variable cycle counts. The current workaround of scattering callbacks_.idle()
calls is error-prone and difficult to maintain.Cpu::ExecuteInstruction
to calculate and return the precise cycle cost of each instruction, including penalties for addressing modes and memory access speeds. The main Snes
loop should then consume this exact value, centralizing timing logic and dramatically improving accuracy.Snes::RunFrame()
is state-driven based on the in_vblank_
flag. This can be fragile and makes it difficult to reason about component state at any given cycle.Ppu::RunLine
calls HandlePixel
for every pixel). This is highly accurate but can be slow due to high function call overhead and poor cache locality.Benefits:
dsp.cc
and spc700.cc
, inherited from other projects, is written in a very C-like style, using raw pointers, memset
, and numerous "magic numbers."std::array
memset
constexpr
variables or enum class
types for hardware registers and flagsEmulator::Run
queues audio samples directly to the SDL audio device. If the emulator lags for even a few frames, the audio buffer can underrun, causing audible pops and stutters.DisassemblyViewer
uses a std::map
to store instruction traces. For a tool that handles frequent insertions and lookups, this can be suboptimal.std::map
with std::unordered_map
for faster average-case performance.ShouldBreakOn...
functions perform a linear scan over a std::vector
of all breakpoints. This is O(n) and could become a minor bottleneck if a very large number of breakpoints are set.std::unordered_set<uint32_t>
for O(1) average lookup time. This would make breakpoint checking near-instantaneous, regardless of how many are active.docs/E4-Emulator-Development-Guide.md
- Implementation detailsdocs/E1-emulator-enhancement-roadmap.md
- Feature roadmapdocs/E5-debugging-guide.md
- Debugging techniquesStatus: Active Planning
Next Steps: Begin APU timing refactoring for v0.4.0