Skip to content

Replace CPU core with w65c02s.h#437

Open
ziplantil wants to merge 1 commit into
commanderx16:masterfrom
ziplantil:w65c02s-h
Open

Replace CPU core with w65c02s.h#437
ziplantil wants to merge 1 commit into
commanderx16:masterfrom
ziplantil:w65c02s-h

Conversation

@ziplantil

@ziplantil ziplantil commented Oct 20, 2022

Copy link
Copy Markdown

This replaces the fake6502-based emulator core with w65c02s.h, which should be more accurate.

@ziplantil ziplantil changed the title [WIP] Replace CPU core with w65c02s.h: initial commit [WIP] Replace CPU core with w65c02s.h Oct 20, 2022
@ziplantil ziplantil changed the title [WIP] Replace CPU core with w65c02s.h Replace CPU core with w65c02s.h Oct 21, 2022
@ziplantil ziplantil marked this pull request as ready for review October 21, 2022 22:08
@ziplantil

ziplantil commented Oct 22, 2022

Copy link
Copy Markdown
Author

More complex software seems to result in weird issues. For example, BASIC programs tend to print floating-point numbers and throw floating-point related errors. I'm tracking down the cause, but I have a few possible ideas:

  1. bug in w65c02s.h (not at all impossible)
  2. likely difference in when an interrupt is acknowledged. The current IRQ/NMI assert in the main emulator loop will cause the CPU to acknowledge the interrupt one instruction too late, because they are asserted after the last cycle of an instruction, but w65c02s.h latches, like the real deal, before the last cycle.
  3. the fact that in w65c02s.h the IRQ is a line held logically high/low, not just a simple function ("do an IRQ") like NMI.

@mist64

mist64 commented Oct 24, 2022

Copy link
Copy Markdown
Collaborator

This may not be a bad idea, but I think it should be discussed with the wider community further. Have you brought this up on the Discord? Or can you ping some of the main contributors and ask them to comment here?

@ziplantil

ziplantil commented Oct 27, 2022

Copy link
Copy Markdown
Author

No, I haven't brought this up anywhere. I was hoping this PR could serve as the venue for discussing this change, which is fairly major and probably won't (and shouldn't) be merged on short notice.

Even after the current issues with tne new core have been ironed out (unfortunately not something I've had much time to look into this past week), I still feel there is a great opportunity here to rewrite much of the main loop. The w65c02s.h core really prefers running many instructions at once rather than single-stepping instructions (running N instructions or cycles at once is much better for performance in general), but doing so while maintaining accuracy with interrupts from other components is the tricky part.

@mist64

mist64 commented Oct 28, 2022

Copy link
Copy Markdown
Collaborator

[...] I still feel there is a great opportunity here to rewrite much of the main loop. The w65c02s.h core really prefers running many instructions at once rather than single-stepping instructions (running N instructions or cycles at once is much better for performance in general), but doing so while maintaining accuracy with interrupts from other components is the tricky part.

Was was going to say: How about merging this code as an alternative CPU core – maybe even one that we can run in parallel to see where the differences/bugs are. But your main goal requires redesigning the interface of the main emulator to the CPU core.

Changing any of the architecture into emulating more instructions at a time is not only tricky because of interrupts, but also because of the state of other components that has to progress at the correct rate.

IIRC, VICE adopted an architecture 20 years ago or so where it was optimized to run the different components independently for as long as possible, instead of cycling through all components for every clock cycle. Here's a presentation (in German) by @fachat about this: https://www.youtube.com/watch?v=DZ6shiQ-MFQ

@ziplantil

ziplantil commented Oct 28, 2022

Copy link
Copy Markdown
Author

Fundamentally, the problem of keeping the other components in sync has only two difficult parts in it:

  1. Components that have externally facing effects, such as audio (the user can hear what the component is doing)
  2. interrupts.

Anything else is solved by using the read/write callbacks. In w65c02s.h, every single call to a read/write callback corresponds to a single CPU cycle. If the read/write targets MMIO, we run that peripheral for however many cycles we need to in order to catch up with the CPU. If there are interdependencies between components, this is also where that is handled (for example, we could run all peripherals on MMIO).

For user-facing devices, we simply need to cap the number of cycles. The audio device must be run every 1000 cycles, for instance.

Interrupts are tricky, because there may be an interrupt by a peripheral between MMIO accesses by the CPU. One solution is that every peripheral that raises interrupts is able to tell how many cycles can be safely run before an interrupt can occur.

If the plan is though to use w65c02s.h as an alternative core for now, it's indeed wise to postpone rewriting the timing loop.

@fachat

fachat commented Oct 28, 2022

Copy link
Copy Markdown

Changing any of the architecture into emulating more instructions at a time is not only tricky because of interrupts, but also because of the state of other components that has to progress at the correct rate.

IIRC, VICE adopted an architecture 20 years ago or so where it was optimized to run the different components independently for as long as possible, instead of cycling through all components for every clock cycle. Here's a presentation (in German) by @fachat about this: https://www.youtube.com/watch?v=DZ6shiQ-MFQ

And here's the english version of the same presentation: https://youtu.be/84XfgIjFtDU

@fachat

fachat commented Oct 28, 2022

Copy link
Copy Markdown

In short: As a general concept, VICE has a CPU clock tick. Each component (e.g. VIA etc) "knows" its state in its local clock tick, and can (either cycling, or even fast-forwarding) to a new clock tick when needed, e.g. forwarding to the CPU clock tick value when a timer register is being accessed. In addition, each component can register update events ("alarm" in VICE) for a specific CPU clock tick value, where a callback is called. This is e.g. used for Commodore PET video vsync interrupt every frame, or setting the interrupt flag of the CPU on e.g. VIA timer underflows.
The main CPU loop can run undisturbed, and only needs to compare its clock tick value with the next upcoming alarm tick value, where it branches of the CPU execution loop to do the callback as needed.

@ziplantil ziplantil force-pushed the w65c02s-h branch 2 times, most recently from 45b79cf to 3ed337a Compare October 29, 2022 18:51
@ziplantil

ziplantil commented Oct 29, 2022

Copy link
Copy Markdown
Author

I fixed the CPU issues - it was indeed a bug in w65c02s.h, which has been fixed now. The new core seems to work fine now as far as I can test it - it seems for some reason after the recent PR pulls, I can no longer use the keyboard on a Linux build (even on a build on the master branch - but pasting text appears to work fine).

@mist64

mist64 commented Oct 29, 2022

Copy link
Copy Markdown
Collaborator

it seems for some reason after the recent PR pulls, I can no longer use the keyboard on a Linux build (even on a build on the master branch - but pasting text appears to work fine).

Update the ROM.

@ziplantil ziplantil marked this pull request as draft October 30, 2022 20:00
@ziplantil

Copy link
Copy Markdown
Author

I'm converting this to draft for now - I'll continue working on w65c02s.h a bit more before I feel this will be ready to be merged.

Comment thread src/main.c
the emulator loop can be made a lot tighter => faster.
the CPU cycle counter can be read from the read/write callbacks.
the only question is: how do we handle interrupts?
see discussion in <https://github.com/commanderx16/x16-emulator/pull/437/> */

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interrupts are only one of the problems here. While some components have "auto-step" implementations to catch up to the current CPU cycle on a memory read or write, not all components have this implementation (in particular those that are updated in this loop). You'd want to add appropriate implementations for those components. And some of these emulator-step components do so specifically because they're something like the VIA timers and they can generate an interrupt request at arbitrary intervals.

There are also potential problems with VERA replication. Granted, right now the VERA implementation has the idiosyncrasy that it renders an entire line when it does anything at all, but I suppose the point here is that the best-case scenario is to only run as many CPU cycles as it needed to reach the next scanline.

The best thought I've had on this, which I might eventually experiment with on Box16, is to have various components that can generate an IRQ implement some concept of estimating the number of clock ticks until the next IRQ is fired -- or perhaps estimate the clock tick counter value at which we expect the next IRQ to fire. In the latter case, some hypothetical cpu_run() would tick the CPU until it observed that the next expected IRQ clock tick counter value was <= the current clock tick counter value.

@indigodarkwolf indigodarkwolf Nov 1, 2022

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(And in the case that some scheme like that were to be implemented, it would be totally acceptable for some component like VERA to lie in order to enforce opportunities to process its own internal logic -- so perhaps not an "estimated tick counter value for an IRQ" so much as a "please stop ticking the CPU here so I can run arbitrary logic (which in most cases is likely to generate an IRQ).")

@indigodarkwolf

Copy link
Copy Markdown
Collaborator

Well, I was late to the party and went to the diffs before reading the full conversation here, so my comments cover similar ground to that of @mist64.

@ziplantil ziplantil marked this pull request as ready for review November 5, 2022 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants