Skip to content

Conversation

@halldorfannar
Copy link
Contributor

@halldorfannar halldorfannar commented Jan 28, 2026

Changes proposed in this PR

Currently the Windows implementation is completed but other platforms just use a fallback to monitor in slot 1. Also added Monitor name for Windows.

Added a TODO so we can revisit the lack of class for Monitor in a future major release 😉

Fixes #153
(...)

  • Tests added/updated
  • Documentation updated
  • Changelog entry added
  • ./check.sh passed

Currently the Windows implementation is completed but other platforms just use a fallback to monitor in slot 1.
Also added Monitor name for Windows.
Pixels = list[tuple[Pixel, ...]]


class Monitor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, I've been careful to not change any existing aspects of the public API surface. I haven't talked with @BoboTiG about that, though, so he may have different priorities.

This is an example of something that would be a change to that surface. The usage would still be API-compatible, but the signature wouldn't: something using my_monitor: dict = sct.monitors[1] would stop passing type-checking.

The alternative would be to have Monitor inherit from dict, and provide @property attributes that access its entries. That would provide type and API compatibility, but we could mark dict access as deprecated, and eliminate it in the next major rev, simplifying Monitor at that time.

I know I'm the one that suggested using __getattr__ in the first place, so my apologies about the course change.

As another example, I've got one change in the works that affects a lot of ScreenShot, and I'm looking at the likely changes: I'm seeing which changes can be made in an API- and type-compatible way, and which can't. Along the same lines, I'm thinking about what parts of the exposed API we want to say are "public", in the sense of semantic versioning, and which ones we want to stop exposing with public names (such as mss.windows.MSS.user32).

What do you think, @halldorfannar and @BoboTiG?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to not touch the current MSS-specific public API (primary functions and types). The example about mss.windows.MSS.user32 is not considered MSS-specific and can be made private when we want. But Monitor should be kept backward compatible.

The code being used for years still needs to work without changes after we ship the next release. And for the next major release, I am also kind of against breaking the API since this is not really necessary.

How does it sound to both of you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make sure we’re all aligned on terminology and expectations, since this is really about policy going forward.

First off: I completely agree with the priority of protecting existing users. MSS has earned a lot of trust, and preserving working code across releases is hugely important. Nothing here is meant to undermine that.

What I’m trying to clarify is how we define “public API” and “breaking change,” especially in Python, where the line can be fuzzy.

I’m assuming we’re roughly following Semantic Versioning principles. SemVer only really works if we agree on what the public API is. In Python, the usual convention is “no leading underscore == public,” but MSS historically hasn’t been strict about that (such as the user32 example). That’s fine — but it means we need to be explicit about what we consider stable and versioned.

One thing I’ve personally been treating as part of the public API is type declarations, not just runtime behavior. That’s increasingly important now that many users run MyPy or Pyright in CI. A type signature is effectively a promise, even if Python itself won’t enforce it at runtime.

That leads to an important distinction:

  • Runtime compatibility: what actually happens when code runs
  • Type compatibility: whether static type checkers still accept the code

They matter in different ways.

Python APIs are famously hard to change without some theoretical breakage. It’s basically impossible to guarantee that no user code could ever be affected — even adding a key to a dict could break someone doing something exotic. So instead of “never break anything,” I’ve been thinking in terms of typical usage. If something would only break highly unusual or contrived code paths, that carries much less weight than breaking common, idiomatic usage.

With that framing, here’s how I’ve been thinking about changes:

  • Runtime-breaking changes (new required parameters, removing methods, changing behavior in common paths) are the most serious. Those should only happen in a major release, after deprecation, and only if there’s a strong reason. I agree that we should avoid these without a strong reason.
  • Type-only breaking changes (changing a return type in a way that still works at runtime for typical usage) are still API breaks, so should only be at major version boundaries, but they’re lower-impact. They mostly affect users running static analysis, and usually require updating annotations rather than logic.

The Monitor discussion falls into that second category. Returning a Mapping instead of strictly a dict would still behave the same at runtime for typical access (m["left"], iteration, etc.), but it would break type checks for code that annotates it as dict. That’s why I hesitated once I realized the implication.

Now, tying this back to your response: when you say “Monitor should be kept backward compatible,” I fully agree at runtime. Where I want to be careful is whether we’re also committing to never changing exposed types, even at major versions.

The reason this matters is future performance work.

For example, today ScreenShot.bgra is declared as bytes. That implicitly commits us to copying a very large buffer every capture. On a 4K screen that’s ~32 MB, which is several milliseconds per frame — a nontrivial cost if someone is trying to hit 60 fps.

Most OS capture APIs already give us a writable buffer in CPU memory. Python can expose that safely via memoryview, avoiding the copy entirely. For typical usage, a memoryview behaves almost identically to bytes — indexing, slicing, passing to PIL / NumPy / PyTorch all work — but it’s dramatically faster.

From a runtime perspective, this change would be almost invisible to most users. From a typing perspective, however, the distinction matters more.

MSS would need to widen its type declarations (for example, from bytes, to Buffer or bytes | bytearray | memoryview) to reflect what it is returning. Any user code that has explicitly annotated bgra as bytes would start triggering type checker warnings once they upgrade. The runtime logic would still work as before, but static analysis tools would complain. That’s why I’ve been treating this class of change as a public API change, and so far only considering it at major version boundaries.

That’s the kind of change I’m trying to plan carefully around. I don’t want to sneak it in under a minor release — but I do think it’s reasonable to allow this sort of evolution at a major version boundary, because it enables MSS to better deliver on its “ultra-fast” goal.

So my proposed policy, in short, would be:

  • Keep the runtime API stable across releases, including major ones, insofar as is reasonable for typical usage. Avoid breaking existing working code unless there is a compelling reason and no practical alternative.
  • Treat type changes as public API changes, and allow them at major version boundaries when they enable meaningful improvements, even if runtime behavior for typical usage remains unchanged.
  • Be explicit about what we consider public, so users and contributors know what’s safe to rely on.

My intent here isn’t to justify breaking working code, but to make sure we have room to improve performance and structure without surprising users or their tooling.

I’m taking this opportunity to agree on the rules now, so future work doesn’t turn into a philosophical debate every time we touch a boundary.

Curious if that framing matches how you’re thinking about it, or if you see things differently.

Copy link
Owner

@BoboTiG BoboTiG Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your words are perfectly aligned with my vision, I would not have been able to better phrase it :)

I tried to follow SemVer from the beginning, hopefully enough that it made sense so far.

In my perspective, and I was too lazy at the beginning to properly make those objects private, all objects used by MSS to power MSS content (like user32 on Windows, core on macOS and so on) are not intended to be used by developers/users. I mean, they should have been private from the start, and so we can treat them as we want. It is OK to make them private once and for all.

About the MSS API already used by thousands of people, I do not close the door to improvements, and sometimes necessary breaking changes. If that deserves MSS, then lets do it!

As I am typing, I am thinking that if we change the type of Monitor, for instance, then it might be doable without, hopefully, too many problems to users: in a user code using Monitor, I guess they import the type object from MSS, so if we change it, this could be done transparently to them (best case scenario).

All in all, I am not a conservator, I know pretty well things must change if we want to move forward. Lets doing it with parcimony, and keep users in mind everytime since I would prefer to lower their burden at each release.

I am also totally aligned about preventing philosophy talks at each change (I am not a meeting person, I prefer action first :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I enjoy this conversation. It is the right discussion to have, even if it's happening as fallout from my draft MR 😄 I am also a stickler for SemVer. I have followed it on many projects to the letter, and even though it has felt constricting and painful at times, my customers have always thanked me for it.

One of the reasons I made this MR a draft was exactly the uneasy feeling I got when I saw the fallout from changing the Monitor type. Even though I was bending over backwards to ensure runtime compatibility there were still issues, as @jholveck has rightly pointed out. I also just don't like it when a supposedly simple change spreads into multiple files. It feels like a smell, something is off.

My suggestion therefore is that we keep Monitor as a dict type for now. But we add a TODO in there (or whatever mechanism this project prefers for future annnouncement) about this becoming a Class type in the next major release. That way we announce our intent.

I would also like for us to do this with other potential changes we may want to make in the future. If they require changes that break the API we need to target them for the next major release. As part of release planning we should collect those somewhere, so that we can look up the next major release and see what things we are planning to break with that release. I emphasize planning because in my experience we usually find that not all changes get implemented (some are deemed low-value while others have in the light of time and experience been deemed a bad idea 😆).

Finally, I think we should get low-hanging fruit (like this Monitor improvement) into shape so it causes no breakage and cut a release soon. We should then discuss when the next major release is planned, so we can start aligning changes with the right release (major or minor).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A TODO could be added right above the Monitor declaration. It will be a matter of looking for "TODO" strings if anyone wants to give a try.

As MSS is quite stable, there is no really version schedule. When there is a known problem with a fix, or an improvement done, then I cut a fresh version.

Since the latest available MSS version, changes to be released are huge! Cutting a new release soon would allow us to break things for the next version, which would be a major.
I would like to spread those significant improvements @jholveck and you provided so far to the community without a major release.

So yes, maybe sticking to Monitor as a dict would be a good thing until the next release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All righty then! This is a smaller change now which I think is better for everybody involved. I have removed the "Draft" label. Please review at your leisure.

@halldorfannar halldorfannar changed the title Draft: Add Monitor class and primary monitor support Draft: Enhance Monitor dict Jan 29, 2026
@halldorfannar halldorfannar changed the title Draft: Enhance Monitor dict Draft: Enhance Monitor dict, adding primary monitor and name attributes Jan 29, 2026
Had to adjust pyproject.toml so that TODOs don't fail the checking.
Wish they could be flagged as warnings but ruff does not support warnings, only errors.
Before major releases we should remove the ignore entry for TODO and see what we get.
@halldorfannar halldorfannar changed the title Draft: Enhance Monitor dict, adding primary monitor and name attributes Enhance Monitor dict, adding primary monitor and name attributes Jan 29, 2026
halldorfannar and others added 3 commits January 30, 2026 08:38
Add back a line that I accidentally dropped from the CHANGELOG.md
This will make the docs build without warning. The warning is benign.
@BoboTiG
Copy link
Owner

BoboTiG commented Jan 30, 2026

Would you mind updating https://github.com/BoboTiG/python-mss/blob/main/CHANGES.md too?

@jholveck
Copy link
Contributor

jholveck commented Jan 30, 2026

Do you know what happens with display mirroring, like if you've got a screen being shown on both your laptop and a projector? There's no bad choice there; I was just curious. I might suggest making sure it only shows up once in the list, though, and not multiple times with the same coordinates.

I'm drafting the Linux version, and just noticed that possibility. XRandR exposes all the outputs for a given MSS-style "monitor", and each output has its own name (actually, possibly two). Right now, I'm preferring the primary output's name, if it's on the primary monitor, and otherwise just whichever was first in the list.

Also, what do y'all think we should do if the system can't detect the name, like if they don't have XRandR installed? Use an empty string? None? Don't populate "name"? I kinda prefer the last; somebody searching by name can use monitor.get("name") == target_name easily enough, but I can see arguments either way.

One reason I originally wanted to improve the way we handle the monitor list by using classes with some logic was to be able to query something like sct.monitors["EPSON PJ"] to get whatever was connected to the office projector, whether it's using display mirroring or whatever. Indeed, my original concept had been to attach make primary an attribute of the monitors list, rather than the MSS object, but either of those works fine.

At the same time as we change Monitor to be a class, we might also want to make MonitorList (or whatever) a class too, to allow some of these advanced searches (with logic in __getitem__): using sct.monitors["DVI-1"] or sct.monitors["DELL U2725QE"] both could give you the same monitor, as well as sct.monitors["EPSON PJ"] if you're mirroring to a projector.

So it might make sense to have each monitor have a names array, in addition to or instead of the name string, to provide all the names that might be used to refer to that monitor: all the connection names (for Linux) or GDI names (for Windows), and the names of all the physical display devices ("DELL U2725QE"), and anything else that would be a reasonable identifier.

@BoboTiG
Copy link
Owner

BoboTiG commented Jan 30, 2026

Also, what do y'all think we should do if the system can't detect the name, like if they don't have XRandR installed? Use an empty string? None? Don't populate "name"? I kinda prefer the last; somebody searching by name can use monitor.get("name") == target_name easily enough, but I can see arguments either way.

For now, skipping "name" from the dict steems a good idea.
With #470, we might go with an empty string to simplify typing.

I like the idea to be able to pick a monitor from a name. I never had this use case, but it is interesting to have!

@jholveck
Copy link
Contributor

While we're talking about the idea that a user might have different ways they want to identify a monitor, on Windows, there's also the PnP ID (MONITOR\DELxxxx\{...}). That one is stable, if I understand correctly, so might be what a user wants to use to keep the same monitor ID. Not much else is going to be stable across reboots and changes to their monitor configuration (like moving their laptop between their office docking station, using it on the train, and meeting rooms), if they just have a script they want to find the projector for their Thursday afternoon meeting room.

We are talking about two different use cases: one for a human-friendly label, and possibly multiple for a stable identifier. Uniqueness is also a consideration, if somebody has multiple monitors of the same model.

I'm not saying we need to address all this now; we're talking about unusual use cases. But we should at least make sure our design doesn't preclude us from making this viable in the future.

@jholveck
Copy link
Contributor

jholveck commented Jan 30, 2026

Ooof, and ChatGPT just pointed out an issue with HiDPI and mirroring.

One more gotcha: mixed scaling. On Windows especially, depending on whether you’re DPI-aware, the rectangles you enumerate can be “logical” rather than “physical”. If MSS captures in physical pixels but you list monitors in logical coords, users will think your rectangles are wrong. So whatever you do, keep “coords are in capture pixels” consistent and document it.

Apparently, Windows can mirror between a HiDPI display and a non-HiDPI display. (ChatGPT elaborates here.) But IIRC, MSS will globally set the running app as HiDPI-aware. I'm not sure how that figures into things.

I don't entirely trust LLMs to be 100% accurate on these sorts of things: HiDPI also implies lots of changes to underlying rendering that the model that ChatGPT implies here. But it does seem to be worth examining.

@halldorfannar
Copy link
Contributor Author

Would you mind updating https://github.com/BoboTiG/python-mss/blob/main/CHANGES.md too?

Why do we have two files, CHANGES.md and CHANGELOG.md? Feels a bit heavyweight, especially considering that CHANGELOG.md seems to contain most changes. If we need both then I would appreciate a guideline about what goes where. I couldn't see anything about these files in the docs and this repo does not have a CONTRIBUTING.md file.

@halldorfannar
Copy link
Contributor Author

@jholveck I like your thoughts about future improvements. Yes, some of these we will only be able to do in a future major release. I do agree that sct.monitors.primary is more ergonomic than sct.primary_monitor. But we will probably have both eventually (since we have to do the latter for now). I also like the idea of sct.monitors["some unique stable name"] since this will be quite useful for customers with multi monitor setup.

Do you know what happens with display mirroring, like if you've got a screen being shown on both your laptop and a projector? There's no bad choice there; I was just curious. I might suggest making sure it only shows up once in the list, though, and not multiple times with the same coordinates.

I would expect monitors list to contain both displays, and they would have the same coordinates. That makes sense to me. The user may have a reason for wanting to use one or the other. I do not want to reduce this to a single entry and take away that choice.

Finally, the DPI scaling curve-ball. I hadn't thought about how that effects things. If I understand you correctly, the design today is such that it is not hurting us - is that right?

@halldorfannar
Copy link
Contributor Author

halldorfannar commented Jan 30, 2026

While we're talking about the idea that a user might have different ways they want to identify a monitor, on Windows, there's also the PnP ID (MONITOR\DELxxxx\{...}). That one is stable, if I understand correctly, so might be what a user wants to use to keep the same monitor ID. Not much else is going to be stable across reboots and changes to their monitor configuration (like moving their laptop between their office docking station, using it on the train, and meeting rooms), if they just have a script they want to find the projector for their Thursday afternoon meeting room.

We are talking about two different use cases: one for a human-friendly label, and possibly multiple for a stable identifier. Uniqueness is also a consideration, if somebody has multiple monitors of the same model.

I'm not saying we need to address all this now; we're talking about unusual use cases. But we should at least make sure our design doesn't preclude us from making this viable in the future.

Yeah, the PnP ID is nice for unique identification. Do we have something similar on Linux/macOS? I suggest we name this attribute uid or something similar. On Windows it's not a very readable name, it's more like a gibberish looking path. We could then rename the current name attribute to model_name (or just model) to make it more explicit that it is not unique.

@jholveck
Copy link
Contributor

Linux doesn't have something quite like that, no.

On Linux, the connector ID, like "DVI-1", is the most common identifier.

I'll be parsing the EDID block anyway to get the model name, so I can extract the full ID. That's a 10-byte structure including the three-letter manufacturer code (packed five-bit letters, like "PHL" for Philips), product code (16 bits, usually written in hex), serial number (32 bits), and date of manufacture (week and year). Does that seem to be part of what Windows is using there too, or is it totally different?

@jholveck
Copy link
Contributor

Finally, the DPI scaling curve-ball. I hadn't thought about how that effects things. If I understand you correctly, the design today is such that it is not hurting us - is that right?

No idea. 🤷

@jholveck
Copy link
Contributor

jholveck commented Jan 31, 2026

Do you know what happens with display mirroring, like if you've got a screen being shown on both your laptop and a projector? There's no bad choice there; I was just curious. I might suggest making sure it only shows up once in the list, though, and not multiple times with the same coordinates.

I would expect monitors list to contain both displays, and they would have the same coordinates. That makes sense to me. The user may have a reason for wanting to use one or the other. I do not want to reduce this to a single entry and take away that choice.

While I normally prefer to give an API caller all the information and let them decide what to do with that information, I'm not sure about this case. We're really giving the user capture regions, not monitors. When they pass it back to us, all we use is the capture region information. You mention the user wanting to use one or the other, but MSS will do the exact same thing with the data, regardless of which the user provides.

The only time that duplicates in the monitors list are relevant, that I can think of, is when invoking sct.save() with no arguments (or an explicit mon=0), in which case it saves one screenshot per monitor. It would seem that having one screenshot per copy of a mirrored display would be more surprising than one per region. We can solve that in sct.save() ourselves, of course, but if the user is doing something similar - like separately streaming each display - then they'd see the same problem.

Edit: fix quoting. Also, I'm just giving an alternative perspective; we can still use duplicates if that still seems to make more sense.

@BoboTiG
Copy link
Owner

BoboTiG commented Jan 31, 2026

Would you mind updating https://github.com/BoboTiG/python-mss/blob/main/CHANGES.md too?

Why do we have two files, CHANGES.md and CHANGELOG.md? Feels a bit heavyweight, especially considering that CHANGELOG.md seems to contain most changes. If we need both then I would appreciate a guideline about what goes where. I couldn't see anything about these files in the docs and this repo does not have a CONTRIBUTING.md file.

I do not recall why two files exactly. At fist, I think I wanted to split technical changes from a humand-readable changelog, so that developers can simply check CHANGES.md for technical adjustments in their code.

@halldorfannar
Copy link
Contributor Author

Linux doesn't have something quite like that, no.

On Linux, the connector ID, like "DVI-1", is the most common identifier.

I'll be parsing the EDID block anyway to get the model name, so I can extract the full ID. That's a 10-byte structure including the three-letter manufacturer code (packed five-bit letters, like "PHL" for Philips), product code (16 bits, usually written in hex), serial number (32 bits), and date of manufacture (week and year). Does that seem to be part of what Windows is using there too, or is it totally different?

This is good! This will work as UID.

@halldorfannar
Copy link
Contributor Author

Would you mind updating https://github.com/BoboTiG/python-mss/blob/main/CHANGES.md too?

Why do we have two files, CHANGES.md and CHANGELOG.md? Feels a bit heavyweight, especially considering that CHANGELOG.md seems to contain most changes. If we need both then I would appreciate a guideline about what goes where. I couldn't see anything about these files in the docs and this repo does not have a CONTRIBUTING.md file.

I do not recall why two files exactly. At fist, I think I wanted to split technical changes from a humand-readable changelog, so that developers can simply check CHANGES.md for technical adjustments in their code.

Hmmm ... so CHANGES.md is for internal changes that benefit those who contribute to this project while CHANGELOG.md is for consumers of the python package? I'm asking so I can get a clear picture of what the intent is.

@BoboTiG
Copy link
Owner

BoboTiG commented Jan 31, 2026

Would you mind updating https://github.com/BoboTiG/python-mss/blob/main/CHANGES.md too?

Why do we have two files, CHANGES.md and CHANGELOG.md? Feels a bit heavyweight, especially considering that CHANGELOG.md seems to contain most changes. If we need both then I would appreciate a guideline about what goes where. I couldn't see anything about these files in the docs and this repo does not have a CONTRIBUTING.md file.

I do not recall why two files exactly. At fist, I think I wanted to split technical changes from a humand-readable changelog, so that developers can simply check CHANGES.md for technical adjustments in their code.

Hmmm ... so CHANGES.md is for internal changes that benefit those who contribute to this project while CHANGELOG.md is for consumers of the python package? I'm asking so I can get a clear picture of what the intent is.

Yes!

@jholveck
Copy link
Contributor

jholveck commented Feb 1, 2026

Linux doesn't have something quite like that, no.
On Linux, the connector ID, like "DVI-1", is the most common identifier.
I'll be parsing the EDID block anyway to get the model name, so I can extract the full ID. That's a 10-byte structure including the three-letter manufacturer code (packed five-bit letters, like "PHL" for Philips), product code (16 bits, usually written in hex), serial number (32 bits), and date of manufacture (week and year). Does that seem to be part of what Windows is using there too, or is it totally different?

This is good! This will work as UID.

If it's the same information as Windows provides, we may as well format it the same: I don't get it in text form, so I'll be building a string. Do you have a monitor's Windows PnP ID handy?

If you also have its EDID block, that's be handy, so I can make sure I'm using everything the same (like making hex strings bytewise / big-endian vs. wordwise).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

How can I tell which is the primary monitor in the monitors array?

3 participants