Proposal: Upload flag / Dirty rects #387

dbalsom · 2024-01-13T15:58:20Z

dbalsom
Jan 13, 2024

We may want to render the surface at a faster rate than we update the underlying pixel buffer, especially if we are using render_with to overlay a gui or render with a custom shader. We could have a game tick or emulation target that runs at 30Hz, while updating a shader or debug gui at 60Hz. Currently, the pixel buffer will be uploaded to the GPU on every call to render_with(). Although not an issue for most PCs, it can become an issue on lower-spec hardware.

I propose some sort of facility to render without re-uploading the pixel buffer. To avoid breaking changes, perhaps this can be a flag set (set_dirty(false)?) before calling render_with, or perhaps we can expand functionality with a separate function with a new interface, like render_with_ex. The latter may be preferable I think, especially since we could expand this function in several ways like providing a rect to only upload a portion of the pixel buffer. wgpu seems to support updating regions of a texture via the 'origin' field of ImageCopyTexture and 'size' argument to write_texture, unless I am mistaken.

parasyte · 2024-01-13T20:03:06Z

parasyte
Jan 13, 2024
Maintainer

I do like the idea of incrementally updating the texture without rendering. Not because it will allow dirty rect algorithms, but because it allows the user to avoid keeping a copy of the framebuffer. And thus avoids a memcpy in some cases.

I have written dirty rect bookkeeping in a game engine before and my disappointment is that they amortize to the same performance as copying the whole buffer each frame. I saw another experiment recently where the author noted the same issues: https://blogs.scummvm.org/subr3v/2014/08/11/dirty-rectangles-system-performance-considerations/ (I highly recommend reading the whole series, there are five parts. This is the last of them.) In this specific case, it didn't help where you would want it most, when the FPS was lower than 60. But it did greatly improve scenes that already had better than 60 FPS without. This is not a generalization, but it matches my own experience.

pixels already has everything needed for incremental frame buffer updates. The user calls frame_mut and updates the slice in any way they need, including partially. Then render or render_with when they are ready to present it. The one downside with today's interface (if you can call it that) is that render_with always uploads the buffer before presenting. This is slightly wasteful when the buffer hasn't changed between frames. But it's a weird thing to optimize for the same reason that dirty rects don't work out in practice: most of the time, there will be a lot of activity on the screen, and the whole buffer needs to be updated each frame anyway.

We can try it though, smaller regional uploads. I don't anticipate it will be a marked improvement in the general case.

You brought up slow hardware, and I think it is worth digging into this topic in more detail. In the context of pixels, since it only deals with putting the final image on the screen, we can hand-wave away most of the complexity and say, "use a fixed timestep update and variable refresh" and call it a day.

There are two good articles (both commonly cited) that describe a good architecture for a game loop:

They both arrive at the same conclusion (and deWiTTERS even links directly to the Timestep article). E.g. the game updates at a fixed 30 Hz and the display refreshes as fast as it is able. And "as fast as it is able" means one of two things:

The machine is slow and unable to keep up the display with the 30 Hz state updates. The display falls behind by skipping frames.
The machine is fast and is capable of displaying perhaps far more than 30 FPS. Say 120 FPS on an iPhone 15 Pro or 240 FPS on a 240 Hz 4K gaming monitor.

But, in the case of 2, as both articles astutely observe, refreshing the display that quickly for a 30 Hz game isn't really that useful! You might as well just cap the refresh rate at 30 FPS. It will look just as smooth as it would at 240 FPS. Instead, they recommend interpolating frames at higher refresh rates, which implies that you are re-rendering and re-uploading the whole frame regardless.

And that's my takeaway, as well. If you want to avoid the slightly wasteful re-uploads with lower update frequencies, the best way to do that is by capping the frame rate with a sleep. This is the most common thing I found when I was exploring how other projects handled frame rate for #174.

The other issue with regards to frame rate is frame pacing. And this is more about "emulating 30 FPS" on a higher frequency display when updating the display at its native refresh rate. VRR helps a lot here when using a sleep to cap the frame rate. For machines without VRR, your sleep needs to account for the native refresh rate. For instance, so that it begins rendering precisely before every even numbered frame on a 60 HZ display. If you go over the refresh rate budget, that causes frame pacing issues. In other words, you have to pay attention to the frequency of frame skipping. Don't just let it average out over a long period (like one second) you want the frame skipping to be relatively constant at all times.

To recap, these kinds of issues are largely outside of the scope of pixels, but there is some indirect relationship to be sure. The invaders example uses a fixed timestep (240 Hz) and fixed refresh rate (60 FPS) for the reasons described:

pixels/examples/invaders/simple-invaders/src/lib.rs

Lines 38 to 42 in 5461133

    
           // Fixed time step (240 fps) 
        
           pub const FPS: usize = 240; 
        
           pub const TIME_STEP: Duration = Duration::from_nanos(1_000_000_000 / FPS as u64); 
        
           // Internally, the game advances at 60 fps 
        
           const ONE_FRAME: Duration = Duration::from_nanos(1_000_000_000 / 60);

pixels/examples/invaders/src/main.rs

Lines 153 to 158 in 5461133

    
           // Sleep the main thread to limit drawing to the fixed time step. 
        
           // See: https://github.com/parasyte/pixels/issues/174 
        
           let dt = TIME_STEP.as_secs_f64() - Time::now().sub(&g.current_instant()); 
        
           if dt > 0.0 { 
        
               std::thread::sleep(Duration::from_secs_f64(dt)); 
        
           }

Why is the update frequency so high in this example? It's for predictable and robust physics integration. It seems kind of silly to do this for a Space Invaders clone, but it makes sense when you add particle physics. Like I did in this capture from an experimental branch:

simple-invaders-particle-collisions.mov

Also, I think the linked articles make an implied assumption that state updates always take longer than drawing. In my experience it's the opposite. State updates are relatively simple and fast, at least in the kinds of games that I have worked on. Physics, AI, path finding, ray casting, they all have various optimizations and many of these can run in parallel, making use of SIMD and multiple CPU cores. On the other hand, taxing your GPU is ridiculously easy to do with complex shaders and scenes. Many of the optimizations happen on the CPU-side, like culling VBOs to reduce memory bandwidth. But that's still part of drawing.

VBOs aren't really a problem with 2D games, but some postprocessing effects like blur and bloom can be notoriously expensive. But this is also outside of the pixels wheelhouse of presenting preprocessed textures.

That's pretty much everything I know about the subject. Hope it is helpful.

0 replies

dbalsom · 2024-01-18T01:23:31Z

dbalsom
Jan 18, 2024
Author

Thanks for the detailed response. The requirements of an emulator can be a little unique compared to a game.

as both articles astutely observe, refreshing the display that quickly for a 30 Hz game isn't really that useful! You might as well just cap the refresh rate at 30 FPS

The usefulness is that we might have a render rate of some FPS, but a shader framerate higher than that. Consider a CRT shader that does some sort of rolling raster effect at native refresh (144Hz in my case). Or drawing egui at 60fps for smooth window dragging and scrolling, regardless of how fast the pixel buffer is updated. In these cases updating the pixel buffer each frame is a waste.

7 replies

dbalsom Jan 18, 2024
Author

Even if we discount whatever the heck Meson is doing with that crazy memcpy, the render_with call is still 8% frame time.

parasyte Jan 18, 2024
Maintainer

Thank you for the flame graph! I See write_texture takes between 8~9% of the profile. The queue submission is about 50%, but that's all emulating the WebGPU pipeline on GLES, if I understand correctly. You could get a 9% savings by skipping the texture upload. Totally fair.

parasyte Jan 19, 2024
Maintainer

I've been thinking on this. The dirty flag is probably the way to go, but it just gets set implicitly when the user calls pixels_mut(). Optimistically, if the user calls this method, we assume they actually want to modify the buffer. And the dirty flag check is inexpensive enough to include internally to the renderer. What do you think, as a first pass? Will this take care of the ask?

dbalsom Jan 19, 2024
Author

That sounds like a pretty elegant solution, as long as you're not able to save the mutable reference between frames

parasyte Jan 19, 2024
Maintainer

Yeah, you can’t call render while there is an outstanding exclusive borrow. I’ll whip it up when I’m feeling better (got a cold) and ping you on the PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Upload flag / Dirty rects #387

{{title}}

Replies: 2 comments 7 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Proposal: Upload flag / Dirty rects #387

dbalsom Jan 13, 2024

Replies: 2 comments · 7 replies

parasyte Jan 13, 2024 Maintainer

dbalsom Jan 18, 2024 Author

dbalsom Jan 18, 2024 Author

parasyte Jan 18, 2024 Maintainer

parasyte Jan 19, 2024 Maintainer

dbalsom Jan 19, 2024 Author

parasyte Jan 19, 2024 Maintainer

dbalsom
Jan 13, 2024

Replies: 2 comments 7 replies

parasyte
Jan 13, 2024
Maintainer

dbalsom
Jan 18, 2024
Author

dbalsom Jan 18, 2024
Author

parasyte Jan 18, 2024
Maintainer

parasyte Jan 19, 2024
Maintainer

dbalsom Jan 19, 2024
Author

parasyte Jan 19, 2024
Maintainer