Quad SPI PSRAM seems too slow and question for improvement #311

dapeda42 · 2024-05-10T10:25:47Z

dapeda42
May 10, 2024

Hello,

I designed a custom PCB based on EPDiy version 7. Everything is functioning well overall, but I encountered challenges with the error "Assert failed: retrieve_line_isr render_lcd.c:41 (thread < NUM_RENDER_THREADS)" as noted in the EPDiy v7 common errors page (https://github.com/vroland/epdiy/wiki/v7-common-errors). Adjusting the bus speed did not resolve the issue. However, updating to one of the latest "main" releases (as of May 9th, 2024) helped improve the situation slightly. This improvement might be due to optimized rendering computation (using assembly code—fantastic!).

I suspect the main issue with my custom board stems from using an ESP32-S3 module with 2 MB PSRAM Quad SPI. The original EPDiy version 7 utilizes an ESP32-S3 module with Octal SPI PSRAM, which may offer twice the speed of Quad SPI. Given that significant frame data resides in PSRAM, my Quad SPI setup may be too slow. Therefore, the next version of my custom PCB will incorporate an ESP32-S3 module with Octal SPI PSRAM.

One aspect of my current setup involves exploring ways to further enhance and reduce the computational time required for rendering. As far as I understand, around 32 or more frames are needed to render the frame buffer on the display. Each frame is based on the previous color or gray shade, the new color or gray shade, and the waveform. The current software version calculates the frames "on-the-fly": while one row of a frame is transmitted to the display via the LCD interface (using DMA), the software calculates the next row of the frame (very simply speaking).

This situation requires that the computation time for each row is lower than the transmission time of a row to the display, necessitating high computational power and fast memory access, which Quad SPI PSRAM may not sufficiently provide.

Here's my question: Why not pre-calculate all frames and store each one in external PSRAM in advance, then transmit these frames to the display through the LCD interface using DMA in a second step? This approach would eliminate any "on-the-fly" calculation during transmission. Given the ample space available in external PSRAM, it should be feasible to store 32 or more frames.

Pre-calculating all frames could potentially eliminate the race condition encountered with the "on-the-fly" approach and may also be faster overall: Separating reads and writes to external PSRAM between the pre-calculation phase and the transmission phase could enhance speed. In the pre-calculation phase, only writes to external PSRAM are required (in the best case, if all other memories like frame buffers, LUTs, etc. are NOT in PSRAM), while the transmission phase involves only reads from external PSRAM. In contrast, the "on-the-fly" approach interleaves reads and writes from/to external PSRAM, which may take more time overall.

vroland · 2024-05-10T19:27:55Z

vroland
May 10, 2024
Maintainer

Hi, regarding your first question:
Reducing the speed should also make the error go away, if you set it low enough. But there are some things that you may want to double check: Setting the data cache line length to 64 bytes, setting the highest frequency possible, compiling in optimized mode.

Regarding the optimization: I already considered precomputing the frames, but this is not quite practical for larger displays. E.g., the 1872*1404 7.8" display takes 1872 * 1404 * 2 bits per pixel / 8 bits per byte / 2**20 = ~0.62MiB per frame, we cannot fit enough frames. Many waveforms have even longer sequences than 32, like 50-ish. If you only use smaller displays it could work for you, but not in the general case.
Secondly, the on-the-fly calculation never actually writes back to PSRAM, just to a queue in internal memory. So the access pattern should be a pretty much optimal sequential read on the PSRAM. Also, pre-computing the frames would increase latency significantly, which can be a problem for some use cases. So again, it may work for your usecase, but I don't think it makes sense for a driver that aims to be universal.

2 replies

dapeda42 May 13, 2024
Author

I've reviewed the settings for optimization, data cache, and CPU frequency, and everything appears to be in order. However, my suspicion persists regarding the Quad SPI's speed—it might be too slow.

Once I've got the setup with Octal SPI up and running, I'll promptly update you.

Regarding the on-the-fly calculation: Initially, I assumed that the result of the on-the-fly calculation would be stored in PSRAM. However, if it's written to local SRAM instead, then it nullifies the speed advantage of the alternative approach (pre-calculation).

The challenge with on-the-fly calculation lies in the CPU's near-constant engagement during the process and subsequent display update. Essentially, during the display update, which takes about a second or more, the CPU becomes occupied solely with that task. This can potentially halt other critical tasks that may be active or required during the display update. Pre-calculation doesn't pose this issue, but it does present a challenge in terms of overall RAM size for all frames.

In summary, I wanted to briefly discuss the feasibility of pre-calculation. Perhaps it make sense for small displays and maybe it's possible to incorporate an option for pre-calculation in functions like epd_hl_update_screen or epd_draw_base.

dapeda42 Jun 14, 2024
Author

Finally I got my ESP32-S3 board with octal PSRAM working.
With octal PSRAM it works as expected (no "Assert failed: retrieve_line_isr render_lcd.c:41 (thread < NUM_RENDER_THREADS)" even at high bus speeds).
Digging into the approach used for ESP32-S3 (constant phase time resulting in a lot of frames per update) I understand that the on-the-fly approach is best (there is not enough memory for holding all pre-computed frames).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quad SPI PSRAM seems too slow and question for improvement #311

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Quad SPI PSRAM seems too slow and question for improvement #311

dapeda42 May 10, 2024

Replies: 1 comment · 2 replies

vroland May 10, 2024 Maintainer

dapeda42 May 13, 2024 Author

dapeda42 Jun 14, 2024 Author

dapeda42
May 10, 2024

Replies: 1 comment 2 replies

vroland
May 10, 2024
Maintainer

dapeda42 May 13, 2024
Author

dapeda42 Jun 14, 2024
Author