[Raspberry Pi 5: CEF or WebKit for Off-Screen Rendering?]
Analyze with AI
Get AI-powered insights from this Mad Devs tech article:
The goal of this article is to explain the difference between CEF and WPE after a year of working with both frameworks, provide build and run instructions for full JS+HTML+CSS web pages using WPE on a Raspberry Pi 5 with zero-copy at 60+ FPS in Full HD, and explain why CEF simply can't get you there. By the end, you'll have:
WPE for arm64 and amd64, an OpenGL pipeline with EGL, running on a built-in Wayland compositor. Don't let Wayland intimidate you: on Raspberry Pi 5, it comes out of the box, so you can run both Wayland and X11 applications without rebooting. That said, WPE zero-copy works only with Wayland. All code examples are in C++.
Why does this question matter?
Embedded development demands a well-optimized product that's easy to extend. The goal is a stable 60 FPS when displaying full-screen Full HD web pages with HTML and JavaScript, along with a rendering pipeline that uses hardware acceleration, meaning no reliance on SHM (shared memory through the CPU) and direct frame delivery to the GPU: the so-called zero-copy approach. In my case, I specifically needed an OpenGL pipeline to mix multiple video frameworks onto a single screen, across different layers and through different shaders.
Why I walked away from CEF?
When I started my project, a fairly large application for arm64/amd64 systems, the stack included OpenFrameworks as a nice wrapper around OpenGL, FFmpeg for video rendering, and CEF as a full web engine for rendering heavy pages with bidirectional data transfer between the page and C++ code. As long as the plan involved small screens, something like 400Γ600, there were no issues. CEF handled CPU-based rendering just fine. Let me walk through exactly how CEF renders on Raspberry Pi 5.
On the official site, under Off-Screen Rendering, there's a short guide on configuring CEF to render without creating a window. Essentially, you create a browser class that draws frames into a specific CPU memory region, and you take it from there, passing the data into whatever pipeline you're using. That sounds fine in theory, but here's the catch: data gets copied through the CPU, and it happens twice, maybe even three times; I was never able to pin that down exactly.
The first copy happens when you copy the data from the pointer passed in OnPaint() into your own buffer, because once that function returns, the pointer becomes invalid, so you can't bind that memory region directly to a texture. The second copy moves your buffer into GPU memory as an OpenGL texture. The result is suboptimal, even with CEF's DirtyRects optimization, which lets you memcpy only the regions that actually changed.
But as it turned out, the double copy wasn't even the real problem. The real problem was how long CEF itself takes to put the frame into the buffer it hands to you. Here's the benchmark for copying data from OnPaint into my byte array for Full HD web pages (memcpy method):
TEST: avg - 7067.17, max - 10046, count - 3078
TEST: avg - 7067.33, max - 10046, count - 3079
TEST: avg - 7067.37, max - 10046, count - 3080
TEST: avg - 7067.33, max - 10046, count - 3081
TEST: avg - 7067.55, max - 10046, count - 3082Values are in microseconds.
On average, a single memcpy takes 7 milliseconds. Taking the maximum (10ms) and multiplying by 2 for the GPU transfer gives us 20ms per frame, which means the Raspberry Pi 5 tops out at around 40β50 FPS through CPU. And on top of that, the actual call rate of OnPaint in CEF turned out to be just 14 frames per second. On screen, it looked terrible. And since that function is called from deep inside CEF, there's nothing you can do to influence it.
I decided to dig into why this was happening and figure out how to configure CEF to use a texture instead of going through the CPU. After some forum searching, I found a mention of OnAcceleratedPaint, which passes a GPU texture handle directly. Unfortunately, this method isn't covered in the official documentation, and how to implement it isn't at all clear, so I opened a thread on the forum asking about OnAcceleratedPaint support on Raspberry Pi 5. I was pointed to relevant commits, studied them thoroughly, and still never managed to get OnAcceleratedPaint() to fire even once. Maybe I was doing something wrong, but I followed the commits to the letter, and then some. No matter what I tried, the CPU rendering path kept getting called.
After digging through the CEF source code, I found that OnAcceleratedPaint() is only triggered when the output format is ARGB, while the RPi5 outputs in NV12. If the format isn't ARGB, the method fails and falls back to OnPaint. Source: cef\libcef\browser\osr\video_consumer_osr.cc (line 124). This was true in CEF version 138 and also in version 140, which I upgraded to specifically for these experiments.
The most telling test is to run this on your device:
sudo apt install libdrm-testsmodetest -p
The output will show, at least in my case, that BROADCOM_SAND128 contains NV12, NV21, and P030. That means the RPi5 decodes frames in those formats, none of which are supported by the CEF source code.
You can also open Chromium and navigate to chrome://gpu; the log there will make it clear that zero-copy in CEF is simply unavailable on the RPi5 platform.
Given all of the above, I decided to look for a different web engine.
WPE
After searching around, I found a few alternatives: Ultralight, WebKit, and Qt WebEngine. Ultralight was out, it's commercial. Qt was out: my entire codebase was in pure C++, and dragging in a ton of Qt classes wasn't something I wanted to do. That left WebKit.
What I discovered was that there are essentially no up-to-date articles on how to install WebKit on Raspberry Pi and set up zero-copy rendering. There's this one, but it's built around creating a custom Linux image with pre-built WebKit libraries, and several GitHub repos take the same approach. I didn't want to build a custom Linux with Yocto or anything like that; I just wanted an application that would run on any Linux distro as a plain .deb package. And I got there.
Download the required libraries
These were the latest versions as of late 2025. Next, you need to compile all of them, but do not start with WpeWebKit, because it depends on the others. Compile the remaining three libraries directly on the RPi5 (or using a cross-compiler; I compiled directly on the RPi5 for confidence). Once those three are built, start compiling WpeWebKit.
You'll run into errors about missing libraries. Do not disable them with flags; just find the required libraries on GitHub, compile and install them, and restart the WpeWebKit build. There will be several of them, but they're all small (libwoff, libavif, etc.). Once the errors clear, launch the wpewebkit-2.50.4 build with -j1. Here's why: WpeWebKit is a massive library and takes roughly a full day to compile. That's not a joke. Running it on all cores will exhaust the RPi5's resources and crash the build. I kicked off the single-threaded build in the evening and went home. I'd recommend using xterm over SSH to run sudo ninja -j1, so the RPi5 keeps compiling after you disconnect.
After a successful build, install the library. That's the hardest part done: you now have a WpeWebKit runtime environment.
Setting up the sysroot for cross-compilation
Now, copy all your libraries into your sysroot so you're not compiling your application on the RPi5. Full instructions are also in my repository at the end of this article. Create a folder that will serve as your SYS_ROOT (I used root_fs) and sync everything from the RPi5:
rsync -av --mkpath --no-perms user@rpi_ip:/usr/include ./root_fs/usr/
rsync -av --mkpath --no-perms user@rpi_ip:/usr/local/include ./root_fs/usr/local/
rsync -av --mkpath --no-perms user@rpi_ip:/usr/local/lib ./root_fs/usr/local/
rsync -av --mkpath --no-perms user@rpi_ip:/usr/lib ./root_fs/usr/
rsync -av --mkpath --no-perms user@rpi_ip:/usr/share ./root_fs/usr/
rsync -av --mkpath --no-perms user@rpi_ip:/lib ./root_fs/Note: Some libraries may have been installed in custom paths on the RPi5. If you get errors like cannot open .../libm.so: No such file or directory, find that library on the RPi5 and sync it over the same way. After running these commands, fix the absolute symlinks in the sysroot using the script included in the repository:
./scripts/fix_sysroot_links.sh ./my/path/root_fsInstall the arm64 cross-compilers
sudo apt install gcc-aarch64-linux-gnu g++-aarch64-linux-gnuThen point CMake to your root_fs by setting CMAKE_SYSROOT and configuring the correct pkg-config paths through environment variables:
export PKG_CONFIG_SYSROOT_DIR="/absolute/path/to/root_fs"
export PKG_CONFIG_LIBDIR="$PKG_CONFIG_SYSROOT_DIR/usr/local/lib/pkgconfig:$PKG_CONFIG_SYSROOT_DIR/usr/lib/aarch64-linux-gnu/pkgconfig:$PKG_CONFIG_SYSROOT_DIR/usr/share/pkgconfig"
unset PKG_CONFIG_PATHBuild
mkdir build && cd build
cmake .. -DCMAKE_TOOLCHAIN_FILE=../cmake/rpi5_toolchain.cmake -DCMAKE_SYSROOT=/absolute/path/to/root_fs
makeThe result is a binary that runs in the WPE environment you compiled on the RPi5. To run it, place the run.sh script next to the binary and execute it as root. The script sets up the Wayland environment and launches your application. If you want to run this on an RPi5 that doesn't have WPE compiled on it, just build a .deb package with all the required libraries bundled inside. That's what I do; there's nothing complicated about it. Build it, install it, check for missing libraries in your sysroot, add them to the package, and repeat until everything runs. Yes, there's no pre-built WPE binary for arm64, but the WPE compilation is the only slow part. Everything after that is routine.
Compiling for AMD64 and running on WSL2
The steps here are identical to the arm64 process; just build the same set of libraries, and you're done. On native machines (not WSL), it works the same way: Wayland with native zero-copy rendering. On WSL, there's no direct GPU access, so my repository example includes an SHM fallback. That's actually how I do my own testing: compile for amd64 on WSL2, verify that everything renders correctly, then run on a native Linux machine. So the amd64 build works in both native Linux environments and in WSL2 via the standard CPU-to-texture memory copy.
Why did I also have to drop OpenFrameworks?
OpenFrameworks is tightly coupled to X11, and based on discussions on the forum with the lead OF developer, Wayland support isn't on the roadmap anytime soon. So I had to remove OF from the project entirely, but it wasn't as painful as it sounds, since OF is, as I mentioned, just a wrapper around OpenGL. After writing a few small wrappers of my own, the code barely changed.
My take on CEF vs. WPE
In the end, I found working with WPE vastly more enjoyable than CEF. The main reason: the code is orders of magnitude cleaner and easier to understand.
CEF demands its own CefString types, constant manual implementation of reference counting, setting up multi-process launch logic before your application starts, thread correctness checks inside browsers, and keeping in mind that a browser class is essentially each individual tab in Chrome. I also really disliked controlling everything through flags, then through string-based flags for the V8 JS engine. It felt like an unreasonable amount of framework-specific boilerplate for what should be straightforward tasks. With WPE, you just create a page class, configure it through functions rather than magic strings, and things work.
The same goes for JS β C++ communication. In WPE, it's as simple as pointing to a callback that gets invoked when a message arrives from JS. In CEF, you have to implement a full class to send and receive messages, whereas WPE has a single unified send/receive mechanism. CEF creates its own default entry point (cefQuery). Meanwhile, WPE lets you create any custom endpoint you want, including a cefQuery equivalent, with just a few lines:
ucm_ = webkit_user_content_manager_new();
webkit_user_content_manager_register_script_message_handler(ucm_, "cefQuery", nullptr);
g_signal_connect(ucm_, "script-message-received::cefQuery", G_CALLBACK(onScriptMessage), this);onScriptMessage is your own function with whatever logic you need.
Sending a message back looks like this:
webkit_web_view_evaluate_javascript(WebKitWebView* YOUR_INSTANCE, std::string("MY string").c_str(), -1, nullptr, nullptr, nullptr, nullptr, nullptr);Memory leaks in CEF vs. WPE
The classic browser problem is, of course, RAM consumption. While working with CEF, we had a persistent leak on the browser side. We ran both Valgrind and AddressSanitizer; our own code was clean. The leaks disappeared when the browser module was disabled. In production, we had to restart the application once a week because even with pages being killed and recreated each time, memory was slowly consumed and never recovered. 8 gigabytes gone in a week.
With WPE, I haven't seen anything like that. My application ran for 8 days. In the first three days, it consumed about 1,300 MB of RAM and then stopped. WPE appears to handle memory management significantly better than CEF. Both frameworks have cache size and optimization settings available, but in practice, CEF seems to ignore them entirely, while WPE simply doesn't balloon.
Conclusion
WPE is a genuinely pleasant and well-suited framework for working with full JS+HTML+CSS web pages in embedded development environments. The only real downside I can point to is sparse documentation: you end up reading the source code a lot, and the official site is essentially just an alphabetical list of functions with brief descriptions. Building a first working example from that is genuinely difficult. But it's doable π
