I'm not sure switching to a different video library would be as effective as just using gstreamer better; it's very full-featured and even has plugins to use most of the other video libraries, like the ffmpeg ones. At this point I've invested a lot of time in figuring out how gstreamer works, and I'm sure there's a significant learning curve for all the cross-platform video playback libraries.
Last night I got a significant improvement to video playback working, at least on a couple of Linux machines I had handy. I'll do more testing with Mac and Windows soon, but here's what I found:
1. The code was selecting the YUY2 pixel encoding for the raw video frames; this is a packed YUV format, which is a common default for streaming video sources like USB webcams, but is almost never used by stored media codecs such as the ones we use for screen-capture videos or DVD/BluRay. This meant that gstreamer automatically added a conversion layer in its decode process that would translate the native layered YUV (usually I420/IYUV) output of the codec, necessitating an entire buffer copy and software loop over each video frame before we even got access to it.
2. Each GstBuffer, which contains references to the raw video data for a frame, was being copied into a new buffer we allocated as soon as we received it. Then, on the display loop execution, we would copy that buffer's contents into the video's SDL Texture. This is an unnecessary extra copy of the frame data, which slows things down again a bit, and discards the metadata associated with the buffer. Instead of copying the data, I just increase the refcount on the GstBuffer and store that; then my display routine copies directly from the GstBuffer to the SDL Texture and decrements the GstBuffer refcount when it's done.
3. The current code doesn't take into account some subtleties of how the raw video frame data is sometimes stored. I have some handheld emulation videos (GameBoy varieties) that displayed in a garbled manner because the video data is not contiguous in the frame buffer; each line is padded out to a power of 2 by the codec and the display routines are expected to deal with the difference between the video's horizontal resolution and the larger 'stride' of the data buffer, which was not being done. When the buffer data is non-contiguous, gstreamer adds a GstVideoMeta structure to each GstBuffer; this describes the offset and stride of each plane in the buffer (they're not the same size, since each U or V sample covers the space of 4 Y samples in I420, for example) in a way that allows them to be mapped separately and passed into SDL_UpdateYUVTexture function. After adding this, all my videos played correctly!
I had perceptibly slow video playback and some input response lag on my i5 NUC machine, but it's apparently full-speed and more responsive to control input after my changes. I'll test them on Mac and Windows and some other Linux machines and then make a pull request once I'm fairly confident I didn't introduce any problems.