I found case where improperly encoded video has B frames in baseline stream,
which is prohibited by spec. We get memory errors in this case since aux buffer
isn't set up. Remove baseline flag if such erroneous stream is detected,
allowing to decode without errors.
Kernel driver performs caching of mappings based on dmabuf pointer.
Reusing same FD allows to avoid re-creation of dmabuf, utilizing caches
more efficiently, preventing unnecessary remappings.
V4L API doesn't provide userspace with a flexible memory management
controls. This doesn't play well with VDPAU, the best we can do is
to mitigate that limitation by adding complexity to the VDPAU driver.
Surface cache prevents remapping of all buffers on each decode invocation
by pinning surface to a dedicated V4L buffer index, this avoids a
horrible performance kill. Syncing of one buffer may take up to 30ms and
we may have (17 * 3 + 1) * 2 of such buffers.
decoder: Pass non-coherent memory to bitstream reader
Micro-optimize bitstream reading by using CPU-cached memory instead of
going to DRAM on each memory access. We read only couple bytes, so this
change doesn't make much difference in practice.