In my application, I capture video using the Media Foundation APIs and convert it to a common format. I need some further clarification regarding the format (memory representation) of a captured video frame exposed through the IMFMediaBuffer and IMF2DBuffer interface. As per the documentation:
"Every video format defines a contiguous or packed representation. This representation is compatible with the standard layout of a DirectX surface in system memory, with no additional padding. For RGB video, the contiguous representation has a pitch equal to the image width in bytes, rounded up to the nearest DWORD boundary. For YUV video, the layout of the contiguous representation depends on the YUV format. For planar YUV formats, the Y plane might have a different pitch than the U and V planes."
Is there any further discussion or documentation on what contiguity means for the various YUV formats? I'm still relatively new to video processing, so I found articles like this one on YUV render formats very helpful.
It seems like I need to support 3 cases for each format, 2 of which are the same: IMFMediaBuffer (the sample does not expose the 2DBuffer interface), IMF2DBuffer with a contiguous native format, and IMF2DBuffer with a non-contiguous native format. If I read correctly, it seems that an IMFMediaBuffer and a IMF2DBuffer that are contiguous are basically the same thing.
So, for either the contiguous or non-contiguous representation, I need to know the orientation and stride of the image in order to process it.
Right now I am trying to support capture in YUY2, RGB24, and I420.