Thursday, October 30, 2008 12:44:15 PM (Pacific Standard Time, UTC-08:00)
(
Cool Software | School
)
This is part 2 of my VidSpeak series, where I show off an app I wrote for my multimedia course. Check out part 1.
To recap from last time, the problem this assignment is trying to solve is to convert video frames to some kind of audio representation. There are the steps required:
- Extract the next frame from the video
- Scale the frame down to 64x64 pixels
- Make the frame a grayscale image
- "Quantize" the grayscale frame into 4-bit colour
- Convert the frame to sound
I covered extracting video frames in Part 1, so now we have to manipulate the image to prepare it for rendering as sound.
2. Scale the frame down to 64x64 pixels
All the transformation steps (2-4) are performed by "filters" which recieve a System.Drawing.Bitmap and return a transformed one (a filter could also modify the incoming Bitmap and then just return the same object). Scaling is trivial with the "GetThumbnailImage" method on System.Drawing.Image:
public Bitmap ProcessImage(Bitmap input) {
return new Bitmap(input.GetThumbnailImage(TargetWidth, TargetHeight, null, IntPtr.Zero));
}
3. Make the frame a grayscale image
There may be a function in .Net to do this, but as part of the assignment we were provided with a formula to convert RGB images to 255-shade grayscale images so I decided to do this. Now, the Bitmap class provides GetPixel and SetPixel methods, but they aren't really very performant when you need to touch every pixel. Fortunately, the Bitmap class provides a method called "LockPixels" which prevents the Garbage Collector from moving the Bitmap data around in memory. With that, I made a base class for filters that process a Bitmap on a Pixel-by-Pixel basis:
BitmapData data = input.LockBits(new Rectangle(0, 0, input.Width, input.Height),
ImageLockMode.ReadWrite,
PixelFormat.Format24bppRgb);
unsafe {
for (int col = 0; col < data.Width; col++) {
for (int row = 0; row < data.Height; row++) {
PixelData* pixel = (PixelData*)data.Scan0 + col + row * data.Width;
ProcessPixel(pixel);
}
}
}
input.UnlockBits(data);
PixelData is a simple 3-byte struct:
[StructLayout(LayoutKind.Sequential)]
public struct PixelData {
public byte Blue;
public byte Green;
public byte Red;
}
Then, my grayscale image filter just has to implement ProcessPixel
byte convertedValue = (byte) Math.Floor((0.299 * pixel->Red) +
(0.587 * pixel->Green) +
(0.114 * pixel->Blue));
pixel->Red = convertedValue;
pixel->Green = convertedValue;
pixel->Blue = convertedValue;
Kinda strange to see the "->" operator in C#, eh?
4. "Quantize" the image to 4-bit colour
Quantization is the process of "compressing a range of values to a single quantum value" (http://en.wikipedia.org/wiki/Quantization_(image_processing)). In this case, we are quantizing an 8-bit Grayscale image to a 4-bit Grayscale image. To do this, we take each pixel value, and assign it one of 16 partitions (4-bits can hold values 0 through 15). The following code (in FixedPartitionQuantizationStrategy) does this (note _partitionSize is 16 in this case).
public byte Quantize(byte input) {
return (byte)Math.Floor((double)input / _partitionSize);
}
Quantization is done by the QuantizeImageFilter, which uses a similar base class to the Grayscale conversion above. Then, an IQuantizationStrategy is used to perform the Quantization itself:
protected override unsafe void ProcessPixel(PixelData* pixel) {
pixel->Red = _strategy.Quantize(pixel->Red);
pixel->Green = _strategy.Quantize(pixel->Green);
pixel->Blue = _strategy.Quantize(pixel->Blue);
}
And thats it! We now have an image that is 64x64 pixels in 4-bit Grayscale. Next we have to render it out to sound. Stay tuned for Part 3 for that!