Navigation

Search

Categories

On this page

VidSpeak Part 2 - Bitmap Manipulation
VidSpeak Part 1 - Extracting Frames from Video in C#!

Archive

Blogroll

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

RSS 2.0 | Atom 1.0 | CDF

Send mail to the author(s) E-mail

Total Posts: 25
This Year: 0
This Month: 0
This Week: 0
Comments: 5

Sign In
Pick a theme:

 Thursday, October 30, 2008
Thursday, October 30, 2008 8:44:15 PM (GMT Standard Time, UTC+00:00) ( Cool Software | School )

This is part 2 of my VidSpeak series, where I show off an app I wrote for my multimedia course.  Check out part 1.

To recap from last time, the problem this assignment is trying to solve is to convert video frames to some kind of audio representation.  There are the steps required:

  1. Extract the next frame from the video
  2. Scale the frame down to 64x64 pixels
  3. Make the frame a grayscale image
  4. "Quantize" the grayscale frame into 4-bit colour
  5. Convert the frame to sound

I covered extracting video frames in Part 1, so now we have to manipulate the image to prepare it for rendering as sound.

2. Scale the frame down to 64x64 pixels

All the transformation steps (2-4) are performed by "filters" which recieve a System.Drawing.Bitmap and return a transformed one (a filter could also modify the incoming Bitmap and then just return the same object).  Scaling is trivial with the "GetThumbnailImage" method on System.Drawing.Image:

public Bitmap ProcessImage(Bitmap input) {
    return new Bitmap(input.GetThumbnailImage(TargetWidth, TargetHeight, null, IntPtr.Zero));
}

3. Make the frame a grayscale image

There may be a function in .Net to do this, but as part of the assignment we were provided with a formula to convert RGB images to 255-shade grayscale images so I decided to do this.  Now, the Bitmap class provides GetPixel and SetPixel methods, but they aren't really very performant when you need to touch every pixel.  Fortunately, the Bitmap class provides a method called "LockPixels" which prevents the Garbage Collector from moving the Bitmap data around in memory.  With that, I made a base class for filters that process a Bitmap on a Pixel-by-Pixel basis:

BitmapData data = input.LockBits(new Rectangle(0, 0, input.Width, input.Height), ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);
unsafe {
    for (int col = 0; col < data.Width; col++) {
        for (int row = 0; row < data.Height; row++) {
            PixelData* pixel = (PixelData*)data.Scan0 + col + row * data.Width;
            ProcessPixel(pixel);
        }
    }
}
input.UnlockBits(data);

PixelData is a simple 3-byte struct:

[StructLayout(LayoutKind.Sequential)]
public struct PixelData {
    public byte Blue;
    public byte Green;
    public byte Red;
}

Then, my grayscale image filter just has to implement ProcessPixel

byte convertedValue = (byte) Math.Floor((0.299 * pixel->Red) + (0.587 * pixel->Green) + (0.114 * pixel->Blue));
pixel->Red = convertedValue;
pixel->Green = convertedValue;
pixel->Blue = convertedValue;

Kinda strange to see the "->" operator in C#, eh?

4. "Quantize" the image to 4-bit colour

Quantization is the process of "compressing a range of values to a single quantum value" (http://en.wikipedia.org/wiki/Quantization_(image_processing)).  In this case, we are quantizing an 8-bit Grayscale image to a 4-bit Grayscale image.  To do this, we take each pixel value, and assign it one of 16 partitions (4-bits can hold values 0 through 15).  The following code (in FixedPartitionQuantizationStrategy) does this (note _partitionSize is 16 in this case).

public byte Quantize(byte input) {
    return (byte)Math.Floor((double)input / _partitionSize);
}

Quantization is done by the QuantizeImageFilter, which uses a similar base class to the Grayscale conversion above.  Then, an IQuantizationStrategy is used to perform the Quantization itself:

protected override unsafe void ProcessPixel(PixelData* pixel) {
    pixel->Red = _strategy.Quantize(pixel->Red);
    pixel->Green = _strategy.Quantize(pixel->Green);
    pixel->Blue = _strategy.Quantize(pixel->Blue);
}

And thats it! We now have an image that is 64x64 pixels in 4-bit Grayscale.  Next we have to render it out to sound.  Stay tuned for Part 3 for that!

Comments [0] | | # 
 Tuesday, October 21, 2008
Tuesday, October 21, 2008 8:39:44 PM (GMT Standard Time, UTC+00:00) ( Cool Software | School )

I know, its been too long since I blogged, but its pretty busy at school right now :). Anyway, I'm taking a course in Multimedia this semester, and as part of that course, I have to write a program to convert frames in a Video to short Audio clips. I thought it might be interesting to examine how that is done, in C#.  So, over the course of about 3-4 posts, I'll go over the code that I wrote.  I've attached the full project to this post, so you can take a look at it right now.  The GUI app should work, though I can't guarantee it. All I can give it is "Works on My Machine" seal of approval :) 

Here are the steps involved:

  1. Extract the next frame from the video
  2. Scale the frame down to 64x64 pixels
  3. Make the frame a grayscale image
  4. "Quantize" the grayscale frame into 4-bit colour
  5. Convert the frame to sound
The Code

The Code is a Visual Studio 2008 solution, written for .Net 3.5.  It uses unsafe code for image processing and sound generation, so you can't run it without full trust (i.e. you can't run it off of a network share).

1. Extract the next frame from the video

I used a "Pipeline" (http://en.wikipedia.org/wiki/Pipeline_(software)) architecture, so this phase is handled by a component I call a "Frame Source" which is expected to return a new frame when asked (or return null to signal the end of the input). I used the DirectShow COM library "DexterLib" to do the extraction. DexterLib contains a class called MediaDet (for MediaDetector) which does most of the work. Here's the code for the function which retrieves a frame at a specified timecode (in seconds). 

FYI: "_detector" is an instance of DexterLib.MediaDetClass() ("_detector" is of type IMediaDet), "_streamLength" is the length of of the video stream in seconds, "_bufferHandle" is an IntPtr referring to an unmanaged buffer (allocated with Marshal.AllocHGlobal) to hold the bitmap, and "_bufferSize"/"_frameSize" are the size of the buffer and the size of each video frame (respectively)

// WARNING: This method will destroy the bitmap retrieved in a previous call to this method
public Bitmap GetFrameAtTime(double timeCode) {
    // Get the bitmap at this time
    Bitmap frame = null;
    unsafe {
        byte* bufferPointer = (byte*)_bufferHandle;
        _detector.GetBitmapBits(timeCode, ref _bufferSize, ref *bufferPointer, _frameSize.Width, _frameSize.Height);
        frame = new Bitmap(_frameSize.Width, // Width
                           _frameSize.Height, // Heigth
                           _frameSize.Width * 3, // Stride
                           PixelFormat.Format24bppRgb, // Pixel Format
                           new IntPtr(bufferPointer + Marshal.SizeOf(typeof(BITMAPINFOHEADER)))); // Start of Buffer
    }

    return frame;
}

(Note: If you look at the actualy code, you will notice I snipped out some stuff from the beginning of this function to display it on the blog. The missing code just handles an (experimental) feature I added to allow me to start at any location in the video, rather than always starting at the beginning)

After loading the frame, I have to flip it, because Dexter loads the frame upside-down, fortunately the System.Drawing.Image class provides a RotateFlip method to do just that! I also rotate it 90 degrees clock-wise, so that each row of the transformed image maps to a column of the frame. This makes step 5 easier, since Bitmaps are stored in "row-major" order (http://en.wikipedia.org/wiki/Row-major_order).

To use the Frame Source, all my program has to do is call GetFrameAtTime method passing in a timecode (in seconds).  This is handled in the FrameProcessor by the GetNextFrame method

_source.GetFrameAtTime((DateTime.Now - _startTime).TotalSeconds)

Rather than going frame-by-frame, I'm extracting the next frame by time.  So, if it takes 4 seconds to process a frame, the next frame I take is approximately 4 seconds after the frame I just processed.

Here's the code: VidSpeak.zip (267.21 KB)

Comments [0] | | # 

Search with Google

Google