Sample Project: Playing and Recording Sound via the Wave API
By Damon Chandler
 

< screen shot >

Download Demo and Sample Project --  WaveDemo.zip (145 KB)
 
 

    While it is possible to use the MCI (directly or via the TMediaPlayer component) to record wave files, using the Wave API offers superior control.  Most importantly, this latter method allows direct access to the sound buffer.  In this way, one is free to manipulate the actual sound data to add special effects, change the rate, or perform any other signal processing techniques.  This power comes at a cost however, as the Wave API is not nearly as straightforward to use as the MCI.  Indeed, there are several applications for which the use of the Wave API can be considered overkill.  Nonetheless, it is good to know that there are options beyond the MCI or the more complicated DirectSound interface. 

Before we begin, let us first examine the important Wave API structures.  Fortunately, unlike the Mixer control API, there are not a multitude of structures to memorize.  In fact, knowing only two is enough to get started.  First, the crucial WAVEHDR structure:

typedef struct

    LPSTR lpData; <--- long pointer to the actual buffer data

    DWORD dwBufferLength; <--- number of bytes of the data pointed to by lpData

    DWORD dwBytesRecorded; <--- number of bytes that have been 
                                 recorded (when used as an input buffer)

    DWORD dwUser; <--- extra 32-bit value (think of it as a Tag)

    DWORD dwFlags; <--- flags indicating the state of the buffer

    DWORD dwLoops; <--- value indicating the number of times 
                         the buffer is to be played

    struct wavehdr_tag *lpNext; <--- reserved; do not use

    DWORD reserved; <--- reserved; do not use

} WAVEHDR;
 
 

The WAVEHDR is perhaps the most important structure when working with wave audio.  Of its many data members, the lpData and dwFlags are mostly used.  For the latter member, we're usually looking for the WHDR_PREPARED flag indicating that the header has been properly prepared (or unprepared) by the system, and the WHDR_DONE flag telling us that we can now safely access the actual bits.  Another vital structure, WAVEFORMATEX, is used to actually initialize the wave audio device:

typedefstruct
{
    WORD wFormatTag; <--- constant indicating the format of the 
                           waveform-audio 

    WORD nChannels; <--- number of audio channels to use

    DWORD nSamplesPerSec; <--- sampling rate in hertz

    WORD wBitsPerSample; <--- number of bits used to store each sample
                               (i.e., 8 or 16 for PCM)

    WORD nBlockAlign; <--- block alignment that is specific to the 
                            format specified by the wFormatTag member

    DWORD nAvgBytesPerSec; <--- average data transfer rate in bytes 
                                per second (format specific)

    WORD cbSize; <--- size of any extra information (usually not used,
                       and not applicable to PCM format)

} WAVEFORMATEX;
 

The WAVEFORMATEX structure is used when initially opening a waveform-audio device.  The wFormatTag member indicates the actual format of the audio that is to be used.  Unless you're using a special format, this member is usually set to WAVE_FORMAT_PCM indicating standard Pulse Code Modulation.  The nChannels member is used to indicate whether or not the sound will be monaural (specify 1) or stereo (specify 2).   The nSamplesPerSec member indicates the sampling rate.  For example, CD-quality audio is usually sampled at 44.1 kHz.  Since values for this member must be specified in Hertz (not kHz) use 44100 for CD-quality audio.   Other typical values include: 8.0 kHz, 11.025 kHz, and 22.05 kHz, again this value must be specified in hertz, not kilohertz.  wBitsPerSample indicates how many bits will be used to represent each sample.  The PCM format uses either 8 or 16.  The nAvgBytesPerSec and nBlockAlign members are format specific.  For the PCM format, use the following formulas to calculate these fields:
 

   nBlockAlign = (nChannels * wBitsPerSample) / 8
   nAvgBytesPerSec = (nSamplesPerSec * nBlockAlign)
 
 

Also important are the two types of handles: HWAVEIN and HWAVEOUT.  These are handles to input (record) and output (playback) devices, respectively. 
 

Now that all of the introductions are out of the way, lets actually start to write some wrapper functions.  First, let's do the header/data wrapper functions -- there are two of them.  The first allocates the memory then sets the fields of the WAVEHDR structure, while the second simply frees the memory.  These are straightforward... 

//---------------------------------------------------------------------------

#include <mmsystem.h>
/*********************************************************************\
*          WaveHeader (WAVEHDR) wrapper functions
\*********************************************************************/

bool WaveMakeHeader(unsignedlong ulSize, HGLOBAL &HData, HGLOBAL &HWaveHdr,
    LPSTR &lpData, LPWAVEHDR &lpWaveHdr)
{
    HData = GlobalAlloc(GMEM_MOVEABLE | GMEM_SHARE, ulSize);
    if (!HData) return false;

    lpData = (LPSTR)GlobalLock(HData);
    if (!lpData)
    {
        GlobalFree(HData);
        return false;
    }

    HWaveHdr = GlobalAlloc(GMEM_MOVEABLE | GMEM_SHARE, sizeof(WAVEHDR));
    if (!HWaveHdr)
    {
        GlobalUnlock(HData);
        GlobalFree(HData);
        return false;
    }

    lpWaveHdr = (LPWAVEHDR)GlobalLock(HWaveHdr);
    if (!lpWaveHdr)
    {
        GlobalUnlock(HWaveHdr);
        GlobalFree(HWaveHdr);
        GlobalUnlock(HData);
        GlobalFree(HData);
        return false;
    }

    // zero-out the WAVEHDR then assign the lpData member of the 
    // header to the allocated data buffer and the dwBufferLength
    // member of the header to the size of data block
    ZeroMemory(lpWaveHdr, sizeof(WAVEHDR));
    lpWaveHdr->lpData = lpData;
    lpWaveHdr->dwBufferLength = ulSize;

    return true;
}
 

void WaveFreeHeader(HGLOBAL &HData, HGLOBAL &HWaveHdr)
{
    GlobalUnlock(HWaveHdr);
    GlobalFree(HWaveHdr);
    GlobalUnlock(HData);
    GlobalFree(HData);
}

 


 
 

In case you're wondering, the HGLOBAL type is nothing more than a handle to a global memory object.  To get the actual pointer to this memory, use the GlobalLock() API function (don't forget GlobalUnlock()). 

Next come the waveIn*() wrapper functions.  You use the waveIn*() functions for recording, and the waveOut*() functions for playback.  The first function, WaveRecordOpen(), initializes a WAVEFORMATEX structure required for the waveInOpen() API call -- this simply opens the wave recording device.  In this structure, you specify the parameters of the recording including sampling frequency, number of channels, bit rate, format, etc...

//---------------------------------------------------------------------------

/*********************************************************************\
*               WaveIn (recording) wrapper functions
\*********************************************************************/

bool WaveRecordOpen(LPHWAVEIN lphwi, HWND Hwnd, int nChannels,
   long lFrequency, int nBits)
{
    WAVEFORMATEX wfx;
    wfx.wFormatTag = WAVE_FORMAT_PCM;
    wfx.nChannels = (WORD)nChannels;
    wfx.nSamplesPerSec = (DWORD)lFrequency;
    wfx.wBitsPerSample = (WORD)nBits;
    wfx.nBlockAlign = (WORD)((wfx.nChannels * wfx.wBitsPerSample) / 8);
    wfx.nAvgBytesPerSec = (wfx.nSamplesPerSec * wfx.nBlockAlign);
    wfx.cbSize = 0;

    MMRESULT result = waveInOpen(lphwi, WAVE_MAPPER, &wfx, (LONG)Hwnd, NULL,
                                 CALLBACK_WINDOW);

   if (result == MMSYSERR_NOERROR) return true;
   return false;
}
 


 
 

This next function, WaveRecordBegin() wraps the preparation (waveInPrepareHeader()) and initialization (waveInAddBuffer()) of the header (and data block) that the device driver will use to store the wave audio.  You can later manipulate this data as you wish.  Finally, the function starts the recording process...

//---------------------------------------------------------------------------

bool WaveRecordBegin(HWAVEIN hwi, LPWAVEHDR &lpWaveHdr)
{
    MMRESULT result = waveInPrepareHeader(hwi, lpWaveHdr, sizeof(WAVEHDR));
    if (result == MMSYSERR_NOERROR)
    {
        MMRESULT result = waveInAddBuffer(hwi, lpWaveHdr, sizeof(WAVEHDR));
        if (result == MMSYSERR_NOERROR)
        {
            MMRESULT result = waveInStart(hwi);
            if (result == MMSYSERR_NOERROR) return true;
        }
    }
   return false
}
 


 
 

The next two functions simply unprepare the header (at which point you are free to manipulate it -- transform it, filter it, copy it, etc.) and close the recording driver...

//---------------------------------------------------------------------------

void WaveRecordEnd(HWAVEIN hwi, LPWAVEHDR &lpWaveHdr)
{
    waveInStop(hwi);
    waveInReset(hwi);
    waveInUnprepareHeader(hwi, lpWaveHdr, sizeof(WAVEHDR)); 
}

void WaveRecordClose(HWAVEIN hwi)
{
    waveInReset(hwi);
    waveInClose(hwi);
}
 


 
 

Finally comes the waveOut*() playback wrapper functions.  These are very similar to their recording counterparts.  The first function opens the playback device, the second function prepares the header then plays the data using the waveOutWrite() API function.  The last two functions unprepare the header then close the device...

//---------------------------------------------------------------------------

/*********************************************************************\
*               WaveOut (playback) wrapper functions
\*********************************************************************/

bool WavePlayOpen(LPHWAVEOUT lphwo, HWND Hwnd, int nChannels,
   long lFrequency, int nBits)
{
    WAVEFORMATEX wfx;
    wfx.wFormatTag = WAVE_FORMAT_PCM;
    wfx.nChannels = (WORD)nChannels;
    wfx.nSamplesPerSec = (DWORD)lFrequency;
    wfx.wBitsPerSample = (WORD)nBits;
    wfx.nBlockAlign = (WORD)((wfx.nChannels * wfx.wBitsPerSample) / 8);
    wfx.nAvgBytesPerSec = (wfx.nSamplesPerSec * wfx.nBlockAlign);
    wfx.cbSize = 0;

    MMRESULT result = waveOutOpen(lphwo, WAVE_MAPPER, &wfx, (LONG)Hwnd, NULL,
                                  CALLBACK_WINDOW);

    if (result == MMSYSERR_NOERROR) return true;
    return false;
}

bool WavePlayBegin(HWAVEOUT hwo, LPWAVEHDR &lpWaveHdr)
{
    MMRESULT result = waveOutPrepareHeader(hwo, lpWaveHdr, sizeof(WAVEHDR));
    if (result == MMSYSERR_NOERROR)
    {
        MMRESULT result = waveOutWrite(hwo, lpWaveHdr, sizeof(WAVEHDR));
        if (result == MMSYSERR_NOERROR) return true;
    }
    return false;
}

void WavePlayEnd(HWAVEOUT hwo, LPWAVEHDR &lpWaveHdr)
{
    waveOutReset(hwo);
    waveOutUnprepareHeader(hwo, lpWaveHdr, sizeof(WAVEHDR));
}

void WavePlayClose(HWAVEOUT hwo)
{
    waveOutReset(hwo);
    waveOutClose(hwo);
}
 


 
 

That's it for the wrapper functions.  If you notice, the WaveRecordOpen() and WavePlayOpen() functions take a HWND as a parameter.  This allows whatever window's handle you pass into these functions to receive the MM_WIM_DATA message for recording status, and the MM_WOM_DONE message for playback completion notification.  To take advantage of these messages, set up message handlers using the BEGIN_MESSAGE_MAP() / END_MESSAGE_MAP() scheme.  Here's some members to add to the header of your form...
 

//---------------------------------------------------------------------------

// in header...

    unsigned long ulSize;
    bool FAllocated;
    bool FRecording;
    bool FPlaying; 
    HWAVEIN hwi;
    HWAVEOUT hwo; 
    HGLOBAL HData, HWaveHdr;
    LPSTR lpData;
    LPWAVEHDR lpWaveHdr;

    void __fastcall MMWimData(TMessage &Msg);
    void __fastcall MMWomDone(TMessage &Msg); 

BEGIN_MESSAGE_MAP
    MESSAGE_HANDLER(MM_WIM_DATA, TMessage, MMWimData)
    MESSAGE_HANDLER(MM_WOM_DONE, TMessage, MMWomDone)
END_MESSAGE_MAP(TForm)

 


 
 

Next, implement your application using an approach similar to this example...

//---------------------------------------------------------------------------

// in source...
/*********************************************************************\
*               Example usage...
\*********************************************************************/

__fastcall TForm1::TForm1(TComponent* Owner)
    : TForm(Owner)
{
    ulSize = 1000000;  // size of data block (let's use 1 meg)
    FAllocated = false;
    FRecording = false;
    FPlaying = false;

    TrackBar1->Max = 0xFFFF;
    TrackBar1->Frequency = 1000;

    DWORD current_volume;
    waveOutGetVolume(0, &current_volume);
    TrackBar1->Position = TrackBar1->Max - LOWORD(current_volume);
}

void __fastcall TForm1::~TForm1()
{
     if (FAllocated) WaveFreeHeader(HData, HWaveHdr);
     if (FRecording) WaveRecordClose(hwi);
     if (FPlaying) WavePlayClose(hwo);
}

void __fastcall TForm1::RecordButtonClick(TObject *Sender)
{
    // open recording driver...
    if (WaveRecordOpen(&hwi, Handle, 2, 44100, 16))
    {
        // allocate buffer / header
        if (WaveMakeHeader(ulSize, HData, HWaveHdr, lpData, lpWaveHdr))
        {
            FAllocated = true;

            // start recording...
            if (WaveRecordBegin(hwi, lpWaveHdr))
            {
                FRecording = true;
                Caption = "Recording...";
            }
        }
    }
}

void __fastcall TForm1::StopButtonClick(TObject *Sender)
{
    if (FRecording)
    {
        WaveRecordEnd(hwi, lpWaveHdr);
        FRecording = false;
    }
    if (FPlaying)
    {
        WavePlayEnd(hwo, lpWaveHdr);
        FPlaying = false;
    }
}

void __fastcall TForm1::TrackBar1Change(TObject *Sender)
{
    DWORD new_volume = 
        (DWORD)MAKEWPARAM(TrackBar1->Max - TrackBar1->Position,
                          TrackBar1->Max - TrackBar1->Position);
    waveOutSetVolume((HWAVEOUT)WAVE_MAPPER, new_volume);
}
 

// These next Form methods are the message handlers.  The first is the MM_WIM_DATA
// message handler.  This message is sent when either the StopButton is clicked
// (recording terminated) or the device needs more data.  So for example, if the 1
// meg that we allocated is filled, your message handler will fire.  You can see
// that if you can't allocate enough memory in one shot, then you can use this
// handler to feed more memory to the driver.  In this example, we simply terminate
// recording, then initiate playback.  The MM_WOM_DONE message handler is called
// when the waveform is done playing...

void __fastcall TForm1::MMWimData(TMessage &Msg)
{
    Caption = "DONE RECORDING";
    if (FRecording) WaveRecordEnd(hwi, lpWaveHdr);
    WaveRecordClose(hwi);

    if (WavePlayOpen(&hwo, Handle, 2, 44100, 16))
    {
        if (WavePlayBegin(hwo, lpWaveHdr))
        {
             FPlaying = true;
             Caption = "Playing...";
        }
    }
}

void __fastcall TForm1::MMWomDone(TMessage &Msg)
{
    Caption = "DONE PLAYING";
    WavePlayClose(hwo);
}
 


 
 

If you'll notice in the above code, I added some volume functionality with a TTrackBar component.  This simply is used to manipulate the playback volume.  The crucial functions here are waveOutGetVolume() and waveOutSetVolume().

The main advantage of using the Waveform API over the MCI is that you have direct access to the data, and can record large amounts of data.  The disadvantage is that it's a little lower level and easier to halt your system since you're not going through the Media Control Interface (MCI).  Further, you'll have to use the multimedia I/O API to write the data out to file (or do it manually) -- not nearly as simply as the MCI or TMediaPlayer. 
 
 

Download Demo and Sample Project
WaveDemo.zip (145 KB)