Sample
Project: Playing and Recording Sound via the
Wave API
By Damon Chandler
Download Demo and Sample Project
-- WaveDemo.zip (145 KB)
While
it is possible to use the MCI (directly or via the TMediaPlayer component)
to record wave files, using the Wave API offers superior control.
Most importantly, this latter method allows direct access to the sound
buffer. In this way, one is free to manipulate the actual sound data
to add special effects, change the rate, or perform any other signal processing
techniques. This power comes at a cost however, as the Wave API is
not nearly as straightforward to use as the MCI. Indeed, there are
several applications for which the use of the Wave API can be considered
overkill. Nonetheless, it is good to know that there are options
beyond the MCI or the more complicated DirectSound interface.
Before we begin, let us first examine the
important Wave API structures. Fortunately, unlike the Mixer control
API, there are not a multitude of structures to memorize. In fact,
knowing only two is enough to get started. First, the crucial WAVEHDR
structure:
typedef
struct
{
LPSTR
lpData;
<---
long pointer to the actual buffer data
DWORD
dwBufferLength;
<---
number of bytes of the data pointed to by lpData
DWORD
dwBytesRecorded;
<---
number of bytes that have been
recorded (when used as an input buffer)
DWORD
dwUser;
<---
extra 32-bit value (think of it as a Tag)
DWORD
dwFlags;
<---
flags indicating the state of the buffer
DWORD
dwLoops;
<---
value indicating the number of times
the buffer is to be played
struct
wavehdr_tag *lpNext; <--- reserved;
do not use
DWORD
reserved;
<---
reserved; do not use
}
WAVEHDR;
The WAVEHDR is perhaps the most important
structure when working with wave audio. Of its many data members,
the lpData and dwFlags are mostly used. For the latter member, we're
usually looking for the WHDR_PREPARED flag indicating that the header has
been properly prepared (or unprepared) by the system, and the WHDR_DONE
flag telling us that we can now safely access the actual bits. Another
vital structure, WAVEFORMATEX, is used to actually initialize the wave
audio device:
typedefstruct
{
WORD
wFormatTag;
<---
constant indicating the format of the
waveform-audio
WORD
nChannels;
<---
number of audio channels to use
DWORD
nSamplesPerSec;
<---
sampling rate in hertz
WORD
wBitsPerSample;
<---
number of bits used to store each sample
(i.e., 8 or 16 for PCM)
WORD
nBlockAlign;
<---
block alignment that is specific to the
format specified by the wFormatTag member
DWORD
nAvgBytesPerSec;
<---
average data transfer rate in bytes
per second (format specific)
WORD
cbSize;
<---
size of any extra information (usually not used,
and not applicable to PCM format)
}
WAVEFORMATEX;
The WAVEFORMATEX structure is used when
initially opening a waveform-audio device. The wFormatTag member
indicates the actual format of the audio that is to be used. Unless
you're using a special format, this member is usually set to WAVE_FORMAT_PCM
indicating standard Pulse Code Modulation. The nChannels member is
used to indicate whether or not the sound will be monaural (specify 1)
or stereo (specify 2). The nSamplesPerSec member indicates
the sampling rate. For example, CD-quality audio is usually sampled
at 44.1 kHz. Since values for this member must be specified in Hertz
(not kHz) use 44100 for CD-quality audio. Other typical values
include: 8.0 kHz, 11.025 kHz, and 22.05 kHz, again this value must be specified
in hertz, not kilohertz. wBitsPerSample indicates how many bits will
be used to represent each sample. The PCM format uses either 8 or
16. The nAvgBytesPerSec and nBlockAlign members are format specific.
For the PCM format, use the following formulas to calculate these fields:
nBlockAlign = (nChannels * wBitsPerSample) / 8
nAvgBytesPerSec = (nSamplesPerSec * nBlockAlign)
Also important are the two types of handles:
HWAVEIN and HWAVEOUT. These are handles to input (record) and output
(playback) devices, respectively.
Now that all of the introductions are out
of the way, lets actually start to write some wrapper functions.
First, let's do the header/data wrapper functions -- there are two of them.
The first allocates the memory then sets the fields of the WAVEHDR structure,
while the second simply frees the memory. These are straightforward... |
//---------------------------------------------------------------------------
#include
<mmsystem.h>
/*********************************************************************\
*
WaveHeader (WAVEHDR) wrapper functions
\*********************************************************************/
bool WaveMakeHeader(unsignedlong
ulSize, HGLOBAL &HData, HGLOBAL &HWaveHdr,
LPSTR
&lpData, LPWAVEHDR &lpWaveHdr)
{
HData
= GlobalAlloc(GMEM_MOVEABLE | GMEM_SHARE, ulSize);
if
(!HData) return false;
lpData
= (LPSTR)GlobalLock(HData);
if
(!lpData)
{
GlobalFree(HData);
return false;
}
HWaveHdr
= GlobalAlloc(GMEM_MOVEABLE | GMEM_SHARE, sizeof(WAVEHDR));
if
(!HWaveHdr)
{
GlobalUnlock(HData);
GlobalFree(HData);
return false;
}
lpWaveHdr
= (LPWAVEHDR)GlobalLock(HWaveHdr);
if
(!lpWaveHdr)
{
GlobalUnlock(HWaveHdr);
GlobalFree(HWaveHdr);
GlobalUnlock(HData);
GlobalFree(HData);
return false;
}
//
zero-out the WAVEHDR then assign the lpData member of the
// header to the allocated data buffer and the dwBufferLength
// member of the header to the size of data block
ZeroMemory(lpWaveHdr,
sizeof(WAVEHDR));
lpWaveHdr->lpData
= lpData;
lpWaveHdr->dwBufferLength
= ulSize;
return
true;
}
void WaveFreeHeader(HGLOBAL
&HData, HGLOBAL &HWaveHdr)
{
GlobalUnlock(HWaveHdr);
GlobalFree(HWaveHdr);
GlobalUnlock(HData);
GlobalFree(HData);
}
|
In case you're wondering, the HGLOBAL type
is nothing more than a handle to a global memory object. To get the
actual pointer to this memory, use the GlobalLock() API function (don't
forget GlobalUnlock()).
Next come the waveIn*() wrapper functions.
You use the waveIn*() functions for recording, and the waveOut*() functions
for playback. The first function, WaveRecordOpen(), initializes a
WAVEFORMATEX structure required for the waveInOpen() API call -- this simply
opens the wave recording device. In this structure, you specify the
parameters of the recording including sampling frequency, number of channels,
bit rate, format, etc...
//---------------------------------------------------------------------------
/*********************************************************************\
*
WaveIn (recording) wrapper functions
\*********************************************************************/
bool
WaveRecordOpen(LPHWAVEIN lphwi, HWND Hwnd, int nChannels,
long lFrequency, int nBits)
{
WAVEFORMATEX wfx;
wfx.wFormatTag = WAVE_FORMAT_PCM;
wfx.nChannels = (WORD)nChannels;
wfx.nSamplesPerSec = (DWORD)lFrequency;
wfx.wBitsPerSample = (WORD)nBits;
wfx.nBlockAlign = (WORD)((wfx.nChannels * wfx.wBitsPerSample) / 8);
wfx.nAvgBytesPerSec = (wfx.nSamplesPerSec * wfx.nBlockAlign);
wfx.cbSize = 0;
MMRESULT result = waveInOpen(lphwi, WAVE_MAPPER, &wfx, (LONG)Hwnd,
NULL,
CALLBACK_WINDOW);
if (result == MMSYSERR_NOERROR) return true;
return false;
}
|
This next function, WaveRecordBegin() wraps
the preparation (waveInPrepareHeader()) and initialization (waveInAddBuffer())
of the header (and data block) that the device driver will use to store
the wave audio. You can later manipulate this data as you wish.
Finally, the function starts the recording process...
//---------------------------------------------------------------------------
bool
WaveRecordBegin(HWAVEIN hwi, LPWAVEHDR &lpWaveHdr)
{
MMRESULT result = waveInPrepareHeader(hwi, lpWaveHdr, sizeof(WAVEHDR));
if (result == MMSYSERR_NOERROR)
{
MMRESULT result = waveInAddBuffer(hwi, lpWaveHdr, sizeof(WAVEHDR));
if (result == MMSYSERR_NOERROR)
{
MMRESULT result = waveInStart(hwi);
if (result == MMSYSERR_NOERROR) return true;
}
}
return false;
}
|
The next two functions simply unprepare
the header (at which point you are free to manipulate it -- transform it,
filter it, copy it, etc.) and close the recording driver...
//---------------------------------------------------------------------------
void
WaveRecordEnd(HWAVEIN hwi, LPWAVEHDR &lpWaveHdr)
{
waveInStop(hwi);
waveInReset(hwi);
waveInUnprepareHeader(hwi, lpWaveHdr, sizeof(WAVEHDR));
}
void
WaveRecordClose(HWAVEIN hwi)
{
waveInReset(hwi);
waveInClose(hwi);
}
|
Finally comes the waveOut*() playback wrapper
functions. These are very similar to their recording counterparts.
The first function opens the playback device, the second function prepares
the header then plays the data using the waveOutWrite() API function.
The last two functions unprepare the header then close the device...
//---------------------------------------------------------------------------
/*********************************************************************\
*
WaveOut (playback) wrapper functions
\*********************************************************************/
bool
WavePlayOpen(LPHWAVEOUT lphwo, HWND Hwnd, int nChannels,
long lFrequency, int nBits)
{
WAVEFORMATEX wfx;
wfx.wFormatTag = WAVE_FORMAT_PCM;
wfx.nChannels = (WORD)nChannels;
wfx.nSamplesPerSec = (DWORD)lFrequency;
wfx.wBitsPerSample = (WORD)nBits;
wfx.nBlockAlign = (WORD)((wfx.nChannels * wfx.wBitsPerSample) / 8);
wfx.nAvgBytesPerSec = (wfx.nSamplesPerSec * wfx.nBlockAlign);
wfx.cbSize = 0;
MMRESULT result = waveOutOpen(lphwo, WAVE_MAPPER, &wfx, (LONG)Hwnd,
NULL,
CALLBACK_WINDOW);
if (result == MMSYSERR_NOERROR) return true;
return false;
}
bool
WavePlayBegin(HWAVEOUT hwo, LPWAVEHDR &lpWaveHdr)
{
MMRESULT result = waveOutPrepareHeader(hwo, lpWaveHdr, sizeof(WAVEHDR));
if (result == MMSYSERR_NOERROR)
{
MMRESULT result = waveOutWrite(hwo, lpWaveHdr, sizeof(WAVEHDR));
if (result == MMSYSERR_NOERROR) return true;
}
return false;
}
void
WavePlayEnd(HWAVEOUT hwo, LPWAVEHDR &lpWaveHdr)
{
waveOutReset(hwo);
waveOutUnprepareHeader(hwo, lpWaveHdr, sizeof(WAVEHDR));
}
void
WavePlayClose(HWAVEOUT hwo)
{
waveOutReset(hwo);
waveOutClose(hwo);
}
|
That's it for the wrapper functions.
If you notice, the WaveRecordOpen() and WavePlayOpen() functions take a
HWND as a parameter. This allows whatever window's handle you pass
into these functions to receive the MM_WIM_DATA message for recording status,
and the MM_WOM_DONE message for playback completion notification.
To take advantage of these messages, set up message handlers using the
BEGIN_MESSAGE_MAP() / END_MESSAGE_MAP() scheme. Here's some members
to add to the header of your form...
//---------------------------------------------------------------------------
//
in header...
unsigned long ulSize;
bool FAllocated;
bool FRecording;
bool FPlaying;
HWAVEIN hwi;
HWAVEOUT hwo;
HGLOBAL HData, HWaveHdr;
LPSTR lpData;
LPWAVEHDR lpWaveHdr;
void __fastcall MMWimData(TMessage &Msg);
void __fastcall MMWomDone(TMessage &Msg);
BEGIN_MESSAGE_MAP
MESSAGE_HANDLER(MM_WIM_DATA, TMessage, MMWimData)
MESSAGE_HANDLER(MM_WOM_DONE, TMessage, MMWomDone)
END_MESSAGE_MAP(TForm)
|
Next, implement your application using
an approach similar to this example...
//---------------------------------------------------------------------------
//
in source...
/*********************************************************************\
*
Example usage...
\*********************************************************************/
__fastcall
TForm1::TForm1(TComponent* Owner)
: TForm(Owner)
{
ulSize = 1000000; // size of data block
(let's use 1 meg)
FAllocated = false;
FRecording = false;
FPlaying = false;
TrackBar1->Max = 0xFFFF;
TrackBar1->Frequency = 1000;
DWORD current_volume;
waveOutGetVolume(0, ¤t_volume);
TrackBar1->Position = TrackBar1->Max - LOWORD(current_volume);
}
void
__fastcall TForm1::~TForm1()
{
if (FAllocated) WaveFreeHeader(HData, HWaveHdr);
if (FRecording) WaveRecordClose(hwi);
if (FPlaying) WavePlayClose(hwo);
}
void
__fastcall TForm1::RecordButtonClick(TObject *Sender)
{
// open recording driver...
if (WaveRecordOpen(&hwi, Handle, 2, 44100, 16))
{
// allocate buffer / header
if (WaveMakeHeader(ulSize, HData, HWaveHdr, lpData, lpWaveHdr))
{
FAllocated = true;
// start recording...
if (WaveRecordBegin(hwi, lpWaveHdr))
{
FRecording = true;
Caption = "Recording...";
}
}
}
}
void
__fastcall TForm1::StopButtonClick(TObject *Sender)
{
if (FRecording)
{
WaveRecordEnd(hwi, lpWaveHdr);
FRecording = false;
}
if (FPlaying)
{
WavePlayEnd(hwo, lpWaveHdr);
FPlaying = false;
}
}
void
__fastcall TForm1::TrackBar1Change(TObject *Sender)
{
DWORD new_volume =
(DWORD)MAKEWPARAM(TrackBar1->Max - TrackBar1->Position,
TrackBar1->Max - TrackBar1->Position);
waveOutSetVolume((HWAVEOUT)WAVE_MAPPER, new_volume);
}
//
These next Form methods are the message handlers. The first is the
MM_WIM_DATA
//
message handler. This message is sent when either the StopButton
is clicked
//
(recording terminated) or the device needs more data. So for example,
if the 1
//
meg that we allocated is filled, your message handler will fire.
You can see
//
that if you can't allocate enough memory in one shot, then you can use
this
//
handler to feed more memory to the driver. In this example, we simply
terminate
//
recording, then initiate playback. The MM_WOM_DONE message handler
is called
//
when the waveform is done playing...
void
__fastcall TForm1::MMWimData(TMessage &Msg)
{
Caption = "DONE RECORDING";
if (FRecording) WaveRecordEnd(hwi, lpWaveHdr);
WaveRecordClose(hwi);
if (WavePlayOpen(&hwo, Handle, 2, 44100, 16))
{
if (WavePlayBegin(hwo, lpWaveHdr))
{
FPlaying = true;
Caption = "Playing...";
}
}
}
void
__fastcall TForm1::MMWomDone(TMessage &Msg)
{
Caption = "DONE PLAYING";
WavePlayClose(hwo);
}
|
If you'll notice in the above code, I added
some volume functionality with a TTrackBar component. This simply
is used to manipulate the playback volume. The crucial functions
here are waveOutGetVolume() and waveOutSetVolume().
The main advantage of using the Waveform
API over the MCI is that you have direct access to the data, and can record
large amounts of data. The disadvantage is that it's a little lower
level and easier to halt your system since you're not going through the
Media Control Interface (MCI). Further, you'll have to use the multimedia
I/O API to write the data out to file (or do it manually) -- not nearly
as simply as the MCI or TMediaPlayer.
Download Demo and Sample Project
WaveDemo.zip (145 KB)
|