The ultimate objective of this project is to develop the next generation Text to Speech software for regional Indian languages like Hindi and Bengali which is called Embedded Shruti. The keyword here is "next generation"

Download 288.29 Kb.

Page	5/6
Date	31.07.2017
Size	288.29 Kb.
	#24974

1 2 3 4 5 6

BOOL WriteFile(

HANDLE hFile,

LPCVOID lpBuffer,

DWORD nNumberOfBytesToWrite,

LPDWORD lpNumberOfBytesWritten,

LPOVERLAPPED lpOverlapped );

Parameters

hFile

Handle to the file to be written to. The file handle must have been created with GENERIC_WRITE access to the file.

lpBuffer

Pointer to the buffer containing the data to be written to the file.

nNumberOfBytesToWrite

Number of bytes to write to the file.

A value of zero specifies a null write operation. A null write operation does not write any bytes but does cause the time stamp to change. WriteFile does not truncate or extend the file. To truncate or extend a file, use the SetEndOfFile function.

Named pipe write operations across a network are limited to 65,535 bytes.

lpNumberOfBytesWritten

Pointer to the number of bytes written by this function call. WriteFile sets this value to zero before doing any work or error checking.

lpOverlapped

Unsupported; set to NULL.

Return Values

Nonzero indicates success. Zero indicates failure. To get extended error information, call GetLastError.

Remarks

If part of the file is locked by another process and the write operation overlaps the locked portion, this function fails.

Accessing the output buffer while a write operation is using the buffer may lead to corruption of the data written from that buffer. Applications must not read from, write to, reallocate, or free the output buffer that a write operation is using until the write operation completes.

After understanding the basic operations to Create, Read or Write in file in Windows CE, the crude way of porting is to replace all the file operations like fopen, fread, fwrite, fclose, fseek by the CreateFile, ReadFile and WriteFile.

The following diagram shows the file operations done in the hindianalyser native code which are replaced by CreateFile, ReadFile and WriteFile operations to make it compatible with Windows CE API.

As discussed above clicking the Analyse button takes the input and invokes hindianalyser dll and return the number of tokens written in tokens file.

Each file operation is implemented using the Windows CE API now. After this stage the tokens are saved in a file Tokens.txt on disc. The file contains token in this form:

The token name: For example token name can be 0704 which is the name of the sound file corresponding to this token and which will be obtained in the hindiengine phase.
The token type: Token type specifies that whether this is a vowel or a consonant and on the basis of that the sound generation algorithm works.

Please look at the references to know the algorithms used in Shruti which are also used in the Embedded version.

Another important difference between Win32 API and Windows CE API is the memory allocation techniques. C type memory allocation (malloc) doesn’t work on Windows CE API. The following function should be used in place of that:

char *s;

s = (char *)LocalAlloc(LMEM_FIXED,cBytes);

LocalAlloc allocate memory on local heap of size cBytes and returns a pointer that is stored in s. Local heap is always there in Windows CE by default. But the developer can declare heaps on their own and write efficient memory code.
3. Generate button: On clicking the generate button first of all the Hindiengine library is loaded into the main memory. After that the exported function from the Hindiengine library is called with tokenLength as the input where tokenLength is the number of bytes in the Tokens.txt file. Generate function from the dll read the Token.txt file, retrieve the tokens and token type from the file and then concatenates the sound files according to tokens into one file.

For example if token is 0704, then it will retrieve the sound file 0704.wav from a sound library (in this model the sound library is a directory which contains all the sound files)

Thus for all tokens the sound files will be read from the directory, appropriately concatenated and the output sound file will be produced.

In this dll Windows CE counterpart for fread, fwrite and fseek were written which will be given shortly but before that the last function of the generate button click is to play the sound file generated by the dll function.

//Code to play a wav format file on Windows CE

MMRESULT PlayWave(LPCTSTR szWavFile)

{

HWAVEOUT hwo;

WAVEHDR whdr;

MMRESULT mmres;

CWaveFile waveFile;

HANDLE hDoneEvent = CreateEvent(NULL, FALSE, FALSE, TEXT("DONE_EVENT"));

UINT devId;

DWORD dwOldVolume;

// Open wave file

if (!waveFile.Open(szWavFile)) {

TCHAR szErrMsg[MAX_ERRMSG];

_stprintf (szErrMsg, TEXT("Unable to open file: %s\n\n"),szWavFile);

MessageBox(NULL, szErrMsg, TEXT("File I/O Error"), MB_OK);

return MMSYSERR_NOERROR;

}

// Open audio device

for (devId = 0; devId < waveOutGetNumDevs(); devId++) {

mmres = waveOutOpen(&hwo, devId, waveFile.GetWaveFormat(), (DWORD) hDoneEvent,

0, CALLBACK_EVENT);

if (mmres == MMSYSERR_NOERROR) {

break;

}

}

if (mmres != MMSYSERR_NOERROR) {

return mmres;

}

// Set volume

mmres = waveOutGetVolume(hwo, &dwOldVolume);

if (mmres != MMSYSERR_NOERROR) {

return mmres;

}

waveOutSetVolume(hwo, 0xFFFFFFFF);

if (mmres != MMSYSERR_NOERROR) {

return mmres;

}

// Initialize wave header

ZeroMemory(&whdr, sizeof(WAVEHDR));

whdr.lpData = new char[waveFile.GetLength()];

whdr.dwBufferLength = waveFile.GetLength();

whdr.dwUser = 0;

whdr.dwFlags = 0;

whdr.dwLoops = 0;

whdr.dwBytesRecorded = 0;

whdr.lpNext = 0;

whdr.reserved = 0;

// Play buffer

waveFile.Read(whdr.lpData, whdr.dwBufferLength);

mmres = waveOutPrepareHeader(hwo, &whdr, sizeof(WAVEHDR));

if (mmres != MMSYSERR_NOERROR) {

return mmres;

}

mmres = waveOutWrite(hwo, &whdr, sizeof(WAVEHDR));

if (mmres != MMSYSERR_NOERROR) {

return mmres;

}

// Wait for audio to finish playing

while (!(whdr.dwFlags & WHDR_DONE)) {

WaitForSingleObject(hDoneEvent, INFINITE);

}

// Clean up

mmres = waveOutUnprepareHeader(hwo, &whdr, sizeof(WAVEHDR));

if (mmres != MMSYSERR_NOERROR) {

return mmres;

}

waveOutSetVolume(hwo, dwOldVolume);

if (mmres != MMSYSERR_NOERROR) {

return mmres;

}

mmres = waveOutClose(hwo);

if (mmres != MMSYSERR_NOERROR) {

return mmres;

}

delete [] whdr.lpData;

waveFile.Close();

return MMSYSERR_NOERROR;

}
Take a look at the source code to understand the sound producing code clearly.
Now let’s take a look at the file operations done in the hindiengine.dll in the following diagram and the respective operations to do fread, fwrite and fseek:

From the figure it is clear that hindiengine native code contained a number of file operations which are modified using Windows CE API functions to port it to Pocket-PC running Windows CE.

Mapping of fread, fwrite, fseek and fscanf to Windows CE API functions:

fread: It can be implemented using ReadFile.

fwrite: It can be implemented using WriteFile.

fseek: Read the file till the desired position.

fscanf: fscanf operation can be implemented by reading the file byte by byte and then putting the characters on a temporary array till the delimiter(say a blank) and then changing it to appropriate format like an integer or a string. The following code snippet retrieves token and token type from the tokens.txt file and save the token in a string and the token type in integer variable.

//Code to implement fscanf

do //Read tokens till the phrasal boundary

{

while(no_of_blanks!=2){

ReadFile(fin,&c,1,&bytesRead,NULL);

length+=1;

if(c==' ')

no_of_blanks+=1;

if(no_of_blanks==0){

token[tokencount]=c;

tokencount+=1;

}

if(no_of_blanks==1){

ttype1[ttypecount]=c;

ttypecount+=1;

}

if(length==tokenLength)

break;

}

token[tokencount]='\0';

ttype1[ttypecount]='\0';

tokencount=0;

ttypecount=0;

no_of_blanks=0;

ttype=atoi(ttype1);

} while(length !=tokenLength);

The variable token contains the token as a string and the variable ttype contains the type of the token. To read from the files like tokens.txt, intonation and epoch the above substitute of fscanf is used. To read the wav files fread and fseek are sufficient since the size of the wav file can be obtained by reading 4 bytes of the wav file after 40^th byte. Thus using fseek and fread Windows CE implementation the wav files can be read from the sound file library (a directory in this implementation) and concatenated according to the ILPS algorithm to get the speech.

Please take a look at the reference section to get the algorithm used in NLP module and ILPS module.

4.1.1 Drawbacks

Take a close look at the fscanf implementation written above. The main drawback of this model is that the operations like fscanf takes time proportional to the number of characters in the file, which is not the case when fscanf is implemented using operating system directives. The file is not read byte by byte but in blocks and thus fscanf is implemented efficiently rather than reading one byte at a time. In this implementation the whole file is to be read character by character and linear scanning takes time. Also there are intonation file and epoch file which if read character by character takes high amount of time. In epoch file there is epoch value corresponding to a given token and this model linearly search for the epoch value corresponding to a given token. Linear search is expensive and therefore this is another main disadvantage of this model. Another important inference that can be made from this model is that if the token value is assumed as a key then epoch value and the sound file can be retrieved using that key value. This observation leads to the use of extendible hash based database in subsequent models. In the next model first an extendible hash based database will be explained and then the implementation of Embedded Shruti with this database will be presented.

4.2 Model 2: Windows CE port using GDBM (Without voice.db)

In Chapter 2 GNU Database manager was introduced and the reason for choosing it in place of Microsoft SQL server for CE was explained. The crude port discussed in last model has several disadvantages and in this model the linear scan required in epoch file was avoided using an extendible hash based database called GDBM. GNU Database manager was ported on Windows CE platform. Take a look at the source code of GDBMCE for details.

At this point of time it is important to understand the meaning of extendible hashing since this application needs a hash based database not a SQL supported database. Each line of the Epoch file contains the first entry as the token and it is followed by 4 different epoch values to be used in ILPS algorithm (Hindiengine module). Extendible hash based databases are very efficient when retrieval is to be done by the specified key value (in this case the token name like 0704) and the complexity of retrieval operation is O(1+alpha) where alpha is load factor which is nearly 0 for a balanced database. In this model the epoch file is read and saved in GDBM database using the token name as the key and the value being the epoch. Four epoch databases are made which contains the following:

epoch1.db  Key is the token name and the value is the first epoch value specified on the line corresponding to that particular token name on the epoch.txt file.

epoch2.db  Key is the token name and the value is the second epoch value specified on the line corresponding to that particular token name on the epoch.txt file.

epoch3.db  Key is the token name and the value is the third epoch value specified on the line corresponding to that particular token name on the epoch.txt file.

epoch4.db  Key is the token name and the value is the fourth epoch value specified on the line corresponding to that particular token name on the epoch.txt file.

For example take one line from epoch.txt file:

0165179 104 206 307 409

epoch1.db  Key is 0165179, values is 104

epoch2.db  Key is 0165179, values is 206

epoch3.db  Key is 0165179, values is 307

epoch4.db  Key is 0165179, values is 409

The present version of Embedded Shruti uses epoch1.db. For producing better quality speech later versions of the software might use the other epoch database files.

Intonation file is also saved into a GDBM database and used accordingly in the program. Thus the new model of the hindiengine can be represented by the following picture:

Both the epoch database and the intonation database are saved on disc(secondary storage rather than main memory or RAM). The advantage is that linear scan is avoided now and the epoch can be obtained in almost O(1) time provided the key value of the epoch which is the token name.

After understanding the basic structure of this model, let’s take a detailed look on extendible hashing and why it is the most efficient data structure when a values is to retrieved according to the key value.
4.2.1 Extendible hashing

Traditional hash methods are burdened with 2 disadvantages:

Sequential processing of a file according to the natural order on the keys is not supported.
They are not extendible.

Hash table size is pre-determined

Hash table size heavily relies on hash function

Overestimation of the number of records results in wasted space.

Underestimation of the number of records results in rehashing

Extendible hashing method allows hashing to adapt to dynamic files. Hash tables are naturally balanced. By extending the hash address space from the directory address space, hash tables can be made extendible.

4.2.2 Extending hash tables

Assumptions:

A hash function, h, exists.
If K is a key, then K’ = h(K) is a pseudokey.

File is structured into two levels

Leaves: contain (K, I(K)) (I(K) is the information associated with K)

Contains a header that stores the local depth

Directory: the record associated with K or a pointer to the record

Contains a header that stores the depth

Contains pointers to leaf pages

Example

The following figures explain the working of extendible hash structures.

Figure 1

Figure 2

Figure 3

4.2.3 Using GDBMCE library

The gdbmce.dll library exports all the functions to do the database operations. In Chapter 2 all the functions of GDBMCE was explained and all functions are ported on Windows CE platform. The .def (definition) file for the gdbmce dynamic link library exports the following functions which can be accessed by the pointers as discussed in the last model.

//GDBMCE .def file

LIBRARY GDBMCE

EXPORTS
gdbm_open

gdbm_close

gdbm_store

gdbm_fetch

gdbm_delete

gdbm_firstkey

gdbm_nextkey

gdbm_reorganize

gdbm_sync

gdbm_exists

gdbm_setopt

gdbm_errno

gdbm_version

Code snippets that are used to do the database operations using gdbmce.dll.

//Database variable

GDBM_FILE dbf;

//Variables to work with the database

datum key, content;

datum is a data structure defined in gdbmce.h which has two important members. The first member of the structure is a pointer to character array while the next member of the character array is an integer which stores the number of elements in the character. Using this data structure the values are stored and retrieved from the database.

//Define the function pointers to call the gdbm functions

typedef GDBM_FILE(*GDBMOpen_ptr)(WCHAR*,int,int,int,void*);

typedef void(*GDBMClose_ptr)(GDBM_FILE);

typedef int(*GDBMStore_ptr)(GDBM_FILE,datum,datum,int);

typedef datum(*GDBMFetch_ptr)(GDBM_FILE,datum);

//Instance to call the dll

HINSTANCE hInst1;

//Loading the engine dll

hInst1 = ::LoadLibrary(L"gdbmce.dll");

//Getting the function pointers of 4 GDBM functions
GDBMOpen_ptr pgdbm_open=(GDBMOpen_ptr)GetProcAddress(hInst1,L"gdbm_open");

GDBMClose_ptr pgdbm_close=(GDBMClose_ptr)GetProcAddress(hInst1,L"gdbm_close");

GDBMStore_ptr pgdbm_store=(GDBMStore_ptr)GetProcAddress(hInst1,L"gdbm_store");

GDBMFetch_ptr pgdbm_fetch=(GDBMFetch_ptr)GetProcAddress(hInst1,L"gdbm_fetch");

If the pointers (pgdbm_open. pgdbm_close, pgdbm_store, pgdbm_fetch) are NULL then the functions are not exported by the dynamic link library. It is always advisable to check whether the function pointers are NULL or not.

//Opening a database

//Database reader

dbf = (*pgdbm_open)(newstring,512,GDBM_READER,777,0);

newstring contains the name of the database to open. 512 specifies the block size in which the data will be accessed from the disc. GDBM_READER specifies that the database is to be opened in read mode. 777 defines the mode of the database file which means read, write and execute permission on the database file thus created. 0 refers to the default value that should be passed to the error function.

//Database writer

dbf = (*pgdbm_open)(newstring,512,GDBM_WRITER,777,0);

GDBM_WRITER specifies that the database is opened for writing. It also requires that the database should be present on the disc.

//Create a database

dbf = (*pgdbm_open)(newstring,512,GDBM_WRCREAT,777,0);

This will create a database if the database doesn’t exist and provide both a reader and writer for the database. *pgdbm_fetch reads the database and retrieves value corresponding to a given key value while *pgdbm_store writes into the database according to the key value thus provided.

//Storing into a database

Suppose there are two strings. The first string contains the key which is the tokenname (“0704” for example) and the second string which is epoch contains the epoch value corresponding to the key (“107” for example). Now to store the information into the database the following function will be used:

//Storing the key and its size into the datum structure

key.dptr = tokenname;

key.dsize = strlen(tokenname);

//Storing the value and its size to datum structure

content.dptr = epoch ;

content.dsize=strlen(epoch);

//Storing into the database after successful opening

(*pgdbm_store)(dbf, key, content, GDBM_INSERT);

Take a look at the GDBM_INSERT option on the function call. GDBM_INSERT parameter inserts the value corresponding to the key. If the key exists then the store operation will fail. GDBM_REPLACE is used in those cases where the key already exists and the value needs to be changed.

//Fetch pairs from the database

Suppose the epoch value corresponding to the token name (“0165179” for example) is to be fetched from a database that is opened successfully in GDBM_READ mode. The following code will be used to fetch the values:

//Storing the key and its size into the datum structure

key.dptr = tokenname;

key.dsize = strlen(tokenname);

//fetch the value corresponding to the key

content = (*pgdbm_fetch)(dbf,key);

//copy the content to a string called epoch

strcpy (epoch,content.dptr);

int epochval = atoi(epoch);

Thus as the operation completes, epochval will contain the integer value of the epoch string that is fetched from the database.
4.2.4 Advantages

Model 2 avoids linear scan of epoch file and the intonation file. Therefore this implementation is much faster and efficient compared to Model 1. A performance comparison will be given in next chapter where both the implementations are checked on some given input. For this application where pairs are to be retrieved efficiently, Extendible Hashing is the best data structure and GDBMCE uses extendible hashing.

4.2.5 Drawbacks

There are two drawbacks in this model:

The Tokens.txt file generated after hindianalyser phase is scanned linearly by the hindiengine dynamic link library to retrieve the token name and the token type. Refer to the implementation of fscanf using ReadFile in Model 1 section of this chapter. So in the next model to avoid this linear scan token name and token type are also saved in a GDBMCE database using an index value as key which starts from 0 and increases till there are more tokens.
The second drawback of this model is that the sound library is still a directory which contains the wav files. In the next model the sound files are also kept in a GDBM database as the wav files also have the token name as the key and the value is the sound file. A directory with a number of wav files are replaced by this one file called voice.db. It facilitate the transfer of this single file to the Windows CE device. Also this single file can be provided on flash ram accompanying the application. Also making the voice database will also avoid linear scanning of voice files during the generation of the speech. Since now the voice file is saved on the database as a character array, efficient array operations like memcpy can be used to avoid the linear scan of the sound files that was done in the present model.

These drawbacks resulted into the development of a third model in which the voice database is added, along with the epoch database and the intonation database that were already there. The tokens are also saved in a database and no normal file operations are used in this version. That makes the third model as the most robust model out of the other two models. Let’s now take a look at the changes done in third model.

4.3 Model 3: Windows CE port using GDBM (With voice.db)

First take a look at the new dataflow structure of Hindiengine module. The following figure displays the dataflow structure after the GDBM databases are added:

The modifications that are made in Model 2 to obtain Model 3 were already discussed in the last section. Model 3 is the most efficient implementation so let’s take a complete pictorial view of Embedded Shruti in this implementation.

In this implementation tokens database and voice database were added. The basic database operations that were discussed in last section will hold well in this model also.

Download 288.29 Kb.

Share with your friends:

1 2 3 4 5 6