The ultimate objective of this project is to develop the next generation Text to Speech software for regional Indian languages like Hindi and Bengali which is called Embedded Shruti. The keyword here is "next generation"

Download 288.29 Kb.

Page	6/6
Date	31.07.2017
Size	288.29 Kb.
	#24974

1 2 3 4 5 6

4.3.1 Making voice database

In earlier models the sound files are saved in a directory called voice and according to the token name appropriate file is taken from the directory and appended to the existing output voice files after modifications. In this model the sound files are saved in GDBM database with key as the name of the sound file which is eventually the token name. For example if there is a file in the voice library called 0164179.wav then from this file name the token will be extracted which is filename-extension( 0164179 in this case) and the key value will be set to 0164179. After that the wav file 0164179.wav will be saved in the database with this key value. Later hindiengine will retrieve the sound file using this key value.

The database file voice.db was created on linux platform since directory scanning using system calls is quite easier in linux. GDBM is preinstalled on almost all linux platforms. A GDBM database file created on linux platform can be used in Windows CE using the GDBMCE library function calls. The following code shows how the voice database is made from the voice directory:

/* Linux code */

/* Header files needed */

#include

#include
//Directory pointer to read the contents of voice directory

struct dirent *dpointer;

main()

{

//define the database handler for the music database

GDBM_FILE dbf;

//To make the music database

datum key,content;

int i=0,size;

DIR *dirp;

char *name,*buffer;

char *path;

char *voice = "voice/";

FILE* fpt;

Steps by which music database will be created

1.The directory of sound files are transferred to linux machine.

2.Scan the directory,get the name of each file.

3.key = Name - trailing .wav

4.Content = The wav file.The size of the wav file can be obtained from the

file itself.The 40-44 bytes of the wav file gives the file size.

5.Accordingly the wav file will be stored by the key described above.

dbf=gdbm_open("voice.db",512,GDBM_WRCREAT,777,0);

path = (char*)malloc(30*sizeof(char));

if((dirp=opendir("voice"))==NULL)

{

fprintf(stderr,"Error opening voice\n");

perror("dirlist");

exit(1);

}

//Code to scan the directory and put each wav file into database.

while(dpointer=readdir(dirp))

{

if(i>1)

{

name = strtok(dpointer->d_name,".");

//Inserting values in key

key.dptr = name;

key.dsize = strlen(name);

strcpy(path,voice);

strcat(path,name);

strcat(path,".wav");

fpt = fopen(path,"r+");

fseek(fpt,40,0);

fread(&size,4,1,fpt);

fseek(fpt,0,0);

//Add the 44 bytes of the header

size=size+44;

buffer=(char*)malloc(size*sizeof(char));

fread(buffer,size,1,fpt);

//Inserting values in content

content.dptr = buffer;

content.dsize = size;

//Inserting in database

printf("Inserting in database File no : %d\n",i);

gdbm_store(dbf1,key,content,GDBM_INSERT);

free(buffer);

fclose(fpt);

}

i++;

}

closedir(dirp);

gdbm_close(dbf1);

}
/* compile line */

bash$cc voicedb.c –o voicedb -lgdbm

/* execution */

bash$./voicedb

This will create the voice database in the present working directory. The database will be transferred to Windows CE emulator or device for use.

4.3.2 Making tokens database

To make the tokens database first of all the GDBMCE dynamic link library should be loaded into memory. After that open a database file with the name tokens.db. This file will contain the token and type for characters of the input text provided by the frontend through the file TextIscii.txt. In the database file save the token and type as they are produced by Natural Language Processor.

An index value is maintained that starts with 0 and works as a key to this database. As a new set of token and type is added to the database according to the current index, the index value is increased by 1. Hindianalyser phase returns the value of index and the frontend passes this value to Hindiengine to retrieve all the tokens and their respective type.

When the token and type are inserted in database a delimited is added between the two. The following code shows the insertion technique:

char buffer[20];

char keybuffer[20];

//0 concatenated with word[i] gives the token name.

//1 is the token type in this case.

//delimiter | is added between the token name and type.

sprintf(buffer,"%d%d|%d",0,word[i],1);

//content is the datum variable needed for insertion

content.dptr = buffer;

content.dsize = strlen(buffer);

//key is the datum variable to hold the key which is index in this case

sprintf(keybuffer,"%d",index);

key.dptr = keybuffer;

key.dsize = strlen(keybuffer);

//dbftokens is the handler to the tokens database

//Function is called with GDBM_REPLACE as the argument so that the //database will be rewritten if the key already exists.

(*pgdbm_store)(dbftokens,key,content,GDBM_REPLACE);

//increase the index value

index = index + 1;

4.3.3 Producing the output sound file

Hindiengine will first get the token name and token type from tokens database. Once the token name is obtained, it is used as a key to retrieve the sound file from the voice database. The following code shows the whole process:

//Fill the key to retrieve content from the database

sprintf(tokenbuffer,"%d",index);

tokenskey.dptr = tokenbuffer;

tokenskey.dsize = strlen(tokenbuffer);

tokenscontent = (*pgdbm_fetch)(dbftokens,tokenskey);
//Get the token and the type from tokenscontent.

token = strtok(tokenscontent.dptr,"|");

ttype1 = strtok(NULL,"|");

ttype = atoi(ttype1);

//This code finds the length of the voice file and copies the header of //the wav file into an variable Header which will later be added to the //output sound file. In wav files 4 bytes after 40 bytes gives the size //of the file.

voicekey.dptr = token;

voicekey.dsize = strlen(token);

voicecontent = (*pgdbm_fetch)(dbfvoice,voicekey);

memcpy(Header,&voicecontent.dptr[0],40);

len = (voicecontent.dsize-44);

//This code copies the remaining wav file (except the first 44 bytes of //the header) into a temporary character array called data and it will //be concatenated to the output sound file later.

memcpy(data,&voicecontent.dptr[44],len);

This completes the description of model 3.
4.3.4 Advantages

This model is the most efficient model compared to other two models discussed above. Linear scans of the files are completely avoided and memory efficient functions like memcpy are used to increase the performance of the software. As all databases are kept on the disc, this model gives an upper bound on the performance. No efficient performance that always access disc rather than main memory cannot take time less than this implementation. No file operations are used in this implementation and therefore this is a robust implementation.

4.3.5 Drawbacks

This model also has the following drawbacks:

The tokens database is saved on disc rather than in main memory. So both in Hindianalyser and Hindiengine modules, to write the tokens and then again to read the tokens disc access is needed which takes more time compared to RAM access. A solution to this problem is to maintain the list of tokens in main memory rather writing it in database in disc. Another solution will be to maintain the tokens database in a Flash RAM rather than on the secondary storage which is the disc. Flash RAM access time is much less compared to disc access time.
Whenever a sound file is retrieved from the database disc is accessed since the voice database is saved on disc. This will take more time and affect the performance of the software.

These drawbacks lead to the development of Model 4 in which both these problems are addressed and possible solutions are suggested. The next section will explain the solutions that will increase the efficiency by removing the drawbacks of Model 3.

4.4 Model 4: Final Windows CE port

Model 4 is a hybrid model which judiciously uses main memory and secondary storage (disc) to obtain the optimal performance. Model 3 if implemented on a Flash RAM will give better performance but still the performance can be enhanced by using this hybrid approach.

4.4.1 Approach

This model does two important additions to the last model and attempts to increase the performance. The additions are the following:

The token list which was stored on the disc in last model will be stored in main memory so that the access will take less time. Since Embedded Shruti is built on Microsoft Foundation Classes for Windows CE, a number of MFC utility classes can be used to maintain a hash structure of the tokens. The hash facilitates retrieval. In this implementation a MFC class CMap is used to store the token name and the token type keyed by a variable index which is increased accordingly.
The other problem in model 3 was that each time a new sound file is accessed from the database, disc access time is needed. A cache structure is implemented to speed up sound file access. The cache structure stores N number of sound files on the main memory. N is selected according to the application. Typically N can be 20 sound files. When a new file is required first of all the cache is checked whether the file is there in the cache or not. If the file is available then the disc access time is not required. If the file is not available the file is brought from the database on disc and it will be saved in the cache. If the cache is full then an appropriate cache replacement strategy like Least Recently Used Algorithm (LRU) is used. The victim sound file will be chosen and removed from the cache. In its place the new sound file will be kept.

The modifications suggested will optimally use the Random Access Memory and the secondary disc so that Embedded Shruti gives the ideal performance. Till now the models are tested on Pocket-PC emulator and as mentioned in the API reference of the Pocket-PC, the performance of the software on real Pocket-PC will be better than the performance on the emulator.

This chapter will conclude with a graphical view of the model 4. The next chapter will give the performance comparison of different models on some input string. Since the input string remains same the time taken by different models will clearly differentiate between the performances of the implementations.

4.4.2 Dataflow Model

Chapter 5

Performance Comparison
In the last chapter different models for Embedded Shruti were discussed at length. This chapter will present the performance of each model on a given input text. This chapter will also specify the steps to run different models of Embedded Shruti on Pocket-PC emulator. Performance comparison is an important part of the development process since this phase determines which model scores over the others and therefore should be chosen as the final implementation that will be ported.
5.1 Performance Parameters

First of all the performance parameters are to be specified. For real time applications the performance should be measured on the basis of actual time taken. Windows CE is a real time operating system and therefore softwares running on real time applications. To measure the performance real time should be considered. For different models the time to get the output once the input text is supplied and the Analyse and Generate buttons are clicked will be considered. The model which scores better over others on this metric will be considered as the better implementation. There may be several other parameters like RAM space used and Disc space used but they are not of much interest. For Embedded Applications real time constraints should be satisfied. So the time taken for the application to execute is the most important performance parameter.

5.2 Simulations on test inputs

This section will explain how to run each model on Pocket-PC emulator and then to check the program on given test inputs. For every implementation the steps will be provided to run the application.

5.2.1 Model 1

For running the application on Model 1 the following steps should be done:

First of all compile the source code. The source code includes the source code for Frontend, and the source for the two dynamic link libraries Hindianalyser and Hindiengine. All the three modules will be compiled using the eMbedded Visual C++.
On the eMbedded Visual C++ IDE, specify the SDK as the Pocket-PC, and the next fields as Win32 (WCE x86 debug) and Pocket PC 2002 emulation.
If the compilation is successful, then the dynamic link libraries will be made and transferred into the Pocket-PC emulator.
The executable Frontend.exe will also be made and transferred into the Pocket PC’s default executable path.
The dynamic link libraries are copied into \Windows directory on the target emulator or the target device. An MFC dll is also copied to \Windows directory as MFC dll is needed to run the Frontend executable
Now before running the application upload the files epoch.txt and inton_bengali onto the emulator. To upload the files go to Tools  Remote File Viewer. Once the Remote File Viewer appears the files can be transferred onto the emulator or the device.
After the files are transferred the next step is to transfer the sound library on the emulator or target device. A new directory will be made called voice on the device and then using Remote File Viewer the wav files will be uploaded into the voice directory.
The application can be run by clicking on Frontend.exe on the start menu. The Frontend GUI will start and the Bengali text for Text-to-Speech conversion will be applied to it.
An example of a Bengali or Hindi Text can be “mera naam piyush hai”. This text will be entered in input area and then performance will be obtained by pressing the Analyse button and the Generate button.
A sound file will be generated as the output and will be played on the emulator or the device.
The next section will compare the real time performance of this model with other models.

5.2.2 Model 2

In this model GDBMCE library is used. Therefore first of all source code for GDBMCE library will be compiled.
Successful compilation will upload the gdbmce.dll to \Windows folder on the target device or the emulator.
epoch.txt and inton_bengali files will not be transferred to the device. Instead the hash database epoch1.db and inton_bengali.db will be transferred to the device.
Remaining all steps will be same as done in Model 1.

5.2.3 Model 3

This model don’t require the sound library (the folder consisting of wav files) to be transferred on the device. Instead in this model the voice database (voice.db) is transferred to the device.
Rest all steps remains same as Model 2.

5.3 Comparison of performance

The following table gives the performance of each model according to the time taken by the Hindianalyser module (clicking on the Analyse Button) and the time taken by the Hindiengine module (clicking on the Generate Button). The input text used for the

Comparison: “mera naam piyush hai”

Model Name

Hindianalyser Performance

Hindiengine Performance

Model 1

4 seconds

(Tokens in file)

70 seconds

(Very inefficient)

Model 2

4 seconds

(Tokens in file)

10 seconds

(Increase in efficiency)

Model 3

4 seconds

(Token database added)

5 seconds

(Best performance)

On the basis of this performance chart it can be concluded that Model 3 is the most efficient one and this is to be used for the final implementation. Model 4 is also suggested which is an extension of this model 3 and use a hybrid approach as discussed in the last chapter.

Chapter 6

Conclusions
This thesis provides the complete design and implementation of Embedded Shruti. It started with an introduction to technologies that were used in this software product. After that details of different models were provided. The last chapter provides a comparison of the performance of different models and the reason for choosing Model 3 as the final implementation.
Embedded Shruti has several advantages over the desktop version. To run the desktop version one need a desktop computer system that is costly compared to a Personal Digital Assistant like Pocket-PC. Pocket-PC is a mobile device and therefore the software can be used on the move by the user. A person with speech disorders only have to carry a PDA with Embedded Shruti installed on it. The person can communicate with others using the software and since it is installed on a PDA rather than a desktop computer he can take the software with him at any place.
I personally feel that Embedded Shruti realizes the dream of providing Shruti software to every person who needs it. For a person with speech disorders this software will be an integral part of life. Carry a Personal Digital Assistant having Embedded Shruti and you have the power to communicate with people despite your serious speech disorders.

Chapter 7

Future Work
Embedded Shruti is to be tested completely on a number of variable length test inputs. Hindianalyser module is tested completely but Hindiengine module is not rigorously tested. The software is giving perfect results for the input strings on which it is tested till now but still more testing is required.
The final version which is to be shipped to customers will save the voice database, intonation database and epoch database on a Flash rather than on the secondary storage of the Pocket-PC device. Flash comparatively takes less time than disc and so it will surely increase the performance. A Flash version of the code is to be written. In the Flash version of the code, the path of the database has to be changed. Presently since the database is on the root directory of Pocket-PC, the path is simply the name of the database like “voice.db”. For example when the database is opened, the path of the database is “voice.db”. But when a Flash will be connected to the device the path will change to “\Storage Card\voice.db”. In the Flash version of the code this change will be incorporated.
The software is tested on Pocket-PC emulator till now. Once the testing and debugging is done it will be ported on Pocket-PC hardware. The databases, the dynamic link libraries (hindianalyser, hindiengine and gdbmce) and the application Frontend.exe will be transferred to the device using eMbedded Visual C++. The environment is changed to Pocket-PC (default device) from Pocket-PC (Emulation) and the device will be connected to the development workstation using some COM port. After that the files can be transferred to Pocket-PC.
Chapter 8

Software Screenshots

8.1 Hindianalyser module on Standard SDK emulator (First port)

The output of Hindianalyser module is shown in the second text box. 0204 is the token and 0 is the token type. The output has repetitive sequence of token and token type.

8.2 Model 1 on Pocket-PC Emulator

8.2.1 Hindianalyser module execution: The text box contains the token and token type generated in this phase. The pairs generated are:

0204 0 0204172 3 0172 1 0172207 2 0207 0 -2 5 0198 0 0198164 3 0164 1 0164164 4 0164 1 0164204 2 0204 0 -2 5 0200 0 0200166 3 0166 1 -1 5 0205 0 0205168 3 0168 1 0168213 2 0213 0 -2 5 0216 0 0216173 3 0173 1 -2 5

8.2.2 Hindiengine Module execution: After the Analyse button is clicked the tokens are generated and displayed on the second text box. This completes the execution of Hindianalyser module. Hindiengine module is called when Generate button is clicked. After Generate is clicked the speech file will be generated and played. The second text box will be updated by the number of bytes in the tokens.txt file.

8.3 Model 3 on Pocket-PC emulator: Model 1 screenshots are shown above. Please refer chapter 4 for knowing in details about the models. Model 3 is implemented using only Hash databases and no file operations are used. This model works efficiently as compared to the other two models. Refer next page for Model 3 screenshots.
8.3.1 Hindianalyser module execution: In Model 3 before executing the Frontend, database files will be sent to the device. The database files are: inton_bengali.db (intonation database), epoch.db (epoch values) and the voice database (voice.db). This model gives the best performance as shown in Chapter 6.

8.3.2 Hindiengine Module execution: The next screenshot shows the application after the Hindiengine module is executed by clicking on Generate button. It plays the sound and gives the output as the total number of pairs in the tokens database.

The number of < token, token type> pairs generated from the input text is shown at the text box. The Hindiengine model execution is efficient compared to Model 1 and Model 2. Therefore this Model emerges as the winner and it will be used in the final version of Embedded Shruti.

Chapter 9

References

Embedded Shruti is an implementation project. The documentations which helped me in this project are listed chapter wise:

Chapter 1

None

Chapter 2

1. Programming Microsoft Windows CE (Second Edition) by Douglas Boling

2. Windows CE .NET documentation.

3. Pocket-PC SDK documentation.

4. Microsoft eMbedded Visual C++ documentation.

5. Microsoft SQL Server CE documentation.

6. GDBM man pages

Chapter 3

1. Choudhury M. Rule-based Grapheme to Phoneme Mapping for Hindi Speech Synthesis. Presented at the 90th Indian Science Congress of ISCA, Bangalore, 2003

2. Source code of Desktop version of Shruti

Chapter 4

1. GDBM man pages.

2. Source code of GDBM port to Windows CE.

3. Ronald Fagin, Jrg Nievergelt, Nicholas Pippenger, H. Raymond Strong, Extendible hashing - a fast access method for dynamic files, ACM Transactions on Database Systems, New York, NY, Volume 4 Number 3, 1979, pages 315-344.

The complete source code of the implementation is available in MediaLab, Indian Institute of Technology, Kharagpur. Take a look at the source code to understand how the software is working. All the models are implemented separately and you can yourself do a performance evaluation of the respective models.

Download 288.29 Kb.

Share with your friends:

1 2 3 4 5 6