Photograph of Prof. Sugata Sanyal
Invited Talks

Details of work done in the area Speech Research and System Development

  1. Voice Project : Voice Oriented Interactive Computing Environment (VOICE) project was undertaken as a part of Knowledge Based Computer System (KBCS) Project, Phase II, under the aegis of (the then) Department of Electronics (DOE). I was the Co-Principal Investigator (Co-PI) from the time of initiation (1995) and was the Principal Investigator (PI) from 1998 till the end of the project (2000).

    The project was aimed to build application prototypes of voice input / output systems in Indian Languages, by utilizing the technology developed at TIFR during KBCS Project, Phase I. The systems developed were : (1) An isolated word speaker independent speech recognition system, with 200 word vocabulary. It used HMM methodology and was trained by about 100 speakers that included male, female and children. The queries were organized in a menu structure and were used in a travel guide that accepts query (of a single menu word) by voice. The recognition accuracy for the given application was near 100%. (2) A text-to-speech systems for Hindi. It accepts Hindi text in an ITRANS-like transliteration code and generates corresponding Hindi Speech. The output was acceptably intelligible. (3) A continuous Hindi Speech Database. This contains 800 sentences that were designed to be phonetically balanced. Number of speakers was about 100. The CD of the database was handed over to the sponsoring agency (DOE). [1995-2000]
  2. Text-to-Speech Systems: Under the KBCS [Knowledge Based Computer Systems] project, the development of a text-to-speech system for Hindi was completed. Considerable progress was made in developing a similar system for Indian English. A morphological parser was designed for extracting root words from the text for the purpose of subsequent dictionary look-up. Compilation of a phonetic dictionary for Indian English was under progress. [1998-1999]
  3. Speech Recognition :  As an implementation of the TIFR-developed HMM based isolated word, speaker independent speech recognizer, the development of a speech activated `travel guide’ was undertaken under the KBCS project. Basic blocks of the system were developed and were working with selective travel information, which could be augmented to make the system a practically useful one. The system prompted the user to specify the information needed by `speaking out’ any of the menu words displayed. There are several routes which can thus be traveled. On recognizing the spoken word, the system prompted for further clarification of the message with textual captions, graphics (maps etc.) and voice. [1998-1999]
  4. Information Access by Dialing: Work was carried out in the area of Speech Recognition and Synthesis for automatic interaction with Computers. A system for accessing information from a database by voice, through telephone, was proposed.

    The recognizer was based on Hidden Markov Model technique. Specifically, a strict left-to-right model was employed and continuous Gaussian mixture density was used. The state transitions were restricted to only forward transactions up to two immediately following states. The system was initially trained by Indian speakers. The recognition accuracy was around 90%. The system was trained by a large number and wider variety of people. “Word Spotting” technique was pursued to speed up information retrieval process.

    The R & D work on Speech Synthesis was based on Formant Synthesis. A “Text Analyzer” generated phoneme symbols and stress/punctuation markers from the input text. A context-sensitive, language-specific rule module generated formant and other features corresponding to the utterance for the final stage which was implemented by a series of filters, modeling the vocal tract, excited by some source which was either a series of quasi-periodic pulses, simulating the vibration of the vocal chord or random noise, simulating the generation of noisy sound.

    The Rule based phoneme to Acoustic-phonetic Parameter conversion stage was the basic area of focus for research. The control parameters of this model e.g. energies, pitch, `formant’ frequencies were to be varied to produce the desired utterances, specified by the phoneme strings. The rule base, which governed the control parameters, was very much language dependent, was dependent on phonetic context and it was quite complex and extensive. The rules were developed for Hindi and Indian English. [1997-1998]
  5. Real-time Synthesizer: The speech synthesizer software system (developed for the VOICE project) was ported to SUN 10/51 and Pentium Machine platform. Structural modifications of the same were also carried out for faster program execution. The result was real-time synthesis on the above platforms. It is significant that less than a second is now needed to synthesize a second of synthetic speech. This is important for the following reasons: (a) With real time synthesis, it is possible to read text from a file, synthesize and continue to play back the synthesized speech without any break. (b) Fast synthesis results in drastic reduction in the time for synthesis experiments. [1995-1996]