Convert Text to Speech and Speech to Text in Python

FREE Online Courses: Your Passport to Excellence - Start Now

There are times when we require an application to read out the text such as phones or even during transcription and when we need audio to be converted to text for usage such as note-taking etc. In this article, we will see a simple implementation of Speech to text and text to speech conversion project using two libraries: SpeechRecognition and GTTS.

Speech to text and text to speech converter in python project:

To implement the project, we need to install two libraries and an additional library. This project is for beginners, hence no prerequisite knowledge is required

Project Prerequisites:

We use tkinter to build the GUI of the application which is a built-in library in python. Import it to test its installation and refer to the following in case an error pops up: Tkinter Installation.

The other modules to install using pip are:

pip install SpeechRecognition
pip install gTTS
pip install PyAudio

Download code:

You can download the source code in the given link: Text to Speech and Speech to Text Project

Project File Structure:

Below is the flow of the project.

Importing necessary libraries
Declaring functions to view languages and their codes, convert text to speech and speech to text
Creating a user interface

1. Importing necessary libraries:

#Convert Speech to text and text to Speech: PythonGeeks
#import packages
from gtts import gTTS, lang
import os
from tkinter import *
from tkinter import messagebox
import speech_recognition as sr

Code Explanation:

from gtts import gTTS, lang: To convert text to speech, we use the library gtts, Google Text to Speech.
import os: To control the system, we use this library. Using this library we can create, switch, view or delete directories and so on.
from tkinter import *: Create the user interface using Tkinter which contains widgets and support to receive user input.
from tkinter import messagebox: Prompt the user using dialog boxes using messagebox in tkinter
import speech_recognition as sr: Import the module to convert speech to text.

2. Declaring functions to view languages and their codes, convert text to speech and speech to text:

#define functions
#text to speech conversion
def text_to_speech():
       #read inputs given by user
       text = text_entry.get("1.0","end-1c")
       language = accent_entry.get()
       #Check if the user submitted inputs
       if (len(text)<=1) | (len(language)<=0):
               messagebox.showerror(message="Enter required details")
               return     
       #Using the inputs, convert the text to speech
       speech = gTTS(text = text, lang = language, slow = False)
       #save the speech to an MP3 file
       speech.save("text.mp3")
       #Play the file using mpg123 in linux and start in windows
       os.system("mpg123 "+"text.mp3")

Code explanation:

def text_to_speech(): Declare the function text_to_speech to initialise text to speech conversion.
text = text_entry.get(“1.0″,”end-1c”): Obtain the contents of the text box using get. Since it is a Text widget, we specify the index of the string in get() to retrieve it. “ 1.0 ” indicates the start index and ”end-1c” is the last index
language = accent_entry.get(): To get the accent, we use get() on the entry box, accent_entry. Since it is an entry widget, we don’t specify any index here
If….return: Check if the user filled both fields and raise a prompt if anything is missing, using showerror() function of messagebox module. Message contains the message to tell to the user
speech = gTTS(): Use the gTTS() module to convert text to speech with the parameters: text, language accent and slow set to False. Slow indicates the speed of the audio.
speech.save(): Save the speech audio file using save. Specify the file name with the extension
os.system(): To play the audio, we use the system function of the OS library. This allows it to control the mp3 player of the laptop. We use mpg123 followed by the file name to play the file.
If mpg123 is not on your system, install it using sudo apt install mpg123

#List the supported languages and their keys
def list_languages():
       #access languages and access codes using lang.tts_langs()
       messagebox.showinfo(message=list(lang.tts_langs().items()))

Code explanation:

def list_languages(): Declare the function list_languages() to display the supported languages..
messagebox.showinfo(): Display the languages in a prompt to the user. This is analogous to showerror() with the difference being the symbol in the prompt box. Since lang.tts_langs() is a dictionary, we access the items using items and display it as a list

#Python speech to text conversion
def speech_to_text(): 
      
       #Initialise the recognizer class
       recorder = sr.Recognizer()
       try:
               duration =int(duration_entry.get())
       except:
               messagebox.showerror(message="Enter the duration")
               return
       #use the microphone
       messagebox.showinfo(message="Speak into the microphone and wait after finishing the recording")  
       with sr.Microphone() as mic: 
               #Prompt the user to record
               #Record audio from the user
               recorder.adjust_for_ambient_noise(mic)
               audio_input = recorder.listen(mic, duration=duration)   
               try:                        #Convert to text
                       text_output =recorder.recognize_google(audio_input)
                       #Display the output
                       messagebox.showinfo(message="You said:\n "+text_output)       
               except:
                        messagebox.showerror(message="Couldn't process the audio input.")

Code explanation:

def speech_to_text(): Declare the function speech_to_text() to initialise speech to text conversion.
recorder = sr.Recognizer(): Initialise the recogniser class to the recorder.
duration: Read the duration from the duration_entry widget using get() and typecast it to int()
with sr.Microphone() as mic: Activate the microphone and use it to record the audio
recorder.adjust_for_ambient_noise(mic, duration=0.2): Adjust the recorder for ambient noise using the parameters mic and set the duration
audio_input = recorder.listen(mic): Listen to audio using the listen function of the recorder
text_output: Send the recorded audio to recognize_google and display the text in a prompt

3. Creating a user interface:

#Invoke call to class to view a window
window = Tk()
#Set dimensions of window and title
window.geometry("500x300")
window.title("Convert Speech to text and text to Speech: PythonGeeks")
title_label = Label(window, text="Convert Speech to text and text to Speech: PythonGeeks").pack()
#Read inputs
#text_to_speech input
text_label = Label(window, text="Text:").place(x=10,y=20)
text_entry = Text(window, width=30,height=5)
text_entry.place(x=80,y=20)
#Accent input
accent_label = Label(window, text="Accent:").place(x=10,y=110)
accent_entry = Entry(window,  width=26)
accent_entry.place(x=80,y=110)
duration_label = Label(window, text="Duration:").place(x=10,y=110)
duration_entry = Entry(window,  width=26)
duration_entry.place(x=80,y=140)
 
#Perform the functions
button1 = Button(window,text='List languages', bg = 'Turquoise',fg='Red',command=list_languages).place(x=10,y=190)
button2 = Button(window,text='Convert Text to Speech', bg = 'Turquoise',fg='Red',command=text_to_speech).place(x=130,y=190)
button3 = Button(window,text='Convert Speech to Text', bg = 'Turquoise',fg='Red',command=speech_to_text).place(x=305,y=190)
 
#close the app
window.mainloop()

Code explanation:

window = Tk(): Initialise the window with tkinter constructor to use the objects and widgets
window.geometry(“500×300”): Set the dimensions of the window by specifying the width and the height of the window
window.title(): Give the application window a title
title_label, text_label, duration_label, accent_label: Define a label with the parameters: window of the screen and the text to display. Labels cannot be copied or edited.
accent_entry, duration_entry: Entry widget is an input field to obtain user input. Specify the width of the widget using the width parameter.
text_entry: Text widget is another input field to obtain user input. Use this widget to obtain long lines of text from the user. Specify the height and width of the text box.
place(): Place is another positioning element analogous to pack(). Here we specify the distance from the left margin and the top margin in the x and y coordinates respectively.
pack(): Pack() is automatic formatting which positions the element in the center of a row.
button1, button2, button3: Buttons perform a function when the user selects it. The parameters are window of the application, name of the button, background color of the application, text color using the foreground, and the function call is invoked using command parameter.
window.mainloop(): When the user terminates the application, the control flows beyond this line thereby terminating the application. Widgets placed after this line will not be displayed

Python Text to Speech & Speech to Text Output

Run the python text to speech program and get the following output:

Summary

We have successfully implemented speech to text and text to speech convertor program in python. This project provides practical exposure to various python libraries.

Convert Text to Speech and Speech to Text in Python

Speech to text and text to speech converter in python project:

Project Prerequisites:

Download code:

Project File Structure:

1. Importing necessary libraries:

2. Declaring functions to view languages and their codes, convert text to speech and speech to text:

3. Creating a user interface:

Python Text to Speech & Speech to Text Output

Summary

Leave a Reply Cancel reply