Skip to main content

Python Project: Audio Transcription and Text-to-Speech Conversion Using Wav2Vec2 and Pyttsx3

 Explore an advanced Python project that combines audio transcription and text-to-speech synthesis using state-of-the-art tools like Librosa, PyTorch, and Hugging Face's Transformers library. This script demonstrates how to load and resample audio files, transcribe speech to text using Facebook's Wav2Vec2 model, and convert text back to speech with customizable voice options using pyttsx3. Perfect for anyone interested in speech processing, AI-driven voice technology, or natural language processing projects. Ideal for enhancing your Python skills and diving into real-world applications of AI in audio analysis.



import librosa

from scipy.signal import resample

import torch

from transformers import Wav2Vec2ForCTC, Wav2Vec2Tokenizer

import pyttsx3

from scipy.signal import resample


# Load audio file

audio_file = "directory of audio file"

audio, sr = librosa.load(audio_file, sr=None)



def resample_audio(audio, orig_sr, target_sr):

    duration = audio.shape[0] / orig_sr

    target_length = int(duration * target_sr)

    resampled_audio = resample(audio, target_length)

    return resampled_audio


# Example usage:

# resampled_audio = resample_audio(audio, 48000, 16000)


# Resample if necessary

if sr != 16000:

    audio = resample_audio(audio, sr, 16000)

    sr = 16000


print("Audio loaded and resampled successfully.")


# Load Wav2Vec2 model and tokenizer

tokenizer = Wav2Vec2Tokenizer.from_pretrained("facebook/wav2vec2-base-960h")

model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-960h")


print("Model loaded successfully.")


# Tokenize input

input_values = tokenizer(audio, return_tensors="pt").input_values


# Perform inference

with torch.no_grad():

    logits = model(input_values).logits


# Get predicted ids

predicted_ids = torch.argmax(logits, dim=-1)


# Decode the ids to text

transcription = tokenizer.batch_decode(predicted_ids)[0]

print("Transcription: ", transcription)


# Text-to-Speech

def text_to_speech(text, voice_gender='female', rate=150):

    engine = pyttsx3.init()

    voices = engine.getProperty('voices')

    

    if voice_gender == 'male':

        engine.setProperty('voice', voices[0].id)

    else:

        engine.setProperty('voice', voices[1].id)

    

    engine.setProperty('rate', rate)

    engine.say(text)

    engine.runAndWait()


# Example usage

long_text = "hi what happened"

text_to_speech(long_text, voice_gender='male', rate=150)  # For male voice

text_to_speech(long_text, voice_gender='female', rate=180)  # For female voice


print("Text-to-Speech conversion completed.")



#PythonProject
#AudioTranscription
#TextToSpeech
#Wav2Vec2
#PyTorch
#Librosa
#NLP
#SpeechRecognition
#VoiceSynthesis
#AIinPython
#NaturalLanguageProcessing
#SpeechToText
#Pyttsx3
#MachineLearning
#DeepLearning
#AudioProcessing
#PythonAI
#TransformersLibrary
#PythonCoding
#PythonTutorial

Comments

Popular posts from this blog

Rectangular Microstrip Patch Antenna

Microstrip is a type of electrical transmission line which can be fabricated using printed circuit board technology, and is used to convey microwave-frequency signals. It consists of a conducting strip separated from a ground plane by a dielectric layer known as the substrate. The most commonly employed microstrip antenna is a rectangular patch which looks like a truncated  microstrip  transmission line. It is approximately of one-half wavelength long. When air is used as the dielectric substrate, the length of the rectangular microstrip antenna is approximately one-half of a free-space  wavelength . As the antenna is loaded with a dielectric as its substrate, the length of the antenna decreases as the relative  dielectric constant  of the substrate increases. The resonant length of the antenna is slightly shorter because of the extended electric "fringing fields" which increase the electrical length of the antenna slightly. An early model of the microst...

How to Set Up Guest Access on Ruckus ZoneDirector – Step-by-Step Guide

 Are you looking to configure guest access on your Ruckus wireless network? In this blog, we’ll take you through the entire process of setting up secure guest access using Ruckus ZoneDirector. Whether you're an IT admin or a network manager, this guide will help you create a BYOD guest WLAN, set up guest pass authentication, and secure your network with wireless client isolation. Step-by-Step Tutorial Includes: Logging into the Ruckus ZoneDirector controller Configuring Guest Access services for BYOD devices Creating a dedicated guest WLAN Using guest pass authentication for added security Isolating guest devices on the network for better privacy Accessing the guest network from a client device By following this tutorial, you'll be able to provide a seamless and secure experience for visitors connecting to your WiFi network. Check out our video tutorial for a detailed walkthrough! #RuckusZoneDirector #GuestAccess #WiFiSetup #BYOD #WLANConfiguration #WirelessNetwork #NetworkSecu...

Cracking Passwords Using John the Ripper: A Complete Step-by-Step Guide

Cracking Passwords Using John the Ripper: A Complete Step-by-Step Guide In today's post, we’re diving into a practical lab exercise that shows how to use John the Ripper, one of the most effective password-cracking tools in cybersecurity. Whether you're an IT professional or a cybersecurity student, mastering John the Ripper will help you understand password vulnerabilities and enhance your penetration testing skills. Lab Objective: The goal of this lab is to crack the root password on a Linux system (Support) and extract the password from a password-protected ZIP file (located on IT-Laptop). Both tasks are performed using John the Ripper. Steps to Crack the Root Password on Support: Open the Terminal on the Support system. Change directories to /usr/share/john . List the files and open password.lst to view common password guesses. Use John the Ripper to crack the root password by running john /etc/shadow . Once cracked, the password is stored in the john.pot file for future u...