AI Voice-Controlled Chatbot with Google Gemini API (Python + HTML) | Hands-Free AI Assistant

Download a fully functional AI Voice-Controlled Chatbot built with Python, HTML, and Google Gemini API. This hands-free assistant supports voice commands, speech recognition, and speech synthesis with animated robot UI and Flask backend. Easy setup with full source code included.

Monday, June 9, 2025
AI Voice-Controlled Chatbot with Google Gemini API (Python + HTML) | Hands-Free AI Assistant

AI Voice-Controlled Chatbot Using Google Gemini API

This project is an intelligent, voice-enabled chatbot that leverages Google’s Gemini 1.5 Flash model combined with real-time speech recognition and speech synthesis to create a seamless, interactive AI assistant experience. It is designed to be simple, engaging, and futuristic with animated robot UI and smooth voice controls.


System Requirements

To successfully run and deploy this chatbot project, you will need the following:

1. Software Requirements

  • Python 3.8 or higher: The backend server is built using Flask, a Python web framework.

  • Flask: Lightweight web server framework for Python. Install via pip.

  • Requests library for Python: For making HTTP requests to the Gemini API.

  • dotenv: To securely load environment variables such as your API key.

  • Modern web browser: Chrome, Edge, or Firefox with support for Web Speech API (webkitSpeechRecognition and speechSynthesis).

  • Code editor or IDE (optional but recommended): VSCode, PyCharm, Sublime Text, etc.

2. Hardware Requirements

  • A PC or laptop with microphone support for voice input.

  • Internet connection for API communication.


How to Get the Requirements

Install Python

  • Download Python from python.org and install it.

  • Ensure python and pip commands are available in your terminal/command prompt.

Install Required Python Packages

Open your terminal or command prompt and run:

pip install flask requests python-dotenv

Get Google Gemini API Key

  1. Sign in or create a Google Cloud account at Google Cloud Console.

  2. Create a new project or select an existing one.

  3. Navigate to APIs & Services > Credentials.

  4. Create an API key.

  5. Enable the Generative Language API or Gemini API for your project.

  6. Copy the API key securely.


How to Use This Chatbot Project

Step 1: Download and Extract the ZIP File

  • Download the ZIP file from your website.

  • Extract it to your preferred directory.

Step 2: Configure Environment Variables

  • Open the extracted folder.

  • Create a file named .env (if it doesn’t exist).

  • Add your Google Gemini API key in this format:

GEMINI_API_KEY=your_google_api_key_here

Note: Do not share your API key publicly to protect your usage quota and billing.

Step 3: Start the Flask Server

  • Open your terminal/command prompt.

  • Navigate to the project directory containing app.py.

  • Run the following command:

python app.py
  • The Flask server will start, usually at http://127.0.0.1:5000/.

Step 4: Open the Chat Interface

  • Open a modern web browser (Chrome recommended).

  • Navigate to http://127.0.0.1:5000/.

  • You will see a clean chat interface with a robotic animated assistant in the center of the page.

Step 5: Using Voice Controls

  • Click the Start Voice button to begin voice recognition.

  • Speak naturally into your microphone; the recognized text will automatically appear in the input box.

  • The chatbot will process your message using Google Gemini AI and respond with both text and speech.

  • While the bot is speaking, you’ll see animated feedback from the robot avatar.

  • Click Stop Voice to pause voice recognition at any time.

  • Click Reset to clear the conversation and stop all AI functions permanently until you reload the page.

Step 6: Manual Text Input (Optional)

  • You can also type your message manually in the input box.

  • Press Send or hit enter to get a response.


Project Features in Detail

  • Speech Recognition: Uses the browser’s built-in webkitSpeechRecognition for capturing voice input in real-time.

  • AI Backend: Python Flask server sends your query securely to the Gemini API and returns intelligent responses.

  • Speech Synthesis: Browser's speechSynthesis reads the AI’s response aloud with natural voice.

  • Robotic Avatar: Animated robot face reacts visually during user speech and AI replies for immersive engagement.

  • Control Buttons: User-friendly start, stop, and reset buttons manage the conversation flow and voice input/output.

  • Security: API key is never exposed on the frontend; it’s loaded safely from a .env file on the backend.


Troubleshooting Tips

  • Microphone Access: Ensure your browser has permission to use the microphone.

  • Browser Compatibility: Use Chrome or Edge for best voice API support.

  • API Limits: Monitor your Google Cloud quota to avoid service interruptions.

  • Errors: If the AI does not respond, check Flask server logs for API errors or misconfigurations.


Customization & Extension Ideas

  • Add user authentication for personalized experiences.

  • Store chat history in a database.

  • Support multiple languages by adjusting recognition and synthesis language codes.

  • Deploy on cloud platforms like Heroku, AWS, or Google Cloud for public access.

  • Integrate with other AI models or voice assistants.





Summary

This AI voice-controlled chatbot project is a comprehensive, end-to-end solution that combines the power of Google Gemini AI with modern web speech technologies to provide a futuristic conversational interface. It’s perfect for developers, students, or AI enthusiasts who want to explore real-world voice AI applications with a professional and polished user experience.

Download the ZIP, set it up, and start chatting with your own voice-powered AI assistant today!


🙋‍♂️ Frequently Asked Questions (FAQ)

❓ What is this AI Voice-Controlled Chatbot?

This is a voice-enabled chatbot built with Python (Flask), HTML, JavaScript, and Google Gemini API. It allows users to interact with AI using voice commands and get real-time spoken responses with an animated robotic assistant.


❓ Do I need any programming experience to use it?

Basic knowledge of Python and how to run a web server is helpful, but step-by-step instructions are included, making it beginner-friendly.


❓ How do I get a Google Gemini API key?

You can get your API key by signing in to Google AI Studio, creating a project, enabling the Generative Language API, and generating a key.


❓ Is this project free to use?

Yes, the code is completely free to download and use for learning or personal projects. Just be mindful of any usage limits imposed by Google Gemini API.


❓ Can I deploy this chatbot online?

Yes! You can deploy it on platforms like Heroku, Vercel (frontend), PythonAnywhere, or Google Cloud for online access.


❓ What browsers support the voice features?

The chatbot uses the Web Speech API, best supported in Google Chrome and Microsoft Edge. It may not work in Firefox or Safari.


❓ Can I customize the robot animation or voice?

Absolutely! You can modify the CSS animations, add custom avatars, or change the speech synthesis voice through JavaScript.


❓ What happens when I click the “Reset” button?

The Reset button clears all messages and completely stops the chatbot’s listening and speaking processes until the page is reloaded.


❓ Is my voice data stored anywhere?

No. This chatbot processes voice locally in the browser and sends only the transcribed text to your backend. No voice/audio is recorded or stored.


❓ Can I use this chatbot in other languages?

Yes, you can change the language for voice input and output by modifying the language code in the JavaScript (e.g., recognition.lang = 'hi-IN' for Hindi).





Leave a Comment: 👇