Can the ESP32 run AI locally?

The ESP32 can run tiny ML models (keyword spotting, simple classification) but cannot host large language models. In this project, the ESP32 acts as a Wi-Fi client that calls cloud AI services such as the OpenAI API.

How to keep API keys secure on the ESP32?

Avoid hardcoding keys directly in firmware. Use NVS (non-volatile storage), environment-based build flags, or proxy requests through a secure backend. For stronger protection, store secrets server-side and expose only a limited project token to the ESP32.

How to Make ESP32 Talk Using Text to Speech (TTS)?

Use a cloud TTS service (e.g., Google, Azure, Amazon) for natural voices. The ESP32 sends the chatbot text and receives audio (often base64 MP3). Decode and play via I2S DAC/amp. Offline TTS libraries exist but sound robotic and are memory-constrained.

SIQMA

Create a DIY Smart Chatbot Lamp with ESP32 ChatGPT Voice Assistant

Imagine asking your lamp, “How’s the weather tomorrow?” — and instead of just glowing, it talks back to you with an AI-generated response. This is no longer science fiction. Thanks to the ESP32 microcontroller, the OpenAI API, and some clever text-to-speech (TTS) integration, you can build your own AI-powered DIY Smart Chatbot Lamp at home.
In this article, we’ll walk through the concept, hardware, software, and deployment of a ChatGPT on ESP32 project that merges IoT with artificial intelligence. You’ll learn how to turn a simple desk lamp into an interactive companion that listens to your queries, sends them to the ChatGPT API, and responds with synthesized speech.

What the ESP32 Can Do with AI (On-Device)

The ESP32 has:

Dual-core Tensilica Xtensa LX6 (~240 MHz)
~520 KB RAM (plus external PSRAM in some boards)
Wi-Fi + Bluetooth built-in

This means it can handle small AI models or lightweight inference, making it good enough for building an ESP32 ChatGPT voice assistant that runs efficiently on embedded hardware. such as:

Keyword spotting / voice commands (e.g., “ON” / “OFF”)
Simple image recognition with very low resolution
Sensor data classification (motion detection, anomaly detection, etc.)
Edge AI using TensorFlow Lite for Microcontrollers

Frameworks:

TensorFlow Lite Micro → optimized for microcontrollers.
Edge Impulse → lets you train a model in the cloud, then deploy it to ESP32 easily.

ESP32 can’t run massive neural networks locally or run large AI models (like GPT, big CNNs, or LLMs), but it can act as a bridge between sensors, actuators, and the cloud AI services.

In this project, the ESP32 will:

Record your voice through a microphone.
Send the transcribed text to OpenAI API (GPT model).
Receive the chatbot’s response.
Forward the response to a TTS engine.
Play the audio through a speaker connected to the lamp.

Components Required for DIY Smart Chatbot Lamp

🛒 Here’s the hardware you’ll need:

Microcontroller: ESP32 development board (e.g., ESP32-WROOM-32 DevKit C). This is essential for its Wi-Fi/Bluetooth capability and processing power.
Lamp Base: Any aesthetically pleasing enclosure you like (a 3D-printed design, a modified existing lamp, or a simple project box).
Lights: Addressable RGB LED strip (e.g., WS2812B or SK6812) for multi-color functionality.
Microphone: INMP441 I2S digital microphone module. This is strongly recommended over analog mics for its clarity and noise resistance.
Speaker: A small 3W-5W amplifier board (e.g., PAM8403) paired with a 4Ω, 3W speaker. The ESP32 cannot drive a speaker directly.
Power:
- A 5V DC power supply (at least 2A) to power the ESP32, LEDs, and amplifier.
- A USB cable for programming the ESP32.
Connections: Jumper wires, a breadboard (for prototyping), and possibly a soldering iron for final assembly.
Resistor & Capacitor: A 330-470Ω resistor and a 100µF capacitor for the LED strip to prevent voltage spikes.

ESP32 chatbot Architecture

Think of this project as a pipeline:

🎤 Voice Input → 🎛️ ESP32 processes audio (I2S) → ☁️ OpenAI API for chatbot response → 🗣️ ESP32 sends response to TTS service → 🔊 Audio output via speaker → 💡 Lamp control (on/off/dim)

This modular design ensures flexibility: you can swap the AI backend (OpenAI, Hugging Face, or custom LLM) or the TTS provider (Google Cloud, Amazon Polly, ESP32 local TTS libraries).

Step 1: Setting Up the ESP32 Development Environment

You’ll need:

Arduino IDE or PlatformIO
ESP32 board definitions installed
Libraries:
- WiFi.h (for connectivity)
- HTTPClient.h (for API requests)
- ArduinoJson.h (to parse JSON responses from OpenAI)
- I2S.h (for microphone and audio output)

Tip: If you plan to use FreeRTOS tasks, the ESP32 supports multitasking. You can assign one core to network handling and the other to audio processing to avoid bottlenecks. Keep in mind that using the ChatGPT API requires an account, an API key, and billing setup, as it is not free.

Key Pin Connections:

LED Strip: Data In pin → GPIO 16 (with a 330Ω resistor in series). 5V and GND to power supply.
I2S Microphone (INMP441):
- SCK → GPIO14
- WS → GPIO15
- SD → GPIO32
- VCC → 3.3V
- GND → GND
Audio Amplifier (PAM8403):
- Audio In (L/R) → GPIO25 (DAC1) or GPIO26 (DAC2) for analog output, or a digital I2S audio board would use I2S pins.
- VCC → 5V
- GND → GND
- Speaker to amplifier outputs.

Note: Using the ESP32’s internal DAC for audio output is simpler but lower quality. For better quality, use an external I2S DAC board (like a MAX98357) which would share the I2S bus with the microphone.

Step 2: Capturing Voice Input

The ESP32 connects to an I2S MEMS microphone. This sets up a 16 kHz audio stream suitable for speech capture. The raw audio can either be sent to a speech-to-text (STT) API (like OpenAI Whisper API, Google Speech-to-Text, or Vosk) or processed offline if you want to embed a lightweight STT model, making it possible to build an ESP32 ChatGPT voice assistant that can understand spoken commands and respond in real time.

Sample Code for speech capturing:

#include <driver/i2s.h>

#define I2S_WS 25
#define I2S_SD 33
#define I2S_SCK 32

void i2sInit() {
const i2s_config_t i2s_config = {
.mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_RX),
.sample_rate = 16000,
.bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
.channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
.communication_format = I2S_COMM_FORMAT_I2S_MSB,
.dma_buf_count = 8,
.dma_buf_len = 64
};
const i2s_pin_config_t pin_config = {
.bck_io_num = I2S_SCK,
.ws_io_num = I2S_WS,
.data_out_num = -1,
.data_in_num = I2S_SD
};
i2s_driver_install(I2S_NUM_0, &i2s_config, 0, NULL);
i2s_set_pin(I2S_NUM_0, &pin_config);
}

Step 3: Sending Queries to OpenAI API

Once you have text from the speech recognition service, you send it to the OpenAI API. This function sends the user’s message and retrieves the AI’s response, enabling ChatGPT on ESP32 for an interactive voice assistant experience.

Sample Code for Send & receive messages:

#include <WiFi.h>
#include <HTTPClient.h>
#include <ArduinoJson.h>

String openAIQuery(String userInput) {
HTTPClient http;
http.begin("https://api.openai.com/v1/chat/completions");
http.addHeader("Content-Type", "application/json");
http.addHeader("Authorization", "Bearer YOUR_API_KEY");

String payload = "{ \"model\": \"gpt-4.0\", \"messages\": [{\"role\": \"user\", \"content\": \"" + userInput + "\"}] }";
int httpResponseCode = http.POST(payload);

String response = http.getString();
http.end();

DynamicJsonDocument doc(4096);
deserializeJson(doc, response);
String reply = doc["choices"][0]["message"]["content"].as<String>();
return reply;
}

Step 4: Converting Text to Speech (TTS)

You now need to turn text into audio, which is the second-to-last step in creating an ESP32 Voice Assistant with ChatGPT, enabling the device to reply with natural speech. Options include:

Cloud TTS APIs (Google Cloud, Amazon Polly, Microsoft Azure) → higher quality, but needs internet.
ESP32 local TTS libraries (like ESP8266SAM or Talkie) → robotic but works offline.

Example Code Of Using Google Cloud TTS:

String textToSpeech(String text) {
HTTPClient http;
http.begin("https://texttospeech.googleapis.com/v1/text:synthesize?key=YOUR_API_KEY");
http.addHeader("Content-Type", "application/json");

String jsonRequest = "{ \"input\": {\"text\": \"" + text + "\"}, \"voice\": {\"languageCode\": \"en-US\", \"ssmlGender\": \"FEMALE\"}, \"audioConfig\": {\"audioEncoding\": \"MP3\"} }";
http.POST(jsonRequest);

String response = http.getString();
http.end();

DynamicJsonDocument doc(4096);
deserializeJson(doc, response);
String audioContent = doc["audioContent"].as<String>();
return audioContent; // Base64-encoded MP3 data
}

Do not forget that the ESP32 will require a base64 decoder to process incoming data, as well as the capability to play MP3 audio through the MAX98357A I2S DAC, ensuring proper sound output for your project.

ESP32 voice assistant Voice Recognition Technology

We use a two-step process:

On-Device Wake Word Detection (Optional but recommended): A simple model (e.g., using ESP32-WHO or TinyML) can listen continuously for a wake word like “Hey Lamp” to start recording. This saves power and avoids constant API calls.
Cloud-Based Speech-to-Text (STT): We use the OpenAI Whisper API for high-accuracy transcription of the recorded audio command.

Step 5: Controlling the Lamp

You can add intelligence:

If the AI response contains “turn on light,” activate the relay.
If the AI says “dim the light,” adjust PWM on a MOSFET driver.

Example Code of activating the relay:

if (reply.indexOf("turn on") >= 0) {
digitalWrite(RELAY_PIN, HIGH);
} else if (reply.indexOf("turn off") >= 0) {
digitalWrite(RELAY_PIN, LOW);
}

Advanced Technical Notes

Concurrency: Use FreeRTOS tasks — one for audio input, one for Wi-Fi requests, one for lamp control.
Memory Optimization: Responses from OpenAI can be long, so allocate sufficient heap memory or use streaming responses.
Security: Store API keys securely (ESP32 has an NVS flash storage or use an external secure element like ATECC608A).
Latency: Expect ~1–2 seconds of delay (network + API response + TTS). To optimize, pre-buffer AI responses while playing audio.

Final Demo Flow of ESP32 Voice Assistant with ChatGPT

User asks: “Lamp, tell me a joke.”
ESP32 records audio → sends to STT API → gets text.
Text sent to OpenAI → receives witty response.
ESP32 forwards response to TTS API → gets audio.
Audio played on speaker → lamp responds in real time.
Bonus: AI can also toggle brightness or color of the lamp as part of its response.

ESP32 voice assistant Future Enhancements

Local LLM: Use a tiny, quantized model (e.g., Llama 3.1 8B) running on a more powerful SBC (like a Raspberry Pi 5) as the brain, making the device fully offline and private.
Home Automation Hub: Integrate with platforms like Home Assistant or MQTT to control other smart devices in your home (“Hey Lamp, turn off the bedroom light”).
Visual Feedback: Add a small OLED screen to display answers, weather, or time.
Wake Word Training: Train a custom wake word model using Edge Impulse or similar platforms.
User Profiles: Allow voice recognition for different users and personalize responses and preferences.
Multi-Modal: Add a camera module to allow the lamp to describe its surroundings or identify objects.

Conclusion

This Chatbot Lamp Project demonstrates how ChatGPT on ESP32 can bridge the physical and digital worlds. By combining cloud AI APIs with local IoT hardware, you create a device that feels futuristic but is built with accessible components.

It teaches IoT fundamentals (sensors, actuators, networking).
It introduces AI integration (Gpt API, TTS, STT).
It challenges you to think about user experience and humanized interaction.

Whether you’re an electronics hobbyist, AI enthusiast, or blogger, this project blends all three worlds into one glowing, talking lamp 🙂

FAQ

Can the ESP32 run AI locally?

The ESP32 can run tiny ML models (keyword spotting, simple classification) but cannot host large language models. In our project, the ESP32 acts as a Wi-Fi client that calls cloud AI services such as the OpenAI API.
How to keep API keys secure on the ESP32?

Avoid hardcoding keys directly in firmware. Use NVS (non-volatile storage), environment-based build flags, or proxy requests through a secure backend. For stronger protection, store secrets server-side and expose only a limited project token to the ESP32.
How to Make ESP32 Talk Using Text to Speech (TTS)?

Use a cloud TTS service (e.g., Google, Azure, Amazon) for natural voices. The ESP32 sends the chatbot text and receives audio (often base64 MP3). Decode and play via I2S DAC/amp. Offline TTS libraries exist but sound robotic and are memory-constrained.