Close Menu
SkytikSkytik

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    SkytikSkytik
    • Home
    • AI Tools
    • Online Tools
    • Tech News
    • Guides
    • Reviews
    • SEO & Marketing
    • Social Media Tools
    SkytikSkytik
    Home»AI Tools»How ElevenLabs Voice AI Is Replacing Screens in Warehouse and Manufacturing Operations
    AI Tools

    How ElevenLabs Voice AI Is Replacing Screens in Warehouse and Manufacturing Operations

    AwaisBy AwaisMarch 27, 2026No Comments10 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    Warehouse operator using voice picking to prepare orders.
    Share
    Facebook Twitter LinkedIn Pinterest Email

    A picking operation is the process of collecting items from storage locations to fulfil customer orders.

    It is one of the most labour-intensive activities in logistics, accounting for up to 55% of total warehouse operating costs.

    Example of warehouse layout where operators need to pick in multiple locations – (Image by Samir Saci)

    For each order, an operator receives a list of items to collect from their storage locations.

    They walk to each location, identify the product, pick the right quantity, and confirm the operation before moving to the next line.

    In most warehouses, operators rely on RF scanners or handheld tablets to receive instructions and confirm each pick.

    • What happens when operators need both hands for handling?
    • How to onboard operators who don’t read the local language?

    Voice picking solves this by replacing the screen with audio instructions: the system tells the operator where to go and what to pick, and the operator confirms verbally.

    Illustration of an operator using voice picking – (Image by Samir Saci)

    When I was designing supply chain solutions in logistics companies, vocalisation was the default choice, especially for price-sensitive projects.

    Based on my experience, with vocalization, operators’ productivity can reach 250 boxes/hour for retail and FMCG operations.

    The concept is not new. Hardware providers and software editors have offered voice-picking solutions since the early 2000s.

    But these systems come with significant constraints:

    • Proprietary hardware at $2,000 to $5,000 per headset
    • Vendor-locked software with limited customisation
    • Long deployment cycles of 3 to 6 months per site
    • Rigid language support that requires retraining for each new language

    For a 50-FTE warehouse, the total investment reaches $150K to $300K, excluding training costs.

    It is too expensive for my customers.

    What if you could achieve similar results using a smartphone, a custom-made web application, and modern AI voice technology?

    In this article, I will show how I built a minimalist voice-picking module that integrates with Warehouse Management Systems, using ElevenLabs for text-to-speech and speech recognition.

    Example of screens of this app designed to be used on a smartphone with a vocal interface – (Image by Samir Saci)

    This web application has been deployed in the distribution centre of a small supermarket chain with great results (the customer is happy!).

    The objective is not to design solutions that compete with market leaders, but rather to offer an alternative to logistics and manufacturing operations that lack the capacity to invest in expensive equipment and want customised solutions.

    Problem Statement

    Before we get into voice-picking powered by ElevenLabs, let me introduce the logistic operations this AI-powered web application will support.

    Layout of the distribution centre – (Image by Samir Saci)

    This is the central distribution centre of a small supermarket chain that delivers to 50 stores in Central Europe.

    Layout of the warehouse with 10 aisles and 12 pallet positions displayed on the app – (Image by Samir Saci)

    The facility is organised in a grid layout with aisles (A through L) and positions along each aisle:

    • Each location stores a specific item (called SKU) with a known quantity in boxes.
    • Operators need to know where to go and what to expect when they arrive.

    What is the objective? Boost the operators productivity!

    They were not happy about the order allocation and walking paths provided by their old system.

    Solutions used to optimise picking operations for this warehouse – (Image by Samir Saci)

    They first asked to reduce operators’ walking distance and boost the number of boxes picked per hour using the solutions presented in this article.

    The solution was a web application connected to the Warehouse Management System (WMS) database that guides the operator through the warehouse.

    Operators can check their picking list but also detailed information per location – (Image by Samir Saci)

    This visual layout provides a real-time view of what we have in the system, with a better routing solution.

    Our objective is to go from a productivity of 75 boxes/hour to 200 boxes/hour with:

    • A better order allocation of orders with spatial clustering and pathfinding to minimise the walking distance per box picked
    • Voice-picking to guide operators in a flawless manner

    How the Picking Flow Works

    Before jumping into the vocalisation of the tool, let me introuce the process of order picking.

    Three stores sent orders to the warehouse:

    • Store 1 ordered 3 boxes of Organic Green Tea 500g that are located in Location A1
    • Store 2 ordered 2 boxes of Earl Grey Tea 250g that are located in Location A3
    • Store 3 ordered 5 boxes of Arabica Coffee Beans 1kg that are located in Location B2

    A picking batch is a group of store orders consolidated into a single work assignment.

    The operator will prepare the three orders in a single batch – (Image by Samir Saci)

    The system generates a batch with multiple order lines with instructions:

    • Where to go (the storage location)
    • What to pick (the SKU reference)
    • How many boxes to collect
    Picking list (left), layout (middle), details of location (right) – (Image by Samir Saci)

    The operator just has to process each line sequentially.

    Once they confirm a pick, the system advances to the next instruction.

    This sequential flow is critical because it determines the walking path through the warehouse using the optimisation algorithms.

    Example of the original pathfinding solution (bottom) and the optimised (top)

    As this is a custom application, we could implement this optimisation without relying on an external editor.

    Why building a custom solution? Because it’s cheaper and easier to implement.

    Initially, the customer planned to purchase a commercial solution and wanted me to integrate the pathfinding solution.

    After investigation, we discovered that it would have been more expensive to integrate the app into the vendor solution than to build something from scratch.

    What is the process without the AI-based voice feature?

    Manual Mode: The Screen-Based Baseline

    In manual mode, the operator reads each instruction on screen and confirms by tapping a button.

    Two actions are available at each step:

    • Confirm Pick: operator collected the right quantity
    • Report Issue: the location is empty, the quantity doesn’t match, or the product is damaged
    Our operator has to press the button to confirm the picking or report an issue – (Image by Samir Saci)

    I built the manual mode as a reliable fallback in case we have issues with Elevenlabs.

    But it keeps the operator’s eyes and one hand tied to the device at every step.

    We need to add vocal commands!

    Voice Mode: Hands-Free with ElevenLabs

    Now that you know why we want the voice mode to replace screen interaction, let me explain how I added two AI-powered components.

    Technical architecture of this application – (Image by Samir Saci)

    Text-to-Speech: ElevenLabs Reads the Instructions

    When the operator starts a picking session in voice mode, each instruction is converted to speech using the ElevenLabs API.

    Instead of reading “Location A-03-2, pick 4 boxes of SKU-1042” on a screen, the operator hears a natural voice say:

    “Location Alpha Three Two. Pick four boxes.”

    ElevenLabs provides several advantages over basic browser-based TTS:

    • Natural intonation that is easy to understand in a noisy warehouse
    • 29+ languages available out of the box, with no retraining
    • Consistent voice quality across all instructions
    • Sub-second generation for short sentences like pick instructions

    But what about speech recognition?

    Speech-to-Text: The Operator Confirms Verbally

    After hearing the instruction, the operator walks to the location, picks the items, and needs to confirm.

    Here, I made a deliberate design choice relying on speech recognition and the reasoning capabilities of ElevenLabs.

    Using a single endpoint, we capture the response and match it against expected commands:

    • “Confirm” or “Done” to validate the pick
    • “Problem” or “Issue” to flag a discrepancy
    • “Repeat” to hear the instruction again

    The agentic part translates the operator’s feedback and tries to match it to the expected interactions (CONFIRM, ISSUE, or REPEAT).

    The complete process from left to right: Step 1 -> Step 2 -> Step 3 – (Image by Samir Saci)

    For a multilingual warehouse, this is a significant benefit:

    • A Czech operator and a Filipino operator can both receive instructions in their native language from the same system, without any hardware change.
    • I don’t have to consider all the languages possible in the design of the solution

    Why using ElevenLabs?

    For another feature, the inventory cycle count tool presented in this video, I have used n8n with AI agent nodes to perform the same task.

    n8n workflow for the voice-powered inventory cycle count tools – (Image by Samir Saci)

    This was working quite well, but it required a more complex setup

    • Two AI nodes: one for the audio transcription using OpenAI models, and one AI agent to format the output of the transcription
    • The system prompts were assuming that the operator was speaking English.

    I have replaced that with a single ElevenLabs endpoint with multi-lingual capabilities.

    Putting both components together, a single pick cycle looks like this:

    The Complete Voice Picking Cycle – (Image by Samir Saci)
    1. The app calls ElevenLabs to generate the audio instruction
    2. The operator hears: “Location Alpha Three Two. Pick four boxes.”
    3. The operator walks to the location (hands free, eyes free)
    4. The operator picks the items and says, “Confirm”
    5. The speech recognition endpoint processes the confirmation and moves to the next picking location

    The entire interaction takes a few seconds of system time.

    What about the costs?

    This is where the comparison with traditional systems becomes striking.

    Comparative study – (Image by Samir Saci)

    For this mid-size warehouse with 50 FTEs, they estimated that the traditional approach costs roughly $60K to $150K in the first year.

    The AI-powered approach costs a few API calls.

    The trade-off is clear: traditional systems offer proven reliability and offline capability for high-volume operations.

    In case of failures, we have the manual solution as a rollback.

    This AI-powered approach offers accessibility and speed for organisations that cannot justify a six-figure investment.

    What Does That Mean for Operations Managers and Decision Makers?

    Voice picking is no longer a technology reserved for the largest 3PLs and retailers with large budgets.

    If your warehouse has WiFi and your operators have smartphones, you can prototype a voice-guided picking system in days.

    It is easy to test it on a real batch to measure the impact before committing any significant budget for productisation.

    Three scenarios where this approach makes particular sense:

    • Multilingual facilities where operators struggle with screen-based instructions in a language that is not their own
    • Multi-site operations where deploying proprietary hardware to every small warehouse is not economically viable
    • High-turnover environments where training time on complex scanning systems directly impacts productivity

    What about other processes?

    Good news, the same architecture extends beyond picking.

    Voice-guided workflows can support any process where an operator needs instructions while keeping their hands free.

    You can find a live demo of an inventory cycle counting tool here:

    How to start this journey?

    As you could easily guess, the front end of these applications has been vibecoded using Lovable and Claude Code.

    For the backend, if you have limited coding capabilities, I would suggest starting with n8n.

    Example of n8n workflows – (Image by Samir Saci)

    n8n is a low-code automation platform that lets you connect APIs and AI models using visual workflows.

    The initial version of this solution has been built with this tool:

    1. I started with a backend connected to a Telegram Bot
    2. Users were playing with the tool using this interface
    3. After validation, we moved that to a web application

    This is the easiest way to start, even with limited coding skills.

    I share a step-by-step tutorial with free templates to start automating from day 1 in this video:

    Let me know what you plan to build using all these nice tools!

    About Me

    Let’s connect on LinkedIn and Twitter. I am a Supply Chain Engineer who is using data analytics to improve logistics operations and reduce costs.

    If you’re looking for tailored consulting solutions to optimise your supply chain and meet sustainability goals, please contact me.

    ElevenLabs manufacturing Operations Replacing screens Voice Warehouse
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Awais
    • Website

    Related Posts

    [2510.14989] Constrained Diffusion for Protein Design with Hard Structural Constraints

    March 27, 2026

    Why Open-Source Vision LLMs Struggle with Hierarchical Visual Recognition

    March 27, 2026

    a Fully Interpretable Relational Way

    March 27, 2026

    From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

    March 27, 2026

    What the Bits-over-Random Metric Changed in How I Think About RAG and Agents

    March 26, 2026

    A Self-Adapting, Tool-Enabled, Extensible Architecture for Multi-Protocol AI Agent Systems

    March 26, 2026
    Leave A Reply Cancel Reply

    Top Posts

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 20250 Views

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 20250 Views

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 20250 Views
    Don't Miss

    ChatGPT hits $100 million in ad revenue and is opening self-serve access in April

    March 27, 2026

    Just six weeks after launching its ad pilot, OpenAI has hit a significant milestone —…

    Why Google’s New “Google-Agent” Is The Biggest Mindset Shift In SEO History

    March 27, 2026

    [2510.14989] Constrained Diffusion for Protein Design with Hard Structural Constraints

    March 27, 2026

    How ElevenLabs Voice AI Is Replacing Screens in Warehouse and Manufacturing Operations

    March 27, 2026
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Automating a YouTube channel with Cursor

    March 27, 2026

    Google-Agent user agent identifies AI agent traffic in server logs

    March 27, 2026
    Most Popular

    13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

    November 18, 20257 Views

    How to watch the 2026 GRAMMY Awards online from anywhere

    February 1, 20263 Views

    Corporate Reputation Management Strategies | Sprout Social

    November 19, 20252 Views
    Our Picks

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Disclaimer

    © 2025 skytik.cc. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.