The Scandit Vision AI Engine is the intelligence layer powering barcode scanning, ID verification, and shelf intelligence in Scandit Smart Data Capture.
Built on 15+ years of real-world enterprise experience, the engine is purpose-built to solve specific, measurable business problems.
Accuracy to 99.9%, on-device processing, and real-time performance on commercial mobile devices are core, non-negotiable design principles.
Models run locally on end-user devices by default, keeping sensitive data secure without relying on live customer data for training.
Imagine a grocery store associate, about to do some routine price checks. An alert pops up on their smart device. The store's bestselling product is no longer on shelf, but five crates are sitting in the backroom.
They shift priority, head out back, and scan the crowded storage area with their device. An augmented reality (AR) overlay instantly identifies not only the right product, but the box with the soonest expiry date.
Shelf restocked in minutes with the optimal product. No guesswork, no wasted time, no sales lost.All this is powered by the Scandit Vision AI Engine — the intelligence that powers Scandit Smart Data Capture.
By automating complex tasks and providing real-time insights, the Scandit Vision AI Engine empowers workers and customers to make smarter, faster, more accurate decisions. Think of it as the brain behind every scan, guiding actions and driving efficiency across an entire business.
The main advantage… is the fact that in one click you can have the image of the whole store. The fact that you can actually have the vision of the availability of products on the shelf on a daily basis is something that cannot compare to any other manual process…. Improved sales come from improved availability.
Piotr Lubiewa-Wielezynski, Sales Development Director, Carrefour
What is vision AI?
Vision AI is the branch of AI that understands visual information from the physical world. Whereas generative AI (GenAI) creates new content, vision AI analyzes what is already there. It tells you what actual products are on your actual shelves, whether an identity card is real or fake, or what information you need from a complex label printed with three barcodes and six text fields.
Vision AI is related to the much older term computer vision — people have been trying to teach computers to "see" for decades — but is now replacing it. This reflects the way in which AI has accelerated the field and unlocked use cases impossible to achieve with traditional algorithms.
Vision AI analyzes visual data and surfaces actions from insights in real-time, making it highly effective in real-world environments.
What makes the Scandit Vision AI Engine special?
Vision AI looks outwards to the physical world, not inwards to a recursive digital universe.
Retail, logistics, manufacturing, and healthcare are not social media platforms. They depend for their existence on physical objects and processes — such as making sure products are on shelf when consumers want to buy them, delivering the right goods to the right person at the right time, or giving a patient the correct medication.The Scandit Vision AI Engine is purpose-built to solve these specific, measurable real-world problems. It's been developed through more than 15 years of building solutions for the world’s largest retailers, manufacturers, couriers, logistics brands, and pharmaceutical companies.
What makes the Scandit Vision AI Engine special is as much the principles driving it as our technical expertise:
Accuracy is non-negotiable: Getting to 99.9% accuracy is key in our space. When we say that an ID is fake, that there are five product facings, or that a patient-medication combination is correct, there’s almost no tolerance for error.
Security and privacy come first: By default, our models are not trained on live customer data. Instead, customers get a final, trained model, with the ability to interpret millions of pixels in milliseconds. In most instances, the model runs locally on end-user devices, keeping sensitive data where it belongs.
Models must run efficiently on commercial mobile devices: From the outset, we’ve had to build to run on frontline workers’ devices, where speed is critical and battery life precious.
AI is used in a deliberate and targeted fashion: We focus on automating specific (and tedious) tasks that previously required substantial manual intervention. This delivers significant, quantifiable value to businesses.
Workflows are always user-centered: The Scandit Vision AI Engine is built to deliver fast, robust, and ergonomic data capture even in difficult conditions. The overarching goal is always to create a user experience that “just works”, and that neither the developer nor the user has to worry about.
$33m
One of the world's largest retailers saved $33m a year by checking top stock using Scandit MatrixScan Pick.
Here’s an overview of how the Scandit Vision AI Engine powers our three main product lines. Our barcode scanning products share a unified codebase and common machine learning models. ID scanning and ShelfView are technically distinct — but they're built by the same team, draw on the same deep expertise, and follow the same principles.
Barcode scanning
Scandit has been using machine learning (ML) to decode barcodes from the camera feed of smart devices for over 15 years.
Back when we founded Scandit, that was a hard engineering problem. Modern AI was in its infancy. The cameras on early smartphones also lacked autofocus and were simply not high enough quality.
Early on, the CIO of one of Europe’s largest grocers told us that our technology would never be good enough. We now count seven of the top ten US grocers and five out of the top ten European retailers as customers.
Get the latest smart data capture insights in the Scandit newsletter
If you don’t work in the field, it’s easy to underestimate just how hard vision AI is. Humans are highly visual creatures. Around half of your brain — the most powerful supercomputer we know of — is devoted to visual processing.
What’s easy for humans is hard for computers. Take a look at the image below. It's easy for the human eye to tell that the barcode the user wants to scan is the one on the product they're holding, not any of the barcodes visible on the shelf in the background.
Until 2024, when the Scandit SDK 7 released, there was no software in the world that could do that reliably.
Let's break it down. To successfully scan the barcode in this image, the Scandit Vision AI Engine has to:
Identify which barcode the user wants to scan, by analyzing contextual data such as movement and barcode characteristics.
Decode the identified barcode, which in this instance is tiny, printed on a curved surface, and with some glare. (Other factors we often contend with are damaged codes, low light, extreme angles, and blurry images.)
With the Scandit Vision AI Engine, edge cases like this don’t need to be hard-coded. Instead, our AI-powered barcode scanning does all of this automatically. It adapts to different environments to reliably scan codes without requiring explicit instructions from either the developer or the end user.
Scandit’s AI-driven data capture capabilities demonstrate remarkable technical expertise and show a deep focus on innovations that truly enhance the user’s scanning experience in real-world conditions
And that’s just scanning a single code. Our MatrixScan products do all of the above, but here the Scandit Vision AI Engine also scans multiple codes in parallel.
It tracks their position and adds augmented reality (AR) overlays to solve specific use cases such as counting, finding, or picking items.
Smart Label Capture, our newest barcode scanning product, goes beyond scanning multiple barcodes to capture text and understand label structure.
With one press, users can scan complex labels in their entirety: multiple barcode formats (e.g. 15-digit IMEI numbers), field positioning, and contextual relationships (e.g. "BEST BEFORE" adjacent to a date).
The result is that your application receives precisely the data it needs with a single press. Nothing more. Nothing less.
ID scanning and verification
Identity documents are messy: formats vary across jurisdictions, older designs stay in circulation, and fraudsters mimic subtle layout and encoding details. This makes accurate ID checks hard for frontline teams.
In ID scanning and verification, the Scandit Vision AI Engine combines ML, vision language models (VLM), and other technologies like optical character recognition (OCR). This transforms unstructured visual input into structured, verifiable identity data that users can trust.
Traditional ID scanners rely on rigid templates and mainly “extract” data, so they fail when documents deviate from specs or when fakes exploit inconsistencies. For example, US driver’s licenses all use PDF417 barcodes, but encode fields differently by state.
Scandit's AI-powered identity verification works differently. Instead, it mirrors how counterfeit art is detected. Not by looking at the subject itself, but by examining brush strokes, techniques, and materials to assess whether they align with an authentic work.
Similarly, the Scandit Vision AI Engine doesn’t simply capture and decode individual data values. Instead, it analyzes structural characteristics — how fields are stored, how barcodes are generated and printed, and how visual elements are laid out — to accurately capture data and detect fakes.
2 million
One of the largest US food delivery companies validates over 2 million IDs monthly using a Scandit-powered worker app
ShelfView
Our shelf analytics product, ShelfView, is a specialist solution that analyzes images of store shelves captured using smart devices, fixed-position cameras, and robots. Similarly to how LLMs are trained, we ingest vast datasets to understand retail shelves comprehensively and accurately.
It lets retailers see what is really on their shelves, not just what their inventory system says.
Scene parsing identifies trays, shelves, products, and shelf labels.
Products are identified down to the SKU level with image recognition.
OCR and barcode scanning are used on shelf labels to extract product and price information.
Together, they create a precise digital representation of the shelf (often called a realogram). Associates then receive prioritized alerts for missing or misplaced products, pricing updates, and promotional errors, so issues can be resolved before customers notice them.
What's next for the Scandit Vision AI Engine?
The future of AI lies in domain-specific models tailored to unique industry needs. Ultimately, the Scandit Vision AI Engine is moving towards a situation where you don’t scan barcodes, scan IDs, scan objects, or scan text. You just… scan.
Holistic scene analysis will ingest multiple data sources and context, then return exactly the data and insights you need for your specific scenario — whether you’re a consumer, store associate, delivery driver, or operations manager.
You won't have to switch tools. You won't have to explicitly instruct it in what you want.
It doesn't just provide inventory visibility, but real-time actions for store teams.
The evolution of vision AI makes it possible to add a comprehensive intelligence layer between business environment and user. That's something that retailers, healthcare providers, logistics companies, manufacturers, and all the other industries who create value in the physical world have never truly been able to do before.
Look up from your screen. The world's biggest problems are physical, not digital. The Scandit Vision AI Engine tackles this head on.