These Bots Have Eyes: Why the Evolution of Visual Chatbots Is a Boon for Entrepreneurs

We humans are emotional, visual creatures who communicate with body language and subtle cues. When will our bots be like this, too?

By Ronen Rozenberg | Sep 24, 2018

Add Entrepreneur

Wutthichai Luemuang/EyeEm | Getty Images

Opinions expressed by Entrepreneur contributors are their own.

The world may be enamored with bots at the moment, but they’ve actually been around for quite some time. The first bots were used in the finance industry more than a decade ago, to automatically buy and sell equities based on key market indicators. This technology was a novel concept at the time, but it’s now ubiquitous in the industry, with the financial robo-advice market projected to grow to $7 trillion by 2025, according to CNBC.

Today’s bots have evolved to be much more capable than their predecessors. Conversational AI platforms, known as chatbots, automate and scale one-on-one conversations — with massive use cases that extend well beyond the finance industry, into the sales, marketing and customer support domains.

What’s more, chatbots continue to evolve: Just a few years ago, the notion that a bot could answer a text message or suggest a product for purchase was revolutionary. But these capabilities are now commonplace, with chatbots a near-standard help feature on websites and other online platforms.

The next stage in bot technology should have entrepreneurs salivating.

So, here’s an interesting question: What if chatbots had eyes?

Businesses utilizing chatbot technology today have likely done so for two main reasons: to enhance the customer experience and save money. Juniper Research projects that bots will cut business expenses by as much as $8 billion by 2022; without a doubt, this technology will have a huge impact for both SMBs and enterprises.

Yet bots still come with a multitude of problems for entrepreneurs, especially as pertains to customer experience. The hard truth is that chatbots may fail to deliver user experiences as seamless, efficient, and pleasant as hoped. And often the reason why is simple: Chatbots cannot see.

Some explanation may be needed here: When a customer interacts with a chatbot, the success of the communication is highly dependent on the customer’s ability to accurately describe — and type — the issue at hand. In response, however, the chatbot’s success at interpreting the customer’s phrases, nuances and complex reality is limited.

This carries over into the chatbot’s ability to help the customer solve the problem. The bot’s responses are even further limited by its programmed pool of words and texts.

According to a PointSource survey, 59 percent of customers surveyed said bots weren’t getting the job done because, obviously, customers are more than text. These are emotional, visual creatures that communicate with body language and subtle cues. Humans use their eyes and brains to see and visually sense the world around them. That’s why we’ve seen a huge spike in visual search engines, video tutorials and visually based customer assistance.

For business owners, then, the difference between visually walking a customer through the steps required for resolution and merely typing in information about mechanical actions is immense: Visual engagement reduces frustration and empowers the customer rather than escalates dissatisfaction.

The good news: Early-stage visual bots have arrived.

Computer vision AI is already being utilized in a wide range of applications. It recognizes faces and smiles in cameras; it helps self-driving cars read traffic signs and avoid pedestrians; it allows factory robots to monitor problems on the production line. In customer engagement, it will help the visual chatbot see the problem, as a virtual assistant. The implications of this for business owners are immense.

The ecommerce industry, and the fashion industry in particular, have been among the early adopters of visual chatbots. Levi’s AI-powered virtual stylist and Amazon Look can advise the shopper about products or styles most suited to them. If brands can use computer vision to “see” and understand their customers on an individual level, they can truly up their efforts at personalized sales, marketing and service. These are exciting developments, but there are many more use cases along the customer journey that still remain untapped.

The path to further evolution

For mass adoption of visual chatbots, vendors and enterprises are required to adopt the core technologies that support its development — computer vision AI and augmented reality (AR). This evolution will encompass a number of phases:

Phase one: Text to image. At the early stage, the chatbot receives text-based inputs from the customer, interprets the input and retrieves a relevant visual from a knowledge base or a search engine. This can be a reply for a specific request, such, as “Please show me the room in the hotel I’ve reserved,” or a general request, such as, “How do I program my new coffee machine?”

Phase two: Image to image/text. At this more advanced phase, the bots apply computer vision AI to process the input received, and reply either with words or visuals. For example: Museum-goers snap a photo of an item of interest and a museum chatbot recognizes the item and shares more details about the artist and the item’s background.

Phase 3: Image to smart image. At this stage, the bot applies computer vision upon processing the input as well as processing the reply. For example, the customer contacts his insurance company following a car accident. The bot asks him to upload images of the vehicle, identifies the damaged areas, detects the extent of the damage and estimates the potential cost of repairs. This is information that speeds up the claim cycle and saves money for the business. Insurtech companies such as CCC have been focused on developing these capabilities, resulting in the maturing of “virtual adjuster’ bots.

Phase 4: Interactive visual conversation. The most advanced stage in the evolution occurs when the chatbot can switch to real-time video mode, enabling the customer to show the issue and receive interactive AR guidance. This advanced bot can perform complicated tasks while guiding customers and can also provide feedback and correct them in an interactive manner. For example, when unboxing a new router, a “virtual technician” recognizes the cables and inputs, and guides the customer using AR through the installation process.

Next Step: Teaching the visual bots

Advanced visual bots harness deep learning technologies to recognize and analyze visual images to the highest degree of accuracy. “Deep learning” requires the creation of a massive data set in order to effectively train the model.

In order for the visual bot to correctly identify vehicular damage, as in the above example, the bot must have had the opportunity to process tens of thousands of images of each damage type. To identify a coffee machine’s specific model, the bot needs to have processed a massive amount of images of each specific model — from various lighting scenarios, angles and positions.

Building these massive data sets is extremely time-consuming and labor-intensive — a task that is simply out of scope for many enterprises and vendors. These data sets are time-intensive and costly. But the work absolutely will be done, and as with all technology, its price will drop over time and become affordable for all sorts of businesses.

A smart investment for business owners

Chatbots are quickly becoming an integral part of the user experience, and as long as humans are involved, it is clear that the future of chatbots is going to be visual.

The transformation to visual bots will be an evolutionary process, where the bots gradually move from traditional text-based understanding to image-processing, and eventually to full visual interactions. For entrepreneurs and business owners, the potential upside is far-reaching: promising; improved customer experience, better loyalty and retention; lower costs; and the generation of higher revenue, thanks to the personalized sales and service that “seeing” chatbots will bring.