How AI is Changing The Face of Content Consumption in The Future
From E-Books to Voice Searches AI is going to change the face of the modern world
While many organizations and sectors are deploying AI solutions for the first time, one industry has been gobbling up algorithms and artificial intelligence solutions - Media. Big players in media have been successfully using personalization and intelligent algorithms for quite some time. Highlighting the media industry’s large appetite for AI, Ted Sarandos, Head of Content at Netflix, said, “Our shows intentionally don’t have a unifying brand. Our brand is content personalization.”
But the applications of AI in media are not just limited to content personalization. Media teams have to deal with manual processes for everything - from tagging the media to creating multilingual subtitles. But recent advances in AI are automating many of these tasks. Developments in computer vision, speech to text and natural language processing algorithms are changing the face of media creation, distribution and most importantly, media consumption.
Voice is the most natural way for people to communicate. That’s why the use of voice search in media is skyrocketing. Even in India, the voice is becoming a dominant way to interact with computers - Google reported that Hindi Voice Queries are growing 400% YoY. Media companies are scrambling to meet this demand - YouTube has a voice search and recently, Gaana also introduced a voice assistant in their mobile apps.
Voice queries are very different than traditional search. People ask long questions like “Show me when Kohli scored a century” and expect to see the exact time when Kohli scored. Traditional media is not really equipped to do that - the match recording will be a huge, 4 hour plus monolithic file. The part where Kohli actually scores a century is just a 2-minute clip in this long file. This means a clip will have to be created manually, which cannot scale for all the different queries people will ask.
AI can change all that. AI can analyze the video frame by frame and the commentary word for word. Using face detection, AI can identify Kohli and the NLP algorithms can detect the time when commentators are announcing the century. This defines a short section of the video when it is most likely that Kohli has scored a century. This is a dynamically created section, without any manual editing, and can be played when the user makes the search query.
Once the media can understand voice interactions, it can start to interact with you. This creates exciting new avenues for storytelling. Look at a company like Novel Effect, which has created a new experience of reading storybooks with children. As you read the book aloud, the smart voice recognition stays in sync and the AI automatically plays custom-created music like thunder, violins or songs at the right time to keep the child engaged in the story.
The media is now talking with you, understanding you and then creating something completely new. Is this still a book or something more? Is it a story or an experience? AI is unshackling media from the traditional silos of audio, video and text and fusing them together into something exciting.
Content personalization is all about the recommendation. While it suggests a video or audio that you will like, the media itself does not change for you. In the future, the media will change itself and adapt for you. With AI, production teams can give more context to media - the location, the user profile, metadata of the file and based on that the media can dynamically change.
Consider the simple example - There is a movie that has spoken profanity in it. Traditionally, editors will bleep out the profanity manually to make it suitable for younger audiences. Deep learning algorithms, in combination with the speech to text algorithms, can detect profanity and the time where it was spoken in the media and automatically bleep it with very high accuracy. This type of media accepts user inputs (the app can pop the question “What is your birth date?” to the user) and based on the user inputs, the profanity is dynamically bleeped if the user is below 18, else it is not. The key word here is dynamical - it happens on the fly, without any human intervention.
The Rise of Intelligent Content
Traditional media infrastructure cannot support all these use cases and that is why making content intelligent is a top priority of all media houses. Instead of storing media as unintelligent, monolithic files, companies can have AI take a pass at the files for intelligent labelling, transcribing and classifying. This makes the media itself intelligent and ready for the future of voice search and other exciting, new experiences.
Anup Gosavi is a serial entrepreneur and co-founder of Spext, a Media AI company based out of San Francisco and Bangalore. He has an MBA from Babson College in Boston and products he designed have been used by hundreds of thousands of people and featured multiple times under Best Design category on the App Store.