Ivan Crewkov CEO & Co-Founder of Buddy AI – Interview Series – Unite.AI

16 minutes, 21 seconds Read

Ivan Crewkov is the CEO & Co-Founder of Buddy AI, the world’s first conversational AI tutor for kids, on a mission to ensure all students are able to afford 1:1 English tutoring. After moving to the US from Siberia, Ivan witnessed his preschool-aged daughter struggle to learn English. This inspired him to build Buddy, a fictional character that kids can actually converse with through the power of generative AI.


World’s Leading High-rise Marketplace

Since its launch in 2020, the Buddy app has won several awards and topped the charts in the App Store’s Kids and Education category with over 36M downloads worldwide.

In 2014, you launched Cubic.ai, one of the first smart speakers and voice-assistant apps for smart homes. What were some of your key takeaways from this experience?

I’m not sure I can take the credit for launching Cubic.ai. I joined the company a year after its foundation and received my co-founder title for my contribution.

Here are the key takeaways:

  • Hardware is hard, but someone has to do it anyway. Securing venture funding for hardware startups is extremely hard. The only thing that makes things a bit easier is crowdfunding.
  • The space of Voice-first products is vast and diverse. What applies to smart homes does not apply to early learning, from technologies to UX design.

Could you share the genesis story of Buddy and how it originated from your family moving to the USA from Siberia?

With Cubic.ai, I moved from Siberia to the U.S. in 2014 and brought my family with me. My older daughter Sofia started learning English as a second language when she went to a preschool in Mountain View, California, at the age of 4. Sofia struggled to begin speaking English for the first 3 – 5 months in preschool. We were worried because she couldn’t find friends and play with most of her peers because of the language. We started looking for ways to help her learn to speak.

It became clear that language apps for kids do not teach to speak (and everything has stayed the same over time), and language apps for grown-ups like Duolingo do not work for children because of the UX. So, we started taking lessons on platforms that connect children with live teachers via video conferencing. Examples are Cambly, VipKid, Novakid, GoStudent, etc. As I observed Sofia learn with live tutors virtually, I saw the benefit of 1:1 attention and active speaking practice, but also saw the shortcomings of these programs in general.

For example, as they scale, many of the Online Tutoring Platforms and Online Schools have to hire people without pedagogical backgrounds, skills in teaching children, or even a proper English proficiency level. So, to ensure a certain quality of education, online platforms and schools strictly script curriculum and lesson plans, and teachers have to use pre-canned exercises, including audio and video fragments. So, unfortunately, on many platforms, tutors basically work like bots.

Still, online tutoring has been the only way for most people to learn to SPEAK English, especially in non-English speaking countries. But partly because of the teacher shortage, it is way too expensive for most families. Learning with live teachers is a premium education service few families can afford.

My co-founder and I came to the realization that AI tutoring is the only scalable way to provide 1:1 English-speaking tutoring to every child worldwide. Soon, we learned that it is also the best from an educational standpoint. When we were considering Buddy’s earliest prototypes, we got inspired by research in the field of Virtual Humans in Education.

Academic studies show animated pedagogical agents’ educational advantages and superiority compared to more traditional learning tools and environments. For example, see Face-to-Face Interaction with Pedagogical Agents, Twenty Years Later, a 2016 article that overviews the field and cites a lot of the relevant material. Here is one quote:

“In particular, the meta-analysis found that agents do enhance learning in comparison with learning environments that do not feature agents. […] Perhaps most interesting was the finding that, in formal education, pedagogical agents seem to be more effective for younger learners than for older learners. […] studies have found, for example, that students interacting with pedagogical agents exhibit stronger learning outcomes when 1) pedagogical agents speak rather than communicate with text, 2) pedagogical agents use human-like gestures, 3) pedagogical agents communicate conversationally rather than formally, and 4) pedagogical agents use polite rather than direct phrasing.” 

This strengthened our confidence in the multimodal AI tutoring approach. We decided that Buddy would be a multimodal AI tutor – an animated pedagogical agent capable of voice recognition and natural language processing. At its core, an AI Tutoring system consists of three main technologies:

  1. Automatic speech recognition (ASR) and analysis allow us to process and analyze the student’s speech.
  2. Natural language processing (NLP), natural language understanding and dialogue management that processes the content of the student’s speech and produces the next response. The response consists of both verbal and non-verbal components.
  3. Embodied animated virtual character that provides both listening feedback and plays back the system’s response. The character is animated procedurally – the system creates animations on the fly from the NLP response.

All three components are crucial to our approach because only in combination do they allow us to build an engaging, interactive tutor and deliver a successful educational experience.

My daughter Sofia and my co-founder’s son Arseny became Buddy’s first users. Sofia used the earliest versions of Buddy through the 1st grade.

Several years later, my younger daughter Alisa started using Buddy at three years old when she went to preschool. Now, she is in Transitional Kindergarten and plays with Buddy almost every day. When Alisa started learning with Buddy, she had several speech issues, so Buddy did not understand her most of the time. But after a couple of weeks of practice, not only her English but her speech improved, as she tried her best to make Buddy understand her.

Why are the legacy ways of teaching a second language so ineffective?

Today, we are focused on solving particular education problems connected to speech. You can’t learn to speak without speaking practice:

  • Most traditional educational tools focus on teaching other language skills like reading or writing.
  • Language Apps for kids don’t teach speaking skills.
  • Some Language Apps for adults today provide speaking practice using AI, but these services don’t work for kids because of UX, safety concerns, and privacy regulations.
  • Live tutors are too expensive for most families. Unfortunately, many tutors don’t have pedagogical training or aren’t proficient in English.

Buddy is a multimodal AI tutor.

  • It’s superior to traditional learning apps because it works like a live teacher in many ways. Let me quote one of our advisors, Dr. Alex Desatnik, PhD, University College London:

“Voice-based virtual tutor. This concept may sound simple, but there is science behind it. From a psychology of learning standpoint, the virtual talking character is an embodiment of the teacher. This approach creates an effect called epistemic trust, strengthening the student’s motivation and engagement, and improving the learning outcomes.”

  • Buddy has some advantages even over human teachers. Buddy does not judge, and for some children, it makes it easier to start talking to Buddy than to a teacher. That’s why today, many tutors use Buddy as an icebreaker that helps children overcome their fear and discomfort and start speaking the language.

Buddy works to help teachers, not to replace them.

I think it’s very important to note this. Buddy can help teachers automate the mundane part of their job – providing regular practice. We want to give power to school teachers. Buddy is like a team of tutors and teacher assistants, working individually with every child in the class and reporting to the class teacher.

Can you discuss how Buddy uses elements of gamification to keep children excited about learning?

Fun fact: Buddy’s mobile App was downloaded 22 million times in 2023, and over 70% of these downloads were made by children. For children, our App is a game where they play with Buddy, their talking virtual friend and a popular Youtuber. Children download the App and convince parents to pay for a subscription, explaining that Buddy is a teacher.

To make this approach work, we are designing Buddy as a game with a story and a universe. We work with Hollywood character designers and writers to create Buddy and his story. We have a very strong game design team working directly with our educators and turning curriculum and exercises into mini-games in Buddy’s world.

What are some other core functionalities that make Buddy so powerful in teaching a second language?

Our core functionality is really focused on Buddy as a multimodal AI tutor:

  • Speech recognition
  • Conversational AI
  • Avatar visual behavior

What are some of the machine learning algorithms that are used at Buddy?

We are developing the whole stack of technologies, working together to enable our multimodal AI tutoring approach.

  • BSR (Buddy’s Speech Recognition) is a proprietary speech recognition engine specifically to work with accented children’s speech and comply with regulations like COPPA.
  • BLM (Buddy’s language model) — Conversational AI Engine for Children. Safe, fast, and free to operate. It focuses on specific educational functionality and is much less versatile than large language models.
  • BABE (Buddy’s Avatar Behavior Engine). This technology generates our character’s visual behavior based on the context of the conversation. Buddy understands when he needs to smile, change color, or put on a silly hat.

Many voice recognition systems struggle with accents especially for young children, how does Buddy overcome these challenges?

By developing BSR, our proprietary Speech Recognition technology.

Our unique audience and market required the development of proprietary technology. Buddy must recognize the highly accented speech of young English as a Foreign Language (EFL) learners. Another complicating factor is that beginner students start by learning separate, often short words, which are very difficult to recognize without context. Finally, the children’s market is highly regulated, and voice recognition is subject to the Children Online Privacy Protection Act (COPPA) since voice recordings are considered Personal Identifiable Information (PII).

BSR handles children’s speech with different accents, produced on a variety of mobile devices with microphones of various acoustic qualities and in real-life environments with many kinds of background noise. And it’s COPPA compliant by design.

Working globally, we managed to accumulate a unique data set to train our model on. Today, BSR outperforms commercial off-the-shelf solutions in recognizing and understanding accented children’s speech.

How do you plan on expanding market penetration to target parents who may be unfamiliar with AI technology?

Buddy started seeing success before AI became a buzzword, and most of our users aren’t the typical early tech adopters. We are successfully solving an important educational problem, and it just so happens that we are using AI for it.

Still, one of the challenges we face is making parents treat learning with Buddy as seriously as with a live tutor — don’t skip lessons, stick to a schedule, etc. The current AI revolution seems to be helping with that.

I’d say that the next big step for us is to start working more closely with teachers and schools. We are running a pilot partnership with a school in Brazil and discussing partnerships with a dozen more educational institutions.

What is your vision for the future of AI tutors and education in general?

AI tutors are the best and the only scalable way to solve humanity’s #1 educational problem – the global teacher shortage. We need about 69 million new teachers to address just basic learning needs. For subjects that require 1:1 tutoring, like language learning, the problem is much worse.

The AI revolution accelerated the development of AI tutors, though primarily in the adult segment using off-the-shelf solutions, while early learning remains dramatically underserved. We are proud to be pioneers of AI tutoring for young children.

Regarding our future, Buddy started as a language learning tutor, but in the longer term, it will become an AI tutoring platform teaching a wide variety of subjects to children under 12. We have already started rolling out an early version of our first non-language course – the School Preparation Curriculum for U.S. children. We see Buddy as the child’s learning assistant, growing up with a child from 3 to 4 years old and teaching multiple courses over many years.

Thank you for the great interview, readers who wish to learn more should visit Buddy AI.

This post was originally published on 3rd party site mentioned in the title of this site

Similar Posts