Building an offline e-learning application with speech recognition

Even in the best of times, working from home can be challenging. Nowadays, working from home with children in tow, presently necessary for countless millions around the world, can be nearly impossible. As our habits and practices evolve to embrace our new reality of social distancing and home isolation, our thoughts increasingly turn to how best to keep our children educated and engaged at the same time. Unfortunately, e-learning solutions that offer distance and virtual learning solutions comprise only part of the solution for parents with children to homeschool. But as it turns out, crafting your own e-learning solution and virtual curriculum for homeschooling your children doesn't have to be a daily chore. In fact, in a mere hundred lines of code, you can build an offline e-learning application with the SpeechRecognition API and speech-to-text transcription to help your child with language acquisition.

We had a special episode of Tag1 Team Talks bringing together Laslo Horvath (Senior Laravel Developer at Tag1), the creator of an offline language-learning e-learning application, Michael Meyers (Managing Director at Tag1), and yours truly (Preston So, Editor in Chief at Tag1; Senior Director, Product Strategy at Oracle; and author of Decoupled Drupal in Practice). During our half-as-long edition, we dove into how Laslo built, in a matter of days, a compelling e-learning application for practicing key skills in German and English to keep his homebound children occupied while he maintains focus on mission-critical projects at Tag1 Consulting. In this blog post, we'll share some of the insights that led Laslo to open-source his application for others to benefit from.

The challenge

These days, engineers have plenty on their plate when it comes to client implementations. Now that schools have closed for the foreseeable future, keeping children engaged in learning can be downright impossible in an environment where parents still need to maintain their productivity and utmost focus. Since children are increasingly accustomed to the digital world, and since Laslo thinks like a developer, the ideal solution to any hairy problem involves a great degree of automation. But whereas automating curricula like mathematics courses may be more straightforward, Laslo's children are currently learning to read in school.

By mid-March, authorities in Vienna, where Laslo and his family live, had shut schools for everyone but the small proportion of parents who are essential frontline workers. Though Vienna has been at the forefront of introducing distance learning programs for children now adjusting to homeschooling, many e-learning tools tend to be droll, exam-based, completion-focused workflows that may not be as effective in keeping distracted students engaged. As Laslo eloquently states, "The learning itself is something you have to do with the kids. The teacher role is still on you, and we have to give kids what they need to learn."

Laslo's children are currently learning how to read in the German and English languages. In the reading and speaking curriculum Laslo developed for practicing these languages, students are required to read simple sentences aloud. A machine listens to the pronunciation and corrects the children if they make a mistake. Since at the moment Laslo has considerable deliverables to provide for a project with an imminent launch, he needs to invest most of his day in his work. Unfortunately, the most productive hours of Laslo's day are also the most productive hours for his children when it comes to concentration. "I didn't want [my son] to start learning at 7pm," said Laslo. "I wanted him to start learning during regular hours in the day."

Fortunately, Laslo has a formidable background to draw from while developing his e-learning application. With a background in speech-to-text (STT) and text-to-speech (TTS) applications, Laslo once built an enterprise resource planning (ERP) system implemented on top of SAP originally intended for a demonstration at an annual artificial intelligence (AI) summit in Germany. At the time, most AI activity was focused on chatbots, voice, and other conversational interfaces. Thus, Laslo prototyped a chatbot for the application where users could insert machine-interpretable sentences. If users wanted to order fifty bottles of hand sanitizer, they could do so by saying "order me 50 hand sanitizers." Though the prototype was a rousing success in 2016, at the time, speech recognition was nowhere near where it is today, and Laslo soon tabled the project as no more than an interesting proof of concept.

The solution

Laslo's resulting user interface for his own children allows for his star pupil, his son, to practice their German and English skills on his own. First, Laslo's son inputs a particular written sentence in the German language, thus practicing both typing and writing skills. Then, Laslo's son reads the sentence aloud, and the software verifies the response to ensure he is reading it correctly according to the expected pronunciation of the sentence. Fortunately for Laslo, because his son loves reading books, it isn't a significant lift for his son to derive hours of entertainment from this e-learning application.

To make it a reality, Laslo built a lightweight application in a matter of hours consisting only of HTML5, JavaScript, and SVG animations. All things considered, the application is a mere one hundred lines of code and leverages existing standards to their fullest extent. For instance, now that browsers have speech recognition capabilities natively, building voice detection and voice input validation into any web application is table stakes.

As for the JavaScript side of things, limited logic handles the communication between user and browser. The JavaScript code listens to the user's voice input, parses the response using the browser's speech recognition, and displays the degree of certainty to which the software is certain of a particular word having been uttered. It accomplishes this by returning multiple potential matching words; the script only looks at the first result. Finally, the JavaScript uses client-side logic to mark correct text matches as green as the student progresses through the interface.

Laslo then enlisted an animator friend to assist with a few SVG animations to make the e-learning application more aesthetically appealing. A human hand SVG points out the word that should be pronounced at that moment, just like a parent pointing at a word in a picture book during storytime, and unleashes a few party poppers when the task has successfully been completed, giving his son positive feedback. And last but not least, Laslo built localization into the application, thus allowing anyone leveraging the application to offer not only their own corpus of reading samples but also corpuses in other languages.

elearningscreen

In a more sophisticated piece of software, admits Laslo, especially in the voice context, having a comprehensive corpus that also gracefully handles fallbacks would normally be expected. For example, Alexa often asks whether a user meant a particular word over another, something that can be configured by Alexa skill developers. Fortunately for Laslo, all he needed to do for this scenario was to compare strings, thanks to his ability to stand on the shoulders of giants in the form of the SpeechRecognition API.

Best of all, Laslo's e-learning application lacks infrastructure entirely (one could certainly call it serverless!), meaning no running servers and no complex hosting. With no server and no other infrastructural requirements, Laslo was able to build a compelling educational tool with no more than HTML and JavaScript in the browser. This means Laslo can keep this application local with no connectivity to the internet (an important consideration for children who are increasingly left alone on their own devices) and share the tool with his friends and colleagues without any need to attach a lengthy list of hosting setup instructions or back-end dependencies.

Conclusion

Because Laslo's e-learning application was primarily for convenience rather than for a customer, he doesn't plan to build many additional features, but he does plan to encourage his friends to try the application out. Among the new capabilities on the roadmap are support for lessons with preset text supplied by a local database. In addition, Laslo plans to introduce skill levels—basic, intermediate, and advanced—to offer a cumulative experience for his children. Though Laslo's son is in preschool now, later, in the third grade, he may be expected to read entire sentences fluently rather than individual words, therefore requiring the matching algorithm to adjust as well. For even more advanced children, detecting entire paragraphs could be a possible addition as well.

For parents seeking solutions to maintain their focus while they work from home with distracted children in these unprecedented times, it can seem impossible to reconcile the need for absolute focus at work and the need for children to learn. In this blog post, we examined one possible trajectory for fed-up parents in the form of lightweight e-learning applications, especially those that can run entirely offline without infrastructure. Thanks to Laslo's offline e-learning application, which is available as an open-source project on GitHub and leverages the SpeechRecognition API, a feature available in all modern browsers, he can continue to work on mission-critical projects for Tag1 with his undivided attention and automate the way his children learn from home in these unprecedented times.

Special thanks to Jeremy Andrews, Laslo Horvath, and Michael Meyers for their feedback during the writing process.

Photo by visuals on Unsplash

Home curricula during coronavirus:

Building an offline e-learning application with speech recognition

Preston So

Editor in Chief

The challenge

The solution

Conclusion

More Migration Resources

Performance testing with Gander

Popular content

Popular blogs