Who’s the Phoneme?

Ever since attending Google I/O (one of the best conferences I’ve ever attended… seriously Google, please have me back next year!) earlier this year, I felt I needed to build a dedicated app for the Google Assistant.  I kicked around a ton of ideas and at first felt daunted by all of the things that I didn’t know that were fundamentals for even making something simple in the space:  My programming language(s) of choice were not in the ecosystem.  I knew nothing about Conversational UI.  TensorFlow, DialogFlow and all of the Machine Learning (ML) terminology were new to me and the SDKs were changing rapidly.  So I did the same thing I did in the early days of Android and just jumped in and started learning.

I had a job (which seems like eons ago) in which I worked on many things that became the precursors to modern day AI and ML concepts.  So a lot of that was learning the new names for concepts with which I was already familiar.  Libraries to facilitate AI had come a long way as well, and there were off the shelf pieces for things that entire companies had arisen around back in the day.

Even after getting a pretty good grasp on the basics, I held off for a bit on actually building something; secretly hoping that Google would roll out Kotlin support to all of their server side infrastructure. 😉 That never happened, but I’d also done a fair amount of Node.js and general JavaScript development over the years and finally just decided to dive in there.  This is pretty much required to use the backend components of their FireBase infrastructure for Android as well, so it’s not lost time in continuing to be familiar.  I was pleasantly surprised to see that Speech Synthesis Markup Language (SSML) was pervasive in the voice assistant space.  I had followed the early W3C recommendation pretty closely as part of a (way before its time) Augmented Reality Gaming Engine that I had worked on as a side project.  This is where the title of the post comes in…

I dabbled in a lot of odd things in college… so many things that, over the years since I’ve learned them, seemed so far afoot of my current chosen career as to be laughable.  One of my big obsessions (that still is) was language and its origins.  Why is it that ancient Sanskrit is so similar in some ways to Classic Mayan?  I could discuss this stuff forever, but for the purposes of this digression, my point is that I’ve taken a bunch of linguistics and language courses.  One course in particular was immensely useful in creating a primarily voice centric assistant app.  In this course, I learned about phonemes.  Phonemes are a convenient way of representing the way words should be pronounced.  It’s probably pretty obvious how this would be valuable when dealing with a highly technical subject matter using just voice, but just think about the wide range of pronouncing words in English that have very similar spellings and you’ll get the idea.  Phonemes are often represented using symbols from the International Phonetic Alphabet.  I chuckled to myself about the irony that I was building a beer related voice assistant using the IPA, but this was the secret sauce to making my app sound like it was a beer judging expert and not some rube reading unfamiliar words out of a homebrewing manual.

I ended up building something for the Assistant that I’m proud of and learning a bunch along the way.  The app is currently under review with Google, otherwise I’d be telling you all to go out and try it.  I’ll leave that for a future post. UPDATE: The app is now available, read more about it here.