Last week I wrote about an AI startup that’s building technology that can alter, in real time, the accent of someone’s speech. But what if the AI goal instead is to make it possible for people speaking in whatever way they do, to be understood just as they are, and to remove some of the bias inherent in a lot of AI systems in the process? There’s a major need for that, too, and now a UK startup called Speechmatics — which has built AI to translate speech to text, regardless of the accent or how the person — is announcing $62 million in funding to expand its business.
Susquehanna Growth Equity out of the US led the round with UK investors AlbionVC and IQ Capital also participating. This is Series B is a big step up for Speechmatics. The company was originally spun out back in 2006 of AI research in Cambridge by founder Dr. Tony Robinson, and prior to this had only raised around $10 million (Albion and IQ are among those past backers, along with the CIA-backed In-Q-Tel and others).
In the interim it has built up a customer base of some 170 — it only sells B2B, to power consumer-facing or business-facing services — and while it doesn’t disclose the full list, some of the names include what3words, 3Play Media, Veritone, Deloitte UK, and Vonage, which variously use the tech not just for making transcriptions in the traditional sense; but for taking in spoken words to help other aspects of an app function, such as automatic captioning, or to power wider accessibility features.
Its engine today is able to translate speech to text in 34 languages, and in addition to using the funding both to continue improving the accuracy there, and for business development, it will also be adding in more languages and looking at different use cases, such as building speech to text that can be used in the more tricky environment of motor vehicles (where motor noise and vibrations impact how AIs can ingest the sounds).
“What we have done is gather millions of hours of data in our effort to tackle AI bias. Our goal is to understand any and every voice, in multiple languages,” said Katy Wigdahl, the CEO of the startup (a title she co-held with Robinson, who has since stepped back from an executive role recently).
This manifests in the company’s product focus as well as its mission, and that’s something it’s also looking to expand.
“The way we look at language is global,” Wigdahl said. “Google will have a different pack for every version of English but our one pack will understand every one.” It initially only made its tech available by way of a private API that it sold to customers; Now in an effort to bring in more users and potentially more paying users, it’s also offering more open API tools to developers to play with the tech, and a drag-and-drop sampler on its site.
And indeed, if one of Speechmatics’ challenges is in training AI to be more human in its understanding of how people speak, the other is to carve out a name for itself against other major providers of speech-to-text technology.
Wigdahl said company today competes against “big tech” — that is, major companies like Amazon, Google and Microsoft (which now has Nuance) that have build speech recognition engines and provide the tech as a service to third parties.
But it says it consistently scores better than these in tests for being able to comprehend when languages are spoken in the many ways that they are. (One test it cited to me was Stanford’s ‘Racial Disparities in Speech Recognition’ study, where it recorded “an overall accuracy of 82.8% for African American voices compared to Google (68.6%) and Amazon (68.6).” It said that “equates to a 45% reduction in speech errors recognition — the equivalent of three words in an average sentence. It also provided TC with a “competitor weighted average”:
There is indeed a massive opportunity here, though, when you consider that between smaller developers and massive, outsized technology giants like Apple, Google, Microsoft and Amazon there are hundreds of giant companies that might not be quite at the level (or interest) of building in-house AI for this purpose, but if you take for example a company like Spotify, are definitely interested in it, and definitely would prefer not to be reliant on sometimes those huge companies, which are also their competitors, and sometimes their outright foils. (To be clear, Wigdahl did not tell me Spotify was a customer, but said that that is a typical example of the kind of size and situation in which someone might knock on Speechmatics’ door.)
That too has been partly why investors are so keen to fund this company. Susquehanna has a history of backing companies that look like they might give the power players a run for their money (it was an early and big backer of Tik Tok).
“The Speechmatics team are undoubtedly a different pedigree of technologists,” said Jonathan Klahr, MD of Susquehanna Growth Equity, in a statement. “We started tracking Speechmatics when our portfolio companies told us that again and again Speechmatics win on accuracy against all the other options including those coming from ‘Big Tech’ players. We are primed to work with the team to ensure that more companies can get exposed to and adopt this superior technology.” Klahr is joining the board with this round.
Indeed, as tech becomes more naturalized and those making it look for more ways to reduce any and all friction that there might be around usage of that tech, voice has emerged as a major opportunity point, as well as a pain point. So having tech that works in “reading” and understanding all kinds of voices can potentially get applied in all kinds of ways.
“The view is voice will become the dominant human-machine interface and Speechmatics are the category leaders in deep learning to speech, with category defining accuracy and understanding across industry use-case and requirements,” added Robert Whitby-Smith, a partner at AlbionVC. “We have witnessed the impressive growth of the team and product over the last few years since our Series A investment in 2019 and as responsible investors we are delighted to support the company’s inclusive mission to understand every voice globally.”