Thanks to how your Google Home voice assistant records our conversations, which are sometimes triggered by mistake, audio clips – both those recorded on purpose and otherwise – are being sent to engineers working on Google Home voice processing.
How it’s supposed to work: Google Home should only be activated when someone says the triggers “OK, Google” or “Hey, Google.” But it’s not hard to flip that switch accidentally: if someone nearby says “Google,” or even a word that sounds like “Google,” the speaker often starts recording.
The audio clips have included people’s bedroom sound symphonies, their kids’ or grandkids’ voices, payment information from transactions, medical information they divulge when searching on their ailments, and far more.
This all comes from a new report from Belgian broadcaster VRT News that relied on input from three Google insiders.
Listening in on the kids
With the help of a whistleblower, VRT listened to some of the clips. Its reporters managed to hear enough to discern the addresses of several Dutch and Belgian people using Google Home, in spite of the fact that some of them never said the listening trigger phrases. One couple looked surprised and uncomfortable when the news outlet played them recordings of their grandchildren.
The whistleblower who leaked the recordings was working as a subcontractor to Google, transcribing the audio files for subsequent use in improving its speech recognition. They reached out to VRT after reading about how Amazon workers are listening to what you tell Alexa, as Bloomberg reported in April.
They’re listening, but they aren’t necessarily deleting: a few weeks ago, Amazon confirmed – in a letter responding to a lawmaker’s request for information – that it keeps transcripts and recordings picked up by its Alexa devices forever, unless a user explicitly requests that they be deleted.
VRT talked to cybersecurity expert Bavo Van den Heuvel, who spotted potential dangers in the prospect of humans listening to our voice assistant recordings, given that they can be made just about anywhere: in a doctor’s office, in a business meeting, or where people deal with sensitive files, such as police stations, lawyers’ offices or courts.
It’s not just Dutch and Belgian contractors who are listening to Google Home requests, though those are the only recordings VRT listened to. The whistleblower showed the news outlet a platform with recordings from all over the world, meaning that there are likely thousands of contractors listening in on Assistant recordings. From VRT:
That employee let us look into the system in which the employees have to listen to recordings from the Google Assistant. There must be thousands of employees worldwide; in Flanders and the Netherlands, a dozen employees are likely to hear recordings from Dutch-speaking users.
Google’s well aware that its contractors can listen to these recordings, and it’s aware of the privacy questions that raises. To keep those contractors from identifying the people they’re listening to, Google strips identifying data from the recordings.
Of course, it’s common for data-gorging companies to point to a lack of identity details and equate that lack to a privacy shield. But in these days of Big Data, the claim has been proved to be flawed. After all, as we’ve noted in the past, data points that are individually innocuous can be enormously powerful and revealing when aggregated. That is, in fact, the essence of Big Data.
Take, for example, the research done by MIT graduate students a few years back to see how easy it might be to re-identify people from three months of credit card data, sourced from an anonymized transaction log.
The upshot: with 10 known transactions – easy enough to rack up if you grab coffee from the same shop every morning, park at the same lot every day and pick up your newspaper from the same newsstand – the researchers found they had a better than 80% chance of identifying you.
But we don’t need to go to Big Data science to identify the people in these recordings. They do it themselves. That’s how VRT managed to identify the people in the recordings they listened to. Here’s VRT:
By listening to the things the users themselves say, it is not rocket science to find out their identity…
In addition, employees who listen to the excerpts must search every word, address, name or company name [when] they are not sure how they are written, via Google or Facebook, to find out the correct spelling. In this way they often find out quickly who has spoken the piece in question.
Google: Yes, we’re listening. Just a little.
Google responded to VRT with an emailed statement in which it acknowledged that people are indeed listening to recordings… but not many.
Google said that humans listen to only 0.2% of all audio clips. And those clips have been stripped of personally identifiable information (PII) as well, Google said.
We’ve got to do this work to make the technology better, Google said:
We work with language experts around the world to improve speech technology by making transcripts from a small number of audio clips. This work is crucial for the development of technology that makes products such as the Google Assistant possible.
Heads will roll, ears and all
…and we’ve got to find that whistleblower, Google said:
We have recently learned that one of these language experts may have violated our data security policy by leaking Dutch-language audio clips.
We are actively investigating this and when we find a breach of our policy, we will take action quickly, up to and including the termination of our agreement with the partner.