The race to fingerprint the human voice
Since 9/11, voice scientists have been searching for a way to find a
person's unique 'voice-print'. Accurately identifying you by your voice
is more difficult than it sounds, according to a new discovery.
"You do not have to say anything, but it may harm your defence if you
do not mention, when questioned, something you later rely on in court.
Anything you do say may be given in evidence." We have all heard this
1,000 times yet we barely give a thought as to what may happen to all
the recordings that the police make of their interviews. Or indeed to
the somewhat more mundane equivalent: "This call may be recorded for
training purposes."
However, without your permission - or even your knowledge - your
recorded voice may be about to play a key role in the race to
fingerprint the human voice.
Fuelled by 9/11, spurred on by the advance of our digital society and
made possible by raw computing power, the development of increasingly
sophisticated automated speaker recognition systems (ASRS) are now
bringing the prospect of a "voice-print" enticingly close, threatening
to make the skilled voice scientist redundant.
These automated systems, already widely used by police and
intelligence services on the Continent, can in as little as 15 minutes
use a background population of voices to make a statistical judgement on
the significance of any similarity or difference between the voice of
the criminal and that of a suspect that could have taken a human 15
hours to complete.
Attacks
"September 11 was the trigger for this as, after the attacks, the
police and intelligence services realised that while there were so many
recordings of the voices of the terrorists they didn't have the
technology they needed to extract information from them," Antonio Moreno
says.
Moreno is the technical director of Agnitio Corp, which was spun out
of the Technical University of Madrid in 2004 and provides forensic
automated speaker recognition systems, such as its market-leading
Batvox, to the police forces of more than 20 countries, including
Germany and the US but not yet the UK.
"By the time of Spain's own 9/11 [the Madrid train bombings of 11
March 2004], Batvox could be used to identify some of the men behind the
bombings as, although they wore masks on YouTube, they spoke naturally."
For Professor Peter French, founder of the UK's leading and oldest
forensic speech laboratory, JP French Associates, the bugging, recording
and identification of people traffickers, drug dealers and terrorists
was only the beginning of this revolution.
"The ubiquity of mobile phones means that almost the first thing you
do if you are attacked is call 999, and as all 999 calls are recorded a
lot of people inadvertently record their rape or mugging and capture
their attackers' voices," French says.
Now, though, the "great quest" is to fingerprint the human voice and
"many engineers keep telling me that all they need is more time to tweak
the algorithms and they can achieve full accuracy", French says. Francis
Nolan, Professor of phonetics in the Department of Theoretical and
Applied Linguistics at the University of Cambridge, agrees that the
balance is shifting towards these automated systems due to the technical
advances that have made them possible. The importance of speaker
identification has grown for the simple reason that "it's not the amount
or nature of crime that has changed, it's just the sheer amount of
recorded material that is now available".
Nolan adds that "while on the Continent the police are more likely to
use an automated system, in the UK the tradition has been to use a
skilled dialectician", who would analyse one at a time the sound of the
vowels and the even the rise and fall of the voice, its melody, through
a complicated system of notation called the International Phonetic
Alphabet.
Later, acoustic tests were introduced that allowed the dialectologist
to measure the different elements of the speech signal and so extract
information that was beyond the ability of humans to hear.
Even with the help of technology to run these tests, the role of the
specialist remains largely the same: to make a judgement as to whether -
for example - any differences between two recorded voices were down to
smoking, drugs, flu or even whom they were talking to, or whether they
were the voices of two entirely different people.
Augment
Now, Nolan says, "we are beginning to augment the human element still
further through the introduction of automated systems such as Batvox",
which by analysing the speech signal analyses the characteristics of
each human's vocal tract and comes up with a statistical model that can
compare an unknown voice against voices coming from known speakers
regardless of what they are saying.
Batvox, for example, then produces a likelihood ratio, much like a
DNA profile does, to suggest how significant such a match is.
The system depends on a reference population of hundreds of human
voices from which to learn what is the norm.
For Nolan, while these ASRs are "an extra tool" in the specialists'
tool box, " there is a real danger that these systems hide from a jury
the implications of the "complexities of the human voice and language".
- The Independent
|