Google's fact-checking bots build vast knowledge bank
The search giant is automatically building Knowledge Vault, a massive
database that could give us unprecedented access to the world's facts
GOOGLE is building the largest store of knowledge in human history -
and it's doing so without any human help.
Instead, Knowledge Vault autonomously gathers and merges information
from across the web into a single base of facts about the world, and the
people and objects in it.
Knowledge Vault is a type of “knowledge base”
The breadth and accuracy of this gathered knowledge is already
becoming the foundation of systems that allow robots and smartphones to
understand what people ask them. It promises to let Google answer
questions like an oracle rather than a search engine, and even to turn a
new lens on human history.
Knowledge Vault is a type of "knowledge base" - a system that stores
information so that machines as well as people can read it. Where a
database deals with numbers, a knowledge base deals with facts. When you
type "Where was Madonna born" into Google, for example, the place given
is pulled from Google's existing knowledge base.
This existing base, called Knowledge Graph, relies on crowd sourcing
to expand its information. But the firm noticed that growth was
stalling; humans could only take it so far.
So Google decided it needed to automate the process. It started
building the Vault by using an algorithm to automatically pull in
information from all over the web, using machine learning to turn the
raw data into usable pieces of knowledge.
Knowledge Vault has pulled in 1.6 billion facts to date. Of these,
271 million are rated as "confident facts", to which Google's model
ascribes a more than 90 per cent chance of being true. It does this by
cross-referencing new facts with what it already knows.
"It's a hugely impressive thing that they are pulling off," says
Fabian Suchanek, a data scientist at Télécom ParisTech in France.
Google's Knowledge Graph is currently bigger than the Knowledge
Vault, but it only includes manually integrated sources such as the CIA
Knowledge Vault offers Google fast, automatic expansion of its
knowledge - and it's only going to get bigger. As well as the ability to
analyse text on a webpage for facts to feed its knowledge base, Google
can also peer under the surface of the web, hunting for hidden sources
of data such as the figures that feed Amazon product pages, for example.
Tom Austin, a technology analyst at Gartner in Boston, says that the
world's biggest technology companies are racing to build similar vaults.
"Google, Microsoft, Facebook, Amazon and IBM are all building them,
and they're tackling these enormous problems that we would never even
have thought of trying 10 years ago," he says. The potential of a
machine system that has the whole of human knowledge at its fingertips
is huge. One of the first applications will be virtual personal
assistants that go way beyond what Siri and Google Now are capable of,
"Before this decade is out, we will have a smart priority inbox that
will find for us the 10 most important emails we've received and handle
the rest without us having to touch them," Austin says. Our virtual
assistant will be able to decide what matters and what doesn't.
Other agents will carry out the same process to watch over and guide
our health, sorting through a knowledge base of medical symptoms to find
correlations with data in each person's health records. IBM's Watson is
already doing this for cancer at Memorial Sloan Kettering Hospital in
Knowledge Vault promises to supercharge our interactions with
machines, but it also comes with an increased privacy risk. The Vault
doesn't care if you are a person or a mountain - it is voraciously
gathering every piece of information it can find.
"Behind the scenes, Google doesn't only have public data," says
It can also pull in information from Gmail, Google+ and Youtube. You
and I are stored in the Knowledge Vault in the same way as Elvis
Presley," Suchanek says.
Google researcher Kevin Murphy and his colleagues will present a
paper on Knowledge Vault at the Conference on Knowledge Discovery and
Data Mining in New York on August 25.
As well as improving our interactions with computers, large stores of
knowledge will be the fuel for augmented reality, too. Once machines get
the ability to recognise objects, Knowledge Vault could be the
foundation of a system that can provide anyone wearing a heads-up
display with information about the landmarks, buildings and businesses
they are looking at in the real world. "Knowledge Vault adds local
entities - politicians, businesses. This is just the tip of the
iceberg," Suchanek says.
Richer vaults of knowledge will also change the way we study human
society "This is the most visionary thing," says Suchanek. "The
Knowledge Vault can model history and society."
Google already has a way to track mentions of names over time using
historical texts, measuring the popularity of Albert Einstein vs Charles
Darwin, for instance.
By adding knowledge bases - which know the gender, age and place of
birth of myriad people - historians would be able to track more in-depth
questions, such as the popularity of female singers over time, for
example. Suchanek has already carried out a version of this kind of
data-driven history. By combining a knowledge base called YAGO with data
from French newspaper Le Monde, he was able to show how the gender gap
in French politics changed over time. This was only possible because
YAGO knows the gender of every French politician, and can apply that
knowledge to names mentioned in Le Monde. He will present the work at
the Very Large Databases Conference in Hangzhou, China, in September. It
might even be possible to use a knowledge base as detailed and broad as
Google's to start making accurate predictions about the future based on
analysis and forward projection of the past, says Suchanek.
"This an entirely new generation of technology that's going to result
in massive changes - improvement in how people live and have fun, and
how they make war," says Austin. "This is a quantum leap."