Home / News & Analysis / AI edges closer to understanding 3D space the way we do

AI edges closer to understanding 3D space the way we do

If I show you single picture of a room, you can tell me right away that there’s a table with a chair in front of it, they’re probably about the same size, about this far from each other, with the walls this far away — enough to draw a rough map of the room. Computer vision systems don’t have this intuitive understanding of space, but the latest research from DeepMind brings them closer than ever before.

The new paper from the Google -owned research outfit was published today in the journal Science (complete with news item). It details a system whereby a neural network, knowing practically nothing, can look at one or two static 2D images of a scene and reconstruct a reasonably accurate 3D representation of it. We’re not talking about going from snapshots to full 3D images (Facebook’s working on that) but rather replicating the intuitive and space-conscious way that all humans view and analyze the world.

When I say it knows practically nothing, I don’t mean it’s just some standard machine learning system. But most computer vision algorithms work via what’s called supervised learning, in which they ingest a great deal of data that’s been labeled by humans with the correct answers — for example, images with everything in them outlined and named.

This new system, on the other hand, has no such knowledge to draw on. It works entirely independently of any ideas of how to see the world as we do, like how objects’ colors change towards their edges, how they get bigger and smaller as their distance changes, and so on.

It works, roughly speaking, like this. One half of the system is its “representation” part, which can observe a given 3D scene from some angle, encoding it in a complex mathematical form called a vector. Then there’s the “generative” part, which, based only on the vectors created earlier, predicts what a different part of the scene would look like.

(A video showing a bit more of how this works is available here.)

Think of it like someone hand you a couple pictures of a room, then asking you to draw what you’d see if you were standing in a specific spot in it. Again, this is simple enough for us, but computers have no natural ability to do it; their sense of sight, if we can call it that, is extremely rudimentary and literal, and of course machines lack imagination.

Yet there are few better words that describe the ability to say what’s behind something when you can’t see it.

“It was not at all clear that a neural network could ever learn to create images in such a precise and controlled manner,” said lead author of the paper, Ali Eslami, in a release accompanying the paper. “However we found that sufficiently deep networks can learn about perspective, occlusion and lighting, without any human engineering. This was a super surprising finding.”

It also allows the system to accurately recreate a 3D object from a single viewpoint, such as the blocks shown here:

I’m not sure I could do that.

Obviously there’s nothing in any single observation to tell the system that some part of the blocks extends forever away from the camera. But it creates a plausible version of the block structure regardless that is accurate in every way. Adding one or two more observations requires the system to rectify multiple views, but results in an even better representation.

This kind of ability is critical for robots especially because they have to navigate the real world by sensing it and reacting to what they see. With limited information, such as some important clue that’s temporarily hidden from view, they can freeze up or make illogical choices. But with something like this in their robotic brains, they could make reasonable assumptions about, say, the layout of a room without having to ground-truth every inch.

“Although we need more data and faster hardware before we can deploy this new type of system in the real world,” Eslami said, “it takes us one step closer to understanding how we may build agents that learn by themselves.”

Read more

Check Also

Google keeps a history of your locations even when Location History is off

In a wonderfully clear example of “dark patterns” designed to mislead users and retain control over their data, Google continues tracking your location even when you turn off Location History and are told that “the places you go are no longer stored.” Google says it tells users, but its disclosure is the bare minimum and users are discouraged from further interference with data collection. A report from the AP lays out the details, but the information will come as no surprise to anyone who has tried to fully expunge their location data, or who read the “dark patterns” report from June. The problem is quite simple. When you turn off (technically “pause,” a choice of words in itself troubling) “Location History,” a major Google account-level setting, you are told: “With Location History off, the places you go are no longer stored.” Yet many apps and services Google provides when Location History has been turned off, in fact, do record and store your location. To be fair, this is explained, after a fashion, when you turn off location history (here): “This setting does not affect other location services on your device, like Google Location Services and Find My Device. Some location data may be saved as part of your activity on other Google services, like Search and Maps.” Although it makes sense that checking the weather would require location data, it makes less sense that the data would be collected systematically, in direct contradiction with what the user has been told. It’s not exactly a deception on Google’s part, but rather what appears to be a deliberate understatement of the company’s other location tracking practices. Not listed: that a precise location is recorded every time you interact with some apps and services. That “some location data” as part of your search history is precise and organized, good enough to reconstitute a person’s movements over a few days, as indeed the AP reporters did; with Location History off, there was in fact a detailed history of locations stored with Google. Google protests that you can turn off this location data collection as well — it’s just under a separate setting called “Web and App Activity.” Why is it there? Why are there multiple places? Why is the user not told that in order to truly turn off location history, there is a second setting that must be adjusted as well? Why is it assumed that the user will understand that location is also stored under separate headings of search and other services? It hardly need be said that this is completely inadequate as far as informing the user of how their data is being handled. WTF is dark pattern design? Further, it falls squarely under the concept of dark patterns. The user is duped into thinking that their locations are no longer being recorded by Google, down to a warning from the company that some services might not work correctly if Location History is disabled. Meanwhile location is still recorded silently and without notifying the user, for example, that such and such an action will produce a location record that will be saved, and giving them a chance to delete it or recall the action. The deletion of these points, by the way, is one of Google’s other defenses: you can go delete them at any time. But deleting location history points was one of the main points of criticism for Google in the dark patterns paper, which found that hardly any of their testers could figure out how to do it. There are separate controls for different types of location collection, isolated from each other and each unaffected by the other’s deletion or restriction, but it is not explained why, or why for example some can be deleted in bulk but others must be done one by one. This kind of confusing and underhanded, not to say malicious, practice is far from uncommon among tech companies, but this is a particularly indefensible one. Continuing to maintain a history of locations when a user has deliberately indicated their preference to have no such history recorded is simply ridiculous. In a statement to TechCrunch, Google explained: Location History is a Google product that is entirely opt in, and users have the controls to edit, delete, or turn it off at any time. As the story notes, we make sure Location History users know that when they disable the product, we continue to use location to improve the Google experience when they do things like perform a Google search or use Google for driving directions. It’s easy to imagine a handful of minor UI or alert changes that would fully inform users of what is being recorded and when. A notification when a location is generated, for instance, or a link to the separate location tracking setting would be sufficient. But it is telling that not only is the interface the way it is, but the system has been designed the way it is: silently recording location in spite of user preference, with no way to opt out without compromising the service. These are both deliberate choices and the more such choices are exposed and questioned, the better off users will be.

Leave a Reply

Your email address will not be published. Required fields are marked *

Disclaimer: Trading in bitcoins or other digital currencies carries a high level of risk and can result in the total loss of the invested capital. theonlinetech.org does not provide investment advice, but only reflects its own opinion. Please ensure that if you trade or invest in bitcoins or other digital currencies (for example, investing in cloud mining services) you fully understand the risks involved! Please also note that some external links are affiliate links.