System automatically converts 2-D video to 3-D.

Scientists at MIT and the Qatar Computing Research Institute (QCRI) have leveraged videogame technology to generate broadcast quality 3D video of soccer and football matches from a 2D source in real time. The resulting video can reportedly be enjoyed with any 3D TV or virtual reality headset, and could lead to much more 3D content becoming available in the near future. When working to full effect, 3D technology can deliver highly immersive user experiences, but the content on these platforms is limited by the large amount of work needed to produce it. Whether in videogames or in movies, content producers must take specific measures ranging from extra programming time to specialized equipment to create content suitable for VR headsets or 3D TVs.

Past attempts at developing general-purpose systems for making the conversion from 2D to 3D automatic don’t always perform as hoped, often producing odd visual artifacts that harm visual quality and break the sense of immersion. The avenue now being explored by MIT and QCRI researchers is to focus on a relatively narrow domain (soccer games) and leverage data from videogames to help produce 3D content on the fly. Today’s sports videogames keep a detailed three-dimensional map of the pitch and players throughout the match, taking tens of 2D snapshots every second in order to display the data on the player’s screen. Reverse-engineering this process, the scientists reasoned, would be a good way to build a 3D map from a flat image.

Using content from EA Sport’s popular FIFA 13 videogame, researchers Kiana Calagari and colleagues built a comprehensive database of tens of thousands of videogame screenshots along with their corresponding 3D maps, reflecting the most common camera angles and game situations seen in a TV broadcast of a soccer match. The researchers then created a system then takes a screenshot from a television broadcast, subdivides it into smaller sections, matches each section with the database, and finally stitches all the pieces back together to produce a broadcast quality 3D picture. According to the researchers, the converted video can be played on any 3D-enabled device and features none of the artifacts from less optimized techniques: in a user study, the majority of subjects reportedly rated the quality of the resulting image at five out of five.

The system currently produces 3D video with a delay of approximately three tenths of a second from the original broadcast, but the researchers are currently working on reducing the lag time even further. Though no such plans have been announced, it seems plausible that such a technique could be adapted to other sports. If the technology is commercially successful, the amount of 3D content that would suddenly become available for public consumption could be a driving factor in making 3D TV and VR technology more mainstream.

Currently, the researchers say, the system takes about a third of a second to process a frame of video. But successive frames could all be processed in parallel, so that the third-of-a-second delay needs to be incurred only once. A broadcast delay of a second or two would probably provide an adequate buffer to permit conversion on the fly. Even so, the researchers are working to bring the conversion time down still further. “This is a clever use of game content, which leads to better results and easier acquisition of large and diverse reference data,” says Hanspeter Pfister, a professor of computer science at Harvard University. “One of the main insights of the paper is that domain-specific methods are able to yield bigger improvements than more general approaches. This is an important lesson that will have ramifications for other domains.”

For more information please visit: www.mit.edu