72 photos are taken around a circle of radius 20cm and stitched into an omnidirectional stereo panorama of an arbitrary baseline.
A stereo photosphere (Google cardboard camera format) can be found here: goo.gl/photos/DYXvro9wDLT4jNk29. Download to your phone and put it in DCIM/CardboardCamera
72 photos are taken around a circle of radius 20cm and stitched into an omnidirectional stereo panorama of an arbitrary baseline.
A stereo photosphere (Google cardboard camera format) can be found here: goo.gl/photos/DYXvro9wDLT4jNk29. Download to your phone and put it in DCIM/CardboardCamera
3D Parallax Photosphere: youtube.com/watch?v=1oWBsR8zTP0[CVPR 2021] NeX: Real-time View Synthesis with Neural Basis ExpansionSupasorn Suwajanakorn2021-03-10 | This is a supplementary video for NeX: Real-time View Synthesis with Neural Basis Expansion by Suttisak Wizadwongsa*, Pakkapon Phongthawee*, Jiraphon Yenphraphai*, Supasorn Suwajanakorn. (* first co-authors)
Abstract We present NeX, a new approach to novel view synthesis based on enhancements of multiplane image (MPI) that can reproduce NeXt-level view-dependent effects---in real time. Unlike traditional MPI that uses a set of simple RGBα planes, our technique models view-dependent effects by instead parameterizing each pixel as a linear combination of basis functions learned from a neural network. Moreover, we propose a hybrid implicit-explicit modeling strategy that improves upon fine detail and produces state-of-the-art results. Our method is evaluated on benchmark forward-facing datasets as well as our newly-introduced dataset designed to test the limit of view-dependent modeling with significantly more challenging effects such as rainbow reflections on a CD. Our method achieves the best overall scores across all major metrics on these datasets with more than 1000× faster rendering time than the state of the art.this. is. real-time.Supasorn Suwajanakorn2020-12-08 | A novel 3D scene representation capable of reproducing *really hard* view-dependent effects in real-time. 60FPS. A 1000x speed up from SOTA.
Input is a set of photos captured from a phone. Output is a realistic, interactive 3D rendering for the scene.
Work done by Suttisak Wizadwongsa, Pakkapon Phongthawee, Dome Yenphraphai, and Supasorn Suwajanakorn.
Royalty free music from BensoundDiscovery of Latent 3D Keypoints via End-to-end Geometric Reasoning (NIPS 2018)Supasorn Suwajanakorn2018-11-23 | We present KeypointNet, an end-to-end geometric reasoning framework to learn an optimal set of category-specific 3D keypoints, along with their detectors. Given a single image, KeypointNet extracts 3D keypoints that are optimized for a downstream task. We demonstrate this framework on 3D pose estimation by proposing a differentiable objective that seeks the optimal set of keypoints for recovering the relative pose between two views of an object. Our model discovers geometrically and semantically consistent keypoints across viewing angles and instances of an object category. Importantly, we find that our end-to-end framework using no ground-truth keypoint annotations outperforms a fully supervised baseline using the same neural network architecture on the task of pose estimation. The discovered 3D keypoints on the car, chair, and plane categories of ShapeNet are visualized on keypointnet.github.io/.
This paper is presented at NIPS 2018 (Oral).Teaser Synthesizing Obama: Learning Lip Sync from AudioSupasorn Suwajanakorn2017-07-12 | Synthesizing Obama: Learning Lip Sync from Audio Supasorn Suwajanakorn, Steven M. Seitz, Ira Kemelmacher-Shlizerman SIGGRAPH 2017
Given audio of President Barack Obama, we synthesize a high quality video of him speaking with accurate lip sync, composited into a target video clip. Trained on many hours of his weekly address footage, a recurrent neural network learns the mapping from raw audio features to mouth shapes. Given the mouth shape at each time instant, we synthesize high quality mouth texture, and composite it with proper 3D pose matching to change what he appears to be saying in a target video to match the input audio track
http://grail.cs.washington.edu/projects/AudioToObama/Synthesizing Obama: Learning Lip Sync from AudioSupasorn Suwajanakorn2017-07-12 | Synthesizing Obama: Learning Lip Sync from Audio Supasorn Suwajanakorn, Steven M. Seitz, Ira Kemelmacher-Shlizerman SIGGRAPH 2017
Given audio of President Barack Obama, we synthesize a high quality video of him speaking with accurate lip sync, composited into a target video clip. Trained on many hours of his weekly address footage, a recurrent neural network learns the mapping from raw audio features to mouth shapes. Given the mouth shape at each time instant, we synthesize high quality mouth texture, and composite it with proper 3D pose matching to change what he appears to be saying in a target video to match the input audio track
http://grail.cs.washington.edu/projects/AudioToObama/Depth from Focus with Your Mobile PhoneSupasorn Suwajanakorn2017-04-10 | Video Supplement for Depth from Focus with Your Mobile Phone
http://homes.cs.washington.edu/~supasorn/Suwajanakorn_Depth_From_Focus_2015_CVPR_paper.pdfAutomatic 360 Panorama Head for DSLR.Supasorn Suwajanakorn2017-03-06 | An automatic rig for capturing 2D / 3D panorama and 360 photosphere with motorized pan tilt motors. The rig is controlled with Wemos D1 Mini (ESP8266). It has a joystick and OLED screen for displaying the menu. The structure is aluminum and 3D printed parts, designed in Fusion 360. Software and hardware are open-source. http://www.thingiverse.com/thing:2084208
3D Anaglyph Photosphere : goo.gl/photos/1s5AUaHT2Pbz3pza7 3D Parallax Photosphere: youtube.com/watch?v=1oWBsR8zTP0 3D Stereoscopic Photosphere for VR (Google cardboard camera format): goo.gl/photos/61VhSRtq2Ka4yDVn7 (Download and put it in DCIM/CardboardCamera of an Android phone)Google Cardboard VR Positional Tracking (outside-in, tethered)Supasorn Suwajanakorn2016-01-20 | I'm trying to add positional tracking to Cardboard App (Full 6DOF). Proper calibration is needed. Tracking is done on computer and position values are sent through USB to the phone (TCP via ADB).
Latency is not superb, but should be better than using phone's camera and phone's processing power.
This is a result from our paper "What Makes Tom Hanks Look Like Tom Hanks" submitted to International Conference on Computer Vision 2015.
Webpage: http://grail.cs.washington.edu/projects/3DPersona/What Makes Tom Hanks Look Like Tom HanksSupasorn Suwajanakorn2015-10-22 | We reconstruct a controllable model of a person from a large photo collection that captures his or her persona, i.e., physical appearance and behavior. The ability to operate on unstructured photo collections enables modeling a huge number of people, including celebrities and other well photographed people without requiring them to be scanned. Moreover, we show the ability to drive or puppeteer the captured person B using any other video of a different person A. In this scenario, B acts out the role of person A, but retains his/her own personality and character. Our system is based on a novel combination of 3D face reconstruction, tracking, alignment, and multi-texture modeling, applied to the puppeteering problem. We demonstrate convincing results on a large variety of celebrities derived from Internet imagery and video.Total Moving Face Reconstruction ECCV 2014Supasorn Suwajanakorn2014-09-04 | We present an approach that takes a single video of a person's face and reconstructs a high detail 3D shape for each video frame. We target videos taken under uncontrolled and uncalibrated imaging conditions, such as youtube videos of celebrities. ECCV 2014 http://grail.cs.washington.edu/projects/totalmoving/Total Moving Face ReconstructionSupasorn Suwajanakorn2014-08-25 | Please see our updated video: youtube.com/watch?v=C1iLVAUiC7s
We present an approach that takes a single video of a person's face and reconstructs a high detail 3D shape for each video frame. We target videos taken under uncontrolled and uncalibrated imaging conditions, such as youtube videos of celebrities. http://grail.cs.washington.edu/projects/totalmoving/Illumination-Aware Age ProgressionSupasorn Suwajanakorn2014-06-07 | We present an approach that takes a single photograph of a child as input and automatically produces a series of age-progressed outputs between 1 and 80 years of age, accounting for pose, expression, and illumination. Leveraging thousands of photos of children and adults at many ages from the Internet, we first show how to compute average image subspaces that are pixel-to-pixel aligned and model variable lighting. These averages depict a prototype man and woman aging from 0 to 80, under any desired illumination, and capture the differences in shape and texture between ages. Applying these differences to a new photo yields an age progressed result. Contributions include relightable age subspaces, a novel technique for subspace-to-subspace alignment, and the most extensive evaluation of age progression techniques in the literature. Learn more at http://grail.cs.washington.edu/aging/Illumination-Aware Age ProgressionSupasorn Suwajanakorn2014-04-10 | We present an approach that takes a single photograph of a child as input and automatically produces a series of age-progressed outputs between 1 and 80 years of age, accounting for pose, expression, and illumination. Leveraging thousands of photos of children and adults at many ages from the Internet, we first show how to compute average image subspaces that are pixel-to-pixel aligned and model variable lighting. These averages depict a prototype man and woman aging from 0 to 80, under any desired illumination, and capture the differences in shape and texture between ages. Applying these differences to a new photo yields an age progressed result. Contributions include relightable age subspaces, a novel technique for subspace-to-subspace alignment, and the most extensive evaluation of age progression techniques in the literature. Learn more at http://grail.cs.washington.edu/aging/