Decoding the brain using fMRI and YouTube
The scholarly paper is Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies by Nishimoto, Vu, Naselaris, Benjamini, Yu, and Gallant.
Here’s how it works: fMRI (functional magnetic resonance imaging) scans light up pixels in three dimensions, 2 mm cubes called voxels. You’ve seen the images, color maps of the brain. The colors represent the volume of blood flow in each voxel. Since an fMRI scan takes about a second to record, the voxel colors represent the time-average blood flow during a given second.
Three different subjects (each of whom were also authors of the paper) watched YouTube videos from within an fMRI scanner. Brain scans were taken as rapidly as possible as they watched a large number of 12 minute videos. Each video was watched one time. The resulting scans were used to “train” models. The models consisted of fits to the 3D scans and unique models were developed for each person.
By fitting a subject’s model to the time-ordered series of scans and then optimizing the model over a large sample of known videos, the model translates between measured blood flow and features in the video like shapes, edges, and motion.
The derived models were then used to predict brain scans. That is, the models were applied to 5000 hours of random YouTube video. The result was a huge set of predictions for how a subject would respond to a given video had they seen it. These predicted responses served as Bayesian prior probability distributions. (If this last sentence doesn’t ring a bell, ignore it, but if you are familiar with Bayesian statistics, stroke your chin and say, “uh huh” or “hmmm,” according to your feelings about Bayes’ theorem.)
The subjects then watched random YouTube movies that they’d never seen before and that were not included in the 5000 hours used to build the big set of predicted brain scans, again within an fMRI scanner. These test movies lasted 9 minutes and were repeated ten times. The 9 scans were averaged and then compared to the set of 5000 hours of predicted brain scans. Likelihoods were calculated for each comparison. The 100 most likely predictions were assembled with their likelihoods. Each of the videos corresponding to those 100 brain scan predictions were added together with intensities weighted by their likelihoods. The resulting videos, decoded from the brain scans of observers, are remarkably similar to those actually watched. The ghost-like, multiply exposed nature of the reconstructed videos comes from overlapping the 100 contributing videos.
Now, I glossed over the subtlety that posed the greatest challenge to the team. The problem with fMRI scans is that they have poor time resolution. You can’t make an fMRI movie. Rather, the images are taken over at least a second. The team’s biggest innovation was how they combined these essentially static images into brain scan animations that could be compared to video.
They used three time-encoding models, a static model that acted as a sort of control sample, a model that included local motion of blood oxygen levels but without specific direction, and a directional model that correlated motion with direction. Though it’s a gross oversimplification, you can think of them as different ways of interpolating between images taken at one-second intervals. The interpolations then animate the brain scans for comparison to video. Without the dynamic models to thread between fMRI scans, the videos couldn’t be decoded.
The scientists were surprised that the technique worked, but not as surprised as the population of people who’ve seen the results on YouTube. Possible applications of the technology, when eventually mature, range from the ability to record your dreams, the ultimate psychology tool, or another avenue for the NSA to dig into your privacy, this time, your ultimate privacy.
Personally, my biggest worry is that my dreams might not merit an X rating.
How do you feel about this technology? What do you think the most positive and negative applications are likely to be?