SIM: Surface-based fMRI Analysis for Inter-Subject Multimodal Decoding from Movie-Watching Experiments - @ICLR 2025

This repo contains the codebase of the SIM framework:

SIM is a,

surface-based
self-supervised
tri-modal (audio,video, fMRI)

learning framework that generalised fMRI decoding during movie-watching experiments to new subjects and new movie scenes.

ICLR Poster | Arxiv Paper | Open Review Submision

V 0.2 - 23.04.25

Initial commits

Adding audio/video preprocessing scripts

V 0.1 - 07.03.25

Initial commits

Adding basis of the SIM codebase for tri-modal alignment

Installation & Set-up

1. Conda installation

For cuda usage and python dependencies installation please follow instructions in install.md.

2. Docker installation

Coming Soon

Data Access

1. 7T HCP movie-watching fMRI data

This work uses the HCP 7T movie-watching experiment from [1]. fMRI raw files can be directly downloaded from the HCP platform https://db.humanconnectome.org/, following registration.

We used the Movie Task fMRI 1.6mm/59k Functional Preprocessed data files (~700GB) available for download on https://db.humanconnectome.org/ (see image).

[1] David C. Van Essen, Stephen M. Smith, Deanna M. Barch, Timothy E.J. Behrens, Essa Yacoub, and Kamil Ugurbil. The WU-Minn Human Connectome Project: An overview. NeuroImage, 80:62–79, 10 2013. ISSN 10538119.

2. 7T HCP movie-watching movies

Movie files are avaialble for download on the HCP platform https://db.humanconnectome.org/ under 7T Movie Stimulus Files (11GB) (see image). There are 4 movies files in mp4 format: 7T_MOVIE1_CC1.mp4, 7T_MOVIE2_HO1.mp4, 7T_MOVIE3_CC2.mp4, 7T_MOVIE4_HO4.mp4.

There are 4 fMRI recording sessions that we will call for simplicity MOVIE1-4,respectively corresponding to 7T_MOVIE1_CC1.mp4, 7T_MOVIE2_HO1.mp4, 7T_MOVIE3_CC2.mp4 and 7T_MOVIE4_HO4.mp4.

Preprocessing

1. fMRI data

Once downloading the HCP data, the fMRI raw files are located in ./{subjectID}/MNINonLinear/Results/{fMRI_session_ID}/tfMRI_MOVIE1_7T_AP_Atlas_1.6mm_MSMAll_hp2000_clean.dtseries.nii

The following steps are performed:

Cifti files are separated into left and right hemispheres.
The resulting gifti files are then demeaned.
Then resampled from native resolution (59292 vertices) to $I6$ resolution (40962 vertices). This resampling is necessary to integrate with the SiT framework, which utilises regular icosahedral grids (e.g. $I3$) to patch the input surface data (at $I6$).
Right hemispheres are symmetrised to appear like left hemispheres.
fMRI frames are extracted at TR=1 second and saved into gifti files.

Coming Soon: fMRI processing script

2. Movie files extraction

All fMRI sessions (MOVIE1-4) were divided into overlapping 3-seconds .mp4v movie-clips using the OpenCV library (shifted by 1sec). Preprocessing script is available in ./processing/step_1_movie_clip_extraction.ipynb.

Audio files are then extracted to .wav format at 16kHz from all movie clips using the torchaudio library. Preprocessing script is available in https://github.com/open-mmlab/mmaction2

3. Video/Audio embedding extraction files extraction

Video frames

We used the pre-trained VideoMAE model available on the MMAction2 to extract video-frames embedding representation from the 3s movie-clips.

A preprocessing script is available in ./processing/step_2_extract_embeddings_videomae.py.

Audio

We used a pre-trained Wav2Vec2.0 from the torchaudio library torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H to extract audio features from 3s video clips.

A preprocessing script is available in ./processing/step_2_extract_embeddings_wave2vec.py.

Training

The training commands are run using torchrun and using config files located under /config/.

fMRI & Video & Audio

For training the SIM pipeline with all three modalities, please run:

cd tools/
torchrun --nproc_per_node=1 --nnodes=1  train_fmri_clip_ddp.py ../config/CLIP-fmri-video-audio/hparams.yml

fMRI & Video

For training the SIM pipeline with fMRI and video modalities, please run:

cd tools/
torchrun --nproc_per_node=1 --nnodes=1  train_fmri_clip_ddp.py ../config/CLIP-video/hparams.yml

fMRI & Video & Audio

For training the SIM pipeline with fMRI and audio modalities, please run:

cd tools/
torchrun --nproc_per_node=1 --nnodes=1  train_fmri_clip_ddp.py ../config/CLIP-audio/hparams.yml

Retrival Testing

Retrieval testing

[TODO] add the retrieval scripts

Video-frame Reconstruction

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
config		config
data		data
labels/split1		labels/split1
models		models
patch_extraction		patch_extraction
processing		processing
tools		tools
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SIM: Surface-based fMRI Analysis for Inter-Subject Multimodal Decoding from Movie-Watching Experiments - @ICLR 2025

Installation & Set-up

1. Conda installation

2. Docker installation

Data Access

1. 7T HCP movie-watching fMRI data

2. 7T HCP movie-watching movies

Preprocessing

1. fMRI data

2. Movie files extraction

3. Video/Audio embedding extraction files extraction

Training

fMRI & Video & Audio

fMRI & Video

fMRI & Video & Audio

Retrival Testing

Video-frame Reconstruction

About

Releases

Packages

Languages

metrics-lab/sim

Folders and files

Latest commit

History

Repository files navigation

SIM: Surface-based fMRI Analysis for Inter-Subject Multimodal Decoding from Movie-Watching Experiments - @ICLR 2025

Installation & Set-up

1. Conda installation

2. Docker installation

Data Access

1. 7T HCP movie-watching fMRI data

2. 7T HCP movie-watching movies

Preprocessing

1. fMRI data

2. Movie files extraction

3. Video/Audio embedding extraction files extraction

Training

fMRI & Video & Audio

fMRI & Video

fMRI & Video & Audio

Retrival Testing

Video-frame Reconstruction

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages