Moshiko Raboh*, Roei Herzig*, Gal Chechik, Jonathan Berant, Amir Globerson
This repository contains code used to produce the results in our paper.
We propose an intermediate “graph-like” representation (DSGs) that can be learned in an end-to-end manner from the supervision for a downstream visual reasoning task, which achieves a new state-of-the-art results on Referring Relationships[1] task.
- Differentiable Scene-Graph layer, which captures the information about multiple entities in an image and their relations.
- A new architecture for the task of referring relationships, using a DSG as its central component.
- New state-of-the art results on the task of referring relationships on the Visual Genome, VRD and CLEVR datasets.
The proposed architecture: The input consists of an image and a relationship query triplet <subject, relation, object>.
- A detector produces a set of bounding box proposals.
- An RoiAlign layer extracts object features from the backbone using the boxes. In parallel, every pair of box proposals is used for computing a union box, and pairwise features extracted in the same way as object features.
- These features are used as inputs to a Differentiable Scene-Graph Generator Module which outputs the Differential Scene Graph, a new and improved set of node and edge features.
- The DSG is used for both refining the original box proposals, as well as a Referring Relationships Classifier, which classifies each bounding box proposal as either Subject, Object, Other or Background. The ground-truth label of a proposal box will be Other if this proposal is involved in another query relationship over this image. Otherwise the ground truth label will be Background.
To get started with the framework, install the following dependencies:
conda create -n dsg python=3.6.8
conda activate dsg
pip install tensorflow-gpu==1.11.0
pip install Pillow
pip install opencv-python
pip install scipy
pip install pyyaml
pip install gast==0.2.2
cd lib
make clean
make
cd ..
mkdir -p data/imagenet_weights
cd data/imagenet_weights
wget -v http://download.tensorflow.org/models/resnet_v1_101_2016_08_28.tar.gz
tar -xzvf resnet_v1_101_2016_08_28.tar.gz
mv resnet_v1_101.ckpt res101.ckpt
cd ../..
cd data
wget https://cs.stanford.edu/people/ranjaykrishna/referringrelationships/visualgenome.zip
unzip visualgenome.zip
rm visualgenome.zip
cd VisualGenome
mkdir JPEGImages
cd JPEGImages
wget https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip
unzip images.zip
rm images.zip
wget https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip
unzip images2.zip
rm images2.zip
cd ../../
Train a model:
./experiments/scripts/train.sh <gpu-id> visualgenome res101 <experiment name>
Test a model:
./experiments/scripts/test.sh <gpu-id> visualgenome res101 <experiment name>_iter_0
Test a pre trained model:
./experiments/scripts/test.sh <gpu-id> visualgenome res101 dsg_pretrained
This repository implemented on top of "https://github.com/endernewton/tf-faster-rcnn".
[1] Ranjay Krishna, Ines Chami, Michael Bernstein, Li Fei-Fei, Referring Relationships, CVPR 2018.
Please cite our paper if you use this code in your own work:
@InProceedings{raboh2020dsg,
title = {Differentiable Scene Graphs},
author = {Moshiko Raboh and
Roei Herzig and
Gal Chechik and
Jonathan Berant and
Amir Globerson},
booktitle = {Winter Conference on Applications of Computer Vision},
year = {2020}
}