GitHub

About VULOC

VULOC is a vulnerability scanning tool for C/C++ source code. It uses a customized deep learning architecture, combined with high-level abstract features of vulnerability source code and low-level fine-grained features of assembly code, to detect vulnerable functions and accurately locate vulnerable lines. Specifically, VULOC first compiles C/C++ programs to obtain assembly code with addresses. Then, it uses Addr2line to generate a mapping between the assembly code and source code line numbers, slices the assembly code into code blocks, and encodes them into a neural network model. Finally, it builds a BLSTM-LOC model to learn vulnerability features and predict vulnerability locations. To the best of our knowledge, this is the first approach to leverage the mapping between assembly code and source code line numbers for vulnerability detection (note: programs that cannot be compiled are excluded from detection).

Method	Precision	F1	Re	VL
RATS	68.9%	69.9%	71.0%	7.5
Flawfinder	75.1%	76.7%	78.3%	5.2
VUDDY	70.1%	72.8%	75.7%	10.4
BovdGFE	93.1%	88.8%	90.0%	21.5
SySeVR	90.8%	92.6%	94.5%	17.1
HAN-BSVD	97.9%	97.1%	96.4%	8.5
VULOC	97.4%	97.7%	98.0%	1.0

File descriptions

./BinaryFile/: Object files path
./Yunjing/: Web implementation of VULOC (based on Django)
./assemble/: Assembly code files path
./dataset/: The train/test dataset path
./model/: Trained models path
./testcases/: The original data samples
./testcasesupport/: The header file that the original data sample compiles depends on
manifest.xml: Records the vulnerability type and sink line number corresponding to each program in the dataset

How to replicate

Requirements

python = 3.8
torch = 2.6.0
nltk = 3.9.1
matplotlib = 3.5.1
scikit-learn = 1.0.2
gcc = 9.4.0
gdb = 8.1.1 # no python packages
addr2line = 2.3.0

If you simply want to replicate the experimental results, you can skip the preprocessing steps (including compilation, disassembly, slicing, and tokenization) and directly execute the following commands for model training and vulnerability detection.

sudo apt install unzip
cd dataset
unzip dataset.zip
cd ..
python train.py

If you want to gain a deeper understanding of VULOC or build your own dataset, you can follow the detailed feature engineering and preprocessing steps from scratch by executing the commands step by step as shown below.

Step 1: Compile And Disassemble

1.Download the initial dataset.

cd testcases
wget https://samate.nist.gov/SARD/downloads/test-suites/2017-10-01-juliet-test-suite-for-c-cplusplus-v1-3.zip
unzip 2017-10-01-juliet-test-suite-for-c-cplusplus-v1-3.zip
mv C/testcases/* ./
rm -rf ./C
cd ../testcasesupport
unzip testcasesupport.zip
cd ..

2.Compile the code using GCC.

python compile.py

After compilation is complete, the binary files corresponding to each sample will be available in the BinaryFile directory.

3.Diassemble the object files.

gdb -q -x disassemble_addr2line.py

After disassembly is completed, the disassembly files bad_function_assembly.txt and good_function_assembly.txt will be saved to the assemble directory.

Step 2: Slice and Label

1.Slice and label the generated assembly code, then split the samples into train.txt and test.txt files in an 8:2 ratio.

python label_dataset.py

2.Remove samples with assembly instruction lines fewer than 10 or more than 120, and regenerate the train.txt and test.txt files, saving them to the dataset directory.

python del_long_short.py

Step 3: Train and Detect

After completing the steps above, you can proceed to train the model. Here, we use the following default hyperparameters: dropout=0.2, batch_size=8, epochs=100, learning_rate=0.001, hidden_dim=512, and n_layers=2.

python train.py

Web Deployment(Yunjing)

If you prefer not to use VULOC in the command-line interface, you can follow the steps below to deploy the web version and perform vulnerability detection through an interactive web interface.

Step 1: Environment settings

Install the django package, and copy the CodeMirror project to the Yunjing/Dima/static/CodeMirror path.

pip install django
cd Yunjing/Dima/static
mkdir CodeMirror

Step 2: Startup project

cd Yunjing
python manage.py runserver

The default IP is 127.0.0.1 and the default port number is 8000 (can be modified to specify IP and port). The website page after the project is launched is VULOC-Web(or Yunjing)。

Step 3: Usage

After registering an account, log in and follow the instructions on the page to perform vulnerability detection.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About VULOC

File descriptions

How to replicate

Requirements

Step 1: Compile And Disassemble

Step 2: Slice and Label

Step 3: Train and Detect

Web Deployment(Yunjing)

Step 1: Environment settings

Step 2: Startup project

Step 3: Usage

About

Releases 2

Packages

Contributors 5

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
BinaryFile		BinaryFile
Yunjing		Yunjing
assemble		assemble
dataset		dataset
figures		figures
model		model
testcasesupport		testcasesupport
Length_distribution.py		Length_distribution.py
compile.py		compile.py
del_long_short.py		del_long_short.py
disassemble_addr2line.py		disassemble_addr2line.py
label_dataset.py		label_dataset.py
load_xml.py		load_xml.py
loading_data.py		loading_data.py
manifest.xml		manifest.xml
readme.md		readme.md
train.py		train.py
units.py		units.py

DataAvailable/VULOC

Folders and files

Latest commit

History

Repository files navigation

About VULOC

File descriptions

How to replicate

Requirements

Step 1: Compile And Disassemble

Step 2: Slice and Label

Step 3: Train and Detect

Web Deployment(Yunjing)

Step 1: Environment settings

Step 2: Startup project

Step 3: Usage

About

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 5

Languages

Packages