This mini-repo aims to precompute and normalize large graphs.
Large graphs can be challenging also to precompute. Several graph collections are available online, one on the top of the other is the Suitsparse Matrix Collection (link); however, the raw input files usually has no guarantees about the indices contiguity (some indices can never appear in the whole non-zero values). While these 'gohst' indices could have sense in a Sparse matrix context, on graphs they are compleatly meaningless; they only shift the indices of the following vertices without changing the graph topology. Moreover, especially in a GPU scenario where data contiguity is foundamental, these 'gost' vertices can create issues in performance.
This repo implements functions to normalize graphs by deleting gost vertices and shift all the following. Morover, since often requested, it also allow parallel edges and self loop deletions and vertex degree computation.
You can compile the program by using make.
make
Since the makefile is system dependent, some variable like must CUDA_HOME
MPI_HOME
be adapted to your envirorment.
The program accepts several command-line arguments to configure its behavior:
mpirun -np <number_of_processes> ./graph_processor -f <input_file> [options]
-o <output_file>
: Specify the output file prefix.-O <output_path>
: Specify the output path.-m <metadata_path>
: Specify the metadata path.-f <input_file>
: Specify the input file.-I <max_iterations>
: Specify the maximum number of iterations.-M <max_memory>
: Specify the maximum memory per task in MB.
mpirun -np 4 ./graph_processor -f graph.txt -o output -O results/ -m metadata/ -I 10 -M 1024
The program generates three output file: *_degree.out
, *_globalmap.out
, and *_mpi.mtx
.
- The main output is inside
*_mpi.mtx
, which contain the new mtx file where relabling was performed and no gost vertices are included. *_globalmap.out
containg the global map used to generate*_mpi.mtx
; each line include two numbers representing the new and the old vertex id.- If the
COMPUTE_DEGREE
marco is defined, the*_degree.out
file show the degree of each vertex.