Skip to content

sveron/KPAR

 
 

Repository files navigation

KPAR

We implemented the KPAR model on KKBox song dataset and MovieLens-IMDb (MI) dataset, compared it to a KPRN model (https://arxiv.org/abs/1811.04540).

Python version is 3.6.9, and environment requirements can be installed using KPAN_requirements.yml

Usage Information

To train and evaluate the KPAR model, you have multiple choices for sample the data:

  • all data (subnetwork = full)
  • random sampling (subnetwork = rs) - rs contains a random 10% sample of entities
  • "smart" sampling (subnetwork = dense) - contains the top 10% entities with highest degree
  • create a subnetwork yourself.

The first step is download the data:

  1. KKBox - download the songs.csv and train.csv from https://www.kaggle.com/c/kkbox-music-recommendation-challenge/data.
  2. MI - download the ratings.dat and movies.dat from https://www.kaggle.com/datasets/odedgolden/movielens-1m-dataset, and from IMDb datasets (https://www.imdb.com/) download movies.csv, names.csv and title.csv. Then, merge ML and IMDb by movie ID and year and named the file: ml_imdb.csv.

For simplicity, we will demonstrate the instructions on the KKBox domain. For MI domain you need to replace the word 'song' in the word 'movie'. Create a folder called song_dataset in {root}/data and place songs.csv and train.csv in song_dataset.

Then construct the knowledge graph with data_preparation.py (and data_preparation_ml.py for MI), and path-find, train, and evaluate using recommender.py.

Knowledge Graph Construction

Run data_preparation.py/data_preparation_ml.py to create relevant dictionaries from the datasets.

Arguments:

--songs_file/--movies_file to specify path to CSV containing song/movie information

--interactions_file to specify path to CSV containing user-item interactions

--subnetwork to specify data to create knowledge graph from. For our evaluation we use 'full'.

recommender.py arguments

--train to train model, --validation to add validation. --eval to evaluate

--find_paths if you want to find paths before training or evaluating

--subnetwork to specify subnetwork training and evaluating on.

--model designates the model to train or evaluate from

--model_name designates the specific model to train or evaluate: KPAR or KPRN

--path_agg_methos designates the way of path aggregation: attention (for cross attention) or weighted pooling

--load_checkpoint if you want to load a model checkpoint (weights and parameters) before training

--kg_path_file designates the file to save/load train/test paths from

--user_limit designates the max number of train/test users to find paths for

-b designates model batch size and -e number of epochs to train model for

--not_in_memory if training on entire dense subnetwork, whose paths cannot fit in memory all at once

--lr, --l2_reg specify model hyperparameters (learning rate, l2 regularization)

--nhead,--dropout specify hyperparameters for transformer layer

--path_nhead specify number of heads in path aggregation

--entity_agg designates the method for aggregate paths

--random_samples designates if the paths sampling will be random

--item-to-item True for inference task of item-to-item similarity

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%