sentence-similarity

I plan to implement some models for sentence similarity found in the literature to reproduce and study them. They have a wide variety of application, including:

Paraphrase Detection: Give two sentences, are the sentences paraphrases of each other?
Semantic Texual Similarity: Given two sentences, how close are they in terms of semantic equivalence?
Natural Language Inference / Textual Entailment: Can one sentence be inferred from another sentence (the premise)?
Answer Selection: Given question-answer pairs, rank candidate answers based on relevance to question.

Setup

Install packages in requirements.txt.

Theignite library, currently in alpha, needs to be installed from source. See https://github.com/pytorch/ignite.

Download SpaCy English model:

python -m spacy download en

Compile trec_eval for computing MAP/MRR metrics for WikiQA dataset:

cd metrics
./get_trec_eval.sh

Running

Baseline

SICK

# Unsupervised
$ python main.py --model sif --dataset sick --unsupervised
Test Results - Epoch: 0 pearson: 0.7199 spearman: 0.5956
# Supervised
$ python main.py --model sif --dataset sick
Test Results - Epoch: 15 pearson: 0.7763 spearman: 0.6637
$ python main.py --model mpcnn --dataset sick
$ python main.py --model bimpm --dataset sick

WikiQA

$ python main.py --model sif --dataset wikiqa --epochs 15 --lr 0.001
Test Results - Epoch: 15 map: 0.6295 mrr: 0.6404
$ python main.py --model mpcnn --dataset wikiqa
$ python main.py --model bimpm --dataset wikiqa

Attribution

The English Wikipedia token frequency dataset for estimating p(w) in the baseline model is obtained from the official SIF implementation: https://github.com/PrincetonML/SIF.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
data		data
datasets		datasets
metrics		metrics
models		models
runners		runners
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sentence-similarity

Setup

Running

Baseline

Attribution

About

Releases

Packages

Languages

License

tuzhucheng/sentence-similarity

Folders and files

Latest commit

History

Repository files navigation

sentence-similarity

Setup

Running

Baseline

Attribution

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages