This is an in class Kaggle Competition in Kernel Method for Machine Learning at AMMI
Transcription factors (TFs) are regulatory proteins that bind specific sequence motifs in the genome to activate or repress transcription of target genes. Genome-wide protein-DNA binding maps can be profiled using some experimental techniques and thus all genomics can be classified into two classes for a TF of interest: bound or unbound.
The main task of this project is to classify gene sequence: thus predicting whether a DNA sequence region is binding site to a specific transcription factor.
The data is of two form: the principal files and the optional files.
The principal files contain data that has 2000 training points and 1000 test sequence.
-
Ridge Regression
-
Kernel Ridge Regression
-
Naive Bayes Model
-
Logistic Regression
-
Kernel Logistic Regression
-
Weighted Kernel Logistic Regression
-
Kernel Support Vector Machine
-
Linear Kernel
-
Quadratic Kernel
-
Polynomial Kernel
-
Exponential Kernel
-
Radial Basis Kernel (RBF)
-
Laplacian Kernel
-
On the Private score, the three best accuracies are: 0.684, 0.662 and 0.648 which were obtained by kernel logistic regression (polynomial kernel), Kernel ridge Regression(Polynomial kernel) and SVM with RBF kernel respectively.
-
This indicates that, these Kernels work well on the data set.
-
In addition, simple models performed better than Support Vector Machine (SVM) in general.