Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjacency matrix for Centroid and PestControl task #6

Open
KamilDre opened this issue Mar 7, 2022 · 4 comments
Open

Adjacency matrix for Centroid and PestControl task #6

KamilDre opened this issue Mar 7, 2022 · 4 comments

Comments

@KamilDre
Copy link

KamilDre commented Mar 7, 2022

Hi,

I have noticed that in multiple_categorical.py you define the adjacency matrices for both the Centroid and PestControl task as if the variables were ordinal and not categorical. Is there any particular reason for this? Also, I have noticed that if I change the adjacency matrix in lines 39 and 137 to adjmat = torch.ones((n_v, n_v)).fill_diagonal_(0) then the output of the diffusion kernel is full of ones irrespective of the input. Would you mind shedding some light on this?

@ChangYong-Oh
Copy link
Collaborator

ChangYong-Oh commented Mar 8, 2022 via email

@KamilDre
Copy link
Author

KamilDre commented Mar 8, 2022

I have put together this small example that better illustrates what I was referring to in the previous message. In the code below, n_v is the number of categories for the specific categorical variable. The factor_gram is essentially the Gram matrix on all verticies.

import torch

if __name__ == '__main__':
    n_v = 10  # number of categories
    dtype = torch.float64

    # Eigen decomposition of the graph laplacian
    adjmat = torch.ones((n_v, n_v), dtype=dtype).fill_diagonal_(0)
    laplacian = torch.diag(torch.sum(adjmat, dim=0)) - adjmat
    fourier_freq, fourier_basis = torch.linalg.eigh(laplacian, UPLO='U')

    # Calculate the factor_gram
    freq_transform = torch.exp(- fourier_freq)
    factor_gram = torch.matmul(fourier_basis * freq_transform.unsqueeze(0), fourier_basis.t())

Now, irrespective of the dtype, the maximum difference between any two elements in the factor gram decreases very sharply to 0 as n_v increases. The graph below illustrates this (note the log scale on the y-axis).

test

Consequently, as you can see, the factor graph is approximately full of ones irrespective of the inputs for categorical variables with a handful of categories. Do you have any advice on how to overcome this and apply COMBO to problems with categorical variables with more than a handful of categories? From my investigation I noticed that the primary cause of this is that as n_v increases the difference between the smallest eigenvalue of the laplacian and the next smalless eigenvalue increases. Hence, when you do freq_transform = torch.exp(- fourier_freq), freq_transform essentially has a 1 in the first entry and approximately 0 in the remaining entries. And I found that scaling the adjacency matrix does not necessarily help as it simply scales all of the eigenvalues.

@ChangYong-Oh
Copy link
Collaborator

ChangYong-Oh commented Mar 8, 2022 via email

@KamilDre
Copy link
Author

KamilDre commented Mar 8, 2022

Thanks a lot for your detailed answer. Indeed with your modifications the code is more numerically stable and the problem disappears.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants