Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ds analysis with multiple conditions #60

Open
feanaros opened this issue Feb 1, 2025 · 1 comment
Open

ds analysis with multiple conditions #60

feanaros opened this issue Feb 1, 2025 · 1 comment

Comments

@feanaros
Copy link

feanaros commented Feb 1, 2025

I have a dataset with multiple conditions and I would like to perform DS analysis but I am not sure I the data are prepared properly.

ei
   sample_id condition patient_id n_cells
1   HealthyA   Healthy        IDA   57406
2   HealthyB   Healthy        IDB   57360
3   HealthyE   Healthy        IDE  186564
4      NAFL1      NAFL        ID1  129166
5      NAFL2      NAFL        ID2   84568
6      NAFL3      NAFL        ID4  144629
7      NAFL4      NAFL        ID5  328842
8      NAFL5      NAFL       ID10  209022
9       NAS1       NAS        ID8   84714
10      NAS2       NAS        ID3  216991
11      NAS3       NAS        ID7   85073
12     NASH1      NASH        ID6   95879
13     NASH2      NASH       ID11   67581
14     NASH3      NASH       ID12   47626

> ds_formula1 <- createFormula(ei, cols_fixed = "condition")
> ds_formula1
$formula
y ~ condition
<environment: 0x7fbba1a1a2a8>

$data
   condition
1    Healthy
2    Healthy
3    Healthy
4       NAFL
5       NAFL
6       NAFL
7       NAFL
8       NAFL
9        NAS
10       NAS
11       NAS
12      NASH
13      NASH
14      NASH

$random_terms
[1] FALSE

> contrast <- createContrast(c(0, 1, 0, 0))
> contrast
     [,1]
[1,]    0
[2,]    1
[3,]    0
[4,]    0
> 
> ds_res4 <- diffcyt(
+   sce, 
+   formula = ds_formula1, 
+   contrast = contrast, 
+   analysis_type = "DS", 
+   method_DS = c("diffcyt-DS-LMM"),
+   clustering_to_use = "meta14", 
+   subsampling = 10000, 
+   verbose = TRUE
+ )
using SingleCellExperiment object from CATALYST as input
using cluster IDs from clustering stored in column 'meta14' of 'cluster_codes' data frame in 'metadata' of SingleCellExperiment object from CATALYST
calculating features...
calculating DS tests using method 'diffcyt-DS-LMM'...
There were 50 or more warnings (use warnings() to see the first 50)
> diffcyt::topTable(ds_res4, format_vals = TRUE, top_n = 1000, order_by = "p_adj")
DataFrame with 504 rows and 4 columns
    cluster_id   marker_id     p_val     p_adj
      <factor>    <factor> <numeric> <numeric>
6           6        CD69   0.002060     0.147
6           6        CXCR3  0.003700     0.147
9           9        CXCR3  0.002050     0.147
12          12       CXCR3  0.002580     0.147
3           3        FoxP3  0.000912     0.147
...        ...         ...       ...       ...
11          11 CD223_LAG-3        NA        NA
12          12 CD223_LAG-3        NA        NA
13          13 CD223_LAG-3        NA        NA
14          14 CD223_LAG-3        NA        NA
13          13 CD16               NA        NA

If I try with limma DS:

> ds_design <- createDesignMatrix(ei, cols_design = "condition")
> ds_formula1 <- createFormula(ei, cols_fixed = "condition")
> contrast <- createContrast(c(0, 1, 0, 0))
> ds_res3 <- diffcyt(
+   sce, 
+   design = ds_design, 
+   contrast = contrast, 
+   analysis_type = "DS", 
+   clustering_to_use = "meta14", 
+   subsampling = 10000, 
+   verbose = TRUE,
+   transform = F
+ )
using SingleCellExperiment object from CATALYST as input
using cluster IDs from clustering stored in column 'meta14' of 'cluster_codes' data frame in 'metadata' of SingleCellExperiment object from CATALYST
calculating features...
calculating DS tests using method 'diffcyt-DS-limma'...
Warning messages:
1: In fitFDist(var, df1 = df, covariate = covariate) :
  More than half of residual variances are exactly zero: eBayes unreliable
2: In splines::ns(covariate, df = splinedf, intercept = TRUE) :
  shoving 'interior' knots matching boundary knots to inside

> diffcyt::topTable(ds_res3, format_vals = TRUE, show_logFC = T, top_n = 1000, order_by = "marker_id")
DataFrame with 504 rows and 5 columns
    cluster_id marker_id     logFC     p_val     p_adj
      <factor>  <factor> <numeric> <numeric> <numeric>
1            1      CD45     0.057    0.7820     1.000
2            2      CD45     0.303    0.5810     1.000
3            3      CD45    -0.150    0.4660     1.000
4            4      CD45    -0.140    0.2630     1.000
5            5      CD45    -0.295    0.0535     0.655
...        ...       ...       ...       ...       ...
10          10      CD16     0.000     1.000         1
11          11      CD16     0.000     1.000         1
12          12      CD16    -0.023     0.235         1
13          13      CD16     0.000     1.000         1
14          14      CD16     0.000     1.000         1


QUESTION:

  1. IS IT OK THE CONTRAST? IF i CHANGE WITH -1, 1, 0, 0 RESULTS ARE SIGNIFICANTS.
  2. CAN YOU ME EXPLAIN THE WARNING MESSAGES?
  3. I FOLLOWED THE CATALYST TUTORIAL WITH TRASFORMATIONS USING COFACTOR = 5. DO i NEED TO SET TRANSFORM = F OR TRUE?
    THANK YOU
@SamGG
Copy link

SamGG commented Feb 2, 2025

Hi. I am not part of the developers, just listening to the issues here. The linked issue is HelenaLC/CATALYST#417

  1. The contrasts seem OK to me. In limma approach, when using createDesignMatrix, createFormula is not needed. Contrasts for these two create functions are the same here, but I suspect it is not always the case. To view the columns created with createDesignMatrix, print the ds_design. For createFormula, I don't exactly know the matching between its output and makeContrast as if a factor is used, the order of conditions may change. In you case, the conditions are in the alphabetic order, so no problem IMO.
    There are warnings() in the first run that you should look at.
  2. No idea.
  3. Transform informs diffcyt if the MFI are already transformed or not. I don't know what in the sce from CATALYST, I do my own prepare when using diffcyt. Tell which cytometry instrument you are using, as cofactor depend on the technology. 5 is OK for mass cytometry.

I think you should limit the number of markers tested by selecting the ones you want to test. Currently, it seems that all are tested leading to an over-correction of p-values, i.e. less p-values sound significant, as in the first run. In the 2nd run, CD45 is probably not worth testing.

Best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants