Run SELINA
Pre-train
You can follow the usage instruction listed below if you want to train a model with your own reference files. Note that you should include --disease in your command if you want to train models for disease data.
usage: selina train [-h] --path-in PATH_IN --path-out PATH_OUT
[--outprefix OUTPREFIX] [--disease]
optional arguments:
-h, --help show this help message and exit
--path-in PATH_IN File path of training datasets.
--path-out PATH_OUT File path of the output model.
--outprefix OUTPREFIX
Prefix of the output files. DEFAULT: pre-trained
--disease This flag should be used when the data is in
disease condition
In this step, two output files used in the next step will be generated.
pre-trained_params.pt: a file containing all parameters of the pre-trained modelpre-trained_meta.pkl: a file containing the cell types and genes of the reference data
Predict
Here you can choose one model from our pre-trained models (available on SELINA models) or the model trained by yourself to annotate the query data. These pre-trained models are divided into two categories, of which one is for normal dataset prediction, and another one is for disease datasets prediction. Currently the disease models only cover the non-small-cell lung carcinoma, type 2 diabetes and Alzheimer’s disease, which were used to evaluate the performance of SELINA in our paper.
Since the expression profiles of disease data may be more complicated than normal data, we removed the fine-tuning step when predicting for the disease data, which can be achieved by adding --disease to the command.
usage: selina predict [-h] --query-expr QUERY_EXPR --model MODEL --seurat
SEURAT [--disease] [--prob-cutoff PROB_CUTOFF]
--path-out PATH_OUT [--outprefix OUTPREFIX]
optional arguments:
-h, --help show this help message and exit
Arguments for input:
--query-expr QUERY_EXPR
File path of the query data matrix.
--model MODEL File path of the pre-trained model.
--seurat SEURAT File path of the seurat object.
--disease This flag should be used when the data is in some
disease condition
Cutoff for downstream analysis:
--prob-cutoff PROB_CUTOFF
Cutoff for prediction probability. DEFAULT: 0.9.
Output Arguments:
--path-out PATH_OUT File path of the output files.
--outprefix OUTPREFIX
Prefix of the output files. DEFAULT: query
This step will output four files:
query_predictions.txt: predicted cell type for each cell in the query dataquery_probability.txt: probability of cells predicted as each of the reference cell typesquery_pred.png: UMAP plot with cell type labelsquery_DiffGenes.tsv: differentially expressed genes for each cell type