capital.tl.preprocessing

capital.tl.preprocessing(adata, Min_Genes=200, Min_Cells=3, Min_Mean=0.0125, Max_Mean=3, Min_Disp=0.5, N_pcs=50, n_Top_genes=2000, K=10, magic_imputation=False)

The recipe for preprocessing raw count data.

In adata.raw, all genes are stored, so that those genes can be used in later calculation in CAPITAL.

The recipe runs the following steps:

import scanpy as sc
sc.pp.filter_cells(adata, min_genes=Min_Genes)
sc.pp.filter_genes(adata, min_cells=Min_Cells)
sc.pp.normalize_total(adata, exclude_highly_expressed=True)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, min_mean=Min_Mean, max_mean=Max_Mean, min_disp=Min_Disp, n_top_genes=n_Top_genes)
adata.raw = adata
adata = adata[:,adata.var['highly_variable']]
sc.tl.pca(adata, n_comps=N_pcs)
sc.pp.neighbors(adata, n_neighbors=K, n_pcs=N_pcs)
sc.tl.diffmap(adata)
sc.tl.umap(adata)
sc.tl.leiden(adata)
sc.tl.paga(adata, groups='leiden')
Parameters:
  • adata (AnnData) – The annotated data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.

  • Min_Genes (int) – The number of genes to filter in scanpy.pp.filter_cells(), by default 200.

  • Min_Cells (int) – The number of cells to filter in scanpy.pp.filter_genes(), by default 3.

  • Min_Mean (float) – The minimum mean that is filtered to calculate highly variable genes. Look scanpy.pp.highly_variable_genes(), by default 0.0125.

  • Max_Mean (int) – The maxmum mean that is filtered to calculate highly variable genes. Look scanpy.pp.highly_variable_genes(), by default 3.

  • Min_Disp (float) – The minimum dispersion that is filtered to calculate highly variable genes. Look scanpy.pp.highly_variable_genes(), by default 0.5.

  • N_pcs (int) – The number of principal components used, by default 50.

  • n_Top_genes (int) – The number of highly variable genes, by default 2000.

  • K (int) – The size of a local neighborhood used for manifold approximation, by default 10.

  • magic_imputation (bool) – If True, MAGIC imputation is done, by default False.