CellphoneDB v4 | 快来体验更快、更准的细胞通讯分析吧！

2023-05-03 12:15:13, 欧易生物上海欧易生物医学科技有限公司

细胞通讯领域重要工具 CellphoneDB 近期迎来了一次重大更新。在这之前，CellphoneDB 作为一个专注于细胞间信号传导研究的在线数据库，在探索生命科学中发挥着重要作用。CellphoneDB从根据已知蛋白质相互作用信息构建细胞间通讯网络模型，到对不同类型的细胞、组织和物种进行研究，再到对大规模的蛋白质相互作用进行筛选，历经多次升级优化的 CellphoneDB 提供了丰富的工具和数据资源，帮助研究人员更好地理解生物体内细胞间的相互作用，并揭示相关生物过程的分子机制。

此次CellphoneDB的数据库版本更新到了4.1.0版本，相较于上个版本，它的受配体对的数量更少，但是它结果准确性也得到了进一步提高。此次数据库更新后，去除了所有未经过筛选的外部数据库，并添加了更多经过手动筛选的高可信度的受配体对，总数达到2923个！

CellphoneDB软件同时也更新到4.0版本。在这次更新中，CellphoneDB 的运行速度得到了大幅提升，新版 CellphoneDB软件采用 Python 编写，输入的count文件不再局限于matrix格式的文件，可以直接使用h5ad格式的文件，并且读取速度更快。对于cellphoneDB的运行结果，此次开发团队也提供了一种可以根据细胞类型、受配体对或者特定基因进行筛选的方法，而且本次更新大幅提高了基于数学统计方法（cpdb_statistical_analysis_method）的计算效率。本小欧用3万个细胞的数据做了一个测试，新版cellphoneDB的运行速度比之前快了一倍不止！

图1 不同版本软件的运行速度，其中V4.1版本的数据库由4.0版本的cellphoneDB软件运行，其他两个版本的数据库由3.0版本的软件运行

除了运行速度的变化，我们通过network图也能发现此次更新的数据库与之前的数据库相比较，在结果中得到受配体数量上也发生了较大的变化。

图2 V3版本数据库运行结果

图3 V4版本数据库运行结果

图4 V4.1版本数据库运行结果

Python版本的cellphoneDB运行代码非常简洁，而且作者给每一个参数都做了非常详细的注释，可以说是非常照顾新手小白了。下面就让我们一起来看看吧！

##配置环境conda create -n cpdb python=3.8source activate cpdbpip install cellphonedb##下载databasecpdb_version=”v4.1.0”cpdb_target_dir = os.path.join(''/data/database/db'', cpdb_version)from cellphonedb.utils import db_utilsdb_utils.download_database(cpdb_target_dir, cpdb_version)

接下来我们使用cpdb_statistical_analysis_method方法进行计算,meta_file_path参数使用是一列包含每个细胞barcode和对应细胞类型的dataframe信息，counts_file_path参数可以提供h5ad文件，也可以提供counts矩阵文件。

from cellphonedb.src.core.methods import cpdb_statistical_analysis_method
deconvoluted, means, pvalues, significant_means = cpdb_statistical_analysis_method.call(    cpdb_file_path = “/data/database/db/v4.1.0”,                 # mandatory: CellPhoneDB database zip file.    meta_file_path = “test_meta.txt” ,                 # mandatory: tsv file defining barcodes to cell label.    counts_file_path = “test.h5ad ” ,             # mandatory: normalized count matrix.    counts_data = ''hgnc_symbol'',                     # defines the gene annotation in counts matrix.    iterations = 1000,                               # denotes the number of shufflings performed in the analysis.    threshold = 0.1,                                 # defines the min % of cells expressing a gene for this to be employed in the analysis.    threads = 4,                                     # number of threads to use in the analysis.    debug_seed = 42,                                 # debug randome seed. To disable >=0.    result_precision = 3,                            # Sets the rounding for the mean values in significan_means.    pvalue = 0.05,                                   # P-value threshold to employ for significance.    subsampling = False,                             # To enable subsampling the data (geometri sketching).    subsampling_log = False,                         # (mandatory) enable subsampling log1p for non log-transformed data inputs.    subsampling_num_pc = 100,                        # Number of componets to subsample via geometric skectching (dafault: 100).    subsampling_num_cells = 1000,                    # Number of cells to subsample (integer) (default: 1/3 of the dataset).    separator = ''|'',                                 # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".    debug = False,                                   # Saves all intermediate tables employed during the analysis in pkl format.    output_path = out_path,                          # Path to save results.    output_suffix = None                             # Replaces the timestamp in the output files by a user defined string in the  (default: None).)

输出的内容为statistical_analysis_means.txt,statistical_analysis_deconvoluted.txt， statistical_analysis_pvalues.txt和statistical_analysis_significant_means.txt四个文件。

可以使用search_utils函数对输出的结果进行进一步的查询，具体代码如下：

```from cellphonedb.utils import search_utilssearch_results = search_utils.search_analysis_results(    query_cell_types_1 = [''EVT_1'', ''EVT_2'', ''GC'', ''eEVT'', ''iEVT''],  # List of cells 1, will be paired to cells 2 (list or ''All'').    query_cell_types_2 = [''PV MMP11'', ''PV MYH11'', ''PV STEAP4''],     # List of cells 2, will be paired to cells 1 (list or ''All'').    query_genes = [''TGFBR1''],                                       # filter interactions based on the genes participating (list).    query_interactions = [''CSF1_CSF1R''],                            # filter intereactions based on their name (list).    significant_means = “statistical_analysis_means.txt”,                          # significant_means file generated by CellPhoneDB.    deconvoluted = “statistical_analysis_deconvoluted.txt”,                                    # devonvoluted file generated by CellPhoneDB.    separator = ''|'',                                                # separator (default: |) employed to split cells (cellA|cellB).    long_format = True                                              # converts the output into a wide table, removing non-significant interactions)