Data preprocessing for analyzing the assemblies¶
This notebook includes all scripts used for data preprocessing and assembly analysis. For figure generation please refer figure generation.
MUMMer for synteny coords¶
In [1]:
cd /home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/MUMer
In [2]:
# Step 1: Alignment of both genomes using nucmer
nucmer --maxmatch -p synteny ../genomes/DSM158.fasta ../genomes/GCF_049434525.1_MWCSPHH2ANNA_genomic.fasta
# Step 2: match-filtering
delta-filter -1 synteny.delta > synteny.filtered.delta
# Step 3: Create Coord-file (for plotting)
show-coords -rcl -T synteny.filtered.delta > synteny.coords
1: PREPARING DATA 2,3: RUNNING mummer AND CREATING CLUSTERS # reading input file "synteny.ntref" of length 4520363 # construct suffix tree for sequence of length 4520363 # (maximum reference length is 536870908) # (maximum query length is 4294967295) # process 45203 characters per dot #.................................................................................................... # CONSTRUCTIONTIME /usr/bin/mummer synteny.ntref 1.30 # reading input file "/home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/MUMer/../genomes/GCF_049434525.1_MWCSPHH2ANNA_genomic.fasta" of length 4520329 # matching query-file "/home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/MUMer/../genomes/GCF_049434525.1_MWCSPHH2ANNA_genomic.fasta" # against subject-file "synteny.ntref" # COMPLETETIME /usr/bin/mummer synteny.ntref 4.19 # SPACE /usr/bin/mummer synteny.ntref 8.75 4: FINISHING DATA
Mash-distance calculation¶
In [3]:
# install Mash (if not installed)
cd /home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes
conda activate mash
mash sketch -o ./all *.fasta # reduce sequences
mash dist all.msh all.msh > ../mash/dist.tab #calculate distance matrix
conda deactivate
Sketching DSM158.fasta... Sketching GCF_000012905.2_ASM1290v2_genomic.fasta... Sketching GCF_000015985.1_ASM1598v1_genomic.fasta... Sketching GCF_000021005.1_ASM2100v1_genomic.fasta... Sketching GCF_000212605.1_ASM21260v1_genomic.fasta... Sketching GCF_000269625.1_PB_Rhod_Spha_2_4_1_V1_genomic.fasta... Sketching GCF_000273405.1_Rhod_Spha_2_4_1_V1_genomic.fasta... Sketching GCF_001576595.1_ASM157659v1_genomic.fasta... Sketching GCF_001685625.1_ASM168562v1_genomic.fasta... Sketching GCF_002706325.1_ASM270632v1_genomic.fasta... Sketching GCF_003324715.1_ASM332471v1_genomic.fasta... Sketching GCF_003846365.1_ASM384636v1_genomic.fasta... Sketching GCF_003846385.1_ASM384638v1_genomic.fasta... Sketching GCF_003846405.1_ASM384640v1_genomic.fasta... Sketching GCF_003846425.1_ASM384642v1_genomic.fasta... Sketching GCF_012647365.1_ASM1264736v1_genomic.fasta... Sketching GCF_049434525.1_MWCSPHH2ANNA_genomic.fasta... Sketching GCF_052246835.1_ASM5224683v1_genomic.fasta... Writing to ./all.msh...
Quality Assessment of the genomes¶
busco (completeness)¶
In [4]:
conda activate busco
cd /home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes
busco -i ./DSM158.fasta -m genome -l rhodobacter_odb12 -c 20 -o DSM # strain DSM158
busco -i ./GCF_000012905.2_ASM1290v2_genomic.fasta -m genome -l rhodobacter_odb12 -c 20 -o NCBI_Ref # ncbi reference
busco -i ./GCF_049434525.1_MWCSPHH2ANNA_genomic.fasta -m genome -l rhodobacter_odb12 -c 20 -o SUBH2 # substrain H2
conda deactivate
2025-10-30 10:16:28 INFO: ***** Start a BUSCO v6.0.0 analysis, current time: 10/30/2025 10:16:28 *****
2025-10-30 10:16:28 INFO: Configuring BUSCO with local environment
2025-10-30 10:16:28 INFO: Running genome mode
2025-10-30 10:16:28 INFO: Downloading information on latest versions of BUSCO data...
2025-10-30 10:16:30 INFO: Input file is /home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/DSM158.fasta
2025-10-30 10:16:30 INFO: The local file or folder /home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/busco_downloads/lineages/rhodobacter_odb12 is the last available version.
2025-10-30 10:16:31 INFO: Running BUSCO using lineage dataset rhodobacter_odb12 (prokaryota, 2025-05-14)
2025-10-30 10:16:31 INFO: Running 1 job(s) on bbtools, starting at 10/30/2025 10:16:31
2025-10-30 10:16:32 INFO: [bbtools] 1 of 1 task(s) completed
2025-10-30 10:16:32 INFO: ***** Run Prodigal on input to predict and extract genes *****
2025-10-30 10:16:32 INFO: Running Prodigal with genetic code 11 in single mode
2025-10-30 10:16:32 INFO: Running 1 job(s) on prodigal, starting at 10/30/2025 10:16:32
2025-10-30 10:16:41 INFO: [prodigal] 1 of 1 task(s) completed
2025-10-30 10:16:42 INFO: Genetic code 11 selected as optimal
2025-10-30 10:16:42 INFO: ***** Run HMMER on gene sequences *****
2025-10-30 10:16:42 INFO: Running 1343 job(s) on hmmsearch, starting at 10/30/2025 10:16:42
2025-10-30 10:16:44 INFO: [hmmsearch] 135 of 1343 task(s) completed
2025-10-30 10:16:45 INFO: [hmmsearch] 269 of 1343 task(s) completed
2025-10-30 10:16:45 INFO: [hmmsearch] 403 of 1343 task(s) completed
2025-10-30 10:16:46 INFO: [hmmsearch] 538 of 1343 task(s) completed
2025-10-30 10:16:47 INFO: [hmmsearch] 672 of 1343 task(s) completed
2025-10-30 10:16:48 INFO: [hmmsearch] 806 of 1343 task(s) completed
2025-10-30 10:16:48 INFO: [hmmsearch] 941 of 1343 task(s) completed
2025-10-30 10:16:49 INFO: [hmmsearch] 1075 of 1343 task(s) completed
2025-10-30 10:16:50 INFO: [hmmsearch] 1209 of 1343 task(s) completed
2025-10-30 10:16:52 INFO: [hmmsearch] 1343 of 1343 task(s) completed
2025-10-30 10:16:53 INFO: Results: C:98.2%[S:98.0%,D:0.2%],F:0.5%,M:1.3%,n:1343
2025-10-30 10:16:53 INFO:
---------------------------------------------------
|Results from dataset rhodobacter_odb12 |
---------------------------------------------------
|C:98.2%[S:98.0%,D:0.2%],F:0.5%,M:1.3%,n:1343 |
|1319 Complete BUSCOs (C) |
|1316 Complete and single-copy BUSCOs (S) |
|3 Complete and duplicated BUSCOs (D) |
|7 Fragmented BUSCOs (F) |
|17 Missing BUSCOs (M) |
|1343 Total BUSCO groups searched |
---------------------------------------------------
2025-10-30 10:16:53 INFO: BUSCO analysis done. Total running time: 23 seconds
2025-10-30 10:16:53 INFO: Results written in /home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/DSM
2025-10-30 10:16:53 INFO: For assistance with interpreting the results, please consult the userguide: https://busco.ezlab.org/busco_userguide.html
2025-10-30 10:16:53 INFO: Visit this page https://gitlab.com/ezlab/busco#how-to-cite-busco to see how to cite BUSCO
2025-10-30 10:16:53 INFO: Thank you for using BUSCO! Anonymous usage data is gathered to improve the tool. You may opt out with --opt-out-run-stats.
2025-10-30 10:16:54 INFO: ***** Start a BUSCO v6.0.0 analysis, current time: 10/30/2025 10:16:54 *****
2025-10-30 10:16:54 INFO: Configuring BUSCO with local environment
2025-10-30 10:16:54 INFO: Running genome mode
2025-10-30 10:16:54 INFO: Downloading information on latest versions of BUSCO data...
2025-10-30 10:16:57 INFO: Input file is /home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/GCF_000012905.2_ASM1290v2_genomic.fasta
2025-10-30 10:16:57 INFO: The local file or folder /home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/busco_downloads/lineages/rhodobacter_odb12 is the last available version.
2025-10-30 10:16:57 INFO: Running BUSCO using lineage dataset rhodobacter_odb12 (prokaryota, 2025-05-14)
2025-10-30 10:16:57 INFO: Running 1 job(s) on bbtools, starting at 10/30/2025 10:16:57
2025-10-30 10:16:58 INFO: [bbtools] 1 of 1 task(s) completed
2025-10-30 10:16:58 INFO: ***** Run Prodigal on input to predict and extract genes *****
2025-10-30 10:16:58 INFO: Running Prodigal with genetic code 11 in single mode
2025-10-30 10:16:58 INFO: Running 1 job(s) on prodigal, starting at 10/30/2025 10:16:58
2025-10-30 10:17:09 INFO: [prodigal] 1 of 1 task(s) completed
2025-10-30 10:17:09 INFO: Genetic code 11 selected as optimal
2025-10-30 10:17:09 INFO: ***** Run HMMER on gene sequences *****
2025-10-30 10:17:09 INFO: Running 1343 job(s) on hmmsearch, starting at 10/30/2025 10:17:09
2025-10-30 10:17:11 INFO: [hmmsearch] 135 of 1343 task(s) completed
2025-10-30 10:17:12 INFO: [hmmsearch] 269 of 1343 task(s) completed
2025-10-30 10:17:12 INFO: [hmmsearch] 403 of 1343 task(s) completed
2025-10-30 10:17:13 INFO: [hmmsearch] 538 of 1343 task(s) completed
2025-10-30 10:17:14 INFO: [hmmsearch] 672 of 1343 task(s) completed
2025-10-30 10:17:15 INFO: [hmmsearch] 806 of 1343 task(s) completed
2025-10-30 10:17:15 INFO: [hmmsearch] 941 of 1343 task(s) completed
2025-10-30 10:17:16 INFO: [hmmsearch] 1075 of 1343 task(s) completed
2025-10-30 10:17:17 INFO: [hmmsearch] 1209 of 1343 task(s) completed
2025-10-30 10:17:19 INFO: [hmmsearch] 1343 of 1343 task(s) completed
2025-10-30 10:17:20 INFO: Results: C:98.5%[S:98.3%,D:0.2%],F:0.4%,M:1.1%,n:1343
2025-10-30 10:17:21 INFO:
---------------------------------------------------
|Results from dataset rhodobacter_odb12 |
---------------------------------------------------
|C:98.5%[S:98.3%,D:0.2%],F:0.4%,M:1.1%,n:1343 |
|1323 Complete BUSCOs (C) |
|1320 Complete and single-copy BUSCOs (S) |
|3 Complete and duplicated BUSCOs (D) |
|5 Fragmented BUSCOs (F) |
|15 Missing BUSCOs (M) |
|1343 Total BUSCO groups searched |
---------------------------------------------------
2025-10-30 10:17:21 INFO: BUSCO analysis done. Total running time: 24 seconds
2025-10-30 10:17:21 INFO: Results written in /home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/NCBI_Ref
2025-10-30 10:17:21 INFO: For assistance with interpreting the results, please consult the userguide: https://busco.ezlab.org/busco_userguide.html
2025-10-30 10:17:21 INFO: Visit this page https://gitlab.com/ezlab/busco#how-to-cite-busco to see how to cite BUSCO
2025-10-30 10:17:21 INFO: Thank you for using BUSCO! Anonymous usage data is gathered to improve the tool. You may opt out with --opt-out-run-stats.
2025-10-30 10:17:22 INFO: ***** Start a BUSCO v6.0.0 analysis, current time: 10/30/2025 10:17:22 *****
2025-10-30 10:17:22 INFO: Configuring BUSCO with local environment
2025-10-30 10:17:22 INFO: Running genome mode
2025-10-30 10:17:22 INFO: Downloading information on latest versions of BUSCO data...
2025-10-30 10:17:24 INFO: Input file is /home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/GCF_049434525.1_MWCSPHH2ANNA_genomic.fasta
2025-10-30 10:17:24 INFO: The local file or folder /home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/busco_downloads/lineages/rhodobacter_odb12 is the last available version.
2025-10-30 10:17:24 INFO: Running BUSCO using lineage dataset rhodobacter_odb12 (prokaryota, 2025-05-14)
2025-10-30 10:17:24 INFO: Running 1 job(s) on bbtools, starting at 10/30/2025 10:17:24
2025-10-30 10:17:26 INFO: [bbtools] 1 of 1 task(s) completed
2025-10-30 10:17:26 INFO: ***** Run Prodigal on input to predict and extract genes *****
2025-10-30 10:17:26 INFO: Running Prodigal with genetic code 11 in single mode
2025-10-30 10:17:26 INFO: Running 1 job(s) on prodigal, starting at 10/30/2025 10:17:26
2025-10-30 10:17:35 INFO: [prodigal] 1 of 1 task(s) completed
2025-10-30 10:17:35 INFO: Genetic code 11 selected as optimal
2025-10-30 10:17:35 INFO: ***** Run HMMER on gene sequences *****
2025-10-30 10:17:35 INFO: Running 1343 job(s) on hmmsearch, starting at 10/30/2025 10:17:35
2025-10-30 10:17:38 INFO: [hmmsearch] 135 of 1343 task(s) completed
2025-10-30 10:17:38 INFO: [hmmsearch] 269 of 1343 task(s) completed
2025-10-30 10:17:39 INFO: [hmmsearch] 403 of 1343 task(s) completed
2025-10-30 10:17:40 INFO: [hmmsearch] 538 of 1343 task(s) completed
2025-10-30 10:17:41 INFO: [hmmsearch] 672 of 1343 task(s) completed
2025-10-30 10:17:41 INFO: [hmmsearch] 806 of 1343 task(s) completed
2025-10-30 10:17:42 INFO: [hmmsearch] 941 of 1343 task(s) completed
2025-10-30 10:17:43 INFO: [hmmsearch] 1075 of 1343 task(s) completed
2025-10-30 10:17:44 INFO: [hmmsearch] 1209 of 1343 task(s) completed
2025-10-30 10:17:45 INFO: [hmmsearch] 1343 of 1343 task(s) completed
2025-10-30 10:17:47 INFO: Results: C:98.2%[S:98.1%,D:0.1%],F:0.5%,M:1.3%,n:1343
2025-10-30 10:17:47 INFO:
---------------------------------------------------
|Results from dataset rhodobacter_odb12 |
---------------------------------------------------
|C:98.2%[S:98.1%,D:0.1%],F:0.5%,M:1.3%,n:1343 |
|1319 Complete BUSCOs (C) |
|1317 Complete and single-copy BUSCOs (S) |
|2 Complete and duplicated BUSCOs (D) |
|7 Fragmented BUSCOs (F) |
|17 Missing BUSCOs (M) |
|1343 Total BUSCO groups searched |
---------------------------------------------------
2025-10-30 10:17:47 INFO: BUSCO analysis done. Total running time: 23 seconds
2025-10-30 10:17:47 INFO: Results written in /home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/SUBH2
2025-10-30 10:17:47 INFO: For assistance with interpreting the results, please consult the userguide: https://busco.ezlab.org/busco_userguide.html
2025-10-30 10:17:47 INFO: Visit this page https://gitlab.com/ezlab/busco#how-to-cite-busco to see how to cite BUSCO
2025-10-30 10:17:47 INFO: Thank you for using BUSCO! Anonymous usage data is gathered to improve the tool. You may opt out with --opt-out-run-stats.
mkdir: cannot create directory ‘busco’: File exists
CheckM2 (completeness & contamination)¶
In [5]:
conda activate checkm2
cd /home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes
#mkdir CheckM2
checkm2 predict --threads 30 --force --input ./DSM158.fasta ./GCF_000012905.2_ASM1290v2_genomic.fasta ./GCF_049434525.1_MWCSPHH2ANNA_genomic.fasta --output-directory ./CheckM2
[10/30/2025 10:17:54 AM] INFO: Running CheckM2 version 1.1.0
[10/30/2025 10:17:54 AM] INFO: Running quality prediction workflow with 30 threads.
[10/30/2025 10:17:55 AM] INFO: Calling genes in 3 bins with 30 threads:
Finished processing 3 of 3 (100.00%) bins.
[10/30/2025 10:18:31 AM] INFO: Calculating metadata for 3 bins with 30 threads:
Finished processing 3 of 3 (100.00%) bin metadata.
[10/30/2025 10:18:31 AM] INFO: Annotating input genomes with DIAMOND using 30 threads
[10/30/2025 10:19:09 AM] INFO: Processing DIAMOND output
[10/30/2025 10:19:09 AM] INFO: Predicting completeness and contamination using ML models.
[10/30/2025 10:19:14 AM] INFO: Parsing all results and constructing final output table.
[10/30/2025 10:19:14 AM] INFO: CheckM2 finished successfully.
Pangenome construction using ppanggolin¶
Prepare the gff.list file for ppanggolin
In [7]:
cd /home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/Ppanggolin/
rm genomes.gbff.list
touch genomes.gff.list
cd ../genomes/Annot
for i in *.gbff *.gbk; do echo -e $i'\t'$(pwd)/$i >> ../../Ppanggolin/genomes.gbff.list; done
cd ../../Ppanggolin
cat --show-tabs genomes.gbff.list
GCF_000012905.2.gbff^I/home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/Annot/GCF_000012905.2.gbff GCF_000015985.1.gbff^I/home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/Annot/GCF_000015985.1.gbff GCF_000021005.1.gbff^I/home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/Annot/GCF_000021005.1.gbff GCF_000212605.1.gbff^I/home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/Annot/GCF_000212605.1.gbff GCF_000269625.1.gbff^I/home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/Annot/GCF_000269625.1.gbff GCF_000273405.1.gbff^I/home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/Annot/GCF_000273405.1.gbff GCF_001576595.1.gbff^I/home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/Annot/GCF_001576595.1.gbff GCF_001685625.1.gbff^I/home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/Annot/GCF_001685625.1.gbff GCF_002706325.1.gbff^I/home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/Annot/GCF_002706325.1.gbff GCF_003324715.1.gbff^I/home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/Annot/GCF_003324715.1.gbff GCF_003846365.1.gbff^I/home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/Annot/GCF_003846365.1.gbff GCF_003846385.1.gbff^I/home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/Annot/GCF_003846385.1.gbff GCF_003846405.1.gbff^I/home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/Annot/GCF_003846405.1.gbff GCF_003846425.1.gbff^I/home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/Annot/GCF_003846425.1.gbff GCF_012647365.1.gbff^I/home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/Annot/GCF_012647365.1.gbff GCF_049434525.1.gbff^I/home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/Annot/GCF_049434525.1.gbff GCF_052246835.1.gbff^I/home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/Annot/GCF_052246835.1.gbff DSM158.gbk^I/home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/genomes/Annot/DSM158.gbk
Calculate the pangenome
In [9]:
cd /home/Drives/HDD03_06T_SDE/anna/SyntenyPlotDSMSubH2/Ppanggolin/
conda activate ppanggolin
ppanggolin workflow -f -o ./output --cpu 20 --anno genomes.gbff.list
conda deactivate
2025-10-30 10:19:52 main.py:l146 INFO Command: /home/jupyter-anna/.conda/envs/ppanggolin/bin/ppanggolin workflow -f -o ./output --cpu 20 --anno genomes.gbff.list 2025-10-30 10:19:52 main.py:l147 INFO PPanGGOLiN version: 1.0.13 2025-10-30 10:19:52 annotate.py:l309 INFO Reading genomes.gbff.list the list of organism files ... Processing DSM158.gbk: 100%|███████| 18/18 [00:09<00:00, 1.95annotation file/s] 2025-10-30 10:20:01 writeBinaries.py:l387 INFO Writing genome annotations... 100%|███████████████████████████████████████| 18/18 [00:00<00:00, 92.12genome/s] 2025-10-30 10:20:01 writeBinaries.py:l400 INFO writing the protein coding gene dna sequences 100%|███████████████████████████████| 77147/77147 [00:00<00:00, 105161.82gene/s] 2025-10-30 10:20:02 writeBinaries.py:l426 INFO Done writing the pangenome. It is in file : output/pangenome.h5 2025-10-30 10:20:02 cluster.py:l158 INFO Writing all of the CDS sequences for clustering... 100%|███████████████████████████████| 77147/77147 [00:00<00:00, 323343.87gene/s] 2025-10-30 10:20:02 cluster.py:l201 INFO Clustering all of the genes sequences... 2025-10-30 10:20:02 cluster.py:l45 INFO Creating sequence database... 2025-10-30 10:20:03 cluster.py:l54 INFO Clustering sequences... 2025-10-30 10:20:05 cluster.py:l56 INFO Extracting cluster representatives... 2025-10-30 10:20:05 cluster.py:l68 INFO Writing gene to family informations 2025-10-30 10:20:06 cluster.py:l148 INFO Adding protein sequences to the gene families 2025-10-30 10:20:06 cluster.py:l130 INFO Adding 77147 genes to the gene families 100%|███████████████████████████████| 77147/77147 [00:00<00:00, 230322.64gene/s] 2025-10-30 10:20:06 makeGraph.py:l56 INFO Computing the neighbors graph... Processing DSM158.gbk: 100%|██████████████| 18/18 [00:00<00:00, 42.71organism/s] 2025-10-30 10:20:07 makeGraph.py:l74 INFO Done making the neighbors graph. 2025-10-30 10:20:07 partition.py:l349 INFO Estimating the optimal number of partitions... 100%|███████████████| 19/19 [00:01<00:00, 11.59Number of number of partitions/s] 2025-10-30 10:20:08 partition.py:l351 INFO The number of partitions has been evaluated at 3 2025-10-30 10:20:08 partition.py:l369 INFO Partitioning... 2025-10-30 10:20:09 partition.py:l429 INFO Partitionned 18 genomes in 0.31 seconds. 2025-10-30 10:20:09 writeBinaries.py:l405 INFO Writing gene families and gene associations... 100%|██████████████████████████| 6224/6224 [00:00<00:00, 107409.53gene family/s] 2025-10-30 10:20:09 writeBinaries.py:l407 INFO Writing gene families information... 100%|██████████████████████████| 6224/6224 [00:00<00:00, 312541.58gene family/s] 2025-10-30 10:20:09 writeBinaries.py:l414 INFO Writing the edges... 100%|██████████████████████████████████| 7398/7398 [00:00<00:00, 97064.43edge/s] 2025-10-30 10:20:09 writeBinaries.py:l328 INFO Updating gene families with partition information 100%|██████████████████████████| 6224/6224 [00:00<00:00, 195830.26gene family/s] 2025-10-30 10:20:09 writeBinaries.py:l426 INFO Done writing the pangenome. It is in file : output/pangenome.h5 2025-10-30 10:20:09 tile_plot.py:l38 INFO Drawing the tile plot... 2025-10-30 10:20:09 tile_plot.py:l54 INFO start with matrice 2025-10-30 10:20:09 tile_plot.py:l69 INFO done with making the dendrogram to order the organisms on the plot 2025-10-30 10:20:09 tile_plot.py:l104 INFO Getting the gene name(s) and the number for each tile of the plot ... 2025-10-30 10:20:09 tile_plot.py:l113 INFO Done extracting names and numbers. Making the heatmap ... 2025-10-30 10:20:10 tile_plot.py:l169 INFO Drawing the figure itself... 2025-10-30 10:20:12 tile_plot.py:l171 INFO Done with the tile plot : './output/tile_plot.html' 2025-10-30 10:20:12 ucurve.py:l13 INFO Drawing the U-shaped curve... 2025-10-30 10:20:13 ucurve.py:l60 INFO Done drawing the U-shaped curve : './output/Ushaped_plot.html' 2025-10-30 10:20:13 writeFlat.py:l225 INFO Writing the .csv file ... 2025-10-30 10:20:13 writeFlat.py:l281 INFO Writing the gene presence absence file ... 2025-10-30 10:20:13 writeFlat.py:l213 INFO Writing the gexf file for the pangenome graph... 2025-10-30 10:20:13 writeFlat.py:l213 INFO Writing the light gexf file for the pangenome graph... 2025-10-30 10:20:13 writeFlat.py:l421 INFO Writing the projection files... 2025-10-30 10:20:13 writeFlat.py:l304 INFO Writing pangenome statistics... 2025-10-30 10:20:13 writeFlat.py:l305 INFO Writing statistics on persistent duplication... 2025-10-30 10:20:13 writeFlat.py:l106 INFO Writing the json file for the pangenome graph... 2025-10-30 10:20:13 writeFlat.py:l430 INFO Writing the list of gene families for each partitions... 2025-10-30 10:20:13 writeFlat.py:l301 INFO Done writing the gene presence absence file : './output/gene_presence_absence.Rtab' 2025-10-30 10:20:13 writeFlat.py:l456 INFO Done writing the list of gene families for each partition 2025-10-30 10:20:13 writeFlat.py:l326 INFO Done writing stats on persistent duplication 2025-10-30 10:20:13 writeFlat.py:l327 INFO Writing genome per genome statistics (completeness and counts)... 2025-10-30 10:20:13 writeFlat.py:l389 INFO Done writing genome per genome statistics 2025-10-30 10:20:13 writeFlat.py:l278 INFO Done writing the matrix : './output/matrix.csv' 2025-10-30 10:20:13 writeFlat.py:l222 INFO Done writing the gexf file : './output/pangenomeGraph_light.gexf' 2025-10-30 10:20:13 writeFlat.py:l222 INFO Done writing the gexf file : './output/pangenomeGraph.gexf' 2025-10-30 10:20:13 writeFlat.py:l427 INFO Done writing the projection files 2025-10-30 10:20:14 writeFlat.py:l113 INFO Done writing the json file : './output/pangenomeGraph.json' Genes : 77147 Organisms : 18 Families : 6224 Edges : 7398 Persistent ( min:0.67, max:1.0, sd:0.04, mean:0.99 ): 3708 Shell ( min:0.33, max:0.83, sd:0.1, mean:0.59 ): 538 Cloud ( min:0.06, max:0.61, sd:0.1, mean:0.12 ): 1978 Number of partitions : 3