Change log ========== 22-09-16 (TSOPPI version remaining at 0.3.2) - documentation updates: - PCGR and CPSR version numbers in the documentation text have been changed to values correct for the 0.3.2 version of TSOPPI (version number 1.0.0 for both PCGR and CPSR); - the separator expected in the "Variant recurrence table creation" tool's input file has been explicitly specified in the documentation. 22-06-07 (TSOPPI version 0.3.2) - bug fix: - tumor sample variants not present in the provided variant recurrence table could cause TSOPPI to crash during sample post-processing; the issue should now be fixed, and these novel variants should be labeled with \"no_recurrence_data\" flag in the 'Class_judgement_comments' column of the variant interpretation table. 22-03-20 (TSOPPI version 0.3.1) - bug fix: - in the tumor content adjusted plots within the \*_CNV_distribution_plots.pdf output file, patient sample values below the visible range were plotted at the maximum visible values instead of being plotted at the minimum visible values; this bug is now fixed. 22-03-10 (TSOPPI version 0.3.0) - updated: - continuumio/miniconda3 base Docker image of TSOPPI updated to version 4.10.3p1; - PCGR and CPSR updated to version 1.0.0. - added: - gene-wise CNV distribution plots based on a set of 85 normal InPreD samples; - inclusion of LocalApp's Nirvana variant annotation (allowing for recognition of coding variants according to RefSeq, despite their "intergenic" PCGR annotation); - enabling use of InPreD's own small variant blacklist intended for variants appearing in the variant interpretation table of most samples; variants present on this blacklist are being redirected into a new file (\*small_variant_table_blacklisted.tsv); - enabling use of InPreD's own gene- and small variant- whitelists intended for treatment-relevant sites/regions; details regarding these sites/regions are being saved into a new file (\*whitelist_details.tsv); - new output files related to DNA coverage (\*coverage_plots.pdf, \*coverage_histogram.tsv, \*coverage_details.tsv.gz) are being created for DNA tumor samples; - jdk-11.0.6 and picard_2.26.2 (namely its tools \"AddOrReplaceReadGroups\" and \"CollectSequencingArtifactMetrics\") can now be utilized in order to calculate sequencing artifact metrics for the tumor DNA sample (option \"--enable_CSAM_check True/False\"); - patient sex is now taken into account in CNV processing and plotting: - chrX non-PAR genes are assumed to have 1 copy in copy number neutral male patients; - LocalApp-assumed patient sex (derived from tumor DNA sample data) is now being mentioned in more output files; - PCGR's mutational signature analysis is being automatically run for samples with >=100 assumed somatic SNVs; - on-screen logs of the main sample-processing TSOPPI shell script (\"process_patient_samples.sh\") are now being saved into dedicated files; - a new file integrating splicing variant and fusion variant output meant for manual QC and review is now being generated (\*fusion_and_splice_variant_candidates.tsv); the new table is based on LocalApp's final fusion output and includes events generated by both Manta/RnaFusionFilter and SpliceGirl; - BL variants are now being annotated by CPSR (they used to be ignored in the CPSR processing previously, which was inconsistent with the PCGR annotation). - changed: - :doc:`the metrics plotting tool ` now has to be invoked via a wrapper bash script ("process_metrics_files.sh") that first activates the necessary Conda environment (the parameter setup has not been changed however); - "--create_plots" {True,False} option has been added to the metrics plotting tool; disabling the metrics plotting functionality can be advantageous when the only desired output are text-based metrics tables (typically in scenarios when many runs are being processed together); - parameter "--run_completion_status_file" of the metrics plotting tool can now have value \"NA\" (if for example a suitable RunCompletionStatus.xml file is not available); - WARNING messages are now being output to STDERR logs; - multiple changes to the CNV summary output tables: - \*merged_CNV_summary_FC_sorted.tsv (replacing \*merged_CNV_summary_CN_sorted.tsv) is now being sorted according to each gene's LocalApp-reported FC value (fold change); - \*merged_CNV_summary_location_sorted.tsv is now being sorted according to genomic position with the expected chromosome order; - \"Expected_germline_CN\" column has been added to both above-mentioned tables; - output of the \"summarize_run_variants.py\" tool is now sorted by sample InPreD IDs; - the \"LoH\" entry from LocalApp's CNV results is no longer presented as one of the targeted genes in gene-wise tables and plots generated during the post-processing; - the RNA and DNA post-processing tools have been merged, many changes have been made to the resulting tool (\"process_patient_samples.sh\"): - supplying a reference FASTA file is now mandatory; - short options have been removed; - multiple parameter names have been changed; - keeping/deleting intermediate files is now optional (option \"--remove_temporary_files True/False\"); - if matched samples (normal DNA and/or tumor RNA) are provided on input together with a tumor DNA sample, concordance values based on known germline SNVs are calculated and reported in a new output file (\*_sample_concordance.tsv); - a file listing details about all input samples (sample_list.tsv) is being created in the output directory; - if a matched tumor RNA sample is provided on input, small variants found in the tumor DNA sample will be investigated in the tumor RNA sample using bcftools mpileup (the RNA small variant information will be reflected in e.g., the IGV links documents and the small variant interpretation table); - a joint mutational signature plot is now being automatically created for each DNA tumor samples (this is just a plot, which shouldn't be confused with full signature mutational analysis); - changes to the variant interpretation table: - "MUTATION_HOTSPOT" field from PCGR/CPSR has been added; - a new field detailing Foundation One Liquid CDx targets has been added ("F1LCDx_targets"); - RNA sample fields ("Depth_tumor_RNA" and "AF_tumor_RNA") have been added; - values necessary for some of the filters to trigger were lowered ("LOW_TUMOR_DP": 50 -> 20; "LOW_TUMOR_VAF": 0.05 -> 0.02); - fields "Class_judgement" and "Class_judgement_comments" are now generated pre-filled with the default values "include" and ".", unless conditions for automated variant exclusion are being met (please refer to the table's header for details); - the order in which the included variants are listed has been changed: - recognized hotspot variants (i.e., variants with a non-NA entry in the 'Mutation_hotspot' field) are now listed at the top, variants with an "NA" entry in the 'Mutation_hotspot' field are listed afterwards; - within each of the two groups mentioned above, the variants are further sorted first by Tier (alphabetically), then by Coding status (alphabetically), and finally by TSO500 class (reverse-alphabetically); - inclusion criteria based on protein change consequences (as reported by Nirvana) have been extended (the following SequenceOntology terms now trigger inclusion: frameshift_variant, start_lost, stop_lost, stop_gained, splice_acceptor_variant, splice_donor_variant) - some gene names have been harmonized in order to increase compatibility between the LocalApp and PCGR/CPSR (\"MYCL1\" -> \"MYCL\", \"C11orf30\" -> \"C11ORF30\"); - "Coding_status" field format changes: - the "_variant" affix in the standard SequenceOntology terms is no longer being removed; - status "x_noncoding" is now reported as "x:noncoding_variant" instead; - field names changes in the VIT; - changes to TSOPPI's plots: - the "[deprecated]" keyword has been removed from sample QC plots; - Illumina's callability metric (percentage of exon bases with coverage >= 50) has been added to DNA sample QC plots; - contamination run metrics plots have been slightly adjusted for visual clarity; - multiple changes to CNV plots: - adding detailed chromosome-wise VAF plots with labels for variants included in the interpretation table; - adding chromosome-wise gene CNV plots; - adding a genome-wide small variant VAF plot with variant sequential order on the x-axis (instead of genomic location); - row names/numbers are no longer present in the intermediate "master_metrics_table.tsv" file produced during run metrics plotting (the column/field headers now refer to the correct data items); - headers are now present in both small variant overview tables. - removed: - IGV snapshot creation functionality. - caveats: - the patient sex, as estimated by the LocalApp, can be wrong. 21-06-07 (TSOPPI version 0.1) - fixing broken IGV port command links. 21-06-02 - harmonization of parameter nomenclature across all TSOPPI tools (please note: this implies numerous parameter name changes in the tool set); - introduction of a new parameter to the DNA and RNA post-processing tools: "--inpred_nomenclature" (:doc:`InPreD sample ID nomenclature ` will be assumed to be in use only if this parameter is set to "True"); - when applicable, the new InPreD sample ID nomenclature is now reflected in all sample-wise QC plots; - changing multiple internal parameter values in the `DNA sample post-processing tool` [deprecated as of v.0.3] (these parameters don't affect which variants will be present in the output files, they only affect how the output variants will be flagged): MIN_TUMOR_DP: 10 -> 50; MIN_TUMOR_VAF: 0.03 -> 0.05; MAX_TUMOR_VAF: 0.98 -> 0.99; - changing "\*htm" files into "\*html" files. 21-05-24 - when utilizing a normal sample, the pipeline version string should now correctly conveys that information (stating "TN", instead of the previous erroneous "T"); - genome-wide CNV plots now display centromeres, BAF plots now show GL_P variants; - instead of the number of processed samples, the variant recurrence strings now show the number of callable samples for given variant position. 21-04-14 - initial version. (last updated: 2022-03-04)