**ICNVA** tool ============== About ----- A prerequisite for assessment of certain gene properties/states ("high" copy number, loss of heterozygosity [LoH]) is the accurate determination of sample ploidy and/or allele-specific copy number status in specific genomic regions. The ICNVA tool aims to generate the necessary information for input TSO500/TSO500 HT samples. Functionality overview ---------------------- The tool runs two types of analysis on each input sample: - Allele-specific copy number analysis performed by `PureCN `_. (The purpose here is to assist primarily with LoH assessment.) - Normalized read coverage (NRC) profile plotting for selected genes (these can help identify/confirm intra-genic duplications/losses). An in-house set of controls is utilized during both analyses. Input files ----------- - A fasta file utilized in the initial LocalApp analysis. (e.g., "TSO_500_LocalApp_v2.2.0.12/resources/genomes/hg19_hardPAR/genome.fa"); - LocalApp-generated analysis directory for a given sample's sequencing run. - [optional] A file with user-defined custom parameters for the PureCN analysis. - [optional] A file with user-defined set of genes for the NRC analysis. Output files ------------ - [optional] A directory with intermediate PureCN analysis files (output sub-directory named "TMP"). Once generated, these files can be re-used for re-running PureCN on the same sample with a new parameter set. - A directory with PureCN analysis results utilizing the default InPreD parameters (output sub-directory named "PureCN_results_default"). - A directory with PureCN analysis results utilizing a set of custom parameters (output sub-directory named "PureCN_results_custom", the name can be changed with parameter "-\-purecn_subdirectory_name"). A predefined permissive parameter set will be used, unless a user-defined parameter set is provided via the "-\-purecn_parameter_list" option. - A PDF file with NRC profiles for selected genes. Additional notes ---------------- - As of ICNVA tool version 0.2.0, each PureCN analysis results directory contains also - a file (per reported solution) with detailed gene-wise information, useful for LoH-status checks across solutions ("_genes_s1.tsv", ..., "_genes_s.tsv"); - a table that summarizes selected information/metrics across all reported solutions ("_solution_overview.tsv"). - It is still possible to run curated PureCN analysis in order to generate complete detailed information for any solution (if the gene-wise table automatically generated for each solution does not profide all of the required information). Please use the "-\-curated_output" and "-\-curated_analysis" parameter pair in such cases. - Please note that certain sample properties/quality issues (e.g., low purity, presence of sub-clones, contamination) might obscure or dilute the signal of interest, complicating both analysis types. - Chromosome overview plots generated by PureCN (files named "_chromosomes.pdf") display incorrect segmentation. It is recommended to simply ignore these files. - The following commands can be run (instead of the "bash /inpred/run_ICNVA.sh" command) in order to view the content of internal configuration files: - "cat /inpred/resources/data/valid_NRC_gene_targets.tsv": the complete list of genes that can be selected for NRC plotting; - "cat /inpred/resources/data/default_NRC_gene_list.tsv": the default list of genes utilized for NRC plotting (can be changed via the "-\-nrc_gene_list" option); - "cat /inpred/resources/data/recognized_PureCN_options.tsv": the complete list of recognized PureCN parameters for custom analysis; - "cat /inpred/resources/data/default_PureCN_options.tsv": PureCN parameter settings utilized during default analysis; - "cat /inpred/resources/data/custom_PureCN_options.tsv": PureCN parameter settings utilized during custom analysis (unless specified otherwise with "-\-purecn_parameter_list"). - Many of the tools utilized during the ICNVA sample analysis output informative messages to the stderr output together with potential error messages. In order to mask uninteresting messages from the error logs, most of the final "LOGS/\*stderr.log" files contain only messages including strings "error", "fail" and "fault" (but not "default"). The original/complete error logs in those cases are stored within corresponding "LOGS/\*stderr.orig.log" files. Running the tool ---------------- Command line options: .. code-block:: run_ICNVA.sh [options] --help Prints this help message (the program exits). Core path options: --output_directory [opath] Required. Absolute path to the directory in which all of the output files should be stored. If not existing, the directory will be created. --reference_fasta_file [rff] Required. Absolute path to an indexed reference FASTA file (e.g., the LocalApp pipeline's reference fasta file, which is located in '[LocalApp_directory]/resources/genomes/hg19_hardPAR/genome.fa'). --host_system_mounting_directory [hsmd] Required. Absolute path to the host system mounting directory; the specified directory should include all input and output file paths in its directory tree. Tumor DNA sample related options: --dna_tumor_id [tid] Required. ID of the input tumor DNA sample, as used in the LocalApp output files. --dna_tumor_localapp_run_directory [rpath] Required. Absolute path to main LocalApp output directory generated for the sequencing run containing processed tumor DNA sample. Remaining customization options: --purecn_parameter_list [pp] Optional. Absolute path to a text file with custom PureCN parameter settings (one parameter [+ its value, where applicable] per line). (default value: 'NA' - an internal list will be used) --nrc_gene_list [nrcgl] Optional. Absolute path to a list of genes for which normalized read count plots should be generated. (default value: 'NA' - an internal list will be used) --remove_temporary_files [True|False] Optional. A switch enabling/disabling removal of the "TMP" output sub-directory (located directly under the selected output directory). (default value: 'True') --rewrite_output [True|False] Optional. If set to 'True', previously existing output will be overwritten. (default value: 'True') --curated_output [True|False] Optional. If set to 'True', detailed output will be created based on the content of previously created/edited output files (rather than for the top solution). (default value: 'False') --curated_analysis [default|custom] Optional. Which analysis should be curated? (default value: 'custom') --purecn_subdirectory_name [psn] Optional. Name of the PureCN results sub-directory where results generated with custom parameter settings should be stored (this directory will be placed directly under the selected output directory). Useful when generating PureCN results for multiple option sets. (default value: 'PureCN_results_custom') Example invocation using the Docker image: .. code-block:: $ [sudo] docker run \ --rm \ -it \ -v /hs_prefix_path:/inpred/data \ inpred/icnva_main:0.2.0 \ bash /inpred/run_ICNVA.sh \ --host_system_mounting_directory /hs_prefix_path \ --reference_fasta_file /hs_prefix_path/TSO_500_LocalApp_v2.2.0.12/resources/genomes/hg19_hardPAR/genome.fa \ --output_directory /hs_prefix_path/ICNVA_output/tumor_DNA_A \ --dna_tumor_id tumor_DNA_A \ --dna_tumor_localapp_run_directory /hs_prefix_path/analysis/run1 (last updated: 2023-11-02)