Variant summary creation tool

About

As part of the standard LocalApp output, a variant summary file is created for each sample Pair_ID specified in the input sample sheet. Saved as [LocalApp_run_output]/Results/[Pair_ID]/[Pair_ID]_CombinedVariantOutput.tsv, these files contain information about the calculated TBM and MSI values, as well as the CNV, fusion and splice variants “reportable” by the LocalApp pipeline.

The “summarize_run_variants” TSOPPI tool aggregates the information available in a set of CombinedVariantOutput.tsv files generated for a given run.

Please note that while the information contained in CombinedVariantOutput.tsv files (and therefore also in the output of this tool) is very useful for gaining a first impression of the somatic changes in the analyzed samples, the complete post-processing output generated by TSOPPI tools process DNA sample/ process RNA sample tools will provide a better basis for variant interpretation.

Input files

  • A set of [LocalApp_run_output]/Results/[Pair_ID]/[Pair_ID]_CombinedVariantOutput.tsv files for a given LocalApp analysis run (the “[LocalApp_run_output]” directory should be specified with the --analysis_results_directory parameter; the individual CombinedVariantOutput.tsv files within that directory are found automatically).

Running the tool

Command line options:

usage: summarize_run_variants.py [-h] [-v] -r ANALYSIS_RESULTS_DIRECTORY -o
                               OUTPUT_FILE -s HOST_SYSTEM_MOUNTING_DIRECTORY
                               [-c CONTAINER_MOUNTING_DIRECTORY]

Condense all CombinedVariantOutput.tsv files contained in the specified
directory`s tree into a variant-overview table. The resulting table contains
one line per sample (i.e., one line per found CombinedVariantOutput.tsv
file.

  -r ANALYSIS_RESULTS_DIRECTORY, --analysis_results_directory ANALYSIS_RESULTS_DIRECTORY
                        absolute path to a TSO500 LocalApp output directory
                        (or some other directory containing
                        CombinedVariantOutput.tsv files in its directory
                        tree)
  -o OUTPUT_FILE, --output_file OUTPUT_FILE
                        absolute path to the output table/file with per-sample
                        variant summaries
  -s HOST_SYSTEM_MOUNTING_DIRECTORY, --host_system_mounting_directory HOST_SYSTEM_MOUNTING_DIRECTORY
                        absolute path to the host system mounting directory;
                        the specified directory should include all input and
                        output file paths in its directory tree
optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program`s version number and exit
  -c CONTAINER_MOUNTING_DIRECTORY, --container_mounting_directory CONTAINER_MOUNTING_DIRECTORY
                        container`s inner mounting point; the host system
                        mounting directory path/prefix will be replaced by the
                        container mounting directory path in all input and
                        output file paths (this parameter likely shouldn`t be
                        changed)

Example invocation using the Docker image:

$ [sudo] docker run \
    --rm \
    -it \
    -v /hs_prefix_path:/inpred/data \
    inpred/tsoppi_main:v0.1 \
      python /inpred/user_scripts/summarize_run_variants.py \
        --analysis_results_directory /hs_prefix_path/analysis/run1 \
        --output_file /hs_prefix_path/postprocessing/run1/run1_variant_summary.tsv \
        --host_system_mounting_directory /hs_prefix_path

Output files

A single output file (with the path specified by parameter --output_file) is created; it contains aggregated and re-formatted information retrieved from the input files. Please view the file’s header for details regarding the output format.