Chapter 4 Understanding pVACtools outputs
4.1 Learning Objectives
This chapter will cover:
- Understanding the output files produced by pVACtools
- Interpreting the .filtered.tsv file
- Interpreting the .aggregated.tsv file
4.2 pVACtools Output Files
Both pVACseq and pVACfuse produce three main output files:
- The
all_epitopes.tsv
file is a TSV file with all predicted neoantigens and all information obtained during the run. - The
filtered.tsv
file is the same structure as the all_epitopes.tsv file but the entries have been filtered down according to the thresholds set by the user during the run. The filters will be further explained in subsequent sections. - The
aggregated.tsv
is a condensed output file that contains only the information most pertinent to interpret the results. It has contains only the best neoantigen candidate for each variant. Our heuristic for determining the best neoantigen is described in subsequent sections of this course.
There are also a number of a secondary output files produced by pVACseq and pVACfuse. The most important are:
aggregated.metrics.json
: The file is only produced by pVACseq. It contains metadata needed for visualizing your results in pVACview.aggregated.tsv.reference_matches
: This file is created when the reference proteome match feature is enabled during a run. It contains detailed information about the reference matches found, if there are any.
4.3 Interpreting the filtered.tsv File
The filtered.tsv file takes all the predicted neoantigens from the all_epitopes.tsv file and applies a number of filters to it. Filters are applied consecutively, meaning that only the entries passing the first filter will be passed along to the second filter, and so on. Only neoantigens passing all filters will be reported in this file.
4.3.1 Binding Filter
The binding filter’s primary function is to filter neoantigen candidates on
their IC50 binding affinity to an HLA allele. Because pVACtools allows users
to run more than one prediction algorithm, we then apply two summarization
methods on the calls for each neoantigen candidate and HLA allele combination:
(1) pVACtools calculates the median IC50 binding affinity for all selected prediction
algorithms (reported in the Median [MT] IC50 Score
column), and (2) pVACtools selects
the IC50 binding affinity prediction with the lowest value (reported in the
Best [MT] IC50 Score)
column. By default,
the binding filter is applied to the median IC50 score unless
users set the --top-score-metric
parameters to lowest
.
The binding filter discards candidates where the binding affinity is above the
--binding-threshold
(default: 500). However, users may set the
--allele-specific-binding-thresholds
flag in order to use differing binding
thresholds depending on the HLA allele of the prediction, as recommended by
IEDB.
Custom thresholds are available for the most common 76 class I HLA alleles.
For all others, the --binding-threshold
value is used.
In addition to the binding affinity, other optional parameters can be set to enabled additional filtering on related metrics:
--minimum-fold-change
: The fold change is the ratio of the mutant binding affinity to the wild-type binding affinity, also called agretopicity. A fold change of 1 means that the mutant is a better binder than the wild type. pVACtools calculates this ratio for both the median as well as the lowest values. Which one is filtered on for this metric depends again on the--top-score-metric
set. When a minimum fold change parameter is set, the binding filter discards any prediction with a agretopicity below the set cutoff. This parameter is not available in pVACfuse because there is no matched wildtype peptide for each neoantigen candidate.--percentile-threshold
: The prediction algorithms supported by pVACtools also report a percentile score that represents where each neoantigen’s predicted affinity falls in the range of other values for an HLA allele. Similar to the binding affinity itself, pVACtools report the median and the lowest percentile scores for the range of scores reported by the prediction algorithms chosen by the user and which on is used for filtering is again controlled by the--top-score-metric
parameter.
4.3.2 Coverage Filter
The Coverage Filter is generally used to filter out variants that don’t have enough read support or expression. This ensures that the remaining variants are not just artifacts and that the genes are actually expressed in the patient’s RNA.
For pVACseq, this generally relies on your VCF being annotated with coverage
and expression data. In our example, the VCF has already been annotated with
this data. For more information about how to add coverage
and expression data
to your own VCFs, please see our docs.
Additionally, filtering on the normal DNA depth and variant allele frequency
(VAF) requires your VCF to be a tumor-normal sample VCF and the normal sample
to be identifies in your pVACseq run using the --normal-sample-name
parameter. If a coverage metric doesn’t apply because the underlying data is
not available, NA
is reported by pVACtools. By default, the filter will skip
evaluating a coverage criteria when a neoantigen’s value for it is NA
.
The following thresholds are applied in pVACseq by this filter:
--normal-cov
: Normal coverage cutoff. Minimum number of required reads in the normal DNA (default: 5).--tdna-cov
: Tumor DNA coverage cutoff. Minimum number of required reads in the tumor DNA (default: 10).--trna-cov
: Tumor RNA coverage cutoff. Minimum number of required reads in the tumor RNA (default: 10).--normal-vaf
: Normal VAF cutoff. Only sites BELOW this cutoff in the normal DNA will be considered (default: 0.02).--tdna-vaf
: Tumor DNA VAF cutoff. Only sites above this cutoff will be considered (default: 0.25).--trna-vaf
: Tumor RNA VAF cutoff. Only sites above this cutoff will be considered (default: 0.25).--expn-val
: Gene and Transcript expression cutoff. Only sites above this cutoff will be considered (default: 1.0).
For pVACfuse, this filter evaluates a fusion variant’s fusion read support and fusion transcript expression.
Arriba natively outputs a number of read metrics. These are the number of supporting split fragments with an anchor in
gene1 or gene2, respectively, as well as the number of pairs (fragments) of discordant mates supporting the fusion
(a.k.a. spanning reads or bridge reads). The sum of these three values is
reported as Read Support in pVACfuse. The fusion transcript expression is
parsed from the --starfusion-file
, when provided. This is reported as FFPM
(fusion fragments per million total reads).
The following thresholds are applied in pVACfuse by this filter:
--read-support
: Read Support cutoff. Sites above this cutoff will be considered (default: 5).--expn-val
: Expression cutoff. Sites above this cutoff will be considered (default: 0.1).
4.3.3 Transcript Support Level Filter
The Transcript Support Level (TSL) Filter removes neoantigen candidates for
transcripts with a high TSL, as defined by Ensembl.
The cutoff for this filter is set by the --maximum-transcript-support-level
parameter. Transcripts with a TSL of NA will always be filtered out.
Annotation with TSL values through VEP is only available for GRCh38. For other species and older builds, a value of “Not Supported” is written to the report and the TSL filter will skip those variants.
This filter is currently only run by pVACseq.
4.3.4 Top Score Filter
The Top Score Filter will attempt to determine the best neoantigen candidate for each variants.
For pVACseq it works as follows. Given a set of neoantigen candidates for a variant we first group the transcripts into sets where all transcripts in a set code for the same set of neoantigen candidates. For each transcript set we then determine the best neoantigen candidate as follows:
- Pick all neoantigens with a variant transcript that have a protein_coding Biotype
- Of the remaining candidates, pick the ones with a variant transcript having a
TSL less then the
--maximum-transcript-support-level
. - Of the remaining candidates, pick the entries with no Problematic Positions.
- Of the remaining candidates, pick the ones passing the Anchor Criteria (explained in more detail further below).
- Of the remaining candidates, pick the one with the lowest MT IC50 Score (Median or Best
depending on the
--top-score-metric
), lowest TSL, and longest transcript.
This filter then reports the best neoantigen candidate for each transcript set.
For pVACfuse, the neoantigen candidate for each fusion are similarly grouped
into sets where all transcript1-transcript2 combinations in a set code for the
same set of neoantigen candidates. From there, the best neoantigen candidate
for each transcript set is determined by picking the candidate with the lowest
MT IC50 Score (Median or Best depending on the --top-score-metric
) and the
highest fusion transcript expression.
4.4 Interpreting the aggregated.tsv File
The aggregated.tsv
is a condensed output file that shows the best neoantigen
candidate for each variant and reports only the information most pertinent to
interpreting the results. It also assigns each of the selected neoantigen candidates
a tier based on its suitability for vaccine manufacturing.
Only epitopes meeting the --aggregate-inclusion-threshold
are included in this report
(default: 5000). Depending on the value used for the --top-score-metric
, all neoantigen
candidates with a Median or Best MT IC50 Score below the selected --aggregate-inclusion-threshold
are included in creating this report.
4.4.1 Determining the Best Transcript and Best Peptide of a Variant
In pVACseq, for each variant, all neoantigen candidates meeting the --aggregate-inclusion-threshold
are evaluated as follows:
- Pick all entries with a variant transcript that have a protein_coding Biotype.
- Of the remaining entries, pick the ones with a variant transcript having a Transcript Support Level <=
--maximum-transcript-support-level
. - Of the remaining entries, pick the entries with no Problematic Positions.
- Of the remaining entries, pick the ones passing the Anchor Criteria (see Criteria Details section below).
- Of the remaining entries, pick the one with the lowest MT IC50 score( Median or Best
depending on the
--top-score-metric
), lowest Transcript Support Level, and longest transcript.
In pVACfuse, the neoantigen candidate with the lowest IC50 binding affinity for each variant is selected.
The value used for the --top-score-metric
determines whether the lowest or
median binding affinity is used for this comparison.
The chosen entry determines the best neoantigen candidate and the best transcript coding for it.
4.4.2 Tier and Tiering Criteria
For the purpose of assigning tiers, each best peptide is evaluated by a set of criteria. These criteria and the available tiers differ from tool to tool.
4.4.2.1 Tiering in pVACseq
The Tiers available in pVACseq are:
Tier | Criteria |
---|---|
Pass | Best Peptide passes the binding, expression, tsl, clonal, and anchor criteria |
Anchor | Best Peptide fails the anchor criteria but passes the binding, expression, tsl, and clonal criteria |
Subclonal | Best Peptide fails the clonal criteria but passes the binding, tsl, and anchor criteria |
LowExpr | Best Peptide meets the Low Expression Criteria and passes the binding, tsl, clonal, and anchor criteria |
NoExpr | Best Peptide is not expressed (RNA Expr == 0 or RNA VAF == 0) |
Poor | Best Peptide doesn’t fit in any of the above tiers, usually if it fails two or more criteria or if it fails the binding criteria |
Criteria Details
Criteria | Description | Evaluation |
---|---|---|
Binding Criteria | Pass if Best Peptide is a strong binder | IC50 MT < --binding-threshold and %ile MT < --percentile-threshold (if parameter is set). --allele-specific-binding-thresholds flag is respected. |
Expression Criteria | Pass if Best Transcript is expressed | Allele Expr > --trna-vaf * --expn-val |
Low Expression Criteria | Peptide has low expression or no expression but RNA VAF and coverage | (0 < Allele Expr < --trna-vaf * --expn-val ) OR (RNA Expr == 0 AND RNA Depth > --trna-cov AND RNA VAF > --trna-vaf ) |
TSL Criteria | Pass if Best Transcript has good transcript support level | TSL <= --maximum-transcript-support-level |
Clonal Criteria | Best Peptide is likely in the founding clone of the tumor | DNA VAF > --tumor-purity / 4 |
Anchor Criteria | Fail if all mutated amino acids of the Best Peptide (Pos) are at an anchor position and the WT peptide has good binding (IC50 WT < --binding-threshold ). --allele-specific-binding-thresholds flag is respected. |
4.4.2.2 Tiering in pVACfuse
The Tiers available in pVACfuse are:
Tier | Criteria |
---|---|
Pass | Best Peptide passes the binding, read support, and expression criteria |
LowReadSupport | Best Peptide fails the read support criteria but passes the binding and expression criteria |
LowExpr | Best Peptide fails the expression criteria but passes the binding and read support criteria |
Poor | Best Peptide doesn’t fit any of the above tiers, usually if it fails two or more criteria or if it fails the binding criteria |
Criteria Details
Criteria | Description | Evaluation |
---|---|---|
Binding Criteria | Pass if Best Peptide is strong binder | IC50 MT < --binding-threshold and %ile MT < --percentile-threshold (if parameter is set). --allele-specific-binding-thresholds flag is respected. |
Read Support Criteria | Pass if the variant has read support | Read Support < --read-support |
Expression Criteria | Pass if Best Transcript is expressed | Expr < --expn-val |