vg
tools for working with variation graphs
|
#include <gbwt_extender.hpp>
Classes | |
struct | ErrorModel |
Public Member Functions | |
WFAExtender () | |
Create an empty WFAExtender. More... | |
WFAExtender (const gbwtgraph::GBWTGraph &graph, const Aligner &aligner, const ErrorModel &error_model=default_error_model) | |
WFAAlignment | connect (std::string sequence, pos_t from, pos_t to) const |
WFAAlignment | suffix (const std::string &sequence, pos_t from) const |
WFAAlignment | prefix (const std::string &sequence, pos_t to) const |
Public Attributes | |
const gbwtgraph::GBWTGraph * | graph |
ReadMasker | mask |
const Aligner * | aligner |
const ErrorModel * | error_model |
Static Public Attributes | |
static const ErrorModel | default_error_model |
If not specified, we use this default error model. More... | |
A class that supports haplotype-consistent seed extension in a GBWTGraph using the WFA algorithm:
Marco-Sola, Moure, Moreto, Espinosa: Fast gap-affine pairwise alignment using the wavefront algorithm. Bioinformatics, 2021.
The algorithm either tries to connect two seeds or extends a seed to the start/end of the read.
WFAExtender also needs an Aligner object for scoring the extension candidates. While VG wants to maximize a four-parameter alignment score, WFA minimizes a three-parameter score. We use the conversion between the parameters from:
Eizenga, Paten: Improving the time and space complexity of the WFA algorithm and generalizing its scoring. bioRxiv, 2022.
VG scores a gap of length n
as gap_open + (n - 1) * gap_extend
, while WFA papers use gap_open + n * gap_extend
. Hence we use gap_open - gap_extend
as the effective four-parameter gap open score inside the aligner.
NOTE: Most internal arithmetic operations use 32-bit integers.
vg::WFAExtender::WFAExtender | ( | ) |
Create an empty WFAExtender.
vg::WFAExtender::WFAExtender | ( | const gbwtgraph::GBWTGraph & | graph, |
const Aligner & | aligner, | ||
const ErrorModel & | error_model = default_error_model |
||
) |
Create a WFAExtender using the given GBWTGraph and Aligner objects. If an error model is passed, use that instead of the default error model. All arguments must outlive the WFAExtender.
WFAAlignment vg::WFAExtender::connect | ( | std::string | sequence, |
pos_t | from, | ||
pos_t | to | ||
) | const |
Align the sequence to a haplotype between the two graph positions.
The endpoints are assumed to be valid graph positions. In order for there to be an alignment, there must be a haplotype that includes the endpoints and connects them. However, the endpoints are not covered by the returned alignment.
The sequence that will be aligned is passed by value. All non-ACGT characters are masked with character X, which should not match any character in the graph.
Returns a failed alignment if there is no alignment with an acceptable score.
NOTE: The alignment is to a path after from
and before to
. If the points are identical, such a path can only exist if there is a cycle.
WFAAlignment vg::WFAExtender::prefix | ( | const std::string & | sequence, |
pos_t | to | ||
) | const |
A special case of connect() for aligning the sequence to a haplotype ending at the given position. If there is no alignment for the entire sequence with an acceptable score, returns the highest-scoring partial alignment, which may be empty.
Applies the full-length bonus if the result begins with a match or mismatch. TODO: Use the full-length bonus to determine the optimal alignment.
NOTE: This creates a prefix of the full alignment by aligning a suffix of the sequence.
WFAAlignment vg::WFAExtender::suffix | ( | const std::string & | sequence, |
pos_t | from | ||
) | const |
A special case of connect() for aligning the sequence to a haplotype starting at the given position. If there is no alignment for the entire sequence with an acceptable score, returns the highest-scoring partial alignment, which may be empty.
Applies the full-length bonus if the result ends with a match or mismatch. TODO: Use the full-length bonus to determine the optimal alignment.
NOTE: This creates a suffix of the full alignment by aligning a prefix of the sequence.
const Aligner* vg::WFAExtender::aligner |
|
static |
If not specified, we use this default error model.
const ErrorModel* vg::WFAExtender::error_model |
const gbwtgraph::GBWTGraph* vg::WFAExtender::graph |
ReadMasker vg::WFAExtender::mask |