vg
tools for working with variation graphs
Classes | Public Member Functions | Public Attributes | Static Public Attributes | List of all members
vg::WFAExtender Class Reference

#include <gbwt_extender.hpp>

Classes

struct  ErrorModel
 

Public Member Functions

 WFAExtender ()
 Create an empty WFAExtender. More...
 
 WFAExtender (const gbwtgraph::GBWTGraph &graph, const Aligner &aligner, const ErrorModel &error_model=default_error_model)
 
WFAAlignment connect (std::string sequence, pos_t from, pos_t to) const
 
WFAAlignment suffix (const std::string &sequence, pos_t from) const
 
WFAAlignment prefix (const std::string &sequence, pos_t to) const
 

Public Attributes

const gbwtgraph::GBWTGraph * graph
 
ReadMasker mask
 
const Aligneraligner
 
const ErrorModelerror_model
 

Static Public Attributes

static const ErrorModel default_error_model
 If not specified, we use this default error model. More...
 

Detailed Description

A class that supports haplotype-consistent seed extension in a GBWTGraph using the WFA algorithm:

Marco-Sola, Moure, Moreto, Espinosa: Fast gap-affine pairwise alignment using the wavefront algorithm. Bioinformatics, 2021.

The algorithm either tries to connect two seeds or extends a seed to the start/end of the read.

WFAExtender also needs an Aligner object for scoring the extension candidates. While VG wants to maximize a four-parameter alignment score, WFA minimizes a three-parameter score. We use the conversion between the parameters from:

Eizenga, Paten: Improving the time and space complexity of the WFA algorithm and generalizing its scoring. bioRxiv, 2022.

VG scores a gap of length n as gap_open + (n - 1) * gap_extend, while WFA papers use gap_open + n * gap_extend. Hence we use gap_open - gap_extend as the effective four-parameter gap open score inside the aligner.

NOTE: Most internal arithmetic operations use 32-bit integers.

Constructor & Destructor Documentation

◆ WFAExtender() [1/2]

vg::WFAExtender::WFAExtender ( )

Create an empty WFAExtender.

◆ WFAExtender() [2/2]

vg::WFAExtender::WFAExtender ( const gbwtgraph::GBWTGraph &  graph,
const Aligner aligner,
const ErrorModel error_model = default_error_model 
)

Create a WFAExtender using the given GBWTGraph and Aligner objects. If an error model is passed, use that instead of the default error model. All arguments must outlive the WFAExtender.

Member Function Documentation

◆ connect()

WFAAlignment vg::WFAExtender::connect ( std::string  sequence,
pos_t  from,
pos_t  to 
) const

Align the sequence to a haplotype between the two graph positions.

The endpoints are assumed to be valid graph positions. In order for there to be an alignment, there must be a haplotype that includes the endpoints and connects them. However, the endpoints are not covered by the returned alignment.

The sequence that will be aligned is passed by value. All non-ACGT characters are masked with character X, which should not match any character in the graph.

Returns a failed alignment if there is no alignment with an acceptable score.

NOTE: The alignment is to a path after from and before to. If the points are identical, such a path can only exist if there is a cycle.

◆ prefix()

WFAAlignment vg::WFAExtender::prefix ( const std::string &  sequence,
pos_t  to 
) const

A special case of connect() for aligning the sequence to a haplotype ending at the given position. If there is no alignment for the entire sequence with an acceptable score, returns the highest-scoring partial alignment, which may be empty.

Applies the full-length bonus if the result begins with a match or mismatch. TODO: Use the full-length bonus to determine the optimal alignment.

NOTE: This creates a prefix of the full alignment by aligning a suffix of the sequence.

◆ suffix()

WFAAlignment vg::WFAExtender::suffix ( const std::string &  sequence,
pos_t  from 
) const

A special case of connect() for aligning the sequence to a haplotype starting at the given position. If there is no alignment for the entire sequence with an acceptable score, returns the highest-scoring partial alignment, which may be empty.

Applies the full-length bonus if the result ends with a match or mismatch. TODO: Use the full-length bonus to determine the optimal alignment.

NOTE: This creates a suffix of the full alignment by aligning a prefix of the sequence.

Member Data Documentation

◆ aligner

const Aligner* vg::WFAExtender::aligner

◆ default_error_model

const WFAExtender::ErrorModel vg::WFAExtender::default_error_model
static

◆ error_model

const ErrorModel* vg::WFAExtender::error_model

◆ graph

const gbwtgraph::GBWTGraph* vg::WFAExtender::graph

◆ mask

ReadMasker vg::WFAExtender::mask

The documentation for this class was generated from the following files:
vg::WFAExtender::ErrorModel::default_mismatches
constexpr static Event default_mismatches()
Default error model for mismatches.
Definition: gbwt_extender.hpp:380
vg::WFAExtender::ErrorModel::default_gap_length
constexpr static Event default_gap_length()
Default error model for gap length.
Definition: gbwt_extender.hpp:386
vg::WFAExtender::ErrorModel::default_gaps
constexpr static Event default_gaps()
Default error model for gaps.
Definition: gbwt_extender.hpp:383
vg::WFAExtender::ErrorModel::default_distance
constexpr static Event default_distance()
Default error model for distance.
Definition: gbwt_extender.hpp:389