vg
tools for working with variation graphs
|
#include <gbwt_extender.hpp>
Public Types | |
typedef GaplessExtension::seed_type | seed_type |
typedef pair_hash_set< seed_type > | cluster_type |
Public Member Functions | |
GaplessExtender () | |
Create an empty GaplessExtender. More... | |
GaplessExtender (const gbwtgraph::GBWTGraph &graph, const Aligner &aligner) | |
Create a GaplessExtender using the given GBWTGraph and Aligner objects. More... | |
std::vector< GaplessExtension > | extend (cluster_type &cluster, std::string sequence, const gbwtgraph::CachedGBWTGraph *cache=nullptr, size_t max_mismatches=MAX_MISMATCHES, double overlap_threshold=OVERLAP_THRESHOLD) const |
Static Public Member Functions | |
static seed_type | to_seed (pos_t pos, size_t read_offset) |
Convert (graph position, read offset) to a seed. More... | |
static pos_t | get_pos (seed_type seed) |
Get the graph position from a seed. More... | |
static handle_t | get_handle (seed_type seed) |
Get the handle from a seed. More... | |
static size_t | get_node_offset (seed_type seed) |
Get the node offset from a seed. More... | |
static size_t | get_read_offset (seed_type seed) |
Get the read offset from a seed. More... | |
static bool | full_length_extensions (const std::vector< GaplessExtension > &result, size_t max_mismatches=MAX_MISMATCHES) |
Public Attributes | |
const gbwtgraph::GBWTGraph * | graph |
const Aligner * | aligner |
ReadMasker | mask |
Static Public Attributes | |
constexpr static size_t | MAX_MISMATCHES = 4 |
The default value for the maximum number of mismatches. More... | |
constexpr static double | OVERLAP_THRESHOLD = 0.8 |
A class that supports haplotype-consistent seed extension using GBWTGraph. Each seed is a pair of matching read/graph positions and each extension is a gapless alignment of an interval of the read to a haplotype. A cluster is an unordered set of distinct seeds. Seeds in the same node with the same (read_offset - node_offset) difference are considered equivalent. GaplessExtender also needs an Aligner object for scoring the extension candidates.
vg::GaplessExtender::GaplessExtender | ( | ) |
Create an empty GaplessExtender.
vg::GaplessExtender::GaplessExtender | ( | const gbwtgraph::GBWTGraph & | graph, |
const Aligner & | aligner | ||
) |
Create a GaplessExtender using the given GBWTGraph and Aligner objects.
std::vector< GaplessExtension > vg::GaplessExtender::extend | ( | cluster_type & | cluster, |
std::string | sequence, | ||
const gbwtgraph::CachedGBWTGraph * | cache = nullptr , |
||
size_t | max_mismatches = MAX_MISMATCHES , |
||
double | overlap_threshold = OVERLAP_THRESHOLD |
||
) | const |
Find the highest-scoring extension for each seed in the cluster. If there is a full-length extension with at most max_mismatches mismatches, sort them in descending order by score and return the best non-overlapping full-length extensions. Two extensions overlap if the fraction of identical base mappings is greater than overlap_threshold. If there are no good enough full-length extensions, trim the extensions to maximize the score and remove duplicates. In this case, the extensions are sorted by read interval. Use full_length_extensions() to determine the type of the returned extension set. The sequence that will be aligned is passed by value. All non-ACGT characters are masked with character X, which should not match any character in the graph. Allow any number of mismatches in the initial node, at least max_mismatches mismatches in the entire extension, and at least max_mismatches / 2 mismatches on each flank. Use the provided CachedGBWTGraph or allocate a new one.
|
static |
Determine whether the extension set contains non-overlapping full-length extensions sorted in descending order by score. Use the same value of max_mismatches as in extend().
Get the handle from a seed.
|
inlinestatic |
Get the node offset from a seed.
Get the graph position from a seed.
|
inlinestatic |
Get the read offset from a seed.
Convert (graph position, read offset) to a seed.
const Aligner* vg::GaplessExtender::aligner |
const gbwtgraph::GBWTGraph* vg::GaplessExtender::graph |
ReadMasker vg::GaplessExtender::mask |
|
staticconstexpr |
The default value for the maximum number of mismatches.
|
staticconstexpr |
Two full-length alignments are distinct, if the fraction of overlapping position pairs is at most this.