vg
tools for working with variation graphs
Public Types | Public Member Functions | Static Public Member Functions | Public Attributes | Static Public Attributes | List of all members
vg::GaplessExtender Class Reference

#include <gbwt_extender.hpp>

Public Types

typedef GaplessExtension::seed_type seed_type
 
typedef pair_hash_set< seed_typecluster_type
 

Public Member Functions

 GaplessExtender ()
 Create an empty GaplessExtender. More...
 
 GaplessExtender (const gbwtgraph::GBWTGraph &graph, const Aligner &aligner)
 Create a GaplessExtender using the given GBWTGraph and Aligner objects. More...
 
std::vector< GaplessExtensionextend (cluster_type &cluster, std::string sequence, const gbwtgraph::CachedGBWTGraph *cache=nullptr, size_t max_mismatches=MAX_MISMATCHES, double overlap_threshold=OVERLAP_THRESHOLD) const
 

Static Public Member Functions

static seed_type to_seed (pos_t pos, size_t read_offset)
 Convert (graph position, read offset) to a seed. More...
 
static pos_t get_pos (seed_type seed)
 Get the graph position from a seed. More...
 
static handle_t get_handle (seed_type seed)
 Get the handle from a seed. More...
 
static size_t get_node_offset (seed_type seed)
 Get the node offset from a seed. More...
 
static size_t get_read_offset (seed_type seed)
 Get the read offset from a seed. More...
 
static bool full_length_extensions (const std::vector< GaplessExtension > &result, size_t max_mismatches=MAX_MISMATCHES)
 

Public Attributes

const gbwtgraph::GBWTGraph * graph
 
const Aligneraligner
 
ReadMasker mask
 

Static Public Attributes

constexpr static size_t MAX_MISMATCHES = 4
 The default value for the maximum number of mismatches. More...
 
constexpr static double OVERLAP_THRESHOLD = 0.8
 

Detailed Description

A class that supports haplotype-consistent seed extension using GBWTGraph. Each seed is a pair of matching read/graph positions and each extension is a gapless alignment of an interval of the read to a haplotype. A cluster is an unordered set of distinct seeds. Seeds in the same node with the same (read_offset - node_offset) difference are considered equivalent. GaplessExtender also needs an Aligner object for scoring the extension candidates.

Member Typedef Documentation

◆ cluster_type

◆ seed_type

Constructor & Destructor Documentation

◆ GaplessExtender() [1/2]

vg::GaplessExtender::GaplessExtender ( )

Create an empty GaplessExtender.

◆ GaplessExtender() [2/2]

vg::GaplessExtender::GaplessExtender ( const gbwtgraph::GBWTGraph &  graph,
const Aligner aligner 
)

Create a GaplessExtender using the given GBWTGraph and Aligner objects.

Member Function Documentation

◆ extend()

std::vector< GaplessExtension > vg::GaplessExtender::extend ( cluster_type cluster,
std::string  sequence,
const gbwtgraph::CachedGBWTGraph *  cache = nullptr,
size_t  max_mismatches = MAX_MISMATCHES,
double  overlap_threshold = OVERLAP_THRESHOLD 
) const

Find the highest-scoring extension for each seed in the cluster. If there is a full-length extension with at most max_mismatches mismatches, sort them in descending order by score and return the best non-overlapping full-length extensions. Two extensions overlap if the fraction of identical base mappings is greater than overlap_threshold. If there are no good enough full-length extensions, trim the extensions to maximize the score and remove duplicates. In this case, the extensions are sorted by read interval. Use full_length_extensions() to determine the type of the returned extension set. The sequence that will be aligned is passed by value. All non-ACGT characters are masked with character X, which should not match any character in the graph. Allow any number of mismatches in the initial node, at least max_mismatches mismatches in the entire extension, and at least max_mismatches / 2 mismatches on each flank. Use the provided CachedGBWTGraph or allocate a new one.

◆ full_length_extensions()

bool vg::GaplessExtender::full_length_extensions ( const std::vector< GaplessExtension > &  result,
size_t  max_mismatches = MAX_MISMATCHES 
)
static

Determine whether the extension set contains non-overlapping full-length extensions sorted in descending order by score. Use the same value of max_mismatches as in extend().

◆ get_handle()

static handle_t vg::GaplessExtender::get_handle ( seed_type  seed)
inlinestatic

Get the handle from a seed.

◆ get_node_offset()

static size_t vg::GaplessExtender::get_node_offset ( seed_type  seed)
inlinestatic

Get the node offset from a seed.

◆ get_pos()

static pos_t vg::GaplessExtender::get_pos ( seed_type  seed)
inlinestatic

Get the graph position from a seed.

◆ get_read_offset()

static size_t vg::GaplessExtender::get_read_offset ( seed_type  seed)
inlinestatic

Get the read offset from a seed.

◆ to_seed()

static seed_type vg::GaplessExtender::to_seed ( pos_t  pos,
size_t  read_offset 
)
inlinestatic

Convert (graph position, read offset) to a seed.

Member Data Documentation

◆ aligner

const Aligner* vg::GaplessExtender::aligner

◆ graph

const gbwtgraph::GBWTGraph* vg::GaplessExtender::graph

◆ mask

ReadMasker vg::GaplessExtender::mask

◆ MAX_MISMATCHES

constexpr size_t vg::GaplessExtender::MAX_MISMATCHES = 4
staticconstexpr

The default value for the maximum number of mismatches.

◆ OVERLAP_THRESHOLD

constexpr double vg::GaplessExtender::OVERLAP_THRESHOLD = 0.8
staticconstexpr

Two full-length alignments are distinct, if the fraction of overlapping position pairs is at most this.


The documentation for this class was generated from the following files: