vg
tools for working with variation graphs
Classes | Public Types | Public Member Functions | Public Attributes | Protected Member Functions | List of all members
vg::MEMClusterer Class Referenceabstract

#include <cluster.hpp>

Inheritance diagram for vg::MEMClusterer:
vg::MinDistanceClusterer vg::NullClusterer vg::OrientedDistanceClusterer vg::TVSClusterer vg::ComponentMinDistanceClusterer vg::GreedyMinDistanceClusterer

Classes

struct  DPScoreComparator
 
class  HitEdge
 
class  HitGraph
 
class  HitNode
 

Public Types

using hit_t = pair< const MaximalExactMatch *, pos_t >
 
using cluster_t = pair< vector< hit_t >, double >
 Each cluster is a vector of hits and a paired multiplicity. More...
 
using match_fanouts_t = unordered_map< const MaximalExactMatch *, deque< pair< string::const_iterator, char > >>
 

Public Member Functions

 MEMClusterer ()=default
 
virtual ~MEMClusterer ()=default
 
vector< cluster_tclusters (const Alignment &alignment, const vector< MaximalExactMatch > &mems, const GSSWAligner *Aligner, size_t min_mem_length=1, int32_t max_qual_score=60, int32_t log_likelihood_approx_factor=0, size_t min_median_mem_coverage_for_split=0, double suboptimal_edge_pruning_factor=.75, double cluster_multiplicity_diff=10.0, const match_fanouts_t *fanouts=nullptr)
 
virtual vector< pair< pair< size_t, size_t >, int64_t > > pair_clusters (const Alignment &alignment_1, const Alignment &alignment_2, const vector< cluster_t * > &left_clusters, const vector< cluster_t * > &right_clusters, const vector< pair< size_t, size_t >> &left_alt_cluster_anchors, const vector< pair< size_t, size_t >> &right_alt_cluster_anchors, int64_t optimal_separation, int64_t max_deviation)=0
 

Public Attributes

int64_t max_gap = numeric_limits<int64_t>::max()
 The largest discrepency we will allow between the read-implied distances and the estimated gap distance. More...
 

Protected Member Functions

virtual HitGraph make_hit_graph (const Alignment &alignment, const vector< MaximalExactMatch > &mems, const GSSWAligner *aligner, size_t min_mem_length, const match_fanouts_t *fanouts)=0
 
int32_t estimate_edge_score (const MaximalExactMatch *mem_1, const MaximalExactMatch *mem_2, int64_t graph_dist, const GSSWAligner *aligner) const
 
void deduplicate_cluster_pairs (vector< pair< pair< size_t, size_t >, int64_t >> &cluster_pairs, int64_t optimal_separation)
 

Member Typedef Documentation

◆ cluster_t

using vg::MEMClusterer::cluster_t = pair<vector<hit_t>, double>

Each cluster is a vector of hits and a paired multiplicity.

◆ hit_t

Each hit contains a pointer to the original MEM and the position of that particular hit in the graph.

◆ match_fanouts_t

using vg::MEMClusterer::match_fanouts_t = unordered_map<const MaximalExactMatch*, deque<pair<string::const_iterator, char> >>

Represents the mismatches that were allowed in "MEMs" from the fanout match algorithm

Constructor & Destructor Documentation

◆ MEMClusterer()

vg::MEMClusterer::MEMClusterer ( )
default

◆ ~MEMClusterer()

virtual vg::MEMClusterer::~MEMClusterer ( )
virtualdefault

Member Function Documentation

◆ clusters()

vector< MEMClusterer::cluster_t > vg::MEMClusterer::clusters ( const Alignment alignment,
const vector< MaximalExactMatch > &  mems,
const GSSWAligner Aligner,
size_t  min_mem_length = 1,
int32_t  max_qual_score = 60,
int32_t  log_likelihood_approx_factor = 0,
size_t  min_median_mem_coverage_for_split = 0,
double  suboptimal_edge_pruning_factor = .75,
double  cluster_multiplicity_diff = 10.0,
const match_fanouts_t fanouts = nullptr 
)

Returns a vector of clusters. Each cluster is represented a vector of MEM hits. Each hit contains a pointer to the original MEM and the position of that particular hit in the graph.

◆ deduplicate_cluster_pairs()

void vg::MEMClusterer::deduplicate_cluster_pairs ( vector< pair< pair< size_t, size_t >, int64_t >> &  cluster_pairs,
int64_t  optimal_separation 
)
protected

Sorts cluster pairs and removes copies of the same cluster pair, choosing only the one whose distance is closest to the optimal separation

◆ estimate_edge_score()

int32_t vg::MEMClusterer::estimate_edge_score ( const MaximalExactMatch mem_1,
const MaximalExactMatch mem_2,
int64_t  graph_dist,
const GSSWAligner aligner 
) const
protected

Once the distance between two hits has been estimated, estimate the score of the hit graph edge connecting them

◆ make_hit_graph()

virtual HitGraph vg::MEMClusterer::make_hit_graph ( const Alignment alignment,
const vector< MaximalExactMatch > &  mems,
const GSSWAligner aligner,
size_t  min_mem_length,
const match_fanouts_t fanouts 
)
protectedpure virtual

Initializes a hit graph and adds edges to it, this must be implemented by any inheriting class

Implemented in vg::ComponentMinDistanceClusterer, vg::GreedyMinDistanceClusterer, vg::MinDistanceClusterer, vg::TVSClusterer, vg::OrientedDistanceClusterer, and vg::NullClusterer.

◆ pair_clusters()

virtual vector<pair<pair<size_t, size_t>, int64_t> > vg::MEMClusterer::pair_clusters ( const Alignment alignment_1,
const Alignment alignment_2,
const vector< cluster_t * > &  left_clusters,
const vector< cluster_t * > &  right_clusters,
const vector< pair< size_t, size_t >> &  left_alt_cluster_anchors,
const vector< pair< size_t, size_t >> &  right_alt_cluster_anchors,
int64_t  optimal_separation,
int64_t  max_deviation 
)
pure virtual

Given two vectors of clusters and bounds on the distance between clusters, returns a vector of pairs of cluster numbers (one in each vector) matched with the estimated distance.

Clusters are assumed to be located at the position of the first MEM hit they contain. Optionally, additional MEMs may be identied as possible anchors for the cluster. Additional anchors are provided as pairs of (cluster index, MEM index within cluster). Only one result will be returned per pair of clusters regardless of how many alternate anchors are given.

Implemented in vg::MinDistanceClusterer, vg::TVSClusterer, vg::OrientedDistanceClusterer, and vg::NullClusterer.

Member Data Documentation

◆ max_gap

int64_t vg::MEMClusterer::max_gap = numeric_limits<int64_t>::max()

The largest discrepency we will allow between the read-implied distances and the estimated gap distance.


The documentation for this class was generated from the following files: