vg
tools for working with variation graphs
Public Member Functions | Protected Member Functions | Protected Attributes | List of all members
vg::OrientedDistanceClusterer Class Reference

#include <cluster.hpp>

Inheritance diagram for vg::OrientedDistanceClusterer:
vg::MEMClusterer

Public Member Functions

 OrientedDistanceClusterer (OrientedDistanceMeasurer &distance_measurer, size_t max_expected_dist_approx_error=8)
 Constructor. More...
 
vector< pair< pair< size_t, size_t >, int64_t > > pair_clusters (const Alignment &alignment_1, const Alignment &alignment_2, const vector< cluster_t * > &left_clusters, const vector< cluster_t * > &right_clusters, const vector< pair< size_t, size_t >> &left_alt_cluster_anchors, const vector< pair< size_t, size_t >> &right_alt_cluster_anchors, int64_t optimal_separation, int64_t max_deviation)
 Concrete implementation of virtual method from MEMClusterer. More...
 
- Public Member Functions inherited from vg::MEMClusterer
 MEMClusterer ()=default
 
virtual ~MEMClusterer ()=default
 
vector< cluster_tclusters (const Alignment &alignment, const vector< MaximalExactMatch > &mems, const GSSWAligner *Aligner, size_t min_mem_length=1, int32_t max_qual_score=60, int32_t log_likelihood_approx_factor=0, size_t min_median_mem_coverage_for_split=0, double suboptimal_edge_pruning_factor=.75, double cluster_multiplicity_diff=10.0, const match_fanouts_t *fanouts=nullptr)
 

Protected Member Functions

unordered_map< pair< size_t, size_t >, int64_t > get_on_strand_distance_tree (size_t num_items, const function< pos_t(size_t)> &get_position, const function< int64_t(size_t)> &get_offset)
 
void extend_dist_tree_by_permutations (const function< pos_t(size_t)> &get_position, const function< int64_t(size_t)> &get_offset, size_t num_items, int64_t max_failed_distance_probes, size_t decrement_frequency, unordered_map< pair< size_t, size_t >, int64_t > &recorded_finite_dists, map< pair< size_t, size_t >, size_t > &num_infinite_dists, UnionFind &component_union_find, size_t &num_possible_merges_remaining)
 
void extend_dist_tree_by_buckets (const function< pos_t(size_t)> &get_position, const function< int64_t(size_t)> &get_offset, size_t num_items, unordered_map< pair< size_t, size_t >, int64_t > &recorded_finite_dists, UnionFind &component_union_find, size_t &num_possible_merges_remaining)
 
void exclude_dist_tree_merges (const function< pos_t(size_t)> &get_position, map< pair< size_t, size_t >, size_t > &num_infinite_dists, UnionFind &component_union_find, size_t &num_possible_merges_remaining, int64_t max_failed_distance_probes)
 
vector< unordered_map< size_t, int64_t > > flatten_distance_tree (size_t num_items, const unordered_map< pair< size_t, size_t >, int64_t > &recorded_finite_dists)
 
vector< pair< size_t, size_t > > compute_tail_mem_coverage (const Alignment &alignment, const vector< MaximalExactMatch > &mems)
 
HitGraph make_hit_graph (const Alignment &alignment, const vector< MaximalExactMatch > &mems, const GSSWAligner *aligner, size_t min_mem_length, const match_fanouts_t *fanouts)
 Concrete implementation of virtual method from MEMClusterer. More...
 
- Protected Member Functions inherited from vg::MEMClusterer
int32_t estimate_edge_score (const MaximalExactMatch *mem_1, const MaximalExactMatch *mem_2, int64_t graph_dist, const GSSWAligner *aligner) const
 
void deduplicate_cluster_pairs (vector< pair< pair< size_t, size_t >, int64_t >> &cluster_pairs, int64_t optimal_separation)
 

Protected Attributes

OrientedDistanceMeasurerdistance_measurer
 
size_t max_expected_dist_approx_error
 
bool unstranded
 

Additional Inherited Members

- Public Types inherited from vg::MEMClusterer
using hit_t = pair< const MaximalExactMatch *, pos_t >
 
using cluster_t = pair< vector< hit_t >, double >
 Each cluster is a vector of hits and a paired multiplicity. More...
 
using match_fanouts_t = unordered_map< const MaximalExactMatch *, deque< pair< string::const_iterator, char > >>
 
- Public Attributes inherited from vg::MEMClusterer
int64_t max_gap = numeric_limits<int64_t>::max()
 The largest discrepency we will allow between the read-implied distances and the estimated gap distance. More...
 

Constructor & Destructor Documentation

◆ OrientedDistanceClusterer()

vg::OrientedDistanceClusterer::OrientedDistanceClusterer ( OrientedDistanceMeasurer distance_measurer,
size_t  max_expected_dist_approx_error = 8 
)

Member Function Documentation

◆ compute_tail_mem_coverage()

vector< pair< size_t, size_t > > vg::OrientedDistanceClusterer::compute_tail_mem_coverage ( const Alignment alignment,
const vector< MaximalExactMatch > &  mems 
)
protected

Returns a vector containing the number of SMEM beginnings to the left and the number of SMEM endings to the right of each read position

◆ exclude_dist_tree_merges()

void vg::OrientedDistanceClusterer::exclude_dist_tree_merges ( const function< pos_t(size_t)> &  get_position,
map< pair< size_t, size_t >, size_t > &  num_infinite_dists,
UnionFind &  component_union_find,
size_t &  num_possible_merges_remaining,
int64_t  max_failed_distance_probes 
)
protected

Automatically blocks off merges in the distance tree between groups that can be inferred to be on separate components

◆ extend_dist_tree_by_buckets()

void vg::OrientedDistanceClusterer::extend_dist_tree_by_buckets ( const function< pos_t(size_t)> &  get_position,
const function< int64_t(size_t)> &  get_offset,
size_t  num_items,
unordered_map< pair< size_t, size_t >, int64_t > &  recorded_finite_dists,
UnionFind &  component_union_find,
size_t &  num_possible_merges_remaining 
)
protected

Adds edges into the distance tree by estimating the distance only between pairs of items that can be easily identified as having a finite distance (e.g. by sharing a path)

◆ extend_dist_tree_by_permutations()

void vg::OrientedDistanceClusterer::extend_dist_tree_by_permutations ( const function< pos_t(size_t)> &  get_position,
const function< int64_t(size_t)> &  get_offset,
size_t  num_items,
int64_t  max_failed_distance_probes,
size_t  decrement_frequency,
unordered_map< pair< size_t, size_t >, int64_t > &  recorded_finite_dists,
map< pair< size_t, size_t >, size_t > &  num_infinite_dists,
UnionFind &  component_union_find,
size_t &  num_possible_merges_remaining 
)
protected

Adds edges into the distance tree by estimating the distance between pairs generated by a high entropy deterministic permutation

◆ flatten_distance_tree()

vector< unordered_map< size_t, int64_t > > vg::OrientedDistanceClusterer::flatten_distance_tree ( size_t  num_items,
const unordered_map< pair< size_t, size_t >, int64_t > &  recorded_finite_dists 
)
protected

Given a number of nodes, and a map from node pair to signed relative distance on a consistent strand (defining a forrest of trees, as generated by get_on_strand_distance_tree()), flatten all the trees.

Returns a vector of maps from node ID to relative position in linear space, one map per input tree.

Assumes all the distances are transitive, even though this isn't quite true in graph space.

◆ get_on_strand_distance_tree()

unordered_map< pair< size_t, size_t >, int64_t > vg::OrientedDistanceClusterer::get_on_strand_distance_tree ( size_t  num_items,
const function< pos_t(size_t)> &  get_position,
const function< int64_t(size_t)> &  get_offset 
)
protected

Given a certain number of items, and a callback to get each item's position, and a callback to a fixed offset from that position build a distance forest with trees for items that we can verify are on the same strand of the same molecule.

We use the distance approximation to cluster the MEM hits according to the strand they fall on using the oriented distance estimation function.

Returns a map from item pair (lower number first) to distance (which may be negative) from the first to the second along the items' forward strand.

◆ make_hit_graph()

MEMClusterer::HitGraph vg::OrientedDistanceClusterer::make_hit_graph ( const Alignment alignment,
const vector< MaximalExactMatch > &  mems,
const GSSWAligner aligner,
size_t  min_mem_length,
const match_fanouts_t fanouts 
)
protectedvirtual

Concrete implementation of virtual method from MEMClusterer.

Implements vg::MEMClusterer.

◆ pair_clusters()

vector< pair< pair< size_t, size_t >, int64_t > > vg::OrientedDistanceClusterer::pair_clusters ( const Alignment alignment_1,
const Alignment alignment_2,
const vector< cluster_t * > &  left_clusters,
const vector< cluster_t * > &  right_clusters,
const vector< pair< size_t, size_t >> &  left_alt_cluster_anchors,
const vector< pair< size_t, size_t >> &  right_alt_cluster_anchors,
int64_t  optimal_separation,
int64_t  max_deviation 
)
virtual

Concrete implementation of virtual method from MEMClusterer.

Implements vg::MEMClusterer.

Member Data Documentation

◆ distance_measurer

OrientedDistanceMeasurer& vg::OrientedDistanceClusterer::distance_measurer
protected

◆ max_expected_dist_approx_error

size_t vg::OrientedDistanceClusterer::max_expected_dist_approx_error
protected

◆ unstranded

bool vg::OrientedDistanceClusterer::unstranded
protected

The documentation for this class was generated from the following files: