vg
tools for working with variation graphs
Classes | Public Member Functions | Public Attributes | Private Types | Private Member Functions | List of all members
vg::ZipCodeForest Class Reference

#include <zip_code_tree.hpp>

Classes

struct  child_info_t
 
struct  forest_growing_state_t
 This stores information about the state of the forest as we fill it in. More...
 
struct  interval_state_t
 
struct  seed_info_t
 
struct  sort_value_t
 

Public Member Functions

 ZipCodeForest ()
 Constructor. More...
 
void fill_in_forest (const vector< Seed > &seeds, const SnarlDistanceIndex &distance_index, size_t distance_limit=std::numeric_limits< size_t >::max())
 
void print_self (const vector< Seed > *seeds) const
 
void validate_zip_forest (const SnarlDistanceIndex &distance_index, const vector< Seed > *seeds, size_t distance_limit=std::numeric_limits< size_t >::max()) const
 

Public Attributes

vector< ZipCodeTreetrees
 The actual data, a collection of ZipCodeTrees. More...
 

Private Types

typedef SnarlDistanceIndexClusterer::Seed Seed
 
typedef ZipCodeTree::tree_item_type_t tree_item_type_t
 
typedef ZipCodeTree::tree_item_t tree_item_t
 

Private Member Functions

void sort_one_interval (forest_growing_state_t &forest_state, const interval_state_t &interval) const
 
void radix_sort_zipcodes (vector< size_t > &zipcode_sort_order, const vector< sort_value_t > &sort_values_by_seed, const interval_state_t &interval, bool reverse_order, size_t min_value, size_t max_value, bool sort_by_chain_component=false) const
 
void default_sort_zipcodes (vector< size_t > &zipcode_sort_order, const vector< sort_value_t > &sort_values_by_seed, const interval_state_t &interval, bool reverse_order) const
 
void get_next_intervals (forest_growing_state_t &forest_state, const interval_state_t &interval, std::forward_list< interval_state_t > &next_intervals) const
 
bool move_slice (forest_growing_state_t &forest_state, const size_t &depth)
 
void open_chain (forest_growing_state_t &forest_state, const interval_state_t &interval)
 
void close_chain (forest_growing_state_t &forest_state, const size_t &depth, const Seed &last_seed, bool chain_is_reversed)
 
void add_child_to_chain (forest_growing_state_t &forest_state, const size_t &depth, const size_t &seed_index, bool child_is_reversed, bool chain_is_reversed)
 
void open_snarl (forest_growing_state_t &forest_state, const size_t &depth, bool is_cyclic_snarl)
 Start a new snarl of the given type at the given depth. More...
 
void close_snarl (forest_growing_state_t &forest_state, const size_t &depth, const Seed &last_seed, bool last_is_reversed)
 
size_t add_distance_matrix (forest_growing_state_t &forest_state, const size_t &depth, bool snarl_is_reversed, bool reverse_chain_order=false)
 
void reverse_chains_in_snarl (forest_growing_state_t &forest_state, const size_t &depth)
 
vector< seed_info_tget_edge_seeds (const forest_growing_state_t &forest_state, const size_t &depth) const
 
void add_edges_to_end (vector< tree_item_t > &dist_matrix, forest_growing_state_t &forest_state, const size_t &depth, const vector< seed_info_t > &edge_seeds, bool snarl_is_reversed, bool is_cyclic_snarl) const
 
bool add_edges_for_chains (vector< tree_item_t > &dist_matrix, forest_growing_state_t &forest_state, const size_t &depth, const vector< seed_info_t > &edge_seeds, bool snarl_is_reversed, bool is_cyclic_snarl) const
 

Member Typedef Documentation

◆ Seed

◆ tree_item_t

◆ tree_item_type_t

Constructor & Destructor Documentation

◆ ZipCodeForest()

vg::ZipCodeForest::ZipCodeForest ( )
inline

Member Function Documentation

◆ add_child_to_chain()

void vg::ZipCodeForest::add_child_to_chain ( forest_growing_state_t forest_state,
const size_t &  depth,
const size_t &  seed_index,
bool  child_is_reversed,
bool  chain_is_reversed 
)
private

Add a seed (or snarl starting at a seed) and the preceeding chain edge If the seed is far enough from the previous thing in the chain and it can be a new slice, split off a subtree depth is chain child's depth (may also be chain depth if trivial) seed_index is the index of the current seed in the list of seeds

◆ add_distance_matrix()

size_t vg::ZipCodeForest::add_distance_matrix ( forest_growing_state_t forest_state,
const size_t &  depth,
bool  snarl_is_reversed,
bool  reverse_chain_order = false 
)
private

Add a triangular distance matrix for a snarl to its front The matrix starts with a CHAIN_COUNT with the number of child chains, and then is a list of EDGEs; for each item in order, all distances to it from all previous items, possibly including self-loops If reverse_chain_order is set, then all the child chains will be extracted and then put back in reverse order Returns the size of the distance matrix added (plus the CHAIN_COUNT)

◆ add_edges_for_chains()

bool vg::ZipCodeForest::add_edges_for_chains ( vector< tree_item_t > &  dist_matrix,
forest_growing_state_t forest_state,
const size_t &  depth,
const vector< seed_info_t > &  edge_seeds,
bool  snarl_is_reversed,
bool  is_cyclic_snarl 
) const
private

Helper for add_distance_matrix() to add the chains' rows These are edges from everything prior to the given chain end bound Returns true if most (>90%) inter-chain distances are unreachable

◆ add_edges_to_end()

void vg::ZipCodeForest::add_edges_to_end ( vector< tree_item_t > &  dist_matrix,
forest_growing_state_t forest_state,
const size_t &  depth,
const vector< seed_info_t > &  edge_seeds,
bool  snarl_is_reversed,
bool  is_cyclic_snarl 
) const
private

Helper for add_distance_matrix() to add the last row These are edges from everything prior to the snarl end bound

◆ close_chain()

void vg::ZipCodeForest::close_chain ( forest_growing_state_t forest_state,
const size_t &  depth,
const Seed last_seed,
bool  chain_is_reversed 
)
private

Close a chain that ends at last_seed If the chain was empty, remove it and anything relating to it If must be spliced out, take out a subtree Otherwise, add the end of the chain and, if the chain was in a snarl, remember the distance to the end of the chain

◆ close_snarl()

void vg::ZipCodeForest::close_snarl ( forest_growing_state_t forest_state,
const size_t &  depth,
const Seed last_seed,
bool  last_is_reversed 
)
private

Close a snarl at the given depth with the given last_seed If the snarl has no children, then delete the whole thing Otherwise, add all necessary distances and close it

◆ default_sort_zipcodes()

void vg::ZipCodeForest::default_sort_zipcodes ( vector< size_t > &  zipcode_sort_order,
const vector< sort_value_t > &  sort_values_by_seed,
const interval_state_t interval,
bool  reverse_order 
) const
private

Helper function for sort_one_interval() to sort seeds using std::sort Sorts the slice of seeds in the given interval of zipcode_sort_order, a vector of indices into seeds

◆ fill_in_forest()

void vg::ZipCodeForest::fill_in_forest ( const vector< Seed > &  seeds,
const SnarlDistanceIndex &  distance_index,
size_t  distance_limit = std::numeric_limits<size_t>::max() 
)

Populate the zip forest If a distance limit is given, then partition into subtrees that are farther than distance_limit from each other Otherwise, the forest will just be connected components

◆ get_edge_seeds()

vector< ZipCodeForest::seed_info_t > vg::ZipCodeForest::get_edge_seeds ( const forest_growing_state_t forest_state,
const size_t &  depth 
) const
private

Helper for add_distance_matrix() Look up seeds for chain edges and remember seed_info_t for each Returns [c1_L, c1_R, c2_L, c2_R, ...] for each chain side in the snarl

◆ get_next_intervals()

void vg::ZipCodeForest::get_next_intervals ( forest_growing_state_t forest_state,
const interval_state_t interval,
std::forward_list< interval_state_t > &  next_intervals 
) const
private

Assuming that the range of seeds in forest_state.sort_values_by_seed given by interval is sorted, prepend child intervals to next_intervals. The new intervals get added in their sort order, so the start of a chain will be at the start of the list, to be popped first.

For children of chains, seeds on the chain itself and not nested will be on the same interval if there are no seeds in snarls between them, even if they are not on the same node

◆ move_slice()

bool vg::ZipCodeForest::move_slice ( forest_growing_state_t forest_state,
const size_t &  depth 
)
private

Move a slice of a chain into a new tree Chain copied from forest_state.open_chains.back().first to end This is used when a part is too far from the rest to be in the same tree Returns whether a whole chain was moved (true) or just a slice (false)

◆ open_chain()

void vg::ZipCodeForest::open_chain ( forest_growing_state_t forest_state,
const interval_state_t interval 
)
private

Open a chain that starts at the current_seed Also record its presence and distance-to-start in the parent snarl Assumes that the chain has a parent snarl, i.e. isn't top-level

◆ open_snarl()

void vg::ZipCodeForest::open_snarl ( forest_growing_state_t forest_state,
const size_t &  depth,
bool  is_cyclic_snarl 
)
private

Start a new snarl of the given type at the given depth.

◆ print_self()

void vg::ZipCodeForest::print_self ( const vector< Seed > *  seeds) const
inline

Print each zip code tree in the forest to stderr Ziptrees are prefaced by their index, e.g. "0: <tree.print_self()>"

◆ radix_sort_zipcodes()

void vg::ZipCodeForest::radix_sort_zipcodes ( vector< size_t > &  zipcode_sort_order,
const vector< sort_value_t > &  sort_values_by_seed,
const interval_state_t interval,
bool  reverse_order,
size_t  min_value,
size_t  max_value,
bool  sort_by_chain_component = false 
) const
private

Helper function for sort_one_interval() to sort seeds using radix sort Sorts the slice of seeds in the given interval of zipcode_sort_order, a vector of indices into seeds. Reverse order if reverse_order. The interval also has an is_reversed field, which refers to the orientation in the snarl tree. min_ and max_value are the minimum and maximum value being sorted on. If sort_by_chain_component is true, then sort on the chain component in sort_values

This should run in linear time, but is dependent on the values being sorted on to have a small range

◆ reverse_chains_in_snarl()

void vg::ZipCodeForest::reverse_chains_in_snarl ( forest_growing_state_t forest_state,
const size_t &  depth 
)
private

Helper for add_distance_matrix() to reverse the order of chains This is done by extracting all the chains, adding them back in reverse order, and updating the forest_state

◆ sort_one_interval()

void vg::ZipCodeForest::sort_one_interval ( forest_growing_state_t forest_state,
const interval_state_t interval 
) const
private

Sorts an interval, which must contain seeds on the same snarl/chain/node Sorting is linear-ish along top-level chains, topological-ish in snarls. Uses radix_sort_zipcodes() and default_sort_zipcodes()

For chains, everything is sorted with the prefix sum value of the chain itself from the distance index, not the order in the zip code tree. Everything will be sorted in the order of the zip code tree, but the values will be set from the distance index. Later, the values may be out of order or may need to be subtracted from the length of the chain to get the distance to the ends of the chain

◆ validate_zip_forest()

void vg::ZipCodeForest::validate_zip_forest ( const SnarlDistanceIndex &  distance_index,
const vector< Seed > *  seeds,
size_t  distance_limit = std::numeric_limits<size_t>::max() 
) const

Check that the forest is correct:

  1. Each tree is valid in itself
  2. All seed positions (i.e. ignoring duplicates) appear at least once

Member Data Documentation

◆ trees

vector<ZipCodeTree> vg::ZipCodeForest::trees

The actual data, a collection of ZipCodeTrees.


The documentation for this class was generated from the following files: