vg
tools for working with variation graphs
|
#include <zip_code_tree.hpp>
Classes | |
struct | child_info_t |
struct | forest_growing_state_t |
This stores information about the state of the forest as we fill it in. More... | |
struct | interval_state_t |
struct | seed_info_t |
struct | sort_value_t |
Public Member Functions | |
ZipCodeForest () | |
Constructor. More... | |
void | fill_in_forest (const vector< Seed > &seeds, const SnarlDistanceIndex &distance_index, size_t distance_limit=std::numeric_limits< size_t >::max()) |
void | print_self (const vector< Seed > *seeds) const |
void | validate_zip_forest (const SnarlDistanceIndex &distance_index, const vector< Seed > *seeds, size_t distance_limit=std::numeric_limits< size_t >::max()) const |
Public Attributes | |
vector< ZipCodeTree > | trees |
The actual data, a collection of ZipCodeTrees. More... | |
Private Types | |
typedef SnarlDistanceIndexClusterer::Seed | Seed |
typedef ZipCodeTree::tree_item_type_t | tree_item_type_t |
typedef ZipCodeTree::tree_item_t | tree_item_t |
Private Member Functions | |
void | sort_one_interval (forest_growing_state_t &forest_state, const interval_state_t &interval) const |
void | radix_sort_zipcodes (vector< size_t > &zipcode_sort_order, const vector< sort_value_t > &sort_values_by_seed, const interval_state_t &interval, bool reverse_order, size_t min_value, size_t max_value, bool sort_by_chain_component=false) const |
void | default_sort_zipcodes (vector< size_t > &zipcode_sort_order, const vector< sort_value_t > &sort_values_by_seed, const interval_state_t &interval, bool reverse_order) const |
void | get_next_intervals (forest_growing_state_t &forest_state, const interval_state_t &interval, std::forward_list< interval_state_t > &next_intervals) const |
bool | move_slice (forest_growing_state_t &forest_state, const size_t &depth) |
void | open_chain (forest_growing_state_t &forest_state, const interval_state_t &interval) |
void | close_chain (forest_growing_state_t &forest_state, const size_t &depth, const Seed &last_seed, bool chain_is_reversed) |
void | add_child_to_chain (forest_growing_state_t &forest_state, const size_t &depth, const size_t &seed_index, bool child_is_reversed, bool chain_is_reversed) |
void | open_snarl (forest_growing_state_t &forest_state, const size_t &depth, bool is_cyclic_snarl) |
Start a new snarl of the given type at the given depth. More... | |
void | close_snarl (forest_growing_state_t &forest_state, const size_t &depth, const Seed &last_seed, bool last_is_reversed) |
size_t | add_distance_matrix (forest_growing_state_t &forest_state, const size_t &depth, bool snarl_is_reversed, bool reverse_chain_order=false) |
void | reverse_chains_in_snarl (forest_growing_state_t &forest_state, const size_t &depth) |
vector< seed_info_t > | get_edge_seeds (const forest_growing_state_t &forest_state, const size_t &depth) const |
void | add_edges_to_end (vector< tree_item_t > &dist_matrix, forest_growing_state_t &forest_state, const size_t &depth, const vector< seed_info_t > &edge_seeds, bool snarl_is_reversed, bool is_cyclic_snarl) const |
bool | add_edges_for_chains (vector< tree_item_t > &dist_matrix, forest_growing_state_t &forest_state, const size_t &depth, const vector< seed_info_t > &edge_seeds, bool snarl_is_reversed, bool is_cyclic_snarl) const |
|
private |
|
private |
|
private |
|
inline |
|
private |
Add a seed (or snarl starting at a seed) and the preceeding chain edge If the seed is far enough from the previous thing in the chain and it can be a new slice, split off a subtree depth is chain child's depth (may also be chain depth if trivial) seed_index is the index of the current seed in the list of seeds
|
private |
Add a triangular distance matrix for a snarl to its front The matrix starts with a CHAIN_COUNT with the number of child chains, and then is a list of EDGEs; for each item in order, all distances to it from all previous items, possibly including self-loops If reverse_chain_order is set, then all the child chains will be extracted and then put back in reverse order Returns the size of the distance matrix added (plus the CHAIN_COUNT)
|
private |
Helper for add_distance_matrix() to add the chains' rows These are edges from everything prior to the given chain end bound Returns true if most (>90%) inter-chain distances are unreachable
|
private |
Helper for add_distance_matrix() to add the last row These are edges from everything prior to the snarl end bound
|
private |
Close a chain that ends at last_seed If the chain was empty, remove it and anything relating to it If must be spliced out, take out a subtree Otherwise, add the end of the chain and, if the chain was in a snarl, remember the distance to the end of the chain
|
private |
Close a snarl at the given depth with the given last_seed If the snarl has no children, then delete the whole thing Otherwise, add all necessary distances and close it
|
private |
Helper function for sort_one_interval() to sort seeds using std::sort Sorts the slice of seeds in the given interval of zipcode_sort_order, a vector of indices into seeds
void vg::ZipCodeForest::fill_in_forest | ( | const vector< Seed > & | seeds, |
const SnarlDistanceIndex & | distance_index, | ||
size_t | distance_limit = std::numeric_limits<size_t>::max() |
||
) |
Populate the zip forest If a distance limit is given, then partition into subtrees that are farther than distance_limit from each other Otherwise, the forest will just be connected components
|
private |
Helper for add_distance_matrix() Look up seeds for chain edges and remember seed_info_t for each Returns [c1_L, c1_R, c2_L, c2_R, ...] for each chain side in the snarl
|
private |
Assuming that the range of seeds in forest_state.sort_values_by_seed given by interval is sorted, prepend child intervals to next_intervals. The new intervals get added in their sort order, so the start of a chain will be at the start of the list, to be popped first.
For children of chains, seeds on the chain itself and not nested will be on the same interval if there are no seeds in snarls between them, even if they are not on the same node
|
private |
Move a slice of a chain into a new tree Chain copied from forest_state.open_chains.back().first to end This is used when a part is too far from the rest to be in the same tree Returns whether a whole chain was moved (true) or just a slice (false)
|
private |
Open a chain that starts at the current_seed Also record its presence and distance-to-start in the parent snarl Assumes that the chain has a parent snarl, i.e. isn't top-level
|
private |
Start a new snarl of the given type at the given depth.
|
inline |
Print each zip code tree in the forest to stderr Ziptrees are prefaced by their index, e.g. "0: <tree.print_self()>"
|
private |
Helper function for sort_one_interval() to sort seeds using radix sort Sorts the slice of seeds in the given interval of zipcode_sort_order, a vector of indices into seeds. Reverse order if reverse_order. The interval also has an is_reversed field, which refers to the orientation in the snarl tree. min_ and max_value are the minimum and maximum value being sorted on. If sort_by_chain_component is true, then sort on the chain component in sort_values
This should run in linear time, but is dependent on the values being sorted on to have a small range
|
private |
Helper for add_distance_matrix() to reverse the order of chains This is done by extracting all the chains, adding them back in reverse order, and updating the forest_state
|
private |
Sorts an interval, which must contain seeds on the same snarl/chain/node Sorting is linear-ish along top-level chains, topological-ish in snarls. Uses radix_sort_zipcodes() and default_sort_zipcodes()
For chains, everything is sorted with the prefix sum value of the chain itself from the distance index, not the order in the zip code tree. Everything will be sorted in the order of the zip code tree, but the values will be set from the distance index. Later, the values may be out of order or may need to be subtracted from the length of the chain to get the distance to the ends of the chain
void vg::ZipCodeForest::validate_zip_forest | ( | const SnarlDistanceIndex & | distance_index, |
const vector< Seed > * | seeds, | ||
size_t | distance_limit = std::numeric_limits<size_t>::max() |
||
) | const |
Check that the forest is correct:
vector<ZipCodeTree> vg::ZipCodeForest::trees |
The actual data, a collection of ZipCodeTrees.