|
vg
tools for working with variation graphs
|
#include <zip_code_tree.hpp>
Classes | |
| struct | iteration_position |
Public Types | |
| enum | State { S_START , S_SCAN_CHAIN , S_SCAN_DAG_SNARL , S_SCAN_CYCLIC_SNARL } |
Public Member Functions | |
| distance_iterator (size_t start_index, const vector< tree_item_t > &zip_code_tree, std::stack< size_t > chain_numbers=std::stack< size_t >(), bool right_to_left=true, size_t distance_limit=std::numeric_limits< size_t >::max()) | |
| distance_iterator (const distance_iterator &other)=default | |
| distance_iterator (distance_iterator &&other)=default | |
| distance_iterator & | operator= (const distance_iterator &other)=default |
| distance_iterator & | operator= (distance_iterator &&other)=default |
| distance_iterator & | operator++ () |
| Move in right_to_left direction until we hit another seed or the end. More... | |
| void | move_index () |
| bool | operator== (const distance_iterator &other) const |
| bool | operator!= (const distance_iterator &other) const |
| Compare for inequality. More... | |
| bool | done () const |
| Is the iteration done? More... | |
| seed_result_t | operator* () const |
Private Member Functions | |
| void | use_saved_traversal () |
| Set current iteration_position to the top of pending_traversals. More... | |
| void | save_opposite_cyclic_snarl_exit (size_t chain_num) |
| void | push (size_t value) |
| Push a value to the stack. More... | |
| size_t | pop () |
| Pop a value from the stack and return it. More... | |
| size_t & | top () |
| Get a mutable reference to the value on top of the stack. More... | |
| void | dup () |
| Duplicate the top item on the stack. More... | |
| size_t | depth () const |
| Check stack depth. More... | |
| void | swap () |
| Reverse the top two elements of the stack. More... | |
| void | stack_snarl_distances (size_t snarl_start_i, size_t chain_num, bool right_side) |
| void | stack_matrix_value (size_t matrix_start_i, bool has_main_diagonal, size_t row, size_t col) |
| size_t | get_matrix_value (size_t matrix_start_i, bool has_main_diagonal, size_t row, size_t col) |
| size_t | get_cyclic_snarl_bound_distance (size_t snarl_start_i, size_t row, bool to_end) |
| void | stack_below_top (size_t value) |
| void | halt () |
| void | unimplemented_error () |
| Throw a domain_error that the current state/symbol combo is unimplemented. More... | |
| tree_item_t | current_item () const |
| What item does index point to? More... | |
| bool | entered_snarl () const |
| bool | exited_snarl () const |
| bool | entered_chain () const |
| bool | exited_chain () const |
| void | skip_chain () |
| void | initialize_chain () |
| void | initialize_snarl (size_t chain_num) |
| Decide what to do right after entering a new snarl. More... | |
| void | continue_snarl () |
| bool | tick () |
Private Attributes | |
| size_t | end_index |
| Where we have to stop. More... | |
| const size_t | original_index |
| Where we started from. More... | |
| size_t | distance_limit |
| Distance limit we will go up to. More... | |
| const bool | original_right_to_left |
| Original direction. More... | |
| const vector< tree_item_t > & | zip_code_tree |
| References to the zip code tree to let us look up distance matrices. More... | |
| std::unordered_map< size_t, std::pair< size_t, size_t > > | best_cyclic_snarl_exits |
| iteration_position | pos |
| The iterator's current position (state, direction, stack, etc.) More... | |
| std::stack< iteration_position > | pending_traversals |
Iterator that looks sideways in the tree from a seed, possibly up to a maximum base distance.
Iteration flow:
Original Python implementation: https://github.com/benedictpaten/long_read_giraffe_chainer_prototype/blob/b590c34055474b0c901a681a1aa99f1651abb6a4/zip_tree_iterator.py.
New cyclic snarl handling flowchart: https://docs.google.com/drawings/d/1diKMXCuMteR06fF64BhSttsD5RHh3eTnd8FtyYqs17Y/edit?usp=sharing
In order to de-duplicate work, this iterator will not actually look at everything before it. If it knows that seeds from a given chain would have already output transitions to its own position, then it won't even look at anything in that chain. This occurs in cyclic snarls.
For example, since previous chains in a cyclic snarl will have already visited all other chains, later chains do not have to visit previous ones. Similarly, the traversal should never have to enter a cyclic snarl from the outside, since any possible transition crossing the snarl boundary will be found by traversals starting from seeds inside the snarl.
Since a seed in a cyclic snarl may exit in either direction, the iterator "saves" one point (the opposite of its current direction), remembering the stack state and index, to be restored later. Then, when the main iterator hits the end, it can restore the saved state and continue.
The iterator implements simple memorization of snarl exit distances. This is done for only cyclic snarls, as they are the only ones that literarlly turn one traversal into two via the exit-in-both-directions behavior. Each time the iterator calculates the running distance to exit a cyclic snarl, it will check if it has ever seen a better or equal distance, and if so will pretend that the distance is infinite/unreachable.
| vg::ZipCodeTree::distance_iterator::distance_iterator | ( | size_t | start_index, |
| const vector< tree_item_t > & | zip_code_tree, | ||
| std::stack< size_t > | chain_numbers = std::stack<size_t>(), |
||
| bool | right_to_left = true, |
||
| size_t | distance_limit = std::numeric_limits<size_t>::max() |
||
| ) |
Make a reverse iterator starting from start_index, going in the given direction, and with an optional distance limit
|
default |
|
default |
|
private |
Decide what to do when re-entering a snarl, having already stacked up distances
|
inlineprivate |
What item does index point to?
|
private |
Check stack depth.
|
inline |
Is the iteration done?
|
private |
Duplicate the top item on the stack.
|
inlineprivate |
|
inlineprivate |
Check if the current symbol is an entrance/exit, based on the direction the iterator is going (right_to_left)
|
inlineprivate |
|
inlineprivate |
|
private |
Helper which simplifies the call to stack_matrix_value() Get a distance to the start or end of a CYCLIC snarl Also handles the memorization with best_cyclic_snarl_exits row is zero-indexed within the matrix
|
private |
Helper for stack_matrix_value() Get a value from a triangular distance matrix row and col are zero-indexed within the matrix
|
private |
Stop parsing because nothing else can be below the distance limit. This moves the current index to end_index.
|
private |
Decide what to do right after entering a new chain. This chain's distance should be on top of the stack.
|
private |
Decide what to do right after entering a new snarl.
|
inline |
Move index in right_to_left direction If we hit the end and have pending traversals, use one
|
inline |
Compare for inequality.
| auto vg::ZipCodeTree::distance_iterator::operator* | ( | ) | const |
Get the index and orientation of the seed we are currently at, and the distance to it. Automatically handles orientation depending on right_to_left
| auto vg::ZipCodeTree::distance_iterator::operator++ | ( | ) |
Move in right_to_left direction until we hit another seed or the end.
|
default |
|
default |
|
inline |
Compare for equality to see if we hit end This just trusts that the two iterators are for the same tree
|
private |
Pop a value from the stack and return it.
|
private |
Push a value to the stack.
|
private |
Save a traversal exiting a cyclic snarl in the opposite direction as the current one, e.g. using an edge C1_R -> SNARL_START
|
private |
Skip the current chain, jumping to the matching end and then one past it to continue the snarl Bound pair index should be right below current distance
|
private |
Helper for stack_snarl_distances() Stack a single value below the running distance
|
private |
Helper for stack_snarl_distances() Stack a single value from a triangular distance matrix row and col are zero-indexed within the matrix
|
private |
Push distances relevant to a given chain onto the stack chain_num is one-indexed, so the first chain is 1, and the last is N right_side indicates if distances exit from the right or left side
|
private |
Reverse the top two elements of the stack.
|
private |
Tick the automaton, looking at the symbol at *it and updating the stack and state. Returns true to yield a value at the current symbol, or to halt, and false otherwise.
|
private |
Get a mutable reference to the value on top of the stack.
|
private |
Throw a domain_error that the current state/symbol combo is unimplemented.
|
private |
Set current iteration_position to the top of pending_traversals.
|
private |
Best distance for each snarl exit we've seen Stored as {snarl start index : (start dist, end dist)}
|
private |
Distance limit we will go up to.
|
private |
Where we have to stop.
|
private |
Where we started from.
|
private |
Original direction.
|
private |
Starting positions of other traversals that we will do later When the current traversal can no longer go on, we pop one of these and set the current position to it
|
private |
The iterator's current position (state, direction, stack, etc.)
|
private |
References to the zip code tree to let us look up distance matrices.
1.9.1