vg
tools for working with variation graphs
Classes | Public Types | Public Member Functions | Private Member Functions | Private Attributes | List of all members
vg::ZipCodeTree::distance_iterator Class Reference

#include <zip_code_tree.hpp>

Classes

struct  iteration_position
 

Public Types

enum  State { S_START , S_SCAN_CHAIN , S_SCAN_DAG_SNARL , S_SCAN_CYCLIC_SNARL }
 

Public Member Functions

 distance_iterator (size_t start_index, const vector< tree_item_t > &zip_code_tree, std::stack< size_t > chain_numbers=std::stack< size_t >(), bool right_to_left=true, size_t distance_limit=std::numeric_limits< size_t >::max())
 
 distance_iterator (const distance_iterator &other)=default
 
 distance_iterator (distance_iterator &&other)=default
 
distance_iteratoroperator= (const distance_iterator &other)=default
 
distance_iteratoroperator= (distance_iterator &&other)=default
 
distance_iteratoroperator++ ()
 Move in right_to_left direction until we hit another seed or the end. More...
 
void move_index ()
 
bool operator== (const distance_iterator &other) const
 
bool operator!= (const distance_iterator &other) const
 Compare for inequality. More...
 
bool done () const
 Is the iteration done? More...
 
seed_result_t operator* () const
 

Private Member Functions

void use_saved_traversal ()
 Set current iteration_position to the top of pending_traversals. More...
 
void save_opposite_cyclic_snarl_exit (size_t chain_num)
 
void push (size_t value)
 Push a value to the stack. More...
 
size_t pop ()
 Pop a value from the stack and return it. More...
 
size_t & top ()
 Get a mutable reference to the value on top of the stack. More...
 
void dup ()
 Duplicate the top item on the stack. More...
 
size_t depth () const
 Check stack depth. More...
 
void swap ()
 Reverse the top two elements of the stack. More...
 
void stack_snarl_distances (size_t snarl_start_i, size_t chain_num, bool right_side)
 
void stack_matrix_value (size_t matrix_start_i, bool has_main_diagonal, size_t row, size_t col)
 
size_t get_matrix_value (size_t matrix_start_i, bool has_main_diagonal, size_t row, size_t col)
 
size_t get_cyclic_snarl_bound_distance (size_t snarl_start_i, size_t row, bool to_end)
 
void stack_below_top (size_t value)
 
void halt ()
 
void unimplemented_error ()
 Throw a domain_error that the current state/symbol combo is unimplemented. More...
 
tree_item_t current_item () const
 What item does index point to? More...
 
bool entered_snarl () const
 
bool exited_snarl () const
 
bool entered_chain () const
 
bool exited_chain () const
 
void skip_chain ()
 
void initialize_chain ()
 
void initialize_snarl (size_t chain_num)
 Decide what to do right after entering a new snarl. More...
 
void continue_snarl ()
 
bool tick ()
 

Private Attributes

size_t end_index
 Where we have to stop. More...
 
const size_t original_index
 Where we started from. More...
 
size_t distance_limit
 Distance limit we will go up to. More...
 
const bool original_right_to_left
 Original direction. More...
 
const vector< tree_item_t > & zip_code_tree
 References to the zip code tree to let us look up distance matrices. More...
 
std::unordered_map< size_t, std::pair< size_t, size_t > > best_cyclic_snarl_exits
 
iteration_position pos
 The iterator's current position (state, direction, stack, etc.) More...
 
std::stack< iteration_positionpending_traversals
 

Detailed Description

Iterator that looks sideways in the tree from a seed, possibly up to a maximum base distance.

Iteration flow:

Original Python implementation: https://github.com/benedictpaten/long_read_giraffe_chainer_prototype/blob/b590c34055474b0c901a681a1aa99f1651abb6a4/zip_tree_iterator.py.

New cyclic snarl handling flowchart: https://docs.google.com/drawings/d/1diKMXCuMteR06fF64BhSttsD5RHh3eTnd8FtyYqs17Y/edit?usp=sharing

In order to de-duplicate work, this iterator will not actually look at everything before it. If it knows that seeds from a given chain would have already output transitions to its own position, then it won't even look at anything in that chain. This occurs in cyclic snarls.

For example, since previous chains in a cyclic snarl will have already visited all other chains, later chains do not have to visit previous ones. Similarly, the traversal should never have to enter a cyclic snarl from the outside, since any possible transition crossing the snarl boundary will be found by traversals starting from seeds inside the snarl.

Since a seed in a cyclic snarl may exit in either direction, the iterator "saves" one point (the opposite of its current direction), remembering the stack state and index, to be restored later. Then, when the main iterator hits the end, it can restore the saved state and continue.

The iterator implements simple memorization of snarl exit distances. This is done for only cyclic snarls, as they are the only ones that literarlly turn one traversal into two via the exit-in-both-directions behavior. Each time the iterator calculates the running distance to exit a cyclic snarl, it will check if it has ever seen a better or equal distance, and if so will pretend that the distance is infinite/unreachable.

Member Enumeration Documentation

◆ State

Type for the state of the I-can't-believe-it's-not-a-pushdown-automaton

Enumerator
S_START 
S_SCAN_CHAIN 
S_SCAN_DAG_SNARL 
S_SCAN_CYCLIC_SNARL 

Constructor & Destructor Documentation

◆ distance_iterator() [1/3]

vg::ZipCodeTree::distance_iterator::distance_iterator ( size_t  start_index,
const vector< tree_item_t > &  zip_code_tree,
std::stack< size_t >  chain_numbers = std::stack<size_t>(),
bool  right_to_left = true,
size_t  distance_limit = std::numeric_limits<size_t>::max() 
)

Make a reverse iterator starting from start_index, going in the given direction, and with an optional distance limit

◆ distance_iterator() [2/3]

vg::ZipCodeTree::distance_iterator::distance_iterator ( const distance_iterator other)
default

◆ distance_iterator() [3/3]

vg::ZipCodeTree::distance_iterator::distance_iterator ( distance_iterator &&  other)
default

Member Function Documentation

◆ continue_snarl()

void vg::ZipCodeTree::distance_iterator::continue_snarl ( )
private

Decide what to do when re-entering a snarl, having already stacked up distances

◆ current_item()

tree_item_t vg::ZipCodeTree::distance_iterator::current_item ( ) const
inlineprivate

What item does index point to?

◆ depth()

auto vg::ZipCodeTree::distance_iterator::depth ( ) const
private

Check stack depth.

◆ done()

bool vg::ZipCodeTree::distance_iterator::done ( ) const
inline

Is the iteration done?

◆ dup()

auto vg::ZipCodeTree::distance_iterator::dup ( )
private

Duplicate the top item on the stack.

◆ entered_chain()

bool vg::ZipCodeTree::distance_iterator::entered_chain ( ) const
inlineprivate

◆ entered_snarl()

bool vg::ZipCodeTree::distance_iterator::entered_snarl ( ) const
inlineprivate

Check if the current symbol is an entrance/exit, based on the direction the iterator is going (right_to_left)

◆ exited_chain()

bool vg::ZipCodeTree::distance_iterator::exited_chain ( ) const
inlineprivate

◆ exited_snarl()

bool vg::ZipCodeTree::distance_iterator::exited_snarl ( ) const
inlineprivate

◆ get_cyclic_snarl_bound_distance()

size_t vg::ZipCodeTree::distance_iterator::get_cyclic_snarl_bound_distance ( size_t  snarl_start_i,
size_t  row,
bool  to_end 
)
private

Helper which simplifies the call to stack_matrix_value() Get a distance to the start or end of a CYCLIC snarl Also handles the memorization with best_cyclic_snarl_exits row is zero-indexed within the matrix

◆ get_matrix_value()

size_t vg::ZipCodeTree::distance_iterator::get_matrix_value ( size_t  matrix_start_i,
bool  has_main_diagonal,
size_t  row,
size_t  col 
)
private

Helper for stack_matrix_value() Get a value from a triangular distance matrix row and col are zero-indexed within the matrix

◆ halt()

auto vg::ZipCodeTree::distance_iterator::halt ( )
private

Stop parsing because nothing else can be below the distance limit. This moves the current index to end_index.

◆ initialize_chain()

void vg::ZipCodeTree::distance_iterator::initialize_chain ( )
private

Decide what to do right after entering a new chain. This chain's distance should be on top of the stack.

◆ initialize_snarl()

void vg::ZipCodeTree::distance_iterator::initialize_snarl ( size_t  chain_num)
private

Decide what to do right after entering a new snarl.

◆ move_index()

void vg::ZipCodeTree::distance_iterator::move_index ( )
inline

Move index in right_to_left direction If we hit the end and have pending traversals, use one

◆ operator!=()

bool vg::ZipCodeTree::distance_iterator::operator!= ( const distance_iterator other) const
inline

Compare for inequality.

◆ operator*()

auto vg::ZipCodeTree::distance_iterator::operator* ( ) const

Get the index and orientation of the seed we are currently at, and the distance to it. Automatically handles orientation depending on right_to_left

◆ operator++()

auto vg::ZipCodeTree::distance_iterator::operator++ ( )

Move in right_to_left direction until we hit another seed or the end.

◆ operator=() [1/2]

distance_iterator& vg::ZipCodeTree::distance_iterator::operator= ( const distance_iterator other)
default

◆ operator=() [2/2]

distance_iterator& vg::ZipCodeTree::distance_iterator::operator= ( distance_iterator &&  other)
default

◆ operator==()

bool vg::ZipCodeTree::distance_iterator::operator== ( const distance_iterator other) const
inline

Compare for equality to see if we hit end This just trusts that the two iterators are for the same tree

◆ pop()

auto vg::ZipCodeTree::distance_iterator::pop ( )
private

Pop a value from the stack and return it.

◆ push()

auto vg::ZipCodeTree::distance_iterator::push ( size_t  value)
private

Push a value to the stack.

◆ save_opposite_cyclic_snarl_exit()

void vg::ZipCodeTree::distance_iterator::save_opposite_cyclic_snarl_exit ( size_t  chain_num)
private

Save a traversal exiting a cyclic snarl in the opposite direction as the current one, e.g. using an edge C1_R -> SNARL_START

◆ skip_chain()

void vg::ZipCodeTree::distance_iterator::skip_chain ( )
private

Skip the current chain, jumping to the matching end and then one past it to continue the snarl Bound pair index should be right below current distance

◆ stack_below_top()

void vg::ZipCodeTree::distance_iterator::stack_below_top ( size_t  value)
private

Helper for stack_snarl_distances() Stack a single value below the running distance

◆ stack_matrix_value()

void vg::ZipCodeTree::distance_iterator::stack_matrix_value ( size_t  matrix_start_i,
bool  has_main_diagonal,
size_t  row,
size_t  col 
)
private

Helper for stack_snarl_distances() Stack a single value from a triangular distance matrix row and col are zero-indexed within the matrix

◆ stack_snarl_distances()

void vg::ZipCodeTree::distance_iterator::stack_snarl_distances ( size_t  snarl_start_i,
size_t  chain_num,
bool  right_side 
)
private

Push distances relevant to a given chain onto the stack chain_num is one-indexed, so the first chain is 1, and the last is N right_side indicates if distances exit from the right or left side

◆ swap()

auto vg::ZipCodeTree::distance_iterator::swap ( )
private

Reverse the top two elements of the stack.

◆ tick()

auto vg::ZipCodeTree::distance_iterator::tick ( )
private

Tick the automaton, looking at the symbol at *it and updating the stack and state. Returns true to yield a value at the current symbol, or to halt, and false otherwise.

◆ top()

auto vg::ZipCodeTree::distance_iterator::top ( )
private

Get a mutable reference to the value on top of the stack.

◆ unimplemented_error()

void vg::ZipCodeTree::distance_iterator::unimplemented_error ( )
private

Throw a domain_error that the current state/symbol combo is unimplemented.

◆ use_saved_traversal()

void vg::ZipCodeTree::distance_iterator::use_saved_traversal ( )
private

Set current iteration_position to the top of pending_traversals.

Member Data Documentation

◆ best_cyclic_snarl_exits

std::unordered_map<size_t, std::pair<size_t, size_t> > vg::ZipCodeTree::distance_iterator::best_cyclic_snarl_exits
private

Best distance for each snarl exit we've seen Stored as {snarl start index : (start dist, end dist)}

◆ distance_limit

size_t vg::ZipCodeTree::distance_iterator::distance_limit
private

Distance limit we will go up to.

◆ end_index

size_t vg::ZipCodeTree::distance_iterator::end_index
private

Where we have to stop.

◆ original_index

const size_t vg::ZipCodeTree::distance_iterator::original_index
private

Where we started from.

◆ original_right_to_left

const bool vg::ZipCodeTree::distance_iterator::original_right_to_left
private

Original direction.

◆ pending_traversals

std::stack<iteration_position> vg::ZipCodeTree::distance_iterator::pending_traversals
private

Starting positions of other traversals that we will do later When the current traversal can no longer go on, we pop one of these and set the current position to it

◆ pos

iteration_position vg::ZipCodeTree::distance_iterator::pos
private

The iterator's current position (state, direction, stack, etc.)

◆ zip_code_tree

const vector<tree_item_t>& vg::ZipCodeTree::distance_iterator::zip_code_tree
private

References to the zip code tree to let us look up distance matrices.


The documentation for this class was generated from the following files: