vg
tools for working with variation graphs
Classes | Public Member Functions | Protected Types | Protected Member Functions | Protected Attributes | Private Member Functions | List of all members
vg::IndexedVG Class Reference

#include <indexed_vg.hpp>

Inheritance diagram for vg::IndexedVG:
handlegraph::HandleGraph

Classes

struct  CacheEntry
 

Public Member Functions

 IndexedVG (string graph_filename)
 
void print_report () const
 
virtual bool has_node (id_t node_id) const
 Check if a node exists by ID. More...
 
virtual handle_t get_handle (const id_t &node_id, bool is_reverse=false) const
 Look up the handle for the node with the given ID in the given orientation. More...
 
virtual id_t get_id (const handle_t &handle) const
 Get the ID from a handle. More...
 
virtual bool get_is_reverse (const handle_t &handle) const
 Get the orientation of a handle. More...
 
virtual handle_t flip (const handle_t &handle) const
 Invert the orientation of a handle (potentially without getting its ID) More...
 
virtual size_t get_length (const handle_t &handle) const
 Get the length of a node. More...
 
virtual string get_sequence (const handle_t &handle) const
 
virtual bool follow_edges_impl (const handle_t &handle, bool go_left, const function< bool(const handle_t &)> &iteratee) const
 
virtual bool for_each_handle_impl (const function< bool(const handle_t &)> &iteratee, bool parallel=false) const
 
virtual size_t get_node_count () const
 Return the number of nodes in the graph. More...
 
virtual id_t min_node_id () const
 
virtual id_t max_node_id () const
 
- Public Member Functions inherited from handlegraph::HandleGraph
virtual ~HandleGraph ()=default
 
template<typename Iteratee >
bool follow_edges (const handle_t &handle, bool go_left, const Iteratee &iteratee) const
 
template<typename Iteratee >
bool for_each_handle (const Iteratee &iteratee, bool parallel=false) const
 
virtual size_t get_degree (const handle_t &handle, bool go_left) const
 
virtual bool has_edge (const handle_t &left, const handle_t &right) const
 
bool has_edge (const edge_t &edge) const
 Convenient wrapper of has_edge for edge_t argument. More...
 
virtual size_t get_edge_count () const
 
virtual size_t get_total_length () const
 
virtual char get_base (const handle_t &handle, size_t index) const
 
virtual std::string get_subsequence (const handle_t &handle, size_t index, size_t size) const
 
handle_t forward (const handle_t &handle) const
 Get the locally forward version of a handle. More...
 
edge_t edge_handle (const handle_t &left, const handle_t &right) const
 
handle_t traverse_edge_handle (const edge_t &edge, const handle_t &left) const
 
template<typename Iteratee >
bool for_each_edge (const Iteratee &iteratee, bool parallel=false) const
 

Protected Types

using cursor_t = StreamIndex< Graph >::cursor_t
 Define the type we use for cursors into the backing file. More...
 

Protected Member Functions

void with_cursor (function< void(cursor_t &)> callback) const
 Get temporary ownership of a cursor to the backing vg file. More...
 
void find (id_t id, const function< bool(const CacheEntry &)> &iteratee) const
 
bool with_cache_entry (int64_t group_vo, const function< void(const CacheEntry &)> &callback) const
 
- Protected Member Functions inherited from handlegraph::HandleGraph
virtual bool follow_edges_impl (const handle_t &handle, bool go_left, const std::function< bool(const handle_t &)> &iteratee) const =0
 
virtual bool for_each_handle_impl (const std::function< bool(const handle_t &)> &iteratee, bool parallel=false) const =0
 

Protected Attributes

string vg_filename
 
StreamIndex< Graphindex
 Index data about the vg file. More...
 
list< ifstream > cursor_streams
 Input streams referenced by cursors live in this list that grows forever. More...
 
list< unique_ptr< cursor_t > > cursor_pool
 Cursors live in this free pool. More...
 
mutex cursor_pool_mutex
 Access is protected by this mutex. More...
 
LRUCache< int64_t, shared_ptr< CacheEntry > > group_cache
 
mutex cache_mutex
 The cache is protected with this mutex. More...
 

Private Member Functions

 IndexedVG (const IndexedVG &other)=delete
 

Detailed Description

Use a .vg file on disk with a .vgi index to provide random access to the graph data without loading the entire graph into memory. Sort of a compromise between an XG and a VG, except unlike either we don't need the whole graph in memory.

We require that all nodes in the graph appear in ID order within their chunks, and that all chunks appear in ID order. So all nodes are in ID order in the file.

Cannot be copied since internally it contains a ProtobufIterator wrapping an open file. Can only be moved.

All operations are thread-safe to call. Internally we can't be seeking a cursor off to another location in the middle of looping over a run of matchung chunks, but we handle that ourselves.

Internally, we keep a pool of cursors into the backing graph file, and each time we need to actually access the backing graph file we grab a cursor or make one if we don't have a free one.

Internally we also keep a least-recently-used cache of indexed merged-together graph groups. The cache is keyed by group start VO. The cache holds shared pointers to cache entries, so that one thread can be evicting something from the cache while another is still working with it.

Member Typedef Documentation

◆ cursor_t

Define the type we use for cursors into the backing file.

Constructor & Destructor Documentation

◆ IndexedVG() [1/2]

vg::IndexedVG::IndexedVG ( string  graph_filename)

Open a .vg file. If the .vg has a .vg.vgi index, it wil be loaded. If not, an index will be generated and saved.

◆ IndexedVG() [2/2]

vg::IndexedVG::IndexedVG ( const IndexedVG other)
privatedelete

Member Function Documentation

◆ find()

void vg::IndexedVG::find ( id_t  id,
const function< bool(const CacheEntry &)> &  iteratee 
) const
protected

Wrapper around the index's find, with cacheing. Supports stopping early, but doesn't do internal filtering of chunks/runs where the node being queried is in a hole. Runs the iteratee on CacheEntry objects for the runs that might have info on the requested node, in order. Internally holds shared_ptr copies to the cache entries it is handing out references to. Users must do all everything they need the CacheEntry for within the callback as the reference may not be valid afterwards.

◆ flip()

handle_t vg::IndexedVG::flip ( const handle_t handle) const
virtual

Invert the orientation of a handle (potentially without getting its ID)

Implements handlegraph::HandleGraph.

◆ follow_edges_impl()

bool vg::IndexedVG::follow_edges_impl ( const handle_t handle,
bool  go_left,
const function< bool(const handle_t &)> &  iteratee 
) const
virtual

Loop over all the handles to next/previous (right/left) nodes. Passes them to a callback which returns false to stop iterating and true to continue. Returns true if we finished and false if we stopped early.

◆ for_each_handle_impl()

bool vg::IndexedVG::for_each_handle_impl ( const function< bool(const handle_t &)> &  iteratee,
bool  parallel = false 
) const
virtual

Loop over all the nodes in the graph in their local forward orientations, in their internal stored order. Stop if the iteratee returns false. Can be told to run in parallel, in which case stopping after a false return value is on a best-effort basis and iteration order is not defined.

◆ get_handle()

handle_t vg::IndexedVG::get_handle ( const id_t node_id,
bool  is_reverse = false 
) const
virtual

Look up the handle for the node with the given ID in the given orientation.

Implements handlegraph::HandleGraph.

◆ get_id()

id_t vg::IndexedVG::get_id ( const handle_t handle) const
virtual

Get the ID from a handle.

Implements handlegraph::HandleGraph.

◆ get_is_reverse()

bool vg::IndexedVG::get_is_reverse ( const handle_t handle) const
virtual

Get the orientation of a handle.

Implements handlegraph::HandleGraph.

◆ get_length()

size_t vg::IndexedVG::get_length ( const handle_t handle) const
virtual

Get the length of a node.

Implements handlegraph::HandleGraph.

◆ get_node_count()

size_t vg::IndexedVG::get_node_count ( ) const
virtual

Return the number of nodes in the graph.

Implements handlegraph::HandleGraph.

◆ get_sequence()

string vg::IndexedVG::get_sequence ( const handle_t handle) const
virtual

Get the sequence of a node, presented in the handle's local forward orientation.

Implements handlegraph::HandleGraph.

◆ has_node()

bool vg::IndexedVG::has_node ( id_t  node_id) const
virtual

Check if a node exists by ID.

Implements handlegraph::HandleGraph.

◆ max_node_id()

id_t vg::IndexedVG::max_node_id ( ) const
virtual

Return the largest ID in the graph, or some larger number if the largest ID is unavailable. Return value is unspecified if the graph is empty.

Implements handlegraph::HandleGraph.

◆ min_node_id()

id_t vg::IndexedVG::min_node_id ( ) const
virtual

Return the smallest ID in the graph, or some smaller number if the smallest ID is unavailable. Return value is unspecified if the graph is empty.

Implements handlegraph::HandleGraph.

◆ print_report()

void vg::IndexedVG::print_report ( ) const

◆ with_cache_entry()

bool vg::IndexedVG::with_cache_entry ( int64_t  group_vo,
const function< void(const CacheEntry &)> &  callback 
) const
protected

Load or use the cached version of the CacheEntry for the given group start VO. If the EOF sentinel numeric_limits<int64_t>::max() is passed, the callback is not called and false is returned. (This is to enable easy looping to scan over CacheEntries.) Passing any other past-the-end VO is prohibited, and may produce an error. Handles locking the cache for updates and keeping the CacheEntry reference live while the callback is running.

◆ with_cursor()

void vg::IndexedVG::with_cursor ( function< void(cursor_t &)>  callback) const
protected

Get temporary ownership of a cursor to the backing vg file.

Member Data Documentation

◆ cache_mutex

mutex vg::IndexedVG::cache_mutex
mutableprotected

The cache is protected with this mutex.

◆ cursor_pool

list<unique_ptr<cursor_t> > vg::IndexedVG::cursor_pool
mutableprotected

Cursors live in this free pool.

◆ cursor_pool_mutex

mutex vg::IndexedVG::cursor_pool_mutex
mutableprotected

Access is protected by this mutex.

◆ cursor_streams

list<ifstream> vg::IndexedVG::cursor_streams
mutableprotected

Input streams referenced by cursors live in this list that grows forever.

◆ group_cache

LRUCache<int64_t, shared_ptr<CacheEntry> > vg::IndexedVG::group_cache
mutableprotected

This is the cache that holds CacheEntries for groups we have already parsed and indexed. We can only access the cache from one thread at a time, but the shared pointers let us be working with the actual data in ther threads.

◆ index

StreamIndex<Graph> vg::IndexedVG::index
protected

Index data about the vg file.

◆ vg_filename

string vg::IndexedVG::vg_filename
protected

We store the graph filename, so we can have cursors to it created on demand. This is necessary to have e.g. random accesses to bits of the graph while looping over the graph as a whole. The downside is we lose BGZF block cacheing between different streams of access.


The documentation for this class was generated from the following files: