vg
tools for working with variation graphs
|
#include <indexed_vg.hpp>
Classes | |
struct | CacheEntry |
Public Member Functions | |
IndexedVG (string graph_filename) | |
void | print_report () const |
virtual bool | has_node (id_t node_id) const |
Check if a node exists by ID. More... | |
virtual handle_t | get_handle (const id_t &node_id, bool is_reverse=false) const |
Look up the handle for the node with the given ID in the given orientation. More... | |
virtual id_t | get_id (const handle_t &handle) const |
Get the ID from a handle. More... | |
virtual bool | get_is_reverse (const handle_t &handle) const |
Get the orientation of a handle. More... | |
virtual handle_t | flip (const handle_t &handle) const |
Invert the orientation of a handle (potentially without getting its ID) More... | |
virtual size_t | get_length (const handle_t &handle) const |
Get the length of a node. More... | |
virtual string | get_sequence (const handle_t &handle) const |
virtual bool | follow_edges_impl (const handle_t &handle, bool go_left, const function< bool(const handle_t &)> &iteratee) const |
virtual bool | for_each_handle_impl (const function< bool(const handle_t &)> &iteratee, bool parallel=false) const |
virtual size_t | get_node_count () const |
Return the number of nodes in the graph. More... | |
virtual id_t | min_node_id () const |
virtual id_t | max_node_id () const |
Public Member Functions inherited from handlegraph::HandleGraph | |
virtual | ~HandleGraph ()=default |
template<typename Iteratee > | |
bool | follow_edges (const handle_t &handle, bool go_left, const Iteratee &iteratee) const |
template<typename Iteratee > | |
bool | for_each_handle (const Iteratee &iteratee, bool parallel=false) const |
virtual size_t | get_degree (const handle_t &handle, bool go_left) const |
virtual bool | has_edge (const handle_t &left, const handle_t &right) const |
bool | has_edge (const edge_t &edge) const |
Convenient wrapper of has_edge for edge_t argument. More... | |
virtual size_t | get_edge_count () const |
virtual size_t | get_total_length () const |
virtual char | get_base (const handle_t &handle, size_t index) const |
virtual std::string | get_subsequence (const handle_t &handle, size_t index, size_t size) const |
handle_t | forward (const handle_t &handle) const |
Get the locally forward version of a handle. More... | |
edge_t | edge_handle (const handle_t &left, const handle_t &right) const |
handle_t | traverse_edge_handle (const edge_t &edge, const handle_t &left) const |
template<typename Iteratee > | |
bool | for_each_edge (const Iteratee &iteratee, bool parallel=false) const |
Protected Types | |
using | cursor_t = StreamIndex< Graph >::cursor_t |
Define the type we use for cursors into the backing file. More... | |
Protected Member Functions | |
void | with_cursor (function< void(cursor_t &)> callback) const |
Get temporary ownership of a cursor to the backing vg file. More... | |
void | find (id_t id, const function< bool(const CacheEntry &)> &iteratee) const |
bool | with_cache_entry (int64_t group_vo, const function< void(const CacheEntry &)> &callback) const |
Protected Member Functions inherited from handlegraph::HandleGraph | |
virtual bool | follow_edges_impl (const handle_t &handle, bool go_left, const std::function< bool(const handle_t &)> &iteratee) const =0 |
virtual bool | for_each_handle_impl (const std::function< bool(const handle_t &)> &iteratee, bool parallel=false) const =0 |
Protected Attributes | |
string | vg_filename |
StreamIndex< Graph > | index |
Index data about the vg file. More... | |
list< ifstream > | cursor_streams |
Input streams referenced by cursors live in this list that grows forever. More... | |
list< unique_ptr< cursor_t > > | cursor_pool |
Cursors live in this free pool. More... | |
mutex | cursor_pool_mutex |
Access is protected by this mutex. More... | |
LRUCache< int64_t, shared_ptr< CacheEntry > > | group_cache |
mutex | cache_mutex |
The cache is protected with this mutex. More... | |
Private Member Functions | |
IndexedVG (const IndexedVG &other)=delete | |
Use a .vg file on disk with a .vgi index to provide random access to the graph data without loading the entire graph into memory. Sort of a compromise between an XG and a VG, except unlike either we don't need the whole graph in memory.
We require that all nodes in the graph appear in ID order within their chunks, and that all chunks appear in ID order. So all nodes are in ID order in the file.
Cannot be copied since internally it contains a ProtobufIterator wrapping an open file. Can only be moved.
All operations are thread-safe to call. Internally we can't be seeking a cursor off to another location in the middle of looping over a run of matchung chunks, but we handle that ourselves.
Internally, we keep a pool of cursors into the backing graph file, and each time we need to actually access the backing graph file we grab a cursor or make one if we don't have a free one.
Internally we also keep a least-recently-used cache of indexed merged-together graph groups. The cache is keyed by group start VO. The cache holds shared pointers to cache entries, so that one thread can be evicting something from the cache while another is still working with it.
|
protected |
Define the type we use for cursors into the backing file.
vg::IndexedVG::IndexedVG | ( | string | graph_filename | ) |
Open a .vg file. If the .vg has a .vg.vgi index, it wil be loaded. If not, an index will be generated and saved.
|
privatedelete |
|
protected |
Wrapper around the index's find, with cacheing. Supports stopping early, but doesn't do internal filtering of chunks/runs where the node being queried is in a hole. Runs the iteratee on CacheEntry objects for the runs that might have info on the requested node, in order. Internally holds shared_ptr copies to the cache entries it is handing out references to. Users must do all everything they need the CacheEntry for within the callback as the reference may not be valid afterwards.
Invert the orientation of a handle (potentially without getting its ID)
Implements handlegraph::HandleGraph.
|
virtual |
Loop over all the handles to next/previous (right/left) nodes. Passes them to a callback which returns false to stop iterating and true to continue. Returns true if we finished and false if we stopped early.
|
virtual |
Loop over all the nodes in the graph in their local forward orientations, in their internal stored order. Stop if the iteratee returns false. Can be told to run in parallel, in which case stopping after a false return value is on a best-effort basis and iteration order is not defined.
Look up the handle for the node with the given ID in the given orientation.
Implements handlegraph::HandleGraph.
Get the ID from a handle.
Implements handlegraph::HandleGraph.
|
virtual |
Get the orientation of a handle.
Implements handlegraph::HandleGraph.
|
virtual |
Get the length of a node.
Implements handlegraph::HandleGraph.
|
virtual |
Return the number of nodes in the graph.
Implements handlegraph::HandleGraph.
|
virtual |
Get the sequence of a node, presented in the handle's local forward orientation.
Implements handlegraph::HandleGraph.
|
virtual |
Check if a node exists by ID.
Implements handlegraph::HandleGraph.
|
virtual |
Return the largest ID in the graph, or some larger number if the largest ID is unavailable. Return value is unspecified if the graph is empty.
Implements handlegraph::HandleGraph.
|
virtual |
Return the smallest ID in the graph, or some smaller number if the smallest ID is unavailable. Return value is unspecified if the graph is empty.
Implements handlegraph::HandleGraph.
void vg::IndexedVG::print_report | ( | ) | const |
|
protected |
Load or use the cached version of the CacheEntry for the given group start VO. If the EOF sentinel numeric_limits<int64_t>::max() is passed, the callback is not called and false is returned. (This is to enable easy looping to scan over CacheEntries.) Passing any other past-the-end VO is prohibited, and may produce an error. Handles locking the cache for updates and keeping the CacheEntry reference live while the callback is running.
|
protected |
Get temporary ownership of a cursor to the backing vg file.
|
mutableprotected |
The cache is protected with this mutex.
|
mutableprotected |
Cursors live in this free pool.
|
mutableprotected |
Access is protected by this mutex.
|
mutableprotected |
Input streams referenced by cursors live in this list that grows forever.
|
mutableprotected |
This is the cache that holds CacheEntries for groups we have already parsed and indexed. We can only access the cache from one thread at a time, but the shared pointers let us be working with the actual data in ther threads.
|
protected |
Index data about the vg file.
|
protected |
We store the graph filename, so we can have cursors to it created on demand. This is necessary to have e.g. random accesses to bits of the graph while looping over the graph as a whole. The downside is we lose BGZF block cacheing between different streams of access.