vg
tools for working with variation graphs
|
#include <gfa_to_handle.hpp>
Public Types | |
using | cursor_t = string::const_iterator |
using | chars_t = pair< cursor_t, cursor_t > |
using | tag_list_t = vector< string > |
Public Member Functions | |
GFAIDMapInfo & | id_map () |
Get the ID map we should be using for parsing. More... | |
void | parse (istream &in) |
Static Public Member Functions | |
static string | extract (const chars_t &range) |
static size_t | length (const chars_t &range) |
static bool | empty (const chars_t &range) |
static tag_list_t | parse_tags (const chars_t &tag_range) |
static tuple< tag_list_t > | parse_h (const string &h_line) |
static tuple< string, chars_t, tag_list_t > | parse_s (const string &s_line) |
static tuple< string, bool, string, bool, chars_t, tag_list_t > | parse_l (const string &l_line) |
static tuple< string, chars_t, chars_t, tag_list_t > | parse_p (const string &p_line) |
static tuple< string, size_t, string, pair< int64_t, int64_t >, chars_t, tag_list_t > | parse_w (const string &p_line) |
static void | scan_p_visits (const chars_t &visit_range, function< bool(int64_t rank, const chars_t &node_name, bool is_reverse)> visit_step) |
static void | scan_w_visits (const chars_t &visit_range, function< bool(int64_t rank, const chars_t &node_name, bool is_reverse)> visit_step) |
static void | scan_visits (const chars_t &visit_range, char line_type, function< bool(int64_t rank, const chars_t &node_name, bool is_reverse)> visit_step) |
static bool | decode_rgfa_tags (const tag_list_t &tags, string *out_name=nullptr, int64_t *out_offset=nullptr, int64_t *out_rank=nullptr) |
static nid_t | assign_new_sequence_id (const string &str, GFAIDMapInfo &id_map_info) |
static nid_t | find_existing_sequence_id (const string &str, GFAIDMapInfo &id_map_info) |
Public Attributes | |
unique_ptr< GFAIDMapInfo > | internal_id_map |
GFAIDMapInfo * | external_id_map = nullptr |
vector< std::function< void(const tag_list_t &tags)> > | header_listeners |
These listeners are called for the header line(s), if any. More... | |
vector< std::function< void(nid_t id, const chars_t &sequence, const tag_list_t &tags)> > | node_listeners |
vector< std::function< void(nid_t from, bool from_is_reverse, nid_t to, bool to_is_reverse, const chars_t &overlap, const tag_list_t &tags)> > | edge_listeners |
vector< std::function< void(const string &name, const chars_t &visits, const chars_t &overlaps, const tag_list_t &tags)> > | path_listeners |
vector< std::function< void(const string &sample_name, int64_t haplotype, const string &contig_name, const pair< int64_t, int64_t > &subrange, const chars_t &visits, const tag_list_t &tags)> > | walk_listeners |
vector< std::function< void(nid_t id, int64_t offset, size_t length, const string &path_name, int64_t path_rank)> > | rgfa_listeners |
int64_t | max_rgfa_rank = -1 |
Include paths from rGFA tags at this rank or lower. Set to -1 to ignore rGFA tags. More... | |
bool | stop_on_duplicate_paths = false |
Lower-level tools for parsing GFA elements.
Parsing functions return the fields as strings, and don't support overlaps. Optional tags get read as strings in the vectors.
Allows you to register "listeners" for different kinds of GFA file items, by adding functions to the various *_listeners vectors. These listeners can raise GFAFormatError or its subclasses if they do not like what the GFA is saying. Some types of GFAFormatError can be caught internally and processing of the file will continue with the next line, but not with the next listener for that line, so the user is responsible for worrying about what happens if some but not all listeners for something end up getting called because one failed.
using vg::algorithms::GFAParser::chars_t = pair<cursor_t, cursor_t> |
using vg::algorithms::GFAParser::cursor_t = string::const_iterator |
using vg::algorithms::GFAParser::tag_list_t = vector<string> |
|
static |
Parse a GFA name into a numeric id.
If all ids are numeric, they will be converted directly with stol.
If all ids are non-numeric, they will get incrementing ids beginning with 1, in order they are visited.
If they are a mix of numeric and non-numeric, the numberic ones will be converted with stol until the first non-numeric one is found, then it will revert to using max-id.
Since non-numeric ids are dependent on the order the nodes are scanned, there is the unfortunate side effect that they will be different sepending on whether the GFA is processed in lexicographic order or file order.
If the string ID has been seen before, returns 0.
|
static |
Decode rGFA tags from the given list of tags from an S line. Stores rGFA parameters at the given locations if set. Returns true if a complete set of tags was found.
|
inlinestatic |
|
inlinestatic |
|
static |
Find the existing sequence ID for the given node name, or 0 if it has not been seen yet.
|
inline |
Get the ID map we should be using for parsing.
|
inlinestatic |
void vg::algorithms::GFAParser::parse | ( | istream & | in | ) |
Parse GFA from the given stream.
|
static |
Parse an H line to tags
|
static |
Parse an L line to name, is_reverse, name, is_reverse, overlap, and tags
|
static |
Parse a P line into name, visits, overlaps, and tags.
|
static |
Parse an S line to name, sequence, and tags
|
static |
Parse tags out from a possibly empty range to a vector of tag strings.
|
static |
Parse a W line into sample, haplotype, sequence, range (start and end), walk, and tags. If some or all of the range is missing, uses NO_SUBRANGE and NO_END_POSITION form PathMetadata. Doesn't include an end position if a start position isn't set.
|
static |
Scan visits extracted from a P line. Calls a callback with all the steps. visit_step takes {rank (-1 if path empty), step node name, step reversed} and returns true if it wants to keep iterating (false means stop).
|
static |
Scan visits extracted from a P or W line, as specified in line_type. Calls a callback with all the steps. visit_step takes {rank (-1 if path empty), step node name, step reversed} and returns true if it wants to keep iterating (false means stop).
|
static |
Scan visits extracted from a W line. Calls a callback with all the steps. visit_step takes {rank (-1 if path empty), step node name, step reversed} and returns true if it wants to keep iterating (false means stop).
vector<std::function<void(nid_t from, bool from_is_reverse, nid_t to, bool to_is_reverse, const chars_t& overlap, const tag_list_t& tags)> > vg::algorithms::GFAParser::edge_listeners |
These listeners will be called with information for all edges, after the node listeners for the involved nodes. Listeners are not protected from duplicate edges.
GFAIDMapInfo* vg::algorithms::GFAParser::external_id_map = nullptr |
vector<std::function<void(const tag_list_t& tags)> > vg::algorithms::GFAParser::header_listeners |
These listeners are called for the header line(s), if any.
unique_ptr<GFAIDMapInfo> vg::algorithms::GFAParser::internal_id_map |
int64_t vg::algorithms::GFAParser::max_rgfa_rank = -1 |
Include paths from rGFA tags at this rank or lower. Set to -1 to ignore rGFA tags.
vector<std::function<void(nid_t id, const chars_t& sequence, const tag_list_t& tags)> > vg::algorithms::GFAParser::node_listeners |
These listeners will be called with information for all nodes. Listeners are protected from duplicate node IDs.
vector<std::function<void(const string& name, const chars_t& visits, const chars_t& overlaps, const tag_list_t& tags)> > vg::algorithms::GFAParser::path_listeners |
These listeners will be called with information for all P line paths, after the listeners for all involved nodes, and for the first header if any. Listeners are not protected from duplicate path names.
vector<std::function<void(nid_t id, int64_t offset, size_t length, const string& path_name, int64_t path_rank)> > vg::algorithms::GFAParser::rgfa_listeners |
These listeners will be called with each visit of an rGFA path to a node, after the node listeners for the involved node, but in an unspecified order with respect to listeners for headers. They will be called in order along each path. The listener is responsible for detecting any gaps in the offset space and producing multiple subpaths if necessary. Listeners are protected from duplicate paths with the same name and different ranks, but not from overlaps of nodes in path offset space.
bool vg::algorithms::GFAParser::stop_on_duplicate_paths = false |
Set to true to treat duplicate paths as errors. Otherwise, they will be treated as warnings and the duplicated will be discarded. Some GFA files, like the first HPRC graph releases, include duplicate paths.
vector<std::function<void(const string& sample_name, int64_t haplotype, const string& contig_name, const pair<int64_t, int64_t>& subrange, const chars_t& visits, const tag_list_t& tags)> > vg::algorithms::GFAParser::walk_listeners |
These listeners will be called with information for all W line paths, after the listeners for all involved nodes, and for the first header if any. Listeners are not protected from duplicate path metadata.