vg
tools for working with variation graphs
Public Types | Public Member Functions | Static Public Member Functions | Public Attributes | List of all members
vg::algorithms::GFAParser Class Reference

#include <gfa_to_handle.hpp>

Public Types

using cursor_t = string::const_iterator
 
using chars_t = pair< cursor_t, cursor_t >
 
using tag_list_t = vector< string >
 

Public Member Functions

GFAIDMapInfoid_map ()
 Get the ID map we should be using for parsing. More...
 
void parse (istream &in)
 

Static Public Member Functions

static string extract (const chars_t &range)
 
static size_t length (const chars_t &range)
 
static bool empty (const chars_t &range)
 
static tag_list_t parse_tags (const chars_t &tag_range)
 
static tuple< tag_list_tparse_h (const string &h_line)
 
static tuple< string, chars_t, tag_list_tparse_s (const string &s_line)
 
static tuple< string, bool, string, bool, chars_t, tag_list_tparse_l (const string &l_line)
 
static tuple< string, chars_t, chars_t, tag_list_tparse_p (const string &p_line)
 
static tuple< string, size_t, string, pair< int64_t, int64_t >, chars_t, tag_list_tparse_w (const string &p_line)
 
static void scan_p_visits (const chars_t &visit_range, function< bool(int64_t rank, const chars_t &node_name, bool is_reverse)> visit_step)
 
static void scan_w_visits (const chars_t &visit_range, function< bool(int64_t rank, const chars_t &node_name, bool is_reverse)> visit_step)
 
static void scan_visits (const chars_t &visit_range, char line_type, function< bool(int64_t rank, const chars_t &node_name, bool is_reverse)> visit_step)
 
static bool decode_rgfa_tags (const tag_list_t &tags, string *out_name=nullptr, int64_t *out_offset=nullptr, int64_t *out_rank=nullptr)
 
static nid_t assign_new_sequence_id (const string &str, GFAIDMapInfo &id_map_info)
 
static nid_t find_existing_sequence_id (const string &str, GFAIDMapInfo &id_map_info)
 

Public Attributes

unique_ptr< GFAIDMapInfointernal_id_map
 
GFAIDMapInfoexternal_id_map = nullptr
 
vector< std::function< void(const tag_list_t &tags)> > header_listeners
 These listeners are called for the header line(s), if any. More...
 
vector< std::function< void(nid_t id, const chars_t &sequence, const tag_list_t &tags)> > node_listeners
 
vector< std::function< void(nid_t from, bool from_is_reverse, nid_t to, bool to_is_reverse, const chars_t &overlap, const tag_list_t &tags)> > edge_listeners
 
vector< std::function< void(const string &name, const chars_t &visits, const chars_t &overlaps, const tag_list_t &tags)> > path_listeners
 
vector< std::function< void(const string &sample_name, int64_t haplotype, const string &contig_name, const pair< int64_t, int64_t > &subrange, const chars_t &visits, const tag_list_t &tags)> > walk_listeners
 
vector< std::function< void(nid_t id, int64_t offset, size_t length, const string &path_name, int64_t path_rank)> > rgfa_listeners
 
int64_t max_rgfa_rank = -1
 Include paths from rGFA tags at this rank or lower. Set to -1 to ignore rGFA tags. More...
 
bool stop_on_duplicate_paths = false
 

Detailed Description

Lower-level tools for parsing GFA elements.

Parsing functions return the fields as strings, and don't support overlaps. Optional tags get read as strings in the vectors.

Allows you to register "listeners" for different kinds of GFA file items, by adding functions to the various *_listeners vectors. These listeners can raise GFAFormatError or its subclasses if they do not like what the GFA is saying. Some types of GFAFormatError can be caught internally and processing of the file will continue with the next line, but not with the next listener for that line, so the user is responsible for worrying about what happens if some but not all listeners for something end up getting called because one failed.

Member Typedef Documentation

◆ chars_t

◆ cursor_t

using vg::algorithms::GFAParser::cursor_t = string::const_iterator

◆ tag_list_t

using vg::algorithms::GFAParser::tag_list_t = vector<string>

Member Function Documentation

◆ assign_new_sequence_id()

nid_t vg::algorithms::GFAParser::assign_new_sequence_id ( const string &  str,
GFAIDMapInfo id_map_info 
)
static

Parse a GFA name into a numeric id.

If all ids are numeric, they will be converted directly with stol.

If all ids are non-numeric, they will get incrementing ids beginning with 1, in order they are visited.

If they are a mix of numeric and non-numeric, the numberic ones will be converted with stol until the first non-numeric one is found, then it will revert to using max-id.

Since non-numeric ids are dependent on the order the nodes are scanned, there is the unfortunate side effect that they will be different sepending on whether the GFA is processed in lexicographic order or file order.

If the string ID has been seen before, returns 0.

◆ decode_rgfa_tags()

bool vg::algorithms::GFAParser::decode_rgfa_tags ( const tag_list_t tags,
string *  out_name = nullptr,
int64_t *  out_offset = nullptr,
int64_t *  out_rank = nullptr 
)
static

Decode rGFA tags from the given list of tags from an S line. Stores rGFA parameters at the given locations if set. Returns true if a complete set of tags was found.

◆ empty()

static bool vg::algorithms::GFAParser::empty ( const chars_t range)
inlinestatic

◆ extract()

static string vg::algorithms::GFAParser::extract ( const chars_t range)
inlinestatic

◆ find_existing_sequence_id()

nid_t vg::algorithms::GFAParser::find_existing_sequence_id ( const string &  str,
GFAIDMapInfo id_map_info 
)
static

Find the existing sequence ID for the given node name, or 0 if it has not been seen yet.

◆ id_map()

GFAIDMapInfo & vg::algorithms::GFAParser::id_map ( )
inline

Get the ID map we should be using for parsing.

◆ length()

static size_t vg::algorithms::GFAParser::length ( const chars_t range)
inlinestatic

◆ parse()

void vg::algorithms::GFAParser::parse ( istream &  in)

Parse GFA from the given stream.

◆ parse_h()

tuple< GFAParser::tag_list_t > vg::algorithms::GFAParser::parse_h ( const string &  h_line)
static

Parse an H line to tags

◆ parse_l()

tuple< string, bool, string, bool, GFAParser::chars_t, GFAParser::tag_list_t > vg::algorithms::GFAParser::parse_l ( const string &  l_line)
static

Parse an L line to name, is_reverse, name, is_reverse, overlap, and tags

◆ parse_p()

tuple< string, GFAParser::chars_t, GFAParser::chars_t, GFAParser::tag_list_t > vg::algorithms::GFAParser::parse_p ( const string &  p_line)
static

Parse a P line into name, visits, overlaps, and tags.

◆ parse_s()

tuple< string, GFAParser::chars_t, GFAParser::tag_list_t > vg::algorithms::GFAParser::parse_s ( const string &  s_line)
static

Parse an S line to name, sequence, and tags

◆ parse_tags()

GFAParser::tag_list_t vg::algorithms::GFAParser::parse_tags ( const chars_t tag_range)
static

Parse tags out from a possibly empty range to a vector of tag strings.

◆ parse_w()

tuple< string, size_t, string, pair< int64_t, int64_t >, GFAParser::chars_t, GFAParser::tag_list_t > vg::algorithms::GFAParser::parse_w ( const string &  p_line)
static

Parse a W line into sample, haplotype, sequence, range (start and end), walk, and tags. If some or all of the range is missing, uses NO_SUBRANGE and NO_END_POSITION form PathMetadata. Doesn't include an end position if a start position isn't set.

◆ scan_p_visits()

void vg::algorithms::GFAParser::scan_p_visits ( const chars_t visit_range,
function< bool(int64_t rank, const chars_t &node_name, bool is_reverse)>  visit_step 
)
static

Scan visits extracted from a P line. Calls a callback with all the steps. visit_step takes {rank (-1 if path empty), step node name, step reversed} and returns true if it wants to keep iterating (false means stop).

◆ scan_visits()

void vg::algorithms::GFAParser::scan_visits ( const chars_t visit_range,
char  line_type,
function< bool(int64_t rank, const chars_t &node_name, bool is_reverse)>  visit_step 
)
static

Scan visits extracted from a P or W line, as specified in line_type. Calls a callback with all the steps. visit_step takes {rank (-1 if path empty), step node name, step reversed} and returns true if it wants to keep iterating (false means stop).

◆ scan_w_visits()

void vg::algorithms::GFAParser::scan_w_visits ( const chars_t visit_range,
function< bool(int64_t rank, const chars_t &node_name, bool is_reverse)>  visit_step 
)
static

Scan visits extracted from a W line. Calls a callback with all the steps. visit_step takes {rank (-1 if path empty), step node name, step reversed} and returns true if it wants to keep iterating (false means stop).

Member Data Documentation

◆ edge_listeners

vector<std::function<void(nid_t from, bool from_is_reverse, nid_t to, bool to_is_reverse, const chars_t& overlap, const tag_list_t& tags)> > vg::algorithms::GFAParser::edge_listeners

These listeners will be called with information for all edges, after the node listeners for the involved nodes. Listeners are not protected from duplicate edges.

◆ external_id_map

GFAIDMapInfo* vg::algorithms::GFAParser::external_id_map = nullptr

◆ header_listeners

vector<std::function<void(const tag_list_t& tags)> > vg::algorithms::GFAParser::header_listeners

These listeners are called for the header line(s), if any.

◆ internal_id_map

unique_ptr<GFAIDMapInfo> vg::algorithms::GFAParser::internal_id_map

◆ max_rgfa_rank

int64_t vg::algorithms::GFAParser::max_rgfa_rank = -1

Include paths from rGFA tags at this rank or lower. Set to -1 to ignore rGFA tags.

◆ node_listeners

vector<std::function<void(nid_t id, const chars_t& sequence, const tag_list_t& tags)> > vg::algorithms::GFAParser::node_listeners

These listeners will be called with information for all nodes. Listeners are protected from duplicate node IDs.

◆ path_listeners

vector<std::function<void(const string& name, const chars_t& visits, const chars_t& overlaps, const tag_list_t& tags)> > vg::algorithms::GFAParser::path_listeners

These listeners will be called with information for all P line paths, after the listeners for all involved nodes, and for the first header if any. Listeners are not protected from duplicate path names.

◆ rgfa_listeners

vector<std::function<void(nid_t id, int64_t offset, size_t length, const string& path_name, int64_t path_rank)> > vg::algorithms::GFAParser::rgfa_listeners

These listeners will be called with each visit of an rGFA path to a node, after the node listeners for the involved node, but in an unspecified order with respect to listeners for headers. They will be called in order along each path. The listener is responsible for detecting any gaps in the offset space and producing multiple subpaths if necessary. Listeners are protected from duplicate paths with the same name and different ranks, but not from overlaps of nodes in path offset space.

◆ stop_on_duplicate_paths

bool vg::algorithms::GFAParser::stop_on_duplicate_paths = false

Set to true to treat duplicate paths as errors. Otherwise, they will be treated as warnings and the duplicated will be discarded. Some GFA files, like the first HPRC graph releases, include duplicate paths.

◆ walk_listeners

vector<std::function<void(const string& sample_name, int64_t haplotype, const string& contig_name, const pair<int64_t, int64_t>& subrange, const chars_t& visits, const tag_list_t& tags)> > vg::algorithms::GFAParser::walk_listeners

These listeners will be called with information for all W line paths, after the listeners for all involved nodes, and for the first header if any. Listeners are not protected from duplicate path metadata.


The documentation for this class was generated from the following files: