vg
tools for working with variation graphs
|
#include <recombinator.hpp>
Classes | |
struct | Header |
Header of the serialized file. More... | |
struct | Subchain |
Representation of a subchain. More... | |
struct | TopLevelChain |
Representation of a top-level chain. More... | |
Public Types | |
enum | Verbosity : size_t { verbosity_silent = 0, verbosity_basic = 1, verbosity_detailed = 2, verbosity_debug = 3, verbosity_extra_debug = 4 } |
The amount of progress information that should be printed to stderr. More... | |
typedef std::pair< gbwt::size_type, gbwt::size_type > | sequence_type |
A GBWT sequence as (sequence identifier, offset in a node). More... | |
typedef std::pair< std::uint32_t, std::uint32_t > | compact_sequence_type |
A more space-efficient representation of sequence_type . More... | |
Public Member Functions | |
size_t | components () const |
Returns the number of weakly connected components. More... | |
size_t | jobs () const |
Returns the number of GBWT construction jobs. More... | |
size_t | k () const |
Returns the length of the kmers. More... | |
size_t | kmers () const |
Returns the number of kmers in the subchains. More... | |
hash_map< Subchain::kmer_type, size_t > | kmer_counts (const std::string &kff_file, Verbosity verbosity) const |
void | simple_sds_serialize (std::ostream &out) const |
void | serialize_to (const std::string &filename) const |
void | simple_sds_load (std::istream &in) |
void | load_from (const std::string &filename) |
size_t | simple_sds_size () const |
Returns the size of the object in elements. More... | |
std::vector< size_t > | assign_reference_paths (const gbwtgraph::GBZ &gbz, const gbwt::FragmentMap &fragment_map, Verbosity verbosity) const |
Public Attributes | |
Header | header |
std::vector< size_t > | jobs_for_cached_paths |
std::vector< TopLevelChain > | chains |
A representation of the haplotypes in a graph.
The graph is partitioned into top-level chains, which are further partitioned into subchains. Each subchain contains a set of kmers and a collection of sequences. Each sequence is defined by a bitvector marking the kmers that are present.
At the moment, the kmers are minimizers with a single occurrence in the graph. The requirement is that each kmer is specific to a single subchain and does not occur anywhere else in either orientation. (If no haplotype crosses a snarl, that snarl is broken into a suffix and a prefix, and those subchains may share kmers.)
NOTE: This assumes that the top-level chains are linear, not cyclical.
Versions:
typedef std::pair<std::uint32_t, std::uint32_t> vg::Haplotypes::compact_sequence_type |
A more space-efficient representation of sequence_type
.
typedef std::pair<gbwt::size_type, gbwt::size_type> vg::Haplotypes::sequence_type |
A GBWT sequence as (sequence identifier, offset in a node).
enum vg::Haplotypes::Verbosity : size_t |
The amount of progress information that should be printed to stderr.
std::vector< size_t > vg::Haplotypes::assign_reference_paths | ( | const gbwtgraph::GBZ & | gbz, |
const gbwt::FragmentMap & | fragment_map, | ||
Verbosity | verbosity | ||
) | const |
Assigns each reference and generic path in the graph to a GBWT construction job.
For each path handle from 0 to gbz.named_paths() - 1, we assign the path to the given construction job, or jobs() if the path is empty.
|
inline |
Returns the number of weakly connected components.
|
inline |
Returns the number of GBWT construction jobs.
|
inline |
Returns the length of the kmers.
hash_map< Haplotypes::Subchain::kmer_type, size_t > vg::Haplotypes::kmer_counts | ( | const std::string & | kff_file, |
Verbosity | verbosity | ||
) | const |
Returns a mapping from kmers to their counts in the given KFF file. The counts include both the kmer and the reverse complement.
Reads the KFF file using OpenMP threads. Exits with std::exit()
if the file cannot be opened and throws std::runtime_error
if the kmer counts cannot be used.
|
inline |
Returns the number of kmers in the subchains.
void vg::Haplotypes::load_from | ( | const std::string & | filename | ) |
Loads the object from a file in the Simple-SDS format. Prints an error message and exits the program on failure.
void vg::Haplotypes::serialize_to | ( | const std::string & | filename | ) | const |
Serializes the object to a file in the Simple-SDS format. Prints an error message and exits the program on failure.
void vg::Haplotypes::simple_sds_load | ( | std::istream & | in | ) |
Loads the object from a stream in the Simple-SDS format. I/O errors can be detected by checking the stream state. Throws sdsl::simple_sds::InvalidData
if sanity checks fail.
void vg::Haplotypes::simple_sds_serialize | ( | std::ostream & | out | ) | const |
Serializes the object to a stream in the Simple-SDS format. I/O errors can be detected by checking the stream state.
size_t vg::Haplotypes::simple_sds_size | ( | ) | const |
Returns the size of the object in elements.
std::vector<TopLevelChain> vg::Haplotypes::chains |
Header vg::Haplotypes::header |
std::vector<size_t> vg::Haplotypes::jobs_for_cached_paths |