vg
tools for working with variation graphs
Public Types | Public Member Functions | Public Attributes | List of all members
vg::Haplotypes::Subchain Struct Reference

Representation of a subchain. More...

#include <recombinator.hpp>

Public Types

enum  subchain_t : std::uint64_t { normal = 0, prefix = 1, suffix = 2, full_haplotype = 3 }
 Subchain types. More...
 
typedef gbwtgraph::Key64::value_type kmer_type
 An encoded kmer. More...
 

Public Member Functions

handle_t start_handle () const
 Returns the start node as a GBWTGraph handle. More...
 
handle_t end_handle () const
 Returns the end node as a GBWTGraph handle. More...
 
bool has_start () const
 Returns true if the subchain has a start node. More...
 
bool has_end () const
 Returns true if the subchain has an end node. More...
 
std::string to_string () const
 Returns a string representation of the type and the boundary nodes. More...
 
sequence_type get_sequence (size_t i) const
 Returns (sequence identifier, offset in a node) for the given sequence. More...
 
size_t distance (const gbwtgraph::GBZ &gbz, size_t i) const
 
double badness (const gbwtgraph::GBZ &gbz) const
 
void simple_sds_serialize (std::ostream &out) const
 Serializes the object to a stream in the simple-sds format. More...
 
void load_v1 (std::istream &in)
 Loads a less space-efficient version 1 or 2 subchain. More...
 
void simple_sds_load (std::istream &in)
 Loads the object from a stream in the simple-sds format. More...
 
size_t simple_sds_size () const
 Returns the size of the object in elements. More...
 

Public Attributes

subchain_t type
 The type of this subchain. More...
 
gbwt::node_type start
 Boundary nodes, or gbwt::ENDMARKER if not present. More...
 
gbwt::node_type end
 
std::vector< kmer_typekmers
 
sdsl::int_vector< 0 > kmer_counts
 Number of haplotypes each kmer appears in. More...
 
std::vector< compact_sequence_typesequences
 Sequences as (GBWT sequence id, offset in the relevant node). More...
 
sdsl::bit_vector kmers_present
 

Detailed Description

Representation of a subchain.

Member Typedef Documentation

◆ kmer_type

typedef gbwtgraph::Key64::value_type vg::Haplotypes::Subchain::kmer_type

An encoded kmer.

Member Enumeration Documentation

◆ subchain_t

Subchain types.

Enumerator
normal 

Normal subchain with two boundary nodes.

prefix 

A prefix with only an end node.

suffix 

A suffix with only a start node.

full_haplotype 

A full haplotype with no boundary nodes.

Member Function Documentation

◆ badness()

double vg::Haplotypes::Subchain::badness ( const gbwtgraph::GBZ &  gbz) const

Returns an estimate of the badness of the subchain. The ideal value is 0.0, and higher values indicate worse subchains. The estimate is based on the following factors:

  • Length of the subchain.
  • Number of haplotypes relative to the expected number.
  • Information content of the kmers (disabled).

◆ distance()

size_t vg::Haplotypes::Subchain::distance ( const gbwtgraph::GBZ &  gbz,
size_t  i 
) const

Returns the distance from the last base of start to the first base of end over the given sequence. Returns 0 if the subchain is not normal or if the sequence does not exist.

◆ end_handle()

handle_t vg::Haplotypes::Subchain::end_handle ( ) const
inline

Returns the end node as a GBWTGraph handle.

◆ get_sequence()

sequence_type vg::Haplotypes::Subchain::get_sequence ( size_t  i) const
inline

Returns (sequence identifier, offset in a node) for the given sequence.

◆ has_end()

bool vg::Haplotypes::Subchain::has_end ( ) const
inline

Returns true if the subchain has an end node.

◆ has_start()

bool vg::Haplotypes::Subchain::has_start ( ) const
inline

Returns true if the subchain has a start node.

◆ load_v1()

void vg::Haplotypes::Subchain::load_v1 ( std::istream &  in)

Loads a less space-efficient version 1 or 2 subchain.

◆ simple_sds_load()

void vg::Haplotypes::Subchain::simple_sds_load ( std::istream &  in)

Loads the object from a stream in the simple-sds format.

◆ simple_sds_serialize()

void vg::Haplotypes::Subchain::simple_sds_serialize ( std::ostream &  out) const

Serializes the object to a stream in the simple-sds format.

◆ simple_sds_size()

size_t vg::Haplotypes::Subchain::simple_sds_size ( ) const

Returns the size of the object in elements.

◆ start_handle()

handle_t vg::Haplotypes::Subchain::start_handle ( ) const
inline

Returns the start node as a GBWTGraph handle.

◆ to_string()

std::string vg::Haplotypes::Subchain::to_string ( ) const

Returns a string representation of the type and the boundary nodes.

Member Data Documentation

◆ end

gbwt::node_type vg::Haplotypes::Subchain::end

◆ kmer_counts

sdsl::int_vector<0> vg::Haplotypes::Subchain::kmer_counts

Number of haplotypes each kmer appears in.

◆ kmers

std::vector<kmer_type> vg::Haplotypes::Subchain::kmers

A vector of distinct kmers. For each kmer, list the kmer itself and the number of haplotypes it appears in.

◆ kmers_present

sdsl::bit_vector vg::Haplotypes::Subchain::kmers_present

A bit vector marking the presence of kmers in the sequences. Sequence i contains kmer j if and only if kmers_present[i * kmers.size() + j] == 1.

◆ sequences

std::vector<compact_sequence_type> vg::Haplotypes::Subchain::sequences

Sequences as (GBWT sequence id, offset in the relevant node).

◆ start

gbwt::node_type vg::Haplotypes::Subchain::start

Boundary nodes, or gbwt::ENDMARKER if not present.

◆ type

subchain_t vg::Haplotypes::Subchain::type

The type of this subchain.


The documentation for this struct was generated from the following files: