vg
tools for working with variation graphs
Classes | Public Types | Public Member Functions | Public Attributes | Static Public Attributes | Private Member Functions | List of all members
vg::HaplotypePartitioner Class Reference

#include <recombinator.hpp>

Classes

struct  Parameters
 Parameters for partition_haplotypes(). More...
 
struct  Subchain
 

Public Types

typedef Haplotypes::Verbosity Verbosity
 The amount of progress information that should be printed to stderr. More...
 
typedef Haplotypes::sequence_type sequence_type
 A GBWT sequence as (sequence identifier, offset in a node). More...
 
typedef Haplotypes::Subchain::kmer_type kmer_type
 An encoded kmer. More...
 
typedef gbwtgraph::MinimizerIndex< gbwtgraph::Key64, gbwtgraph::Position > minimizer_index_type
 Minimizer index without payloads. More...
 

Public Member Functions

 HaplotypePartitioner (const gbwtgraph::GBZ &gbz, const gbwt::FastLocate &r_index, const SnarlDistanceIndex &distance_index, const minimizer_index_type &minimizer_index, Verbosity verbosity)
 Creates a new HaplotypePartitioner using the given indexes. More...
 
Haplotypes partition_haplotypes (const Parameters &parameters) const
 

Public Attributes

const gbwtgraph::GBZ & gbz
 
const gbwt::FastLocate & r_index
 
const SnarlDistanceIndex & distance_index
 
const minimizer_index_typeminimizer_index
 
Verbosity verbosity
 

Static Public Attributes

constexpr static size_t SUBCHAIN_LENGTH = 10000
 Target length of a subchain. More...
 
constexpr static size_t APPROXIMATE_JOBS = 32
 Approximate number of construction jobs to be created. More...
 

Private Member Functions

size_t get_distance (handle_t from, handle_t to) const
 
bool contains_reversals (handle_t handle) const
 
std::vector< Subchainget_subchains (const gbwtgraph::TopLevelChain &chain, const Parameters &parameters) const
 
std::vector< sequence_typeget_sequence_visits (handle_t handle) const
 
std::vector< sequence_typeget_sequences (handle_t handle) const
 
std::vector< sequence_typeget_sequences (Subchain subchain) const
 
std::vector< kmer_typeunique_minimizers (gbwt::size_type sequence_id) const
 
std::vector< kmer_typeunique_minimizers (sequence_type sequence, Subchain subchain) const
 
void build_subchains (const gbwtgraph::TopLevelChain &chain, Haplotypes::TopLevelChain &output, const Parameters &parameters) const
 

Detailed Description

A tool for transforming the haplotypes in a GBWT index into a Haplotypes representation. Requires a GBZ graph, an r-index, a distance index, and a minimizer index.

Member Typedef Documentation

◆ kmer_type

An encoded kmer.

◆ minimizer_index_type

typedef gbwtgraph::MinimizerIndex<gbwtgraph::Key64, gbwtgraph::Position> vg::HaplotypePartitioner::minimizer_index_type

Minimizer index without payloads.

◆ sequence_type

A GBWT sequence as (sequence identifier, offset in a node).

◆ Verbosity

The amount of progress information that should be printed to stderr.

Constructor & Destructor Documentation

◆ HaplotypePartitioner()

vg::HaplotypePartitioner::HaplotypePartitioner ( const gbwtgraph::GBZ &  gbz,
const gbwt::FastLocate &  r_index,
const SnarlDistanceIndex &  distance_index,
const minimizer_index_type minimizer_index,
Verbosity  verbosity 
)

Creates a new HaplotypePartitioner using the given indexes.

Member Function Documentation

◆ build_subchains()

void vg::HaplotypePartitioner::build_subchains ( const gbwtgraph::TopLevelChain &  chain,
Haplotypes::TopLevelChain output,
const Parameters parameters 
) const
private

◆ contains_reversals()

bool vg::HaplotypePartitioner::contains_reversals ( handle_t  handle) const
private

◆ get_distance()

size_t vg::HaplotypePartitioner::get_distance ( handle_t  from,
handle_t  to 
) const
private

◆ get_sequence_visits()

std::vector< HaplotypePartitioner::sequence_type > vg::HaplotypePartitioner::get_sequence_visits ( handle_t  handle) const
private

◆ get_sequences() [1/2]

std::vector< HaplotypePartitioner::sequence_type > vg::HaplotypePartitioner::get_sequences ( handle_t  handle) const
private

◆ get_sequences() [2/2]

std::vector< HaplotypePartitioner::sequence_type > vg::HaplotypePartitioner::get_sequences ( Subchain  subchain) const
private

◆ get_subchains()

std::vector< HaplotypePartitioner::Subchain > vg::HaplotypePartitioner::get_subchains ( const gbwtgraph::TopLevelChain &  chain,
const Parameters parameters 
) const
private

◆ partition_haplotypes()

Haplotypes vg::HaplotypePartitioner::partition_haplotypes ( const Parameters parameters) const

Creates a Haplotypes representation of the haplotypes in the GBWT index.

Top-level chains (weakly connected components in the graph) are assigned to a number of jobs that can be later used as GBWT construction jobs. Multiple jobs are run in parallel using OpenMP threads.

Each top-level chain is partitioned into subchains that consist of one or more snarls. Multiple snarls are combined into the same subchain if the minimum distance over the subchain is at most the target length and there are GBWT haplotypes that cross the subchain. We also keep extending the subchain if a haplotype would cross the end in both directions. By doing this, we can avoid sequence loss with haplotypes reversing their direction, while keeping kmers specific to each subchain.

If there are no snarls in a top-level chain, it is represented as a single subchain without boundary nodes.

Haplotypes crossing each subchain are represented using minimizers with a single occurrence in the graph.

Throws std::runtime_error on error in single-threaded parts and exits with std::exit(EXIT_FAILURE) in multi-threaded parts.

◆ unique_minimizers() [1/2]

std::vector< HaplotypePartitioner::kmer_type > vg::HaplotypePartitioner::unique_minimizers ( gbwt::size_type  sequence_id) const
private

◆ unique_minimizers() [2/2]

std::vector< HaplotypePartitioner::kmer_type > vg::HaplotypePartitioner::unique_minimizers ( sequence_type  sequence,
Subchain  subchain 
) const
private

Member Data Documentation

◆ APPROXIMATE_JOBS

constexpr size_t vg::HaplotypePartitioner::APPROXIMATE_JOBS = 32
staticconstexpr

Approximate number of construction jobs to be created.

◆ distance_index

const SnarlDistanceIndex& vg::HaplotypePartitioner::distance_index

◆ gbz

const gbwtgraph::GBZ& vg::HaplotypePartitioner::gbz

◆ minimizer_index

const minimizer_index_type& vg::HaplotypePartitioner::minimizer_index

◆ r_index

const gbwt::FastLocate& vg::HaplotypePartitioner::r_index

◆ SUBCHAIN_LENGTH

constexpr size_t vg::HaplotypePartitioner::SUBCHAIN_LENGTH = 10000
staticconstexpr

Target length of a subchain.

◆ verbosity

Verbosity vg::HaplotypePartitioner::verbosity

The documentation for this class was generated from the following files: