vg
tools for working with variation graphs
|
#include <recombinator.hpp>
Classes | |
struct | Parameters |
Parameters for partition_haplotypes() . More... | |
struct | Subchain |
Public Types | |
typedef Haplotypes::Verbosity | Verbosity |
The amount of progress information that should be printed to stderr. More... | |
typedef Haplotypes::sequence_type | sequence_type |
A GBWT sequence as (sequence identifier, offset in a node). More... | |
typedef Haplotypes::Subchain::kmer_type | kmer_type |
An encoded kmer. More... | |
typedef gbwtgraph::MinimizerIndex< gbwtgraph::Key64, gbwtgraph::Position > | minimizer_index_type |
Minimizer index without payloads. More... | |
Public Member Functions | |
HaplotypePartitioner (const gbwtgraph::GBZ &gbz, const gbwt::FastLocate &r_index, const SnarlDistanceIndex &distance_index, const minimizer_index_type &minimizer_index, Verbosity verbosity) | |
Creates a new HaplotypePartitioner using the given indexes. More... | |
Haplotypes | partition_haplotypes (const Parameters ¶meters) const |
Public Attributes | |
const gbwtgraph::GBZ & | gbz |
const gbwt::FastLocate & | r_index |
const SnarlDistanceIndex & | distance_index |
const minimizer_index_type & | minimizer_index |
Verbosity | verbosity |
Static Public Attributes | |
constexpr static size_t | SUBCHAIN_LENGTH = 10000 |
Target length of a subchain. More... | |
constexpr static size_t | APPROXIMATE_JOBS = 32 |
Approximate number of construction jobs to be created. More... | |
Private Member Functions | |
size_t | get_distance (handle_t from, handle_t to) const |
bool | contains_reversals (handle_t handle) const |
std::vector< Subchain > | get_subchains (const gbwtgraph::TopLevelChain &chain, const Parameters ¶meters) const |
std::vector< sequence_type > | get_sequence_visits (handle_t handle) const |
std::vector< sequence_type > | get_sequences (handle_t handle) const |
std::vector< sequence_type > | get_sequences (Subchain subchain) const |
std::vector< kmer_type > | unique_minimizers (gbwt::size_type sequence_id) const |
std::vector< kmer_type > | unique_minimizers (sequence_type sequence, Subchain subchain) const |
void | build_subchains (const gbwtgraph::TopLevelChain &chain, Haplotypes::TopLevelChain &output, const Parameters ¶meters) const |
A tool for transforming the haplotypes in a GBWT index into a Haplotypes
representation. Requires a GBZ graph, an r-index, a distance index, and a minimizer index.
An encoded kmer.
typedef gbwtgraph::MinimizerIndex<gbwtgraph::Key64, gbwtgraph::Position> vg::HaplotypePartitioner::minimizer_index_type |
Minimizer index without payloads.
A GBWT sequence as (sequence identifier, offset in a node).
The amount of progress information that should be printed to stderr.
vg::HaplotypePartitioner::HaplotypePartitioner | ( | const gbwtgraph::GBZ & | gbz, |
const gbwt::FastLocate & | r_index, | ||
const SnarlDistanceIndex & | distance_index, | ||
const minimizer_index_type & | minimizer_index, | ||
Verbosity | verbosity | ||
) |
Creates a new HaplotypePartitioner
using the given indexes.
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
Haplotypes vg::HaplotypePartitioner::partition_haplotypes | ( | const Parameters & | parameters | ) | const |
Creates a Haplotypes
representation of the haplotypes in the GBWT index.
Top-level chains (weakly connected components in the graph) are assigned to a number of jobs that can be later used as GBWT construction jobs. Multiple jobs are run in parallel using OpenMP threads.
Each top-level chain is partitioned into subchains that consist of one or more snarls. Multiple snarls are combined into the same subchain if the minimum distance over the subchain is at most the target length and there are GBWT haplotypes that cross the subchain. We also keep extending the subchain if a haplotype would cross the end in both directions. By doing this, we can avoid sequence loss with haplotypes reversing their direction, while keeping kmers specific to each subchain.
If there are no snarls in a top-level chain, it is represented as a single subchain without boundary nodes.
Haplotypes crossing each subchain are represented using minimizers with a single occurrence in the graph.
Throws std::runtime_error
on error in single-threaded parts and exits with std::exit(EXIT_FAILURE)
in multi-threaded parts.
|
private |
|
private |
|
staticconstexpr |
Approximate number of construction jobs to be created.
const SnarlDistanceIndex& vg::HaplotypePartitioner::distance_index |
const gbwtgraph::GBZ& vg::HaplotypePartitioner::gbz |
const minimizer_index_type& vg::HaplotypePartitioner::minimizer_index |
const gbwt::FastLocate& vg::HaplotypePartitioner::r_index |
|
staticconstexpr |
Target length of a subchain.
Verbosity vg::HaplotypePartitioner::verbosity |