vg
tools for working with variation graphs
Classes | Public Types | Public Member Functions | Public Attributes | Static Public Attributes | Private Member Functions | List of all members
vg::Recombinator Class Reference

#include <recombinator.hpp>

Classes

struct  LocalHaplotype
 A local haplotype sequence within a single subchain. More...
 
struct  Parameters
 Parameters for generate_haplotypes(). More...
 
struct  Statistics
 Statistics on the generated haplotypes. More...
 

Public Types

enum  kmer_presence { absent, heterozygous, present, frequent }
 Kmer classification. More...
 
typedef Haplotypes::Verbosity Verbosity
 The amount of progress information that should be printed to stderr. More...
 
typedef Haplotypes::sequence_type sequence_type
 A GBWT sequence as (sequence identifier, offset in a node). More...
 

Public Member Functions

 Recombinator (const gbwtgraph::GBZ &gbz, const Haplotypes &haplotypes, Verbosity verbosity)
 Creates a new Recombinator. More...
 
gbwt::GBWT generate_haplotypes (const std::string &kff_file, const Parameters &parameters) const
 

Public Attributes

const gbwtgraph::GBZ & gbz
 
const Haplotypeshaplotypes
 
gbwt::FragmentMap fragment_map
 
Verbosity verbosity
 
std::vector< size_t > jobs_for_cached_paths
 

Static Public Attributes

constexpr static size_t NUM_HAPLOTYPES = 4
 Number of haplotypes to be generated. More...
 
constexpr static size_t NUM_CANDIDATES = 32
 A reasonable number of candidates for diploid sampling. More...
 
constexpr static double BADNESS_THRESHOLD = 4.0
 Badness threshold for subchains. More...
 
constexpr static size_t COVERAGE = 0
 Expected kmer coverage. Use 0 to estimate from kmer counts. More...
 
constexpr static size_t KFF_BLOCK_SIZE = 1000000
 Block size (in kmers) for reading KFF files. More...
 
constexpr static double PRESENT_DISCOUNT = 0.9
 
constexpr static double HET_ADJUSTMENT = 0.05
 
constexpr static double ABSENT_SCORE = 0.8
 

Private Member Functions

Statistics generate_haplotypes (const Haplotypes::TopLevelChain &chain, const hash_map< Haplotypes::Subchain::kmer_type, size_t > &kmer_counts, gbwt::GBWTBuilder &builder, gbwtgraph::MetadataBuilder &metadata, const Parameters &parameters, double coverage) const
 

Detailed Description

A class that creates synthetic haplotypes from a Haplotypes representation of local haplotypes.

Member Typedef Documentation

◆ sequence_type

A GBWT sequence as (sequence identifier, offset in a node).

◆ Verbosity

The amount of progress information that should be printed to stderr.

Member Enumeration Documentation

◆ kmer_presence

Kmer classification.

Enumerator
absent 
heterozygous 
present 
frequent 

Constructor & Destructor Documentation

◆ Recombinator()

vg::Recombinator::Recombinator ( const gbwtgraph::GBZ &  gbz,
const Haplotypes haplotypes,
Verbosity  verbosity 
)

Creates a new Recombinator.

Member Function Documentation

◆ generate_haplotypes() [1/2]

Recombinator::Statistics vg::Recombinator::generate_haplotypes ( const Haplotypes::TopLevelChain chain,
const hash_map< Haplotypes::Subchain::kmer_type, size_t > &  kmer_counts,
gbwt::GBWTBuilder &  builder,
gbwtgraph::MetadataBuilder &  metadata,
const Parameters parameters,
double  coverage 
) const
private

◆ generate_haplotypes() [2/2]

gbwt::GBWT vg::Recombinator::generate_haplotypes ( const std::string &  kff_file,
const Parameters parameters 
) const

Generates haplotypes based on the kmer counts in the given KFF file.

Runs multiple GBWT construction jobs in parallel using OpenMP threads and generates the specified number of haplotypes in each top-level chain (component).

Each generated haplotype has a single source haplotype in each subchain. The source haplotype may consist of multiple fragments. Subchains are by unary paths. Suffix / prefix subchains in the middle of a chain create fragment breaks in every haplotype. If the chain starts without a prefix (ends without a suffix), the haplotype chosen for the first (last) subchain is used from the start (continued until the end).

Throws std::runtime_error on error in single-threaded parts and exits with std::exit(EXIT_FAILURE) in multi-threaded parts.

Member Data Documentation

◆ ABSENT_SCORE

constexpr double vg::Recombinator::ABSENT_SCORE = 0.8
staticconstexpr

Score for getting an absent kmer right/wrong. This should be less than 1, if we assume that having the right variants in the graph is more important than keeping wrong variants out.

◆ BADNESS_THRESHOLD

constexpr static double vg::Recombinator::BADNESS_THRESHOLD = 4.0
staticconstexpr

Badness threshold for subchains.

◆ COVERAGE

constexpr size_t vg::Recombinator::COVERAGE = 0
staticconstexpr

Expected kmer coverage. Use 0 to estimate from kmer counts.

◆ fragment_map

gbwt::FragmentMap vg::Recombinator::fragment_map

◆ gbz

const gbwtgraph::GBZ& vg::Recombinator::gbz

◆ haplotypes

const Haplotypes& vg::Recombinator::haplotypes

◆ HET_ADJUSTMENT

constexpr double vg::Recombinator::HET_ADJUSTMENT = 0.05
staticconstexpr

Adjustment to the score of a heterozygous kmer every time a haplotype with (-) or without (+) that kmer is selected.

◆ jobs_for_cached_paths

std::vector<size_t> vg::Recombinator::jobs_for_cached_paths

◆ KFF_BLOCK_SIZE

constexpr size_t vg::Recombinator::KFF_BLOCK_SIZE = 1000000
staticconstexpr

Block size (in kmers) for reading KFF files.

◆ NUM_CANDIDATES

constexpr size_t vg::Recombinator::NUM_CANDIDATES = 32
staticconstexpr

A reasonable number of candidates for diploid sampling.

◆ NUM_HAPLOTYPES

constexpr size_t vg::Recombinator::NUM_HAPLOTYPES = 4
staticconstexpr

Number of haplotypes to be generated.

◆ PRESENT_DISCOUNT

constexpr double vg::Recombinator::PRESENT_DISCOUNT = 0.9
staticconstexpr

Multiplier to the score of a present kmer every time a haplotype with that kmer is selected.

◆ verbosity

Verbosity vg::Recombinator::verbosity

The documentation for this class was generated from the following files: