vg
tools for working with variation graphs
Classes | Public Types | Public Member Functions | Public Attributes | Static Public Attributes | Private Member Functions | List of all members
vg::Recombinator Class Reference

#include <recombinator.hpp>

Classes

struct  LocalHaplotype
 A local haplotype sequence within a single subchain. More...
 
struct  Parameters
 Parameters for generate_haplotypes(). More...
 
struct  Statistics
 Statistics on the generated haplotypes. More...
 

Public Types

enum  kmer_presence { absent, heterozygous, present, frequent }
 Kmer classification. More...
 
typedef Haplotypes::Verbosity Verbosity
 The amount of progress information that should be printed to stderr. More...
 
typedef Haplotypes::sequence_type sequence_type
 A GBWT sequence as (sequence identifier, offset in a node). More...
 

Public Member Functions

 Recombinator (const gbwtgraph::GBZ &gbz, const Haplotypes &haplotypes, Verbosity verbosity)
 Creates a new Recombinator. More...
 
gbwt::GBWT generate_haplotypes (const std::string &kff_file, const Parameters &parameters) const
 
std::vector< char > classify_kmers (const std::string &kff_file, const Parameters &parameters) const
 
std::vector< LocalHaplotypeextract_sequences (const std::string &kff_file, size_t chain_id, size_t subchain_id, const Parameters &parameters) const
 

Public Attributes

const gbwtgraph::GBZ & gbz
 
const Haplotypeshaplotypes
 
Verbosity verbosity
 

Static Public Attributes

constexpr static size_t NUM_HAPLOTYPES = 4
 Number of haplotypes to be generated. More...
 
constexpr static size_t NUM_CANDIDATES = 32
 A reasonable number of candidates for diploid sampling. More...
 
constexpr static size_t COVERAGE = 0
 Expected kmer coverage. Use 0 to estimate from kmer counts. More...
 
constexpr static size_t KFF_BLOCK_SIZE = 1000000
 Block size (in kmers) for reading KFF files. More...
 
constexpr static double PRESENT_DISCOUNT = 0.9
 
constexpr static double HET_ADJUSTMENT = 0.05
 
constexpr static double ABSENT_SCORE = 0.8
 

Private Member Functions

Statistics generate_haplotypes (const Haplotypes::TopLevelChain &chain, const hash_map< Haplotypes::Subchain::kmer_type, size_t > &kmer_counts, gbwt::GBWTBuilder &builder, gbwtgraph::MetadataBuilder &metadata, const Parameters &parameters, double coverage) const
 

Detailed Description

A class that creates synthetic haplotypes from a Haplotypes representation of local haplotypes.

Member Typedef Documentation

◆ sequence_type

A GBWT sequence as (sequence identifier, offset in a node).

◆ Verbosity

The amount of progress information that should be printed to stderr.

Member Enumeration Documentation

◆ kmer_presence

Kmer classification.

Enumerator
absent 
heterozygous 
present 
frequent 

Constructor & Destructor Documentation

◆ Recombinator()

vg::Recombinator::Recombinator ( const gbwtgraph::GBZ &  gbz,
const Haplotypes haplotypes,
Verbosity  verbosity 
)

Creates a new Recombinator.

Member Function Documentation

◆ classify_kmers()

std::vector< char > vg::Recombinator::classify_kmers ( const std::string &  kff_file,
const Parameters parameters 
) const

Classifies the kmers used for describing the haplotypes according to their frequency in the KFF file. Uses A, H, P, and F to represent absent, heterozygous, present, and frequent kmers, respectively.

Throws std::runtime_error on error.

◆ extract_sequences()

std::vector< Recombinator::LocalHaplotype > vg::Recombinator::extract_sequences ( const std::string &  kff_file,
size_t  chain_id,
size_t  subchain_id,
const Parameters parameters 
) const

Extracts the local haplotypes in the given subchain. In addition to the haplotype sequence, this also reports the name of the corresponding path as well as (rank, score) for the haplotype in each round of haplotype selection. The number of rounds is parameters.num_haplotypes, but if the haplotype is selected earlier, it will not get further scores.

Throws std::runtime_error on error.

◆ generate_haplotypes() [1/2]

Recombinator::Statistics vg::Recombinator::generate_haplotypes ( const Haplotypes::TopLevelChain chain,
const hash_map< Haplotypes::Subchain::kmer_type, size_t > &  kmer_counts,
gbwt::GBWTBuilder &  builder,
gbwtgraph::MetadataBuilder &  metadata,
const Parameters parameters,
double  coverage 
) const
private

◆ generate_haplotypes() [2/2]

gbwt::GBWT vg::Recombinator::generate_haplotypes ( const std::string &  kff_file,
const Parameters parameters 
) const

Generates haplotypes based on the kmer counts in the given KFF file.

Runs multiple GBWT construction jobs in parallel using OpenMP threads and generates the specified number of haplotypes in each top-level chain (component).

Each generated haplotype has a single source haplotype in each subchain. The subchains are connected by unary paths. Suffix / prefix subchains in the middle of a chain create fragment breaks. If the chain starts without a prefix (ends without a suffix), the haplotype chosen for the first (last) subchain is used from the start (continued until the end).

Throws std::runtime_error on error in single-threaded parts and exits with std::exit(EXIT_FAILURE) in multi-threaded parts.

Member Data Documentation

◆ ABSENT_SCORE

constexpr double vg::Recombinator::ABSENT_SCORE = 0.8
staticconstexpr

Score for getting an absent kmer right/wrong. This should be less than 1, if we assume that having the right variants in the graph is more important than keeping wrong variants out.

◆ COVERAGE

constexpr size_t vg::Recombinator::COVERAGE = 0
staticconstexpr

Expected kmer coverage. Use 0 to estimate from kmer counts.

◆ gbz

const gbwtgraph::GBZ& vg::Recombinator::gbz

◆ haplotypes

const Haplotypes& vg::Recombinator::haplotypes

◆ HET_ADJUSTMENT

constexpr double vg::Recombinator::HET_ADJUSTMENT = 0.05
staticconstexpr

Adjustment to the score of a heterozygous kmer every time a haplotype with (-) or without (+) that kmer is selected.

◆ KFF_BLOCK_SIZE

constexpr size_t vg::Recombinator::KFF_BLOCK_SIZE = 1000000
staticconstexpr

Block size (in kmers) for reading KFF files.

◆ NUM_CANDIDATES

constexpr size_t vg::Recombinator::NUM_CANDIDATES = 32
staticconstexpr

A reasonable number of candidates for diploid sampling.

◆ NUM_HAPLOTYPES

constexpr size_t vg::Recombinator::NUM_HAPLOTYPES = 4
staticconstexpr

Number of haplotypes to be generated.

◆ PRESENT_DISCOUNT

constexpr double vg::Recombinator::PRESENT_DISCOUNT = 0.9
staticconstexpr

Multiplier to the score of a present kmer every time a haplotype with that kmer is selected.

◆ verbosity

Verbosity vg::Recombinator::verbosity

The documentation for this class was generated from the following files: