vg
tools for working with variation graphs
Public Member Functions | Public Attributes | Protected Member Functions | Static Protected Member Functions | Protected Attributes | List of all members
vg::VariantAdder Class Reference

#include <variant_adder.hpp>

Inheritance diagram for vg::VariantAdder:
vg::NameMapper vg::Progressive

Public Member Functions

 VariantAdder (VG &graph)
 
void add_variants (vcflib::VariantCallFile *vcf)
 
Alignment smart_align (vg::VG &graph, pair< NodeSide, NodeSide > endpoints, const string &to_align, size_t max_span)
 
void align_ns (vg::VG &graph, Alignment &aln)
 
const VGget_graph () const
 
- Public Member Functions inherited from vg::NameMapper
void add_name_mapping (const string &vcf_name, const string &fasta_name)
 
string vcf_to_fasta (const string &vcf_name) const
 
string fasta_to_vcf (const string &fasta_name) const
 
- Public Member Functions inherited from vg::Progressive
void preload_progress (const string &message)
 
void create_progress (const string &message, long count)
 
void create_progress (long count)
 
void update_progress (long i)
 
void increment_progress ()
 
void destroy_progress (void)
 

Public Attributes

size_t max_node_size = 32
 
size_t variant_range = 50
 How wide of a range in bases should we look for nearby variants in? More...
 
size_t flank_range = 100
 
bool ignore_missing_contigs = false
 Should we accept and ignore VCF contigs that we can't find in the graph? More...
 
size_t max_context_radius = 50
 
size_t whole_alignment_cutoff = 4096
 
size_t large_alignment_band_padding = 30
 
double min_score_factor = 0.95
 
size_t pinned_tail_size = 200
 
Aligner aligner
 
size_t edge_max = 0
 
size_t kmer_size = 16
 What base kmer size should we use? More...
 
size_t doubling_steps = 3
 What number of doubling steps should we use? More...
 
size_t subgraph_prune = 0
 If nonzero, prune short subgraphs smaller than this before GCSA2-indexing. More...
 
size_t thin_alignment_cutoff = 10000
 
size_t mapper_alignment_cutoff = 0
 
bool skip_structural_duplications = true
 
bool print_updates = false
 
- Public Attributes inherited from vg::Progressive
bool show_progress = false
 

Protected Member Functions

set< vector< int > > get_unique_haplotypes (const vector< vcflib::Variant * > &variants, WindowedVcfBuffer *cache=nullptr) const
 
string haplotype_to_string (const vector< int > &haplotype, const vector< vcflib::Variant * > &variants)
 
vector< vcflib::Variant * > filter_local_variants (const vector< vcflib::Variant * > &before, vcflib::Variant *variant, const vector< vcflib::Variant * > &after) const
 

Static Protected Member Functions

static size_t get_radius (const vcflib::Variant &variant)
 
static size_t get_center (const vcflib::Variant &variant)
 
static pair< size_t, size_t > get_center_and_radius (const vector< vcflib::Variant * > &variants)
 

Protected Attributes

VGgraph
 The graph we are modifying. More...
 
GraphSynchronizer sync
 
set< string > path_names
 
- Protected Attributes inherited from vg::NameMapper
map< string, string > vcf_to_fasta_renames
 
map< string, string > fasta_to_vcf_renames
 This is the reverse map from FASTA sequence name to VCF sequence name. More...
 

Detailed Description

A tool class for adding variants to a VG graph. Integrated NameMapper provides name translation for the VCF contigs.

Constructor & Destructor Documentation

◆ VariantAdder()

vg::VariantAdder::VariantAdder ( VG graph)

Make a new VariantAdder to add variants to the given graph. Modifies the graph in place.

Member Function Documentation

◆ add_variants()

void vg::VariantAdder::add_variants ( vcflib::VariantCallFile *  vcf)

Add in the variants from the given non-null VCF file. The file must be freshly opened. The variants in the file must be sorted.

May be called from multiple threads. Synchronizes internally on the graph.

◆ align_ns()

void vg::VariantAdder::align_ns ( vg::VG graph,
Alignment aln 
)

Turn any N/N substitutions in the given alignment against the given graph into matches. Modifies the alignment in place.

◆ filter_local_variants()

vector< vcflib::Variant * > vg::VariantAdder::filter_local_variants ( const vector< vcflib::Variant * > &  before,
vcflib::Variant *  variant,
const vector< vcflib::Variant * > &  after 
) const
protected

Glom all the given variants into one vector, throwing out variants from the before and after vectors that are too big to be in a context.

◆ get_center()

size_t vg::VariantAdder::get_center ( const vcflib::Variant &  variant)
staticprotected

Get the center position of the given variant.

◆ get_center_and_radius()

pair< size_t, size_t > vg::VariantAdder::get_center_and_radius ( const vector< vcflib::Variant * > &  variants)
staticprotected

Get the center and radius around that center needed to extract everything that might be involved in a group of variants.

◆ get_graph()

const VG & vg::VariantAdder::get_graph ( ) const

◆ get_radius()

size_t vg::VariantAdder::get_radius ( const vcflib::Variant &  variant)
staticprotected

Get the radius of the variant around its center: the amount of sequence that needs to be pulled out to make sure you have the ref and all the alts, if they exist. This is just going to be twice the longest of the ref and the alts.

◆ get_unique_haplotypes()

set< vector< int > > vg::VariantAdder::get_unique_haplotypes ( const vector< vcflib::Variant * > &  variants,
WindowedVcfBuffer cache = nullptr 
) const
protected

Get all the unique combinations of variant alts represented by actual haplotypes. Arbitrarily phases unphased variants.

Can (and should) take a WindowedVcfBuffer that owns the variants, and from which cached pre-parsed genotypes can be extracted.

Returns a set of vectors or one number per variant, giving the alt number (starting with 0 for reference) that appears on the haplotype.

TODO: ought to just take a collection of pre-barsed genotypes, but in an efficient way (a vector of pointers to vectors of sample genotypes?)

◆ haplotype_to_string()

string vg::VariantAdder::haplotype_to_string ( const vector< int > &  haplotype,
const vector< vcflib::Variant * > &  variants 
)
protected

Convert a haplotype on a list of variants into a string. The string will run from the start of the first variant through the end of the last variant.

Can't be const because it relies on non-const operations on the synchronizer.

◆ smart_align()

Alignment vg::VariantAdder::smart_align ( vg::VG graph,
pair< NodeSide, NodeSide endpoints,
const string &  to_align,
size_t  max_span 
)

Align the given string to the given graph, between the given endpoints, using the most appropriate alignment method, depending on the relative sizes involved and whether a good alignment exists. max_span gives the maximum length in the graph that we expect our string to possibly align over (for cases of large deletions, where we might want to follow a long path in the graph).

The endpoints have to be heads/tails of the graph.

Treats N/N substitutions as matches.

TODO: now that we have a smart aligner that can synthesize deletions without finding them with the banded global aligner, do we need max_span anymore?

Mostly exposed for testability.

Member Data Documentation

◆ aligner

Aligner vg::VariantAdder::aligner

We use this Aligner to hold the scoring parameters. It may be accessed by multiple threads at once.

◆ doubling_steps

size_t vg::VariantAdder::doubling_steps = 3

What number of doubling steps should we use?

◆ edge_max

size_t vg::VariantAdder::edge_max = 0

Sometimes, we have to make Mappers, for graphs too big to safely use our global banded aligner on. If we do that, what max edge crossing limit should we use for simplification?

◆ flank_range

size_t vg::VariantAdder::flank_range = 100

How much additional context should we try and add outside the radius of our group of variants we actually find?

◆ graph

VG& vg::VariantAdder::graph
protected

The graph we are modifying.

◆ ignore_missing_contigs

bool vg::VariantAdder::ignore_missing_contigs = false

Should we accept and ignore VCF contigs that we can't find in the graph?

◆ kmer_size

size_t vg::VariantAdder::kmer_size = 16

What base kmer size should we use?

◆ large_alignment_band_padding

size_t vg::VariantAdder::large_alignment_band_padding = 30

When we're above that cutoff, what amount of band padding can we use looking for an existing version of our sequence?

◆ mapper_alignment_cutoff

size_t vg::VariantAdder::mapper_alignment_cutoff = 0

◆ max_context_radius

size_t vg::VariantAdder::max_context_radius = 50

What's the max radius on a variant we can have in order to use that variant as context for another main variant?

◆ max_node_size

size_t vg::VariantAdder::max_node_size = 32

What's the maximum node size we should produce, and the size we should chop the input graph to? Since alt sequences are forced out to node boundaries, it makes sense for this to be small relative to whole_alignment_cutoff.

◆ min_score_factor

double vg::VariantAdder::min_score_factor = 0.95

When we're doing a restricted band padding alignment, how good does it have to be, as a fraction of the perfect match score for the whole context, in order to use it?

◆ path_names

set<string> vg::VariantAdder::path_names
protected

We cache the set of valid path names, so we can detect/skip missing ones without locking the graph.

◆ pinned_tail_size

size_t vg::VariantAdder::pinned_tail_size = 200

If the restricted band alignment doesn't find anything, we resort to pinned alignments from the ends and cutting and pasting together. How big should each pinned tail be?

◆ print_updates

bool vg::VariantAdder::print_updates = false

Should we print out periodic updates about the variants we have processed?

◆ skip_structural_duplications

bool vg::VariantAdder::skip_structural_duplications = true

We have code to skip large structural duplications, because aligners won't be able to distinguish the copies. TODO: we want to actually make them into cycles.

◆ subgraph_prune

size_t vg::VariantAdder::subgraph_prune = 0

If nonzero, prune short subgraphs smaller than this before GCSA2-indexing.

◆ sync

GraphSynchronizer vg::VariantAdder::sync
protected

We keep a GraphSynchronizer so we can have multiple threads working on different parts of the same graph.

◆ thin_alignment_cutoff

size_t vg::VariantAdder::thin_alignment_cutoff = 10000

◆ variant_range

size_t vg::VariantAdder::variant_range = 50

How wide of a range in bases should we look for nearby variants in?

◆ whole_alignment_cutoff

size_t vg::VariantAdder::whole_alignment_cutoff = 4096

What's the cut-off for the graph's size or the alt's size in bp under which we can just use permissive banding and large band padding? If either is larger than this, we use the pinned-alignment-based do-each- end-and-splice mode.


The documentation for this class was generated from the following files: