#include <variant_adder.hpp>

Inheritance diagram for vg::VariantAdder:

Public Member Functions
	VariantAdder (VG &graph)

void	add_variants (vcflib::VariantCallFile *vcf)

Alignment	smart_align (vg::VG &graph, pair< NodeSide, NodeSide > endpoints, const string &to_align, size_t max_span)

void	align_ns (vg::VG &graph, Alignment &aln)

const VG &	get_graph () const

Public Member Functions inherited from vg::NameMapper
void	add_name_mapping (const string &vcf_name, const string &fasta_name)

string	vcf_to_fasta (const string &vcf_name) const

string	fasta_to_vcf (const string &fasta_name) const

Public Member Functions inherited from vg::Progressive
void	preload_progress (const string &message)

void	create_progress (const string &message, long count)

void	create_progress (long count)

void	ensure_progress (long count)

void	update_progress (long i)

void	increment_progress ()

void	destroy_progress (void)

Public Attributes
size_t	max_node_size = 32

size_t	variant_range = 50
	How wide of a range in bases should we look for nearby variants in? More...

size_t	flank_range = 100

bool	ignore_missing_contigs = false
	Should we accept and ignore VCF contigs that we can't find in the graph? More...

size_t	max_context_radius = 50

size_t	whole_alignment_cutoff = 4096

size_t	large_alignment_band_padding = 30

double	min_score_factor = 0.95

size_t	pinned_tail_size = 200

Aligner	aligner

size_t	edge_max = 0

size_t	kmer_size = 16
	What base kmer size should we use? More...

size_t	doubling_steps = 3
	What number of doubling steps should we use? More...

size_t	subgraph_prune = 0
	If nonzero, prune short subgraphs smaller than this before GCSA2-indexing. More...

size_t	thin_alignment_cutoff = 10000

size_t	mapper_alignment_cutoff = 0

bool	skip_structural_duplications = true

bool	print_updates = false

Public Attributes inherited from vg::Progressive
bool	show_progress = false

Protected Member Functions
set< vector< int > >	get_unique_haplotypes (const vector< vcflib::Variant * > &variants, WindowedVcfBuffer *cache=nullptr) const

string	haplotype_to_string (const vector< int > &haplotype, const vector< vcflib::Variant * > &variants)

vector< vcflib::Variant * >	filter_local_variants (const vector< vcflib::Variant * > &before, vcflib::Variant variant, const vector< vcflib::Variant > &after) const

Static Protected Member Functions
static size_t	get_radius (const vcflib::Variant &variant)

static size_t	get_center (const vcflib::Variant &variant)

static pair< size_t, size_t >	get_center_and_radius (const vector< vcflib::Variant * > &variants)

Protected Attributes
VG &	graph
	The graph we are modifying. More...

GraphSynchronizer	sync

set< string >	path_names

Protected Attributes inherited from vg::NameMapper
map< string, string >	vcf_to_fasta_renames

map< string, string >	fasta_to_vcf_renames
	This is the reverse map from FASTA sequence name to VCF sequence name. More...

Additional Inherited Members
Static Public Member Functions inherited from vg::Progressive
static void	with_progress (bool show_progress, const std::string &task, const std::function< void(const std::function< void(size_t, size_t)> &progress)> &callback)

Detailed Description

A tool class for adding variants to a VG graph. Integrated NameMapper provides name translation for the VCF contigs.

Constructor & Destructor Documentation

◆ VariantAdder()

vg::VariantAdder::VariantAdder ( VG & graph )

Make a new VariantAdder to add variants to the given graph. Modifies the graph in place.

Member Function Documentation

◆ add_variants()

void vg::VariantAdder::add_variants ( vcflib::VariantCallFile * vcf )

Add in the variants from the given non-null VCF file. The file must be freshly opened. The variants in the file must be sorted.

May be called from multiple threads. Synchronizes internally on the graph.

◆ align_ns()

void vg::VariantAdder::align_ns	(	vg::VG &	graph,
		Alignment &	aln
	)

Turn any N/N substitutions in the given alignment against the given graph into matches. Modifies the alignment in place.

◆ filter_local_variants()

vector< vcflib::Variant * > vg::VariantAdder::filter_local_variants	(	const vector< vcflib::Variant * > &	before,
		vcflib::Variant *	variant,
		const vector< vcflib::Variant * > &	after
	)		const

protected

Glom all the given variants into one vector, throwing out variants from the before and after vectors that are too big to be in a context.

◆ get_center()

size_t vg::VariantAdder::get_center ( const vcflib::Variant & variant )

staticprotected

Get the center position of the given variant.

◆ get_center_and_radius()

pair< size_t, size_t > vg::VariantAdder::get_center_and_radius ( const vector< vcflib::Variant * > & variants )

staticprotected

Get the center and radius around that center needed to extract everything that might be involved in a group of variants.

◆ get_graph()

const VG & vg::VariantAdder::get_graph ( ) const

◆ get_radius()

size_t vg::VariantAdder::get_radius ( const vcflib::Variant & variant )

staticprotected

Get the radius of the variant around its center: the amount of sequence that needs to be pulled out to make sure you have the ref and all the alts, if they exist. This is just going to be twice the longest of the ref and the alts.

◆ get_unique_haplotypes()

set< vector< int > > vg::VariantAdder::get_unique_haplotypes	(	const vector< vcflib::Variant * > &	variants,
		WindowedVcfBuffer *	cache = `nullptr`
	)		const

protected

Get all the unique combinations of variant alts represented by actual haplotypes. Arbitrarily phases unphased variants.

Can (and should) take a WindowedVcfBuffer that owns the variants, and from which cached pre-parsed genotypes can be extracted.

Returns a set of vectors or one number per variant, giving the alt number (starting with 0 for reference) that appears on the haplotype.

TODO: ought to just take a collection of pre-barsed genotypes, but in an efficient way (a vector of pointers to vectors of sample genotypes?)

◆ haplotype_to_string()

string vg::VariantAdder::haplotype_to_string	(	const vector< int > &	haplotype,
		const vector< vcflib::Variant * > &	variants
	)

protected

Convert a haplotype on a list of variants into a string. The string will run from the start of the first variant through the end of the last variant.

Can't be const because it relies on non-const operations on the synchronizer.

◆ smart_align()

Alignment vg::VariantAdder::smart_align	(	vg::VG &	graph,
		pair< NodeSide, NodeSide >	endpoints,
		const string &	to_align,
		size_t	max_span
	)

Align the given string to the given graph, between the given endpoints, using the most appropriate alignment method, depending on the relative sizes involved and whether a good alignment exists. max_span gives the maximum length in the graph that we expect our string to possibly align over (for cases of large deletions, where we might want to follow a long path in the graph).

The endpoints have to be heads/tails of the graph.

Treats N/N substitutions as matches.

TODO: now that we have a smart aligner that can synthesize deletions without finding them with the banded global aligner, do we need max_span anymore?

Mostly exposed for testability.

Member Data Documentation

◆ aligner

Aligner vg::VariantAdder::aligner

We use this Aligner to hold the scoring parameters. It may be accessed by multiple threads at once.

◆ doubling_steps

size_t vg::VariantAdder::doubling_steps = 3

What number of doubling steps should we use?

◆ edge_max

size_t vg::VariantAdder::edge_max = 0

Sometimes, we have to make Mappers, for graphs too big to safely use our global banded aligner on. If we do that, what max edge crossing limit should we use for simplification?

◆ flank_range

size_t vg::VariantAdder::flank_range = 100

How much additional context should we try and add outside the radius of our group of variants we actually find?

◆ graph

VG& vg::VariantAdder::graph

protected

The graph we are modifying.

◆ ignore_missing_contigs

bool vg::VariantAdder::ignore_missing_contigs = false

Should we accept and ignore VCF contigs that we can't find in the graph?

◆ kmer_size

size_t vg::VariantAdder::kmer_size = 16

What base kmer size should we use?

◆ large_alignment_band_padding

size_t vg::VariantAdder::large_alignment_band_padding = 30

When we're above that cutoff, what amount of band padding can we use looking for an existing version of our sequence?

◆ mapper_alignment_cutoff

size_t vg::VariantAdder::mapper_alignment_cutoff = 0

◆ max_context_radius

size_t vg::VariantAdder::max_context_radius = 50

What's the max radius on a variant we can have in order to use that variant as context for another main variant?

◆ max_node_size

size_t vg::VariantAdder::max_node_size = 32

What's the maximum node size we should produce, and the size we should chop the input graph to? Since alt sequences are forced out to node boundaries, it makes sense for this to be small relative to whole_alignment_cutoff.

◆ min_score_factor

double vg::VariantAdder::min_score_factor = 0.95

When we're doing a restricted band padding alignment, how good does it have to be, as a fraction of the perfect match score for the whole context, in order to use it?

◆ path_names

set<string> vg::VariantAdder::path_names

protected

We cache the set of valid path names, so we can detect/skip missing ones without locking the graph.

◆ pinned_tail_size

size_t vg::VariantAdder::pinned_tail_size = 200

If the restricted band alignment doesn't find anything, we resort to pinned alignments from the ends and cutting and pasting together. How big should each pinned tail be?

◆ print_updates

bool vg::VariantAdder::print_updates = false

Should we print out periodic updates about the variants we have processed?

◆ skip_structural_duplications

bool vg::VariantAdder::skip_structural_duplications = true

We have code to skip large structural duplications, because aligners won't be able to distinguish the copies. TODO: we want to actually make them into cycles.

◆ subgraph_prune

size_t vg::VariantAdder::subgraph_prune = 0

If nonzero, prune short subgraphs smaller than this before GCSA2-indexing.

◆ sync

GraphSynchronizer vg::VariantAdder::sync

protected

We keep a GraphSynchronizer so we can have multiple threads working on different parts of the same graph.

◆ thin_alignment_cutoff

size_t vg::VariantAdder::thin_alignment_cutoff = 10000

◆ variant_range

size_t vg::VariantAdder::variant_range = 50

How wide of a range in bases should we look for nearby variants in?

◆ whole_alignment_cutoff

size_t vg::VariantAdder::whole_alignment_cutoff = 4096

What's the cut-off for the graph's size or the alt's size in bp under which we can just use permissive banding and large band padding? If either is larger than this, we use the pinned-alignment-based do-each- end-and-splice mode.

The documentation for this class was generated from the following files:

src/variant_adder.hpp
src/variant_adder.cpp

Public Member Functions

Public Attributes

Protected Member Functions

Static Protected Member Functions

Protected Attributes

Additional Inherited Members

Detailed Description

Constructor & Destructor Documentation

◆ VariantAdder()

Member Function Documentation

◆ add_variants()

◆ align_ns()

◆ filter_local_variants()

◆ get_center()

◆ get_center_and_radius()

◆ get_graph()

◆ get_radius()

◆ get_unique_haplotypes()

◆ haplotype_to_string()

◆ smart_align()

Member Data Documentation

◆ aligner

◆ doubling_steps

◆ edge_max

◆ flank_range

◆ graph

◆ ignore_missing_contigs

◆ kmer_size

◆ large_alignment_band_padding

◆ mapper_alignment_cutoff

◆ max_context_radius

◆ max_node_size

◆ min_score_factor

◆ path_names

◆ pinned_tail_size

◆ print_updates

◆ skip_structural_duplications

◆ subgraph_prune

◆ sync

◆ thin_alignment_cutoff

◆ variant_range

◆ whole_alignment_cutoff