#include <variant_adder.hpp>
|
set< vector< int > > | get_unique_haplotypes (const vector< vcflib::Variant * > &variants, WindowedVcfBuffer *cache=nullptr) const |
|
string | haplotype_to_string (const vector< int > &haplotype, const vector< vcflib::Variant * > &variants) |
|
vector< vcflib::Variant * > | filter_local_variants (const vector< vcflib::Variant * > &before, vcflib::Variant *variant, const vector< vcflib::Variant * > &after) const |
|
|
static size_t | get_radius (const vcflib::Variant &variant) |
|
static size_t | get_center (const vcflib::Variant &variant) |
|
static pair< size_t, size_t > | get_center_and_radius (const vector< vcflib::Variant * > &variants) |
|
A tool class for adding variants to a VG graph. Integrated NameMapper provides name translation for the VCF contigs.
◆ VariantAdder()
vg::VariantAdder::VariantAdder |
( |
VG & |
graph | ) |
|
Make a new VariantAdder to add variants to the given graph. Modifies the graph in place.
◆ add_variants()
void vg::VariantAdder::add_variants |
( |
vcflib::VariantCallFile * |
vcf | ) |
|
Add in the variants from the given non-null VCF file. The file must be freshly opened. The variants in the file must be sorted.
May be called from multiple threads. Synchronizes internally on the graph.
◆ align_ns()
Turn any N/N substitutions in the given alignment against the given graph into matches. Modifies the alignment in place.
◆ filter_local_variants()
vector< vcflib::Variant * > vg::VariantAdder::filter_local_variants |
( |
const vector< vcflib::Variant * > & |
before, |
|
|
vcflib::Variant * |
variant, |
|
|
const vector< vcflib::Variant * > & |
after |
|
) |
| const |
|
protected |
Glom all the given variants into one vector, throwing out variants from the before and after vectors that are too big to be in a context.
◆ get_center()
size_t vg::VariantAdder::get_center |
( |
const vcflib::Variant & |
variant | ) |
|
|
staticprotected |
Get the center position of the given variant.
◆ get_center_and_radius()
pair< size_t, size_t > vg::VariantAdder::get_center_and_radius |
( |
const vector< vcflib::Variant * > & |
variants | ) |
|
|
staticprotected |
Get the center and radius around that center needed to extract everything that might be involved in a group of variants.
◆ get_graph()
const VG & vg::VariantAdder::get_graph |
( |
| ) |
const |
◆ get_radius()
size_t vg::VariantAdder::get_radius |
( |
const vcflib::Variant & |
variant | ) |
|
|
staticprotected |
Get the radius of the variant around its center: the amount of sequence that needs to be pulled out to make sure you have the ref and all the alts, if they exist. This is just going to be twice the longest of the ref and the alts.
◆ get_unique_haplotypes()
set< vector< int > > vg::VariantAdder::get_unique_haplotypes |
( |
const vector< vcflib::Variant * > & |
variants, |
|
|
WindowedVcfBuffer * |
cache = nullptr |
|
) |
| const |
|
protected |
Get all the unique combinations of variant alts represented by actual haplotypes. Arbitrarily phases unphased variants.
Can (and should) take a WindowedVcfBuffer that owns the variants, and from which cached pre-parsed genotypes can be extracted.
Returns a set of vectors or one number per variant, giving the alt number (starting with 0 for reference) that appears on the haplotype.
TODO: ought to just take a collection of pre-barsed genotypes, but in an efficient way (a vector of pointers to vectors of sample genotypes?)
◆ haplotype_to_string()
string vg::VariantAdder::haplotype_to_string |
( |
const vector< int > & |
haplotype, |
|
|
const vector< vcflib::Variant * > & |
variants |
|
) |
| |
|
protected |
Convert a haplotype on a list of variants into a string. The string will run from the start of the first variant through the end of the last variant.
Can't be const because it relies on non-const operations on the synchronizer.
◆ smart_align()
Align the given string to the given graph, between the given endpoints, using the most appropriate alignment method, depending on the relative sizes involved and whether a good alignment exists. max_span gives the maximum length in the graph that we expect our string to possibly align over (for cases of large deletions, where we might want to follow a long path in the graph).
The endpoints have to be heads/tails of the graph.
Treats N/N substitutions as matches.
TODO: now that we have a smart aligner that can synthesize deletions without finding them with the banded global aligner, do we need max_span anymore?
Mostly exposed for testability.
◆ aligner
We use this Aligner to hold the scoring parameters. It may be accessed by multiple threads at once.
◆ doubling_steps
size_t vg::VariantAdder::doubling_steps = 3 |
What number of doubling steps should we use?
◆ edge_max
size_t vg::VariantAdder::edge_max = 0 |
Sometimes, we have to make Mappers, for graphs too big to safely use our global banded aligner on. If we do that, what max edge crossing limit should we use for simplification?
◆ flank_range
size_t vg::VariantAdder::flank_range = 100 |
How much additional context should we try and add outside the radius of our group of variants we actually find?
◆ graph
VG& vg::VariantAdder::graph |
|
protected |
The graph we are modifying.
◆ ignore_missing_contigs
bool vg::VariantAdder::ignore_missing_contigs = false |
Should we accept and ignore VCF contigs that we can't find in the graph?
◆ kmer_size
size_t vg::VariantAdder::kmer_size = 16 |
What base kmer size should we use?
◆ large_alignment_band_padding
size_t vg::VariantAdder::large_alignment_band_padding = 30 |
When we're above that cutoff, what amount of band padding can we use looking for an existing version of our sequence?
◆ mapper_alignment_cutoff
size_t vg::VariantAdder::mapper_alignment_cutoff = 0 |
◆ max_context_radius
size_t vg::VariantAdder::max_context_radius = 50 |
What's the max radius on a variant we can have in order to use that variant as context for another main variant?
◆ max_node_size
size_t vg::VariantAdder::max_node_size = 32 |
What's the maximum node size we should produce, and the size we should chop the input graph to? Since alt sequences are forced out to node boundaries, it makes sense for this to be small relative to whole_alignment_cutoff.
◆ min_score_factor
double vg::VariantAdder::min_score_factor = 0.95 |
When we're doing a restricted band padding alignment, how good does it have to be, as a fraction of the perfect match score for the whole context, in order to use it?
◆ path_names
set<string> vg::VariantAdder::path_names |
|
protected |
We cache the set of valid path names, so we can detect/skip missing ones without locking the graph.
◆ pinned_tail_size
size_t vg::VariantAdder::pinned_tail_size = 200 |
If the restricted band alignment doesn't find anything, we resort to pinned alignments from the ends and cutting and pasting together. How big should each pinned tail be?
◆ print_updates
bool vg::VariantAdder::print_updates = false |
Should we print out periodic updates about the variants we have processed?
◆ skip_structural_duplications
bool vg::VariantAdder::skip_structural_duplications = true |
We have code to skip large structural duplications, because aligners won't be able to distinguish the copies. TODO: we want to actually make them into cycles.
◆ subgraph_prune
size_t vg::VariantAdder::subgraph_prune = 0 |
If nonzero, prune short subgraphs smaller than this before GCSA2-indexing.
◆ sync
We keep a GraphSynchronizer so we can have multiple threads working on different parts of the same graph.
◆ thin_alignment_cutoff
size_t vg::VariantAdder::thin_alignment_cutoff = 10000 |
◆ variant_range
size_t vg::VariantAdder::variant_range = 50 |
How wide of a range in bases should we look for nearby variants in?
◆ whole_alignment_cutoff
size_t vg::VariantAdder::whole_alignment_cutoff = 4096 |
What's the cut-off for the graph's size or the alt's size in bp under which we can just use permissive banding and large band padding? If either is larger than this, we use the pinned-alignment-based do-each- end-and-splice mode.
The documentation for this class was generated from the following files: