vg
tools for working with variation graphs
Public Types | Public Member Functions | Static Public Member Functions | Static Public Attributes | Private Member Functions | Private Attributes | List of all members
vg::io::MessageIterator Class Reference

#include <message_iterator.hpp>

Public Types

using TaggedMessage = pair< string, unique_ptr< string > >
 

Public Member Functions

 MessageIterator (istream &in, bool verbose=false, size_t thread_count=0)
 Constructor to wrap a stream. More...
 
 MessageIterator (unique_ptr< BlockedGzipInputStream > &&bgzf, bool verbose=false)
 Constructor to wrap an existing BGZF. More...
 
 MessageIterator ()=default
 Default constructor for an end iterator. More...
 
TaggedMessageoperator* ()
 
const TaggedMessageoperator* () const
 
const MessageIteratoroperator++ ()
 In-place pre-increment to advance the iterator. More...
 
bool operator== (const MessageIterator &other) const
 
bool operator!= (const MessageIterator &other) const
 
bool has_current () const
 Return true if dereferencing the iterator will produce a valid value, and false otherwise. More...
 
void advance ()
 
TaggedMessage take ()
 Take the current item, which must exist, and advance the iterator to the next one. More...
 
int64_t tell_group () const
 
bool seek_group (int64_t virtual_offset)
 

Static Public Member Functions

static string sniff_tag (istream &stream)
 
static string sniff_tag (::google::protobuf::io::ZeroCopyInputStream &stream)
 
static std::pair< MessageIterator, MessageIteratorrange (istream &in)
 Returns iterators that act like begin() and end() for a stream containing messages. More...
 

Static Public Attributes

const static size_t MAX_MESSAGE_SIZE = 1000000000
 We refuse to serialize individual messages longer than this size. More...
 

Private Member Functions

void handle (bool ok, int64_t group_virtual_offset=0, int64_t message_virtual_offset=0)
 

Private Attributes

TaggedMessage value
 
string previous_tag
 
size_t group_count
 
size_t group_idx
 
int64_t group_vo
 
int64_t item_vo
 
unique_ptr< BlockedGzipInputStreambgzip_in
 Since these streams can't be copied or moved, we wrap ours in a uniqueptr_t so we can be moved. More...
 
bool verbose = false
 Set this to true to print messages about what is being decoded. More...
 

Detailed Description

Iterator over messages in VG-format files. Yields pairs of string tag and message data. Also supports seeking and telling at the group level in bgzip files. Cannot be copied, but can be moved.

TODO: Right now we always copy all messages into an internal buffer. We should only do that if the message contents are actually accessed, with some kind of lazy fake string that evaluates on string conversion. Then we could more efficiently skip stuff with the wrong tag.

Member Typedef Documentation

◆ TaggedMessage

using vg::io::MessageIterator::TaggedMessage = pair<string, unique_ptr<string> >

Represents a pair of a tag value and some message data. If there is no valid tag for a group, as given in the Registry, the tag will be "". If there is a tag but no messages in its group, the data pointer will be null.

Constructor & Destructor Documentation

◆ MessageIterator() [1/3]

vg::io::MessageIterator::MessageIterator ( istream &  in,
bool  verbose = false,
size_t  thread_count = 0 
)

Constructor to wrap a stream.

◆ MessageIterator() [2/3]

vg::io::MessageIterator::MessageIterator ( unique_ptr< BlockedGzipInputStream > &&  bgzf,
bool  verbose = false 
)

Constructor to wrap an existing BGZF.

◆ MessageIterator() [3/3]

vg::io::MessageIterator::MessageIterator ( )
default

Default constructor for an end iterator.

Member Function Documentation

◆ advance()

auto vg::io::MessageIterator::advance ( )

Advance the iterator to the next message, or the end if this was the last message. Basically the same as ++.

◆ handle()

auto vg::io::MessageIterator::handle ( bool  ok,
int64_t  group_virtual_offset = 0,
int64_t  message_virtual_offset = 0 
)
private

Make sure the given Protobuf-library bool return value is true, and fail otherwise with a message. Reports the virtual offset of the invalid group and/or message

◆ has_current()

auto vg::io::MessageIterator::has_current ( ) const

Return true if dereferencing the iterator will produce a valid value, and false otherwise.

◆ operator!=()

auto vg::io::MessageIterator::operator!= ( const MessageIterator other) const

Check if two iterators are not equal. Since you can only have one on a stream, this only has two equality classes: iterators that have hit the end, and iterators that haven't.

◆ operator*() [1/2]

auto vg::io::MessageIterator::operator* ( )

Get the current item. Caller may move it away. Only legal to call if we are not an end iterator.

◆ operator*() [2/2]

const TaggedMessage& vg::io::MessageIterator::operator* ( ) const

Get the current item when we are const. Only legal to call if we are not an end iterator.

◆ operator++()

auto vg::io::MessageIterator::operator++ ( )

In-place pre-increment to advance the iterator.

◆ operator==()

auto vg::io::MessageIterator::operator== ( const MessageIterator other) const

Check if two iterators are equal. Since you can only have one on a stream, this only has two equality classes: iterators that have hit the end, and iterators that haven't.

◆ range()

auto vg::io::MessageIterator::range ( istream &  in)
static

Returns iterators that act like begin() and end() for a stream containing messages.

◆ seek_group()

auto vg::io::MessageIterator::seek_group ( int64_t  virtual_offset)

Seek to the given virtual offset and start reading the group that is there. The next value produced will be the first value in that group. If already at the start of the group at the given virtual offset, does nothing. Return false if seeking is unsupported or the seek fails.

◆ sniff_tag() [1/2]

string vg::io::MessageIterator::sniff_tag ( ::google::protobuf::io::ZeroCopyInputStream &  stream)
static

Sniffing function to identify if data in a ZeroCopyInputStream appears to be uncompressed type-tagged message data, and, if so, what the tag is. Returns the tag if it could be sniffed, or the empty string if the tag could not be read, if the tag actually is an (illegal) empty string, or if the tag is not valid according to the Registry.

BackUps the stream up to where it was before the sniffing read.

◆ sniff_tag() [2/2]

string vg::io::MessageIterator::sniff_tag ( istream &  stream)
static

Sniffing function to identify if data in a C++ stream appears to be uncompressed type-tagged message data, and, if so, what the tag is. Returns the tag if it could be sniffed, or the empty string if the tag could not be read, if the tag actually is an (illegal) empty string, or if the tag is not valid according to the Registry.

Ungets the stream up to where it was before the sniffing read.

Fails with an exception if the sniffed data cannot be ungotten, so not safe to run on streams that aren't seekable and don't have buffering to support this.

◆ take()

auto vg::io::MessageIterator::take ( )

Take the current item, which must exist, and advance the iterator to the next one.

◆ tell_group()

auto vg::io::MessageIterator::tell_group ( ) const

Return the virtual offset of the group being currently read (i.e. the group to which the current message belongs), to seek back to. You can't seek back to the current message, just to the start of the group. Returns -1 instead if the underlying file doesn't support seek/tell. Returns the past-the-end virtual offset of the file if EOF is reached.

Member Data Documentation

◆ bgzip_in

unique_ptr<BlockedGzipInputStream> vg::io::MessageIterator::bgzip_in
private

Since these streams can't be copied or moved, we wrap ours in a uniqueptr_t so we can be moved.

◆ group_count

size_t vg::io::MessageIterator::group_count
private

This holds the number of messages that exist in the current group. Counts the tag, if present.

◆ group_idx

size_t vg::io::MessageIterator::group_idx
private

This holds the number of messages read in the current group. Counts the tag, if present.

◆ group_vo

int64_t vg::io::MessageIterator::group_vo
private

This holds the virtual offset of the current group's start, or the number of the current group if seeking is not available. If the iterator is the end iterator, this is -1.

◆ item_vo

int64_t vg::io::MessageIterator::item_vo
private

This holds the virtual offset of the current item, or counts up through the group if seeking is not possible. Useful for seeking back to the item later, although you will have to seek to a group to iterate, after that.

◆ MAX_MESSAGE_SIZE

const size_t vg::io::MessageIterator::MAX_MESSAGE_SIZE = 1000000000
static

We refuse to serialize individual messages longer than this size.

◆ previous_tag

string vg::io::MessageIterator::previous_tag
private

Because the whole value pair may get moved away, we keep a previous copy of the tag and replace it. TODO: This is a bit of a hack.

◆ value

TaggedMessage vg::io::MessageIterator::value
private

Holds the most recently pulled-out message tag and value. May get moved away.

◆ verbose

bool vg::io::MessageIterator::verbose = false
private

Set this to true to print messages about what is being decoded.


The documentation for this class was generated from the following files: