vg
tools for working with variation graphs
|
#include <message_iterator.hpp>
Public Types | |
using | TaggedMessage = pair< string, unique_ptr< string > > |
Public Member Functions | |
MessageIterator (istream &in, bool verbose=false, size_t thread_count=0) | |
Constructor to wrap a stream. More... | |
MessageIterator (unique_ptr< BlockedGzipInputStream > &&bgzf, bool verbose=false) | |
Constructor to wrap an existing BGZF. More... | |
MessageIterator ()=default | |
Default constructor for an end iterator. More... | |
TaggedMessage & | operator* () |
const TaggedMessage & | operator* () const |
const MessageIterator & | operator++ () |
In-place pre-increment to advance the iterator. More... | |
bool | operator== (const MessageIterator &other) const |
bool | operator!= (const MessageIterator &other) const |
bool | has_current () const |
Return true if dereferencing the iterator will produce a valid value, and false otherwise. More... | |
void | advance () |
TaggedMessage | take () |
Take the current item, which must exist, and advance the iterator to the next one. More... | |
int64_t | tell_group () const |
bool | seek_group (int64_t virtual_offset) |
Static Public Member Functions | |
static string | sniff_tag (istream &stream) |
static string | sniff_tag (::google::protobuf::io::ZeroCopyInputStream &stream) |
static std::pair< MessageIterator, MessageIterator > | range (istream &in) |
Returns iterators that act like begin() and end() for a stream containing messages. More... | |
Static Public Attributes | |
const static size_t | MAX_MESSAGE_SIZE = 1000000000 |
We refuse to serialize individual messages longer than this size. More... | |
Private Member Functions | |
void | handle (bool ok, int64_t group_virtual_offset=0, int64_t message_virtual_offset=0) |
Private Attributes | |
TaggedMessage | value |
string | previous_tag |
size_t | group_count |
size_t | group_idx |
int64_t | group_vo |
int64_t | item_vo |
unique_ptr< BlockedGzipInputStream > | bgzip_in |
Since these streams can't be copied or moved, we wrap ours in a uniqueptr_t so we can be moved. More... | |
bool | verbose = false |
Set this to true to print messages about what is being decoded. More... | |
Iterator over messages in VG-format files. Yields pairs of string tag and message data. Also supports seeking and telling at the group level in bgzip files. Cannot be copied, but can be moved.
TODO: Right now we always copy all messages into an internal buffer. We should only do that if the message contents are actually accessed, with some kind of lazy fake string that evaluates on string conversion. Then we could more efficiently skip stuff with the wrong tag.
using vg::io::MessageIterator::TaggedMessage = pair<string, unique_ptr<string> > |
Represents a pair of a tag value and some message data. If there is no valid tag for a group, as given in the Registry, the tag will be "". If there is a tag but no messages in its group, the data pointer will be null.
vg::io::MessageIterator::MessageIterator | ( | istream & | in, |
bool | verbose = false , |
||
size_t | thread_count = 0 |
||
) |
Constructor to wrap a stream.
vg::io::MessageIterator::MessageIterator | ( | unique_ptr< BlockedGzipInputStream > && | bgzf, |
bool | verbose = false |
||
) |
Constructor to wrap an existing BGZF.
|
default |
Default constructor for an end iterator.
auto vg::io::MessageIterator::advance | ( | ) |
Advance the iterator to the next message, or the end if this was the last message. Basically the same as ++.
|
private |
Make sure the given Protobuf-library bool return value is true, and fail otherwise with a message. Reports the virtual offset of the invalid group and/or message
auto vg::io::MessageIterator::has_current | ( | ) | const |
Return true if dereferencing the iterator will produce a valid value, and false otherwise.
auto vg::io::MessageIterator::operator!= | ( | const MessageIterator & | other | ) | const |
Check if two iterators are not equal. Since you can only have one on a stream, this only has two equality classes: iterators that have hit the end, and iterators that haven't.
auto vg::io::MessageIterator::operator* | ( | ) |
Get the current item. Caller may move it away. Only legal to call if we are not an end iterator.
const TaggedMessage& vg::io::MessageIterator::operator* | ( | ) | const |
Get the current item when we are const. Only legal to call if we are not an end iterator.
auto vg::io::MessageIterator::operator++ | ( | ) |
In-place pre-increment to advance the iterator.
auto vg::io::MessageIterator::operator== | ( | const MessageIterator & | other | ) | const |
Check if two iterators are equal. Since you can only have one on a stream, this only has two equality classes: iterators that have hit the end, and iterators that haven't.
|
static |
Returns iterators that act like begin() and end() for a stream containing messages.
auto vg::io::MessageIterator::seek_group | ( | int64_t | virtual_offset | ) |
Seek to the given virtual offset and start reading the group that is there. The next value produced will be the first value in that group. If already at the start of the group at the given virtual offset, does nothing. Return false if seeking is unsupported or the seek fails.
|
static |
Sniffing function to identify if data in a ZeroCopyInputStream appears to be uncompressed type-tagged message data, and, if so, what the tag is. Returns the tag if it could be sniffed, or the empty string if the tag could not be read, if the tag actually is an (illegal) empty string, or if the tag is not valid according to the Registry.
BackUps the stream up to where it was before the sniffing read.
|
static |
Sniffing function to identify if data in a C++ stream appears to be uncompressed type-tagged message data, and, if so, what the tag is. Returns the tag if it could be sniffed, or the empty string if the tag could not be read, if the tag actually is an (illegal) empty string, or if the tag is not valid according to the Registry.
Ungets the stream up to where it was before the sniffing read.
Fails with an exception if the sniffed data cannot be ungotten, so not safe to run on streams that aren't seekable and don't have buffering to support this.
auto vg::io::MessageIterator::take | ( | ) |
Take the current item, which must exist, and advance the iterator to the next one.
auto vg::io::MessageIterator::tell_group | ( | ) | const |
Return the virtual offset of the group being currently read (i.e. the group to which the current message belongs), to seek back to. You can't seek back to the current message, just to the start of the group. Returns -1 instead if the underlying file doesn't support seek/tell. Returns the past-the-end virtual offset of the file if EOF is reached.
|
private |
Since these streams can't be copied or moved, we wrap ours in a uniqueptr_t so we can be moved.
|
private |
This holds the number of messages that exist in the current group. Counts the tag, if present.
|
private |
This holds the number of messages read in the current group. Counts the tag, if present.
|
private |
This holds the virtual offset of the current group's start, or the number of the current group if seeking is not available. If the iterator is the end iterator, this is -1.
|
private |
This holds the virtual offset of the current item, or counts up through the group if seeking is not possible. Useful for seeking back to the item later, although you will have to seek to a group to iterate, after that.
|
static |
We refuse to serialize individual messages longer than this size.
|
private |
Because the whole value pair may get moved away, we keep a previous copy of the tag and replace it. TODO: This is a bit of a hack.
|
private |
Holds the most recently pulled-out message tag and value. May get moved away.
|
private |
Set this to true to print messages about what is being decoded.