bit::Trie< Array > Class Template Reference

A trie structure for storing a set of integer vectors compactly. More...

#include <Trie.hh>

List of all members.

Public Member Functions

void reserve_levels (unsigned int num_levels)
 Reserve space for levels to avoid reallocing and moving arrays.
u32 num_nodes (unsigned int level) const
 The number of nodes on a given level.
u64 size () const
 The bytes allocated for bit buffers of the trie structure.
const Arraysymbol_array (unsigned int level) const
 Read-only access to symbol arrays.
const Arraychild_limit_array (unsigned int level) const
 Read-only access to child_limit arrays.
const Arraypointer_array (unsigned int level) const
 Read-only access to pointer arrays.
Iterator insert (Iterator it, u32 symbol)
 Insert a new node as a child of the current node or return the iterator if exists alread.
template<class T>
Iterator insert (const std::vector< T > &vec)
 Insert a new string of symbols to the trie of return the iterator if exists already.
Iterator insert_new (Iterator it, u32 symbol)
 Insert a new node as a child of the current node.
template<class T>
Iterator insert_new (const std::vector< T > &vec)
 Insert a new string of symbols to the trie.
template<class T>
Iterator find (const std::vector< T > &vec) const
 Find a string of symbols from the trie.
void compress (unsigned int level)
 Compress child limit and pointer arrays for certain level.
void compress ()
 Compress all levels.
void uncompress (unsigned int level)
 Uncompress a level.
void uncompress ()
 Uncompress all levels.
bool is_separated (unsigned int level)
 Check if leafs have been separated for given level.
void separate_leafs (unsigned int level)
 Separate leaf information from non-leaf information (removing possible compression).
void unseparate_leafs (unsigned int level)
 Undo the leaf separation.
std::string debug_child_limit_arrays_str () const
 Display the contents of the child limit arrays.
std::string debug_str () const
 Display the contents of the trie.
void write (FILE *file) const
 Write trie in file.
void read (FILE *file)
 Read trie from file.

Private Member Functions

u32 compute_non_leaf_index (unsigned int level, u32 symbol_index) const
 Compute the non-leaf index corresponding the main index while handling the default and separated mode.

Private Attributes

std::vector< Arraym_symbol_arrays
 Arrays containing symbols for each level.
std::vector< Arraym_child_limit_arrays
 Arrays containing child limits for each level.
std::vector< Arraym_pointer_arrays
 Arrays to store pointers to non-leaf indices.

Classes

class  Iterator
 A class for traversing in a trie. More...


Detailed Description

template<class Array>
class bit::Trie< Array >

A trie structure for storing a set of integer vectors compactly.

Each vector is stored in a tree as a path from the root node so that common prefixes are shared. In order to avoid storing all children pointers in a node, the following structure is used.

Miscellaneous notes:


Member Function Documentation

template<class Array>
const Array& bit::Trie< Array >::child_limit_array unsigned int  level  )  const [inline]
 

Read-only access to child_limit arrays.

Parameters:
level = the level to access

template<class Array>
void bit::Trie< Array >::compress  )  [inline]
 

Compress all levels.

template<class Array>
void bit::Trie< Array >::compress unsigned int  level  )  [inline]
 

Compress child limit and pointer arrays for certain level.

Works only if the Array template type implements recursive_optimal_compress().

Parameters:
level = level to compress
Exceptions:
bit::out_of_range if level exceeds number of levels

template<class Array>
u32 bit::Trie< Array >::compute_non_leaf_index unsigned int  level,
u32  symbol_index
const [inline, private]
 

Compute the non-leaf index corresponding the main index while handling the default and separated mode.

Parameters:
level = level to access
symbol_index = the entry for which the non-leaf index is computed.
Returns:
max_u32 if the node is leaf, otherwise the non-leaf index

template<class Array>
std::string bit::Trie< Array >::debug_child_limit_arrays_str  )  const [inline]
 

Display the contents of the child limit arrays.

template<class Array>
std::string bit::Trie< Array >::debug_str  )  const [inline]
 

Display the contents of the trie.

template<class Array>
template<class T>
Iterator bit::Trie< Array >::find const std::vector< T > &  vec  )  const [inline]
 

Find a string of symbols from the trie.

Parameters:
vec = symbol string to find
Returns:
iterator to the string or empty iterator if not found.
Exceptions:
bit::invalid_argument if vec is empty

template<class Array>
template<class T>
Iterator bit::Trie< Array >::insert const std::vector< T > &  vec  )  [inline]
 

Insert a new string of symbols to the trie of return the iterator if exists already.

Insertion of a string (length n) is allowed only if the prefix (length n-1) exists already.

Parameters:
vec = symbols of the inserted string
Returns:
iterator positioned at the string.
Exceptions:
bit::invalid_argument if vec is an empty string or the prefix is not found

template<class Array>
Iterator bit::Trie< Array >::insert Iterator  it,
u32  symbol
[inline]
 

Insert a new node as a child of the current node or return the iterator if exists alread.

Insertion is should be made only at the end of the child level. That means that when inserting a 3-gram, it should be greater than the previous 3-gram inserted.

Bug:
Erroneous insertion in the middle is not checked and can produce undefined effects. At least it would be possible to check that possible previous child is smaller. Does not check duplicates.
Parameters:
it = parent node that will get the new child node
symbol = the symbol of the new node
Returns:
iterator to the inserted node

template<class Array>
template<class T>
Iterator bit::Trie< Array >::insert_new const std::vector< T > &  vec  )  [inline]
 

Insert a new string of symbols to the trie.

Insertion of a string (length n) is allowed only if the prefix (length n-1) exists already.

Parameters:
vec = symbols of the inserted string
Returns:
iterator positioned at the new string.
Exceptions:
bit::invalid_argument if duplicate
bit::invalid_argument if vec is an empty string or the prefix is not found

template<class Array>
Iterator bit::Trie< Array >::insert_new Iterator  it,
u32  symbol
[inline]
 

Insert a new node as a child of the current node.

Insertion is should be made only at the end of the child level. That means that when inserting a 3-gram, it should be greater than the previous 3-gram inserted.

Bug:
Erroneous insertion in the middle is not checked and can produce undefined effects. At least it would be possible to check that possible previous child is smaller.
Parameters:
it = parent node that will get the new child node
symbol = the symbol of the new node
Returns:
iterator to the inserted node
Exceptions:
bit::invalid_argument if exists already

template<class Array>
bool bit::Trie< Array >::is_separated unsigned int  level  )  [inline]
 

Check if leafs have been separated for given level.

Parameters:
level = level to check
Returns:
true if separated
Exceptions:
bit::out_of_range if level is the highest level or greater

template<class Array>
u32 bit::Trie< Array >::num_nodes unsigned int  level  )  const [inline]
 

The number of nodes on a given level.

It is safe to call this for non-existent levels: zero is returned.

template<class Array>
const Array& bit::Trie< Array >::pointer_array unsigned int  level  )  const [inline]
 

Read-only access to pointer arrays.

Parameters:
level = the level to access

template<class Array>
void bit::Trie< Array >::read FILE *  file  )  [inline]
 

Read trie from file.

Bug:
After exception the class may be left in weird state.
Parameters:
file = file stream to read from
Exceptions:
bit::io_error if read fails

template<class Array>
void bit::Trie< Array >::reserve_levels unsigned int  num_levels  )  [inline]
 

Reserve space for levels to avoid reallocing and moving arrays.

Parameters:
num_levels = number of levels to reserve

template<class Array>
void bit::Trie< Array >::separate_leafs unsigned int  level  )  [inline]
 

Separate leaf information from non-leaf information (removing possible compression).

This may save space if the level has relatively many leaf nodes and non-leaf nodes have some external information which can be omitted for leaf nodes. Note, that it is not allowed to separate the highest level, because the level does not have children anyway.

Warning:
Invalidates all iterators pointing to the trie on this level or higher. Lower-lever iterators remain ok.
Parameters:
level = the level for which the separation is done
Exceptions:
bit::out_of_range if level is the highest level or greater
bit::invalid_call if level is already separated

template<class Array>
u64 bit::Trie< Array >::size  )  const [inline]
 

The bytes allocated for bit buffers of the trie structure.

Note that we do not count the stuff required for storing the Array vectors - the cost would be neglible compared to even small size language model.

template<class Array>
const Array& bit::Trie< Array >::symbol_array unsigned int  level  )  const [inline]
 

Read-only access to symbol arrays.

Parameters:
level = the level to access

template<class Array>
void bit::Trie< Array >::uncompress  )  [inline]
 

Uncompress all levels.

It is safe to call this even if some levels are uncompressed.

template<class Array>
void bit::Trie< Array >::uncompress unsigned int  level  )  [inline]
 

Uncompress a level.

It is safe to call this for uncompressed level.

Parameters:
level = level to uncompress
Exceptions:
bit::out_of_range if level exceeds number of levels

template<class Array>
void bit::Trie< Array >::unseparate_leafs unsigned int  level  )  [inline]
 

Undo the leaf separation.

It is safe to call this method even if the leafs are not separated.

Warning:
This call removes possible compression.
Parameters:
level = level to unseparate
Exceptions:
bit::out_of_range if level is the highest level or greater

template<class Array>
void bit::Trie< Array >::write FILE *  file  )  const [inline]
 

Write trie in file.

Parameters:
file = file stream to write to
Exceptions:
bit::io_error if write fails


Member Data Documentation

template<class Array>
std::vector<Array> bit::Trie< Array >::m_child_limit_arrays [private]
 

Arrays containing child limits for each level.

template<class Array>
std::vector<Array> bit::Trie< Array >::m_pointer_arrays [private]
 

Arrays to store pointers to non-leaf indices.

The node i is non-leaf only if p[i] == p[i-1] + 1. Otherwise, p[i] == p[i-1], and the node is a leaf without node-leaf information. The actual non-leaf index for node i is p[i] - 1.

In the default mode symbol and child limit arrays have exactly same number of entries, and indices have one-to-one mapping. However, if there are lots of leaf nodes, it may be more efficient to have child limit entires only for non-leaf nodes. Then pointer array is needed for each symbol to indicate the position of possible child limit entry. Then it is also easy to separate other information for leaf and non-leaf nodes.

template<class Array>
std::vector<Array> bit::Trie< Array >::m_symbol_arrays [private]
 

Arrays containing symbols for each level.


The documentation for this class was generated from the following file:
Generated on Mon Jan 8 15:51:04 2007 for bit by  doxygen 1.4.6