bit::Perplexity Class Reference

Compute perplexity or cross-entropy of a language model. More...

#include <Perplexity.hh>

List of all members.

Public Member Functions

 Perplexity (const LM &lm)
 Create a new perplexity computer with a language model.
void reset ()
 Reset perplexity counters.
int num_symbols () const
 Number of symbols.
int num_words () const
 Number of words.
double score () const
 Log-probability of the test data.
float cross_entropy_per_word () const
 Compute cross-entropy in bits assuming that score is in log10.
float add_symbol (const std::string &symbol_str)
 Add a symbol to eval set.

Public Attributes

struct {
   std::string   word_boundary_str
 Symbol used for computing number of words.
   std::string   unk_str
 Symbol used for unknown word.
opt

Private Attributes

bool m_start_pending
 Are we waiting the start of sentence?
double m_score
 Score of the test data.
int m_num_symbols
 Number of symbols in the test data (not including sentence starts).
int m_num_words
 Number of words in the test data.
int m_num_sentences
 Number of sentences in the test data.
const LMm_lm
 The language model used for computing the perplexity.
LM::Iterator m_it
 Iterator for walking in the language model.


Detailed Description

Compute perplexity or cross-entropy of a language model.


Constructor & Destructor Documentation

bit::Perplexity::Perplexity const LM lm  )  [inline]
 

Create a new perplexity computer with a language model.


Member Function Documentation

float bit::Perplexity::add_symbol const std::string &  symbol_str  )  [inline]
 

Add a symbol to eval set.

Parameters:
symbol = symbol to add
Returns:
log10-probability of the word
Exceptions:
bit::invalid_argument if sentence start was missing or sentence start was given before sentence end, or unknown symbol was given

float bit::Perplexity::cross_entropy_per_word  )  const [inline]
 

Compute cross-entropy in bits assuming that score is in log10.

Exceptions:
bit::invalid_call if no words yet

int bit::Perplexity::num_symbols  )  const [inline]
 

Number of symbols.

int bit::Perplexity::num_words  )  const [inline]
 

Number of words.

void bit::Perplexity::reset  )  [inline]
 

Reset perplexity counters.

double bit::Perplexity::score  )  const [inline]
 

Log-probability of the test data.


Member Data Documentation

LM::Iterator bit::Perplexity::m_it [private]
 

Iterator for walking in the language model.

const LM* bit::Perplexity::m_lm [private]
 

The language model used for computing the perplexity.

int bit::Perplexity::m_num_sentences [private]
 

Number of sentences in the test data.

int bit::Perplexity::m_num_symbols [private]
 

Number of symbols in the test data (not including sentence starts).

int bit::Perplexity::m_num_words [private]
 

Number of words in the test data.

double bit::Perplexity::m_score [private]
 

Score of the test data.

bool bit::Perplexity::m_start_pending [private]
 

Are we waiting the start of sentence?

struct { ... } bit::Perplexity::opt
 

std::string bit::Perplexity::unk_str
 

Symbol used for unknown word.

Empty if unk is not used.

std::string bit::Perplexity::word_boundary_str
 

Symbol used for computing number of words.

It is assumed that word boundary comes always after sentence start and before sentence end.


The documentation for this class was generated from the following file:
Generated on Mon Jan 8 15:51:04 2007 for bit by  doxygen 1.4.6