Skip to main content

Popular posts from this blog

how the make HCL and G graphs, and on the fly compositon of HCL and G for KALDI

Well, I had again to do something ;-) The task is to generate/create/update a decoding graph for KALDI on the fly. In my case, I aim at changing a G (grammar) in the context of a dialogue system.

One can generate a new HCLG but this would take a lot of time as this involves FST determinization, epsilon-removal, minimization, etc. Therefore, I tried to use on-the-fly composition of statically prepared HCL and G. At first, I struggled with it but later I made it work. See

Here is a short summary:

At the end, I managed to get LabelLookAheadMatcher to work. It is mostly based on the code and examples in opendcd, e.g.

First, Here is how I build and prepare the HCL and G. Please not that OpenFST must be compiled with --enable-lookahead-fsts, see

#--------------- fstdeterminize ${lang}/L_disambig.fst | fstarcsort >…

Viterbi Algorithm in C++ and using STL

To practice my C++ and STL skills, I implemented the Viterbi algorithm example from the Wikipedia page: The original algorithm was implemented in Python. I reimplemented the example in C++ and I used STL (mainly vector and map classes).  This code is in public-domain. So, use it as you want. 
The complete solution for MS Visual C++ 2008 can be found at

// ViterbiSTL.cpp : is an C++ and STL implementatiton of the Wikipedia example // Wikipedia:
// It as accurate implementation as it was possible

#include "stdafx.h"
#include "string" #include "vector" #include "map" #include "iostream"
using namespace std;
//states = ('Rainy', 'Sunny') //  //observations = ('walk', 'shop', 'clean') //  //start_probability = {'Rainy': 0.6, 'Sunny': 0.…

Temporal-Difference Learning Policy Evaluation in Python

In the code bellow, is an example of policy evaluation for very simple task. Example is taken from the book: "Reinforcement Learning: An Introduction, Surto and Barto".

This is an example of policy evaluation for a random walk policy.

Example 6.2: Random Walk from the book:
"Reinforcement Learning: An Introduction, Surto and Barto"

The policy is evaluated by dynamic programing and TD(0).

In this example, the policy can start in five states 1, 2, 3, 4, 5 and end in
two states 0 and 6. The allowed transitions between the states are as follwes:

0 <-> 1 <-> 2 <-> 3 <-> 4 <-> 5 <-> 6

The reward for ending in the state 6 is 1.
The reward for ending in the state 0 is 0.

In any state, except the final states, you can take two actions: 'left' and 'right'.
In the final states the policy and episodes end.

Because this example implements the random walk policy then both actions can be
taken with th…