Well, I had again to do something ;-) The task is to generate/create/update a decoding graph for KALDI on the fly. In my case, I aim at changing a G (grammar) in the context of a dialogue system.
One can generate a new HCLG but this would take a lot of time as this involves FST determinization, epsilon-removal, minimization, etc. Therefore, I tried to use on-the-fly composition of statically prepared HCL and G. At first, I struggled with it but later I made it work. See https://github.com/jpuigcerver/kaldi-decoders/issues/1
Here is a short summary:
At the end, I managed to get LabelLookAheadMatcher to work. It is mostly based on the code and examples in opendcd, e.g. https://github.com/opendcd/opendcd/blob/master/script/makegraphotf.sh.
First, Here is how I build and prepare the HCL and G. Please not that OpenFST must be compiled with --enable-lookahead-fsts, see http://www.openfst.org/twiki/bin/view/FST/ReadMe.
#---------------
fstdeterminize ${lang}/L_disambig.fst | fstarcsort > ${dir}/det.L.fst
#---------------
fstcomposecontext \
--context-size=$N --central-position=$P \
--read-disambig-syms=${lang}/phones/disambig.int \
--write-disambig-syms=${lang}/disambig_ilabels_${N}_${P}.int \
${dir}/ilabels_${N}_${P} ${dir}/det.L.fst | \
fstarcsort > ${dir}/CL.fst
#---------------
make-h-transducer \
--disambig-syms-out=${dir}/h.disambig.int \
--transition-scale=$tscale \
${dir}/ilabels_${N}_${P} \
${tree} \
${model} > ${dir}/Ha.fst
cat ${dir}/Ha.fst > ${dir}/det.Ha.fst
#---------------
fstconvert \
--fst_type=ilabel_lookahead \
--save_relabel_ipairs=${dir}/h.orelabel ${dir}/CL.fst |
fstarcsort --sort_type=ilabel > ${dir}/la.CL.fst
fstrelabel --relabel_opairs=${dir}/h.orelabel ${dir}/det.Ha.fst | \
fstarcsort --sort_type=olabel | \
fstcompose - ${dir}/la.CL.fst > ${dir}/det.HaCL.fst
#---------------
fstdeterminize ${dir}/det.HaCL.fst | \
fstrmsymbols ${dir}/h.disambig.int | \
fstrmepslocal | \
fstpushspecial | \
fstminimizeencoded | \
add-self-loops --self-loop-scale=$loopscale --reorder=true ${model} - |
fstarcsort --sort_type=olabel |
fstconvert --fst_type=const > ${dir}/HCL.fst
#-----------------------------
fstconvert --fst_type=olabel_lookahead --save_relabel_opairs=${dir}/g.irelabel ${dir}/HCL.fst > ${dir}/HCLr.fst
fstrelabel --relabel_ipairs=${dir}/g.irelabel ${lang}/G.fst | \
fstarcsort |
fstconvert --fst_type=const > ${dir}/Gr.fst
fstcompose ${dir}/HCLr.fst ${dir}/Gr.fst | \
fstconvert --fst_type=const > ${dir}/HCLrGr.fst
ComposeFst* OTFComposeFst(
const StdFst &ifst1, const StdFst &ifst2,
const CacheOptions& cache_opts = CacheOptions()) {
typedef LookAheadMatcher< StdFst > M;
typedef AltSequenceComposeFilter SF;
typedef LookAheadComposeFilter LF;
typedef PushWeightsComposeFilter WF;
typedef PushLabelsComposeFilter ComposeFilter;
typedef M FstMatcher;
ComposeFstOptions opts(cache_opts);
return new ComposeFst(ifst1, ifst2, opts);
}
My observation is that when I want the same WER then I must lower pruning for the OTF composed HCL and G. This results in about 20 % increase in RTF. If I fix RTF then my WER is about 20 % relatively worse for OTF composed HCL and G. So, there is some cost of the OTF composition though it is not that bad. It is usable.
Please note that preparation of the HCL and G is a bit different from the one in https://github.com/opendcd/opendcd/blob/master/script/makegraphotf.sh . For example, I could not determinize Ha.fst as it appeared to be non-functional. Also, the determinization of L is important, otherwise the final HCL graph will not be "small enough" and therefore the OTF composition would not be that efficient.
Comments
i want to add and change the Lexicon dynamically so that I can add new names or OOV words to the recognition model. Imeediate responses would be much appreciated. Thanks!