Ball-by-Ball Cricket Commentary Generation using Stateful Sequence-to-Sequence Model

Due to the availability of high performance computational devices and enormous video data, deep learning algorithms are assisting for human understandable description of videos. Automatic commentary generation of cricket videos take advantage of aforementioned intelligent techniques. VGG-16 network facilitates extraction of visual pattern from frames followed by encoder-decoder LSTM model. Proposed model can handle variable length input data to output variable number of sequential output. Moreover, the model has ability to encompass temporal information to predict the line and length bowled by bowler, the shot selection of batsman and outcome of the ball. Due to unavailability of cricket commentary dataset, a novel cricket commentary dataset containing video-commentary pairs is presented. Evaluation is also performed on benchmark video captioning datasets which are Microsoft Video Description Dataset (MSVD) and MSR - Video to Text dataset (MSRVTT). Captions generated by our model are evaluated on video captioning metrics which are METEOR, BLEU, ROGUE L and CIDEr and outperforms the baseline model.

Download Paper Download BibTeX