We present an approach to automatically generating verbal commentaries for tennis games. We introduce a novel application that requires a combination of techniques from computer vision, natural language processing and machine learning. A video sequence is first analysed using state-of-the-art computer vision methods to track the ball, fit the detected edges to the court model, track the players, and recognise their strokes. Based on the recognised visual attributes we formulate the tennis commentary generation problem in the framework of long short-term memory recurrent neural networks as well as structured SVM. In particular, we investigate pre-embedding of descriptive terms and loss function for LSTM. We introduce a new dataset of 633 annotated pairs of tennis videos and corresponding commentary. We perform an automatic as well as human based evaluation, and demonstrate that the proposed pre-embedding and loss function lead to substantially improved accuracy of the generated commentary.