The paper proposes an innovative method for generating cricket commentary. A hybrid model is proposed that combines three types of neural networks: Convolutional Neural Networks (CNN) for image processing, Long Short-Term Memory (LSTM) networks for sequential text generation, and Graph Convolutional Networks (GCN) for semantic understanding. By integrating these components, the model can generate ball-by-ball commentary that is coherent and contextaware. The model works by processing video frames from a cricket match using CNN. The resulting feature maps are used to retain essential visual information. Fully connected layers transform the features to a format suitable for input into the LSTM. The LSTM generates one word at a time, considering the temporal dependencies inherent in ball-by-ball events. To enhance the semantic understanding between the generated captions, the GCN is used. Evaluation metrics like BLEU, METEOR, and ROUGE are used to assess the proficiency of the model.