Video Storytelling: Textual Summaries for Events

Li, Junnan; Wong, Yongkang; Zhao, Qi; Kankanhalli, Mohan S.

doi:10.1109/TMM.2019.2930041

Computer Science > Multimedia

arXiv:1807.09418 (cs)

[Submitted on 25 Jul 2018 (v1), last revised 14 May 2020 (this version, v3)]

Title:Video Storytelling: Textual Summaries for Events

Authors:Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli

View PDF

Abstract:Bridging vision and natural language is a longstanding goal in computer vision and multimedia research. While earlier works focus on generating a single-sentence description for visual content, recent works have studied paragraph generation. In this work, we introduce the problem of video storytelling, which aims at generating coherent and succinct stories for long videos. Video storytelling introduces new challenges, mainly due to the diversity of the story and the length and complexity of the video. We propose novel methods to address the challenges. First, we propose a context-aware framework for multimodal embedding learning, where we design a Residual Bidirectional Recurrent Neural Network to leverage contextual information from past and future. Second, we propose a Narrator model to discover the underlying storyline. The Narrator is formulated as a reinforcement learning agent which is trained by directly optimizing the textual metric of the generated story. We evaluate our method on the Video Story dataset, a new dataset that we have collected to enable the study. We compare our method with multiple state-of-the-art baselines, and show that our method achieves better performance, in terms of quantitative measures and user study.

Comments:	Published in IEEE Transactions on Multimedia
Subjects:	Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1807.09418 [cs.MM]
	(or arXiv:1807.09418v3 [cs.MM] for this version)
	https://doi.org/10.48550/arXiv.1807.09418
Journal reference:	J. Li, Y. Wong, Q. Zhao and M. S. Kankanhalli, "Video Storytelling: Textual Summaries for Events," in IEEE Transactions on Multimedia, 2019
Related DOI:	https://doi.org/10.1109/TMM.2019.2930041

Submission history

From: Junnan Li Dr [view email]
[v1] Wed, 25 Jul 2018 02:43:19 UTC (4,288 KB)
[v2] Sun, 18 Aug 2019 10:21:46 UTC (4,288 KB)
[v3] Thu, 14 May 2020 12:39:48 UTC (4,791 KB)

Computer Science > Multimedia

Title:Video Storytelling: Textual Summaries for Events

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Multimedia

Title:Video Storytelling: Textual Summaries for Events

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators