Research Post
Abstract: Data-to-text generation systems aim to generate text descriptions based on input data (often represented in the tabular form). A typical system uses huge training samples for learning the correspondence between tables and texts. However, large training sets are expensive to obtain, limiting the applicability of these approaches in real-world scenarios. In this work, we focus on few-shot data-to-text generation. We observe that, while fine-tuned pretrained language models may generate plausible sentences, they suffer from the low semantic coverage problem in the few-shot setting. In other words, important input slots tend to be missing in the generated text. To this end, we propose a search-and-learning approach that leverages pretrained language models but inserts the missing slots to improve the semantic coverage. We further fine-tune our system based on the search results to smooth out the search noise, yielding better-quality text and improving inference efficiency to a large extent. Experiments show that our model achieves high performance on E2E and WikiBio datasets. Especially, we cover 98.35% of input slots on E2E, largely alleviating the low coverage problem.
Aug 8th 2022
Research Post
Read this research paper co-authored by Canada CIFAR AI Chair Angel Chang: Learning Expected Emphatic Traces for Deep RL
Jul 22nd 2022
Research Post
Read this research paper, co-authored by Canada CIFAR AI Chair Angel Chang: D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding
Jul 7th 2022
Research Post
Read this research paper, co-authored by Fellow & Canada CIFAR AI Chair Russ Greiner: Prediction of Obsessive-Compulsive Disorder: Importance of neurobiology-aided feature design and cross-diagnosis transfer learning
Looking to build AI capacity? Need a speaker at your event?