A minimally supervised approach for question generation: what can we learn from a single seed?


Sérgio Curto


  • 15:00, Friday, October 07th, 2011
  • Room 336



In this paper, we investigate how many quality natural language questions can be generated from a single question/answer pair (a seed). In our approach we learn patterns that relate the various levels of linguistic information in the question/answer seed with the same levels of information in text. These patterns contain lexical, syntactic and semantic information and when matched against a target document, new question/answer pairs can be generated. Here, we focus specifically on the task of generating questions. Several works, for instance in Question Answering, explore the re-writing of questions to create (usually lexical) patterns; instead, we use several levels of linguistic information – lexical, syntactic and semantic (through the use of named entities). Also, the patterns are commonly hand-crafted, as opposed to our strategy where patterns are automatically learned, based on a single seed. Preliminary results show that with the single question/answer seed pair – “When was Leonardo da Vinci Born?”/1452 – we manage to generate several questions (from documents related with 25 personalities), from which 80% were evaluated as plausible.

Note: This seminar will be held in English.