Dialog Corpus and Evaluation

OKBQA-7 Hackathon

2018.08.07 - 2018.08.08

Task 2. Dialog Corpus and Evaluation


Along with interest in AI research, researches and application development for AI Agents such as Amazon Echo, Apple Siri, and Google Assistant are also being sparked. These conversational agents focus on task-oriented conversations, that is, conversations that respond to user's requests (e.g. "play music").

However, actual human-to-human conversations are not made by one-sided requests and responses. Instead, they get information from each other and are asked to re-request shortcomings of the other's request. For examples, when a student is taking a class with a teacher, it is a good idea to ask what you do not know. In addition, person-to-person conversations carry on a consistent conversation on a single topic. When you talk about a topic, you communicate with each other based on the knowledge you know, and if you do not know each other, you are asked to ask your opponent to get information.

Task 2. Dialog Corpus and Evaluation aims to develop a conversational agent that can achieve the following goals:

    • Conduct consistent conversations based on given knowledge

    • Generate conversation that provides knowledge to its opponent based on given knowledge

    • Generate conversation that find insufficient knowledge and acquire knowledge of the other.


In order to build and evaluate conversation agents that achieve the above goals, it is necessary to have good learning data, sufficient evaluation methods, and implementation of a dialogue model based on them.

In OKBQA Hackathon in 2018, Task 2. The Dialog Corpus and Evaluation aims to build and evaluate a conversation agent with the goal of sharing the following accomplishments, encouraging participants to participate in this task.

    • Sharing and discussing the training data (dialog corpus)

    • Sharing and discussing how to build training data

    • Evaluation metric

    • Dialog model (baseline model)


    • There are no special restrictions on participating in this task.


Our dataset consists of free conversations between two people. The two people were asked to make conversation with given basic information about a soccer player differently to engage in free conversations. At this time, operators can utilize their existing knowledge as well as questions and answers using basic information to conduct various conversations. However, workers were instructed to prioritize contextual conversations rather than knowledge-based ones. Our dataset aims to model the way people interact with a subject (soccer player) by using the knowledge we know and questioning the knowledge we need within the context.

The dasaset refer to the data format of convai2. More details can be found in our github.


We also use same evaluate method of convai2 that focus on the standard dialogue task of predicting the next utterance given the dialogue history. We evaluate the task using three metrics: 1) perplexity 2) F1 score 3) hits@k