top of page
  • Writer's pictureLiang Qiu

Selected Reading (Nov 2021)

Updated: Feb 11, 2022


"We propose an approach that learns to generate an internet search query based on the context, and then conditions on the search results to finally generate a response, a method that can employ up-to-the-minute relevant information. We train and evaluate such models on a newly collected dataset of human-human conversations whereby one of the speakers is given access to internet search during knowledge-driven discussions in order to ground their responses."

Pros: interesting idea to incorporate the latest information.


"We propose MDETR, an end-to-end modulated detector that detects objects in an image conditioned on a raw text query, like a caption or a question. We use a transformer-based architecture to reason jointly over text and image by fusing the two modalities at an early stage of the model. We pre-train the network on 1.3M text-image pairs, mined from pre-existing multi-modal datasets having explicit alignment between phrases in text and objects in the image. Our approach can be easily extended for visual question answering, achieving competitive performance on GQA and CLEVR."

Pros: could be used on IconQA.


DialogRPT: We leverage social media feedback data (number of replies and upvotes) to build a large-scale training dataset for feedback prediction. Each comment has its own number of replies and upvotes (termed as Likes in some communities). These can be used as engagingness labels after careful normalization and formulation. Pros: read this for value ranking.


"To aid the research community in the development of commonsense dialogue models, we are publicly releasing a large, multiturn, open-domain dialogue dataset that is focused on commonsense knowledge. To create dialogue examples, we provided workers with prompts culled from SocialIQA, a large-scale benchmark for commonsense reasoning about social situations, which is based on the ATOMIC knowledge graph. The prompts are sentences like "Addison wanted to go on a trip to Mexico and messaged all of his friends to set up a schedule” or “Tracy performed her function." Pros: could be used for evaluating ValueNet, data quality is not bad.


"In this work, we focus on stylistic control and evaluation for schema-guided NLG, with joint goals of achieving both semantic and stylistic control. We experiment in detail with various controlled generation methods for large pretrained language models: specifically, conditional training, guided fine-tuning, and guided decoding. We discuss their advantages and limitations, and evaluate them with a broad range of automatic and human evaluation metrics."

Pros: could be used for value-guided dialogue generation.

Cons: no code found.


"Two agents then have a debate where they alternate revealing pixels, stopping at a total of 6 revealed pixels (so the judge sees only a little bit of information in total). One debater is honest and tries to make the judge guess right, the other debater tries to make the judge guess wrong."

Pros: a good start about safe AI, interesting debate setting.


"Based on MultiWOZ, FusedChat appends or prepends open-domain dialogues (ODD) to every existing task-oriented dialogues (TOD)."


"ACCENTOR consists of human-annotated chit-chat additions to the 23.8K dialogues from Schema Guided Dialogue (SGD) and MultiWOZ 2.1."

Notes: similar to the previous one, but add chit-chat to any turn of both MultiWOZ and SGD. The FusedChat is asking human to write the chit-chat directly while the ACCENTOR uses neural models to generate candidates and ask human to choose from them.


"The AI agent starts by acting randomly in the environment. Periodically, two video clips of its behavior are given to a human, and the human decides which of the two clips is closest to fulfilling its goal. The AI gradually builds a model of the goal of the task by finding the reward function that best explains the human's judgments. Note there's no need for the feedback to align with the environment's normal reward function."

Pros: they claim they scaled up the approach of human feedback to work on much more complicated tasks (backflip).

Cons: no node found.


"Iterated amplification is a method for generating a training signal for the latter types of tasks, under certain assumptions. Namely, although a human can't perform or judge the whole task directly, we assume that a human can, given a piece of the task, identify clear smaller components of which it’s made up. The goal here is to match supervised learning with less information, not to surpass it."

Cons: works done on toy algorithmic tasks. No code found.


"BlenderBot, the largest-ever open-domain chatbot. It outperforms others in terms of engagement and also feels more human, according to human evaluators."


* read the abstract, introduction, and conclusion. ** read the full paper. *** read the full paper and code carefully.

99 views0 comments

Recent Posts

See All

Selected Reading (Feb 2022)

*** Aligning Language Models to Follow Instructions "We first collect a dataset of human-written demonstrations on prompts submitted to our API, and use this to train our supervised learning baselines


bottom of page