What is WikiReading

@ WIKIREADING- A Novel Large-scale Language Understanding Task over Wikipedia, ACL 2016

new data from structured knowledge statements

Wikidata

Alt text

(item, property, value) -> (document, question, answer)

Statistics

  • all instances size : 18.58M
  • train/val/test(85/10/5): 16.03M, 1.89M, 0.95M
  • documents: 4.7M unique, 5.31 instances, 4, 879 ; 489.2, 203 words
  • properties: 867 unique, 20->75%, 180->99%; Categorical and Relational

Features

  • long document
  • short question
  • structured knowledge

A Real Case

Alt text

Baselines

Alt text

State-of-the-art Methods

Coarse-to-fine Model

@ Coarse-to-Fine Question Answering for Long Documents, ACL 2017
@ Hierarchical Question Answering for Long Documents, Arxiv 2016

Intuition

long documents and low speed

Alt text

Model

Sentence Selection Methods
  • BoW Model
    Alt text

  • Chunked BoW Model

  • Convolutional Neural Network Model
Document Representation
  • Hard Attention
    Alt text

  • Soft Attention
    Alt text

Learning Methods
  • Distant Learning

    First sentence full matching the answer
    First sentence of Document if no full match exists

Alt text

  • Reinforcement Learning
    Alt text

  • Soft Attention Learning

Result

Alt text

Discussion

70%, 10%, 20%

Sliding-Window Encoder Attentive Reader (SWEAR)

@ Accurate Supervised and Semi-Supervised Machine Reading for Long Documents, EMNLP 2017

Intuition

Long document, Chunk

Model

Alt text

  • Attention Method
    Alt text

Result

Alt text
Alt text

Feedback&Advice