[with apologies for cross-posting]


We are ecstatic to invite participants to the Shared Task at the 2019 Conference on Computational Natural Language Learning (CoNLL):

  Cross-Framework Meaning Representation Parsing (MRP 2019)

For background on the nature of the task and its schedule, please see:

  http://mrp.nlpl.eu

Any potentially interested parties, please sign up for future updates:

  http://lists.nlpl.eu/mailman/listinfo/mrp-users

A sample of sentences annotated with semantic graphs in all frameworks:

  http://svn.nlpl.eu/mrp/2019/public/sample.tgz

June 3, 2019, is the closing date for ‘white-listing’ third-party data:

  http://svn.nlpl.eu/mrp/2019/public/resources.txt


OBJECTIVES

The goal of the task is to advance data-driven parsing into graph-structured representations of sentence meaning.  All things semantic are receiving heightened attention in recent years.  And despite remarkable advances in vector-based (continuous and distributed) encodings of meaning, ‘classic’ (discrete and hierarchically structured) semantic representations will continue to play an important role in ‘making sense’ of natural language.  While parsing has long been dominated by tree-structured target representations, there is now growing interest in general graphs as more expressive and arguably more adequate target structures.

For the first time, this task combines formally and linguistically different approaches to meaning representation in graph form in a uniform training and evaluation setup.  Participants are invited to develop parsing systems that support five distinct semantic graph frameworks—which all encode core predicate–argument structure, among other things—in the same implementation.  Training and evaluation data will be provided for all five frameworks.  Participants are asked to design and train a system that predicts sentence-level meaning representations in all frameworks in parallel.  Architectures that utilize complementary knowledge sources (e.g. via parameter sharing) are encouraged (though not required).  Learning from multiple flavors of meaning representation in tandem has hardly been explored.

The task seeks to reduce framework-specific ‘balkanization’ in the field of meaning representation parsing.  Expected outcomes include (a) a unifying formal model over different semantic graph banks, (b) uniform representations and scoring, (c) systematic contrastive evaluation across frameworks, and (d) increased cross-fertilization via transfer and multi-task learning.  We hope to engage the combined community of parser developers for graph-structured output representations, including from six prior framework-specific tasks at the Semantic Evaluation exercises between 2014 and 2019.  Owing to scarcity of semantic annotations across frameworks, the shared task is regrettably limited to parsing English for the time being.


FRAMEWORKS

The task combines five frameworks for graph-based meaning representation, each with its specific formal and linguistic assumptions.

+ DELPH-IN MRS Bi-Lexical Dependencies (Ivanova et al., 2012)
+ Prague Semantic Dependencies (Hajič et al., 2012)
+ Elementary Dependency Structures (Oepen & Lønning, 2006)
+ Universal Conceptual Cognitive Annotation (Abend & Rappoport, 2013)
+ Abstract Meaning Representation (Banarescu et al., 2013)

For the shared task, we have for the first time repackaged five graph banks into a uniform and normalized abstract representation with a common serialization format (in JSON).  Training data comprising semantic graphs over a total of some 3.5 million tokens in running English text is available to participants.  High-quality tokenization, PoS tagging, lemmatization, and Universal Dependencies parse trees are provided as an optional ‘companion’ resource.  For all frameworks, both in- and out-of-domain evaluation data will be provided in the same unified format.


SCHEDULE

+ March 25, 2019: Availability of Sample Training Graphs
+ April 15, 2019: Initial Release of Training Data
+ May 20, 2019: Data Updates and Syntactic Companions
+ June 3, 2019: Availability of Evaluation Software
+ July 8–22, 2019: Evaluation Period (Held-Out Data)
+ September 2, 2019: Submission of System Descriptions
+ September 30, 2019: Camera-Ready Manuscripts
+ November 3–4, 2019: Presentation of Results at CoNLL


EVALUATION

For each of the individual frameworks, there are common ways of evaluating the quality of parser outputs in terms of graph similarity to gold-standard target representations.  There is broad similarity between the framework-specific evaluation metrics used to date, although there are some subtle differences too.  In a nutshell, meaning representation parsing is commonly evaluated in terms of a graph similarity F1 score at the level of individual node–edge–node triples, i.e. ‘atomic’ dependencies.

For the shared task, we will implement a (straightforward) generalization of existing, framework-specific metrics that is (a) applicable across different flavors of semantic graphs, (b) provides a labeled and unlabeled variant, (c) does not require matching node anchoring in the underlying string, but (d) takes advantage of node ordering when available. Labeled per-dependency scores, macro-averaged across all frameworks, will be the official metric for the task; but we will also provide additional cross-framework evaluation perspectives, as well as scoring in established framework-specific metrics.


INVOLVEMENT

We invite all possibly interested parties to self-subscribe to the mailing list for this task; the subscription link and access information for the training data are available from the task web site:

  http://mrp.nlpl.eu

Please do not hesitate to contact the task organizers for questions or clarifications, using the joint email address provided on the task web pages.


Omri Abend, Jan Hajič, Daniel Hershcovich, Marco Kuhlmann,
Stephan Oepen (chair), Tim O'Gorman, and Nianwen Xue