EPE 2018 Trial Data: Extrinsic Parser Evaluation for the UD Parsing Shared Task Version 1.0; June 17, 2018 Overview ======== This directory contains the trial version of the training and development texts for the downstream applications in EPE 2018: ‘negation.txt’, ‘events.txt’, and ‘speculation.txt’. For general background on the task set-up, please see: http://epe.nlpl.eu All parser inputs for the task are ‘clean’ running text files, encoded in UTF-8. Thus, there may be minor variation in, for example, newline conventions and the use of ASCII vs. Unicode punctuation symbols, notably for quote marks and apostrophes. For the convenience of task participants, the EPE 2018 organizers have ‘packed’ the original large collections of small files into three large documents. The packing scheme inserts ‘delimiter paragraphs’ at document boundaries, using the following general from: Document 0020030 ends. Each such delimiter is preceded and followed by three consecutive newlines, so that sentence splitting and tokenization will hopefully treat each delimiter as a four-token utterance of its own—to not interfere with the syntactic analysis of its immediate context. Event Extraction ================ The training and development texts originate from the 2009 Shared Task on Event Extraction at the BioNLP workshop (held in association with the Conference of the North American Chapter of the Association for Computational Linguistics; NAACL). Both the ‘raw’ files and the split into training and development data remain unchanged from BioNLP 2009. For additional information, please see the ‘LICENSE’ and ‘README’ files in each of the sub-directories, together with an archive of the shared task web site at the following address: http://www.nactem.ac.uk/tsujii/GENIA/SharedTask/ Opinion Analysis ================ The training and development data are taken from the MPQA Opinion Corpus (version 2.0) and have been moderately revised for use with EPE 2017. In particular, a few files have been omitted (e.g. because they contained multi-lingual, parallel text), and other files have been edited to replace mark-up (e.g. tags like ) with whitespace, to preserve character off-sets. File preparation and the split into training vs. development (and eventually evaluation) data was provided by Richard Johansson. For general background on the MPQA data, please see the corpus web page at: http://mpqa.cs.pitt.edu/corpora/mpqa_corpus/ Negation Analysis ================= The training and development (and eventually evaluation data) for this downstream application originates with the Shared Task at the 2012 *SEM Conference (Morante & Blanco, 2012). The underlying text, segmentation, and basic linguistic analysis back then were originally prepared by Stephan Oepen as part of the Oslo Conan Doyle Corpus (CDC; http://www.delph-in.net/cdc/). The split into training and development (and evaluation) sections and actual negation annotation were designed and implemented by the 2012 task organizers: Roser Morante and Eduardo Blanco. For general background on the 2012 *SEM task, please see: http://www.clips.ua.ac.be/sem2012-st-neg/ Communication ============= While you are looking at this data, please self-subscribe to the mailing list for the shared task: http://lists.nlpl.eu/mailman/listinfo/epe-users Known Errors ============ None, for the time being. Release History =============== [Version 1.0; June 17, 2018] + Initial release of training and development texts for three applications. Contact ======= For questions or comments, please do not hesitate to email the task organizers at: ‘epe-organizers@nlpl.eu’. Jari Björne Murhaf Fares Stephan Oepen