For reasons of comparability and fairness, the MRP 2020 shared task imposes
some constraints on which third-party data or pre-trained models can be used in
addition to the resources distributed by the task organizers.  Following is a
‘white-list’ of legitimate resources, which was constructed from nominations by
prospective participants.  The deadline for suggesting additional resources is
Monday, June 15, 2020.

In general, only resources that all participants could in principle obtain are
considered for white-listing.  Some of the MRP task data intricately overlaps
with common syntactic treebanks.  Therefore, a general rule is that resources
like the Penn Treebank, its derivatives like PropBank, as well as the Universal
Dependencies treebanks need to be used with some care MRP system development.
For example, common English parsers (like CoreNLP, spaCy, or UDPipe) have been
trained on some of the same texts that are annotated in the MRP training split,
which will most likely lead to unrealistically high syntactic parsing accuracy
during development and, correspondingly, a distinct drop in parser performance
when moving to held-out evaluation data.  To avoid such effects, the companion
data for the task provides high-quality morpho-syntactic dependency parses that
were produced using jack-knifing; please see:

  http://svn.nlpl.eu/mrp/2020/public/companion/README.txt

The parser and models used to produce the morpho-syntactic companion trees will
be released to the public upon completion of the shared task.  Within reason,
the organizers can parse additional corpora for participants, provided that the
text and parses can be shared will all participants.  Anyone who would like to
use this service please contact ‘mrp-organizers@nlpl.eu’, to discuss specifics
of data preparation and turnaround time.


+ CoNLL 2017 texts and embeddings: http://hdl.handle.net/11234/1-1989

+ NLPL EngC3 Corpus: http://corpora.nlpl.eu/engc3/10/txt/

+ Wikipedia dumps: https://dumps.wikimedia.org/

+ Huggingface Transformers: https://huggingface.co/transformers/

+ BERT: https://github.com/google-research/bert

+ ELMo: https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md

+ ERNIE: https://github.com/thunlp/ERNIE

+ FastText: https://fasttext.cc/docs/en/english-vectors.html

+ GloVe embeddings: https://nlp.stanford.edu/projects/glove/

+ NLPL Vectors Repository: http://vectors.nlpl.eu

+ CoreNLP: https://stanfordnlp.github.io/CoreNLP/

+ spaCy: https://spacy.io/

+ Stanza: https://stanfordnlp.github.io/stanza/

+ UDPipe: http://ufal.mff.cuni.cz/udpipe

+ UDify: https://github.com/Hyperparticle/udify

+ Illinois Named Entity Tagger: https://cogcomp.org/page/software_view/NETagger

+ FrameNet: https://framenet.icsi.berkeley.edu

+ Older (Version 1) UCCA annotations of ‘20K Leagues Under The Sea’:
  https://github.com/UniversalConceptualCognitiveAnnotation/UCCA_English-20K
  https://github.com/UniversalConceptualCognitiveAnnotation/UCCA_German-20K
  https://github.com/UniversalConceptualCognitiveAnnotation/UCCA_French-20K

+ CzEngVallex: http://hdl.handle.net/11234/1-1512

+ ERG Semantic Interface and lexicon:
  http://svn.delph-in.net/erg/tags/1214/etc
  http://svn.delph-in.net/erg/tags/1214/lexicon.tdl

+ VerbNet: http://verbs.colorado.edu/verb-index/vn/verbnet-3.2.tar.gz

+ Princeton WordNet: https://wordnet.princeton.edu/

+ Open Multilingual WordNet: http://compling.hss.ntu.edu.sg/omw/

+ ConceptNet: http://conceptnet.io/

+ Microsoft Concept Graph: https://concept.research.microsoft.com/Home/Download