;;; -*- mode: fundamental; coding: utf-8; indent-tabs-mode: t; -*- ;;; ;;; Copyright (c) 2020 -- 2020 Stephan Oepen (oe@ifi.uio.no); ;;; see 'LICENSE' for conditions. ;;; ;;; ;;; with the 2020 make-over (updating initial tokenization to the more modern ;;; OntoNotes conventions), we are now splitting at (most) dashes and slashes. ;;; these are common elements of various types of named entities (e.g. email ;;; and web addresses) that used to be matched later in token mapping. thus, ;;; we need a new REPP facility to 'mask' sub-strings that can be recognized ;;; (with confidence) as named entities; here, masking means that the strings ;;; matched by the new '=' REPP operator cannot undergo subsequent matching or ;;; rewriting. ;;; ;; ;; email addresses, optionally wrapped in angle brackets (no dashes in TLD) ;; =?