;;; Hey, emacs(1), this is -*- Mode: fundamental; Coding: utf-8; -*- got it? ;;; ;;; first shot at collecting the various parameters needed for the extraction ;;; of bi-lexical (semantic) dependencies from ERG analyses. ;;; ;;; ;;; some relations are made transparent, in the sense of equating themselves ;;; with one of their arguments. for nominalizations, for example, assume the ;;; following: ;;; ;;; x1:nominalization[ARG1 e2] ;;; ;;; flagging `nominalization' as transparent conceptually means that nodes `x1' ;;; and `e2' are unified, i.e. treated as a single node. practically, the two ;;; node labels `x1' and `e2' become interchangeable. ;;; ;;; for coordinate structures, one and the same predicate at times uses L-HNDL ;;; and R-HNDL, or L-INDEX and R-INDEX, or both. the section on [redundant] ;;; arguments below will eliminate the index arguments when handles are present ;;; too, so redundacy elimination must be performed prior to making relations ;;; [transparent]. still, when making transparent conjunction relations, we ;;; need to be able to either equate L-HNDL or L-INDEX with the conjunction, ;;; depending on which of the two is actually present. ;;; [transparent] comp_equal ARG1 nominalization ARG1 ;;; ;;; some grammatical constructions introduce relations that can be projected ;;; into word-to-word dependencies. N-N compounds, for example, introduce a ;;; two-place relation (holding between two entities) quite similar to that of ;;; a preposition. thus, `hospital arrival' vs. `patient arrival' can have a ;;; semantic structure isomorphic to `arrival at the hospital' vs. `arrival ;;; by the patient', only the underspecified two-place `compound' predicate ;;; symbol does not attempt to make explicit the exact nature of the relation. ;;; in the following, each predicate symbol is followed by two role names, of ;;; with the first should be the head, the second the argument in the word-to- ;;; word relation. note that the `special' role ARG0, below, means that the ;;; node itself (e.g. the one labelled with the predicate `part_of') is to be ;;; taken as the head of the dependency, i.e. effectively (for the `part_of' ;;; case) its ARG1 dependency is renamed. finally, note that it can of course ;;; be desirable, in principle, to classify a relation as both transparent and ;;; relational. for now, the dependency label introduced here, will be taken ;;; from the predicate symbol matched as relational, e.g. `and_c'. ;;; [relational] /_c$/ L-HNDL R-HNDL /_c$/ L-INDEX R-INDEX appos ARG2 ARG1 compound ARG2 ARG1 compound_name ARG2 ARG1 implicit_conj L-HNDL R-HNDL implicit_conj L-INDEX R-INDEX loc_nonsp ARG2 ARG1 loc_sp ARG2 ARG1 measure ARG2 ARG1 nonsp ARG2 ARG1 of_p ARG2 ARG1 part_of ARG0 ARG1 poss ARG2 ARG1 subord ARG2 ARG1 ;;; ;;; some lexical entries, notably contracted negations like |couldn't| carry ;;; multiple relations that it is desirable to tease apart, i.e. distribute ;;; over multiple tokens. seeing that the actual analysis only used a single ;;; lexical token (i.e. leaf node in the derivation), we need to break that up ;;; into multiple tokens. for the time being, at least, we assume that these ;;; cases are limited to instances of more fine-grained initial tokenization. ;;; for |can't|, for example, the initial (PTB-compliant) tokenization would ;;; have been |ca| |n't|, and the leaf node in the derivation will actually ;;; carry unique pointers (in its +ID) list to the underlying initial tokens. ;;; thus, it is sufficient to (a) identify contracted leaf nodes by lexical ;;; entry identifier, (b) extract the corresponding initial tokens and their ;;; characterization, (c) align these tokens with an equal number of relations ;;; in the EDS (within the same characterization, and bearing predicate names ;;; as in the list following the lexical entry name); and finally (d) adjust ;;; the characterization of these relations according to the initial tokens. ;;; _fix_me_ ;;; but what about additional initial tokens, say punctuation, that can have ;;; attached to the same lexical token? the only way of discounting these in ;;; the alignment of initial tokens to EDS relations, presumably, would be to ;;; detect orthographemic rules in the derivation tree (and distinguish which ;;; are prefix vs. suffix ones), to discard these from the +ID list. ;;; (27-feb-12; oe) [contracted] can_aux_neg_1 _can_v_modal neg does1_neg_1 _ neg do1_neg_1 _ neg ;;; ;;; in coordinate structures, there will often (but not always) be pairs of ;;; (right and left) index and handle. at the EDS level, however, there is no ;;; remaining distinction between non-scopal and scopal arguments, hence purge ;;; one of the two pointers (typically to the same thing, when both index and ;;; handle are present). to mirror our decision about root nodes in clauses, ;;; prefer handle links over index ones. in the following, we specify triples ;;; of (a) a predicate symbol (which can be a regular expression, as always); ;;; (b) an argument link role to be tested for existence; and (c) and argument ;;; link to be purged, when the argument of (b) is present. ;;; [redundant] /.*/ L-HNDL L-INDEX /.*/ R-HNDL R-INDEX ;;; ;;; by and large, relations corresponding to actual words are characterized by ;;; so-called `lexical' predicate symbols, ones starting with an underscore. ;;; there are, however, a number of predicate symbols that do not obey this ;;; convention and yet can often be associated with a surface token. for the ;;; time being, our approach is to enumerate all such symbols, i.e. only the ;;; relations on this list will be considered for word-to-word dependencies. ;;; [lexical] /^_.*/ abstr_deg card dofm dofw generic_entity mofy much-many_a named named_n neg numbered_hour ord part_of person person_n place_n pron thing time time_n yofc