Arboretum implementation status as of 10-May-04

To invoke the arboretum machinery, do
(pushnew :arboretum *features*) 
at a fresh Lisp prompt, then recompile the LKB and load the ERG.
If using the LexDB, then after loading, click on LexDB--Load TDL Entries and
load the file erg/arboretum/mal-lex.tdl.
Then index for generation, and process sentences using the command
lkb::grammar-check() instead of do-parse-tty(), as for example
(lkb::grammar-check "dog barks")

Note that error descriptions are now defined in the file "errordesc.lsp" in
this directory, to avoid having to recompile the LKB file "arboretum.lsp"
during mal-rule development.

Error types handled or in development:

1. Determiner-noun number mismatch 
   Example: 'two dog bark' => 'two dogs bark'
   Device:  mal-infl rule for making plural nouns for unmarked form.
   Notes:   Multiple outputs possible for "dog bark": "dogs bark" or "a/the dog
            barks". Can sometimes disambiguate for subject NPs, assuming that 
            3sg marking on verb is correct, so "dog barks" can (always?) 
            correct to "a/the dog barks".  But this ambiguity is always present
            with non-subject NPs: "they chased cat" (either "a/the cat" or 
            "cats").  We'll hope a tree bank can get the choice to look okay.

2. Subject-verb number mismatch
   Example: 'dogs barks' => 'dogs bark'
   Device:  mal-infl rule for making non-3sg verbs for 3sg-marked form.
   Notes:   Since we don't supply a rule for removing the plural marking from
            a noun, we only supply one possible correction for 'dogs barks'.
            But as noted above, we will in principle offer two corrected
            alternatives for 'dog bark', where the 'bark' => 'barks' is what
            we want for 'this dog bark'.  This approach may run into trouble
            for the (arguably rare?) cases where the determiner and the verb 
            are both marked singular, but disagree with the noun, as in 
            'this dogs barks'.  Lacking a mechanism for undoing plural marking
            on a noun keeps ambiguity down at some possible cost in coverage.
            Note that we will do okay for the converse: 'these dog bark'.

3. Missing determiner
   Example: 'dog barks' => 'a/the dog barks'
   Device:  Unary phrasal mal-rule which makes an NP for a singular count noun.
   Notes:   Succeeds for subject NPs where the verb is marked for 3sg. Multiple
            outputs possible for "dog bark"; see discussion of determiner-noun
            number mismatch.

4. Extraneous (duplicate) determiner 
   Example: 'the my dog barks' => 'my dog barks'
   Device:  Binary mal-rule which combines two determiners, treating the second
            one as the head, and adding placeholder semantic relations for the
            restrictor and scope of the first determiner.
   Notes:   FIX - Not yet generating. This approach requires munging before 
            generation, since our strong assumption of monotonicity for the 
            RELS list in the grammar means there will be semantics supplied by 
            the first determiner that we don't want in the input to the 
            generator.  Maybe useful, since the munging rules might be more 
            systematic about choosing which of the two determiners is more 
            contentful: on the current approach, we'll lose information 
            for 'our the dog barks'.  Maybe we even want to provide richer 
            paraphrasing eventually, to correct 'our that dog barks' to 
            'that dog of ours barks' where we preserve both dets.

5. Extraneous determiner for strictly mass nouns
   Example: 'I need an information' => 'I need information'
   Device:  Mal lexical entries for "a/an" which have the properties of "some".
   Notes:   Works only for mass/abstract nouns that do not also have a life as
            a singular count noun.

6. Missing subject for finite VP
   Example: 'looks good' => "they/that looks good'
   Device:  Unary phrasal mal-rule converting finite VP to sentence, adding
            semantics for a proposition and the first-person subject pronoun.
   Notes:   It's not obvious how to supply a single overt subject when the
            missing one could be either 1sg ("saw myself") or 3per 
            ("looks good", "aren't here").  For now, we opt for the vague
            demonstrative/pronoun "they/that".
            FIX - we currently restrict the daughter VP to be [ROBUST -] in
            order to prevent mal-inflected verbs from heading the VP, which
            limits interactions with other error types in the same example.

7. Inversion of subject and main verb
   Example: 'hired you Kim?' => 'did you hire Kim'
   Device:  Lexical mal-rule like the one for inverted auxiliary verbs.
   Notes:   

8. Negation of main verb
   Example: 'we hired not him' => 'we did not hire him'
   Device:  Lexical mal-rule like adverb-addition for auxiliary verbs.
   Notes:   Only works if the "not" follows the main verb, so we don't handle
            'we not hired him'.

9. VP complement mismatch [Not yet implemented]
   Example: 'this allows to stay' => 'this allows one to stay'
   Device:  Proposed - unary phrasal rules converting each of INF, PRP, BSE
            VPs to underspecified VFORM.  Could also do lexical rules, but
            better control and perhaps more efficiency with phrasal rules.

10. Perfective aspect/tense mismatch
   Example: 'last night he has arrived' => 'last night he arrived'
   Device:  Proposed - munging rules to detect presence of closed-class
            semantic predicates for "last", "yesterday", etc. which exclude
            present-perfect, and substitute corrected semantics along with
            error detection flag.  
   Notes:   Will be a bit subtle: contrast "since last week I have improved", 
            where semantics using present-perfect is okay, with 
            "*before last week I have improved".
            This approach will require alterations to the control structure,
            always checking parses (well-formed or not) against these munging
            rules, which will also be cleaning up semantics for generation
            in some cases, as with double-determiner treatment above.