DELPH-IN members share a commitment to re-usable, multi-purpose resources and active exchange. Based on contributions from several members and joint development over many years, an open-source repository of software and linguistic resources has been created that has wide usage in education, research, and application building.

At the core of the DELPH-IN repository is agreement among partners on a shared set of linguistic assumptions (grounded in HPSG and Minimal Recursion Semantics) and on a common formalism (i.e. logic) for linguistic description in typed feature structures. The formalism is implemented in several development and processing environments (that can serve differing purposes) and enables the exchange of grammars and lexicons across platforms. Formalism continuity, on the other hand, has allowed DELPH-IN researchers to develop several comprehensive, wide-coverage grammars of diverse languages that can be processed by a variety of software tools.

Over time, the following configuration of core components has emerged as a typical grammar engineering configuration that is commonly used both by DELPH-IN members and other research initiatives.

Linguistic resources that are available as part of the DELPH-IN open-source repository include broad-coverage grammars for English, German, and Japanese, as well as a set of ‘emerging’ grammars for French, Korean, Modern Greek, Norwegian, Portuguese, and Spanish. Additionally, a proprietory grammar for Italian (developed by CELI s.r.l. in Torino) uses the exact same DELPH-IN formalism (and many of the Matrix assumptions) and is available for licensing. Following is some more background information on select grammars:

As several HPSG implementations evolved within the same common formalism, it became clear that homogeneity among existing grammars could be increased and development cost for new grammars greatly reduced by compiling an inventory of cross-linguistically valid (or at least useful) types and constructions. The LinGO Grammar Matrix provides a starter kit to grammar engineers, facilitating not only efficient bootstrapping but also rapid growth towards the wide coverage necessary for robust natural language processing and the precision parses and semantic representations that the ‘deep’ processing paradigm has to offer. The Matrix (in its current release version 0.4) comprises (a) types definitions for the basic feature geometry and technical devices, (b) the representation and composition machinery for with Minimal Recursion Semantics in a type feature structure grammar, (c) general classes of rules, including derivational and inflectional (lexical) rules, unary and binary phrase structure rules, headed and non-headed rules, and head-initial and head-final rules, and (d) types for basic constructions such as head-complement, head-specifier, head-subject, head-filler, and head-modifier rules, coordination, as well as more specialized classes of constructions.

Finally, as processing efficiency and grammatical coverage have become less pressing aspects for ‘deep’ NLP applications, the research focus of several DELPH-IN members has shifted to combinations of ‘deep’ processing with stochastic approaches to NLP, on the one hand, and to building hybrid NLP systems that integrate ‘deep’ and ‘shallow’ techniques in novel ways. More specifically, the transfer of DELPH-IN resources into industry has amplified the need for improved parse ranking, disambiguation, and robust recovery techniques and there is now broad consensus that applications of broad-coverage linguistic grammars for analysis or generation require the use of sophisticated stochastic models. The LinGO Redwoods initiative is providing the methodology and tools for a novel type of treebanks, far richer in the granularity of available linguistic information and dynamic in both the access to treebank information and its evolution over time. Redwoods has completed two sets of treebanks, each of around 7,000 sentences, for Verbmobil transcribed dialogues and customer emails from an ecommerce domain. On-going research for the Redwoods group at Stanford (and partners in Edinburgh and Saarbrücken) is investigating generative and conditional probabilistic models for parse disambiguation in conjunction with the LinGO ERG (and other DELPH-IN grammars).

The Heart of Gold environment is an XML-based middleware for the integration of deep and shallow natural language processing components, with the focus on robust, multilingual, application-oriented HPSG parsing assisted by, for example, shallow part-of-speech taggers, chunkers and named entity recognizers. The Heart of Gold provides a uniform inrastructure for building applications that use RMRS-based and/or XML-based natural language processing components. The middleware itself has been developed at DFKI and Saarland University within the DeepThought and Quetal projects, and is published under LGPL. However, many NLP components for which adapters (‘Modules’) are provided, come with different licenses, most of them free for research purposes. The deep component that is currently integrated is PET, with all deep HPSG grammars mentioned on the DELPH-IN site. Additional deep and shallow NLP components can be integrated easily by providing a simple Java class or an XML-RPC interface.