Parsing Norwegian Noun Phrases in Prolog

Introduction

This short paper is meant to be an in-depth, though not exhaustive, look at practical parsing of Norwegian noun phrases beyond the proper noun.

The enclosed prolog patr rules and lexicon covers, almost completely, norwegian noun phrases as they are described in Faarlund, Lie and Vannebo (1997) chapter 3: Substantivfraser, hereafter referred to as the NRG.

The chosen parsing engine is pro_patr.pl, as designed by Bob Carpenter and taken from Gazdar and Mellish (1989), parts of it originating from Pereira and Shieber (1987).

Points of interest

Agreement

In Norwegian, the elements of the noun phrase agrees with the head noun in both gender, number and definiteness:

den fine hunden
alle de store flaskene

Several possible degrees of definiteness exist (see NRG 3.4):

hund (indefinite)
en hund (indefinite with determiner/article)
den hund (definite, conservative use)
hunden (definite)
den hunden (the double definite, definiteness marked both on determiner and directly on noun)

As the restrictions on the occurence of each type are due not only to syntax and semantics, but also pragmatics, much leeway has been given, and all the rules are adjusted to accomplish this.

Optional adjective phrase before a noun

An adjective phrase or prepositional phrase functioning descriptively can always precede the noun, with nothing allowed to come in between. (see NRG 3.3.1 and especially 3.3.2.2.) Due to this, the AP + N is best handled as a single unit, here called NOM. The rules for the production of the NOM are ap_1, ap_2, nom_1 and nom_2.

Determiner plus noun

Determiners are as described in the NRG chapter 3.2, words like

min, sin, vår (possessives, NRG 3.2.1)
den, denne (demonstratives, NRG 3.2.2)
alle, noen, to (quantifiers, NRG 3.2.3)

The rules for in which order and when the various types can preceed the NOM are complex, (see NRG 3.3,) and therefore this section of the grammar is the most dominating one. Each type of determiner have their own rules, possessives with poss_np_1 and poss_np_2, demonstratives with det_np_1 and the quantifiers with quant_np_1.

Demonstratives and possesives can be preceeded by quantifiers like alle and begge. This is encoded in the rules aq_demo_np_1 and aq_poss_1 respectively.

Nouns acting as quantifiers

Nouns like flaske, flokk and mengde can quantify other nouns (see NRG 3.3.1.1):

en flaske vin
en flokk hunder
en mengde flasker vin

Quantifying nouns of the primary type, like mengde and kilo, is closest to the real quantifiers and can preceed other quantifying nouns, while the quantifying nouns of the secondary type, for instance flaske cannot.

The rules regarding quantifying nouns are nom_nom_1, nom_nom_2, and nom_np_1

Not implemented

Not all possibilities of the Norwegian noun phrase has been covered. Things left out includes the words ingen, egen, selv and the grammar adjustments these needs, gaps of any kind (NRG 3.3.1.2, 3.3.4 - 3.3.6), nouns marked with the genitive s and sin functioning as the genitive s (the "garpegenitiv", see NRG 3.3.2.3 for both), insertion of prepositional and relative phrases, and possessive quantifiers like begges.

Conclusion

The enclosed patr rules and lexicon shows that parsing of a Norwegian noun phrase, using information found in the NRG as a guide, beyond the stand-in proper nouns is perfectly possible. Testing has shown that it does not over-generate.

To conclude, a phrase that is correctly parsed, the last rule used being aq_poss_1 as expected:

alle de femten store svarte hundene dine

References

Faarlund, J.T., Lie, S. and Vannebo, K.I. (1997)
Norsk Referansegrammatikk, Universitetsforlaget: Oslo
Gazdar, Gerald and Mellish, Chris (1989)
Natural Language Processing in PROLOG, Addison-Wesley
Pereira, Fernando C.N. and Shieber, Stuart M. (1984)
Prolog and Natural Language Analysis CSLI Lecture Notes, 10, Chicago University Press, Stanford, 1987.

Home | Root


copyleft etc

Contacting taliesin: see CQ.