;;; -*- Coding: utf-8 -*- ;;; HAG (Hausa Grammar) ;;; Author: Berthold Crysmann ;;; 2009 ;;; Basic tokenisation :[ \t]+ ; Remove initial marker of ungrammaticality !^[\*] ;; pad the full string with trailing and leading whitespace; makes matches for ;; word boundaries a little easier down the road; also, squash multiple spaces ;; and replace tabulators with a space. ;; !^(.+)$ \1 ! + !\t ; Split hyphened input tokens ! ([^ ]+[-])([^ ]+) \1 \2 ; Do away with punctuation for now... !([,;?!.()]|``|'') ;; >char ;; >diacritics ;; >tone