A Lisp Based XML Parser
Introduction/Simple Example
LXML parse output format
parse-xml non-validating parser properties
case and international character support issues
parse-xml and packages
parse-xml, the XML Namespace specification, and packages
ACL does not support Unicode 4 byte scalar values
only little-endian Unicode tested in ACL 6.0 beta
debugging aids
XML Conformance test results
Compiling and Loading the parser
parse-xml reference
The parse-xml generic function processes XML
input, returning a list of XML tags,
attributes, and text. Here is a simple example:
(parse-xml "<item1><item2 att1='one'/>this is some
text</item1>")
-->
((item1 ((item2 att1 "one")) "this is some text"))
The output format is known as LXML format.
LXML Format
LXML is a list representation of XML tags and content.
Each list member may be:
a. a string containing text content, such as "Here is some text with a "
b. a list representing a XML tag with associated attributes and/or content,
such as ('item1 "text") or (('item1 :att1 "help.html")
"link"). If the XML tag
does not have associated attributes, then the first list member will be a
symbol representing the XML tag, and the other elements will
represent the content, which can be a string (text content), a symbol (XML
tag with no attributes or content), or list (nested XML tag with
associated attributes and/or content). If there are associated attributes,
then the first list member will be a list containing a symbol
followed by two list members for each associated attribute; the first member is a
symbol representing the attribute, and the next member is a string corresponding
to the attribute value.
c. XML comments and or processing instructions - see the more detailed example below for
further information.
Non Validating Parser Properties
Parse-xml is a non-validating XML parser. It will detect non-well-formed XML input.
When
processing valid XML input, parse-xml will optionally produce the same output as a
validating
parser would, including the processing of an external DTD subset and external entity
declarations.
By default, parse-xml outputs a DTD parse along with the parsed XML contents. The DTD
parse may
be optionally suppressed. The following example shows DTD parsed output components:
(defvar *xml-example-external-url*
"<!ENTITY ext1 'this is some external entity %param1;'>")
(defun example-callback (var-name token &optional public)
(declare (ignorable token public))
(setf var-name (uri-path var-name))
(if* (equal var-name "null") then nil
else
(let ((string (eval (intern var-name (find-package
:user)))))
(make-string-input-stream string))))
(defvar *xml-example-string*
"<?xml version='1.0' encoding='utf-8'?>
<!-- the following XML input is well-formed but its validity has not been checked ...
-->
<?piexample this is an example processing instruction tag ?>
<!DOCTYPE example SYSTEM '*xml-example-external-url*' [
<!ELEMENT item1 (item2* | (item3+ , item4))>
<!ELEMENT item2 ANY>
<!ELEMENT item3 (#PCDATA)>
<!ELEMENT item4 (#PCDATA)>
<!ATTLIST item1
att1 CDATA #FIXED 'att1-default'
att2 ID #REQUIRED
att3 ( one | two | three ) 'one'
att4 NOTATION ( four | five ) 'four' >
<!ENTITY % param1 'text'>
<!ENTITY nentity SYSTEM 'null' NDATA somedata>
<!NOTATION notation SYSTEM 'notation-processor'>
]>
<item1 att2='1'><item3>&ext1;</item3></item1>")
(pprint (parse-xml *xml-example-string* :external-callback 'example-callback))
-->
((:xml :version "1.0" :encoding "utf-8")
(:comment " the following XML input is well-formed but may or may not be valid
")
(:pi :piexample "this is an example processing instruction tag ")
(:DOCTYPE :example
(:[ (:ELEMENT :item1 (:choice (:* :item2) (:seq (:+ :item3) :item4)))
(:ELEMENT :item2 :ANY)
(:ELEMENT :item3 :PCDATA) (:ELEMENT :item4
:PCDATA)
(:ATTLIST item1 (att1 :CDATA :FIXED
"att1-default") (att2 :ID :REQUIRED)
(att3
(:enumeration :one :two :three) "one")
(att4 (:NOTATION
:four :five) "four"))
(:ENTITY :param1 :param "text")
(:ENTITY :nentity :SYSTEM "null"
:NDATA :somedata)
(:NOTATION :notation :SYSTEM
"notation-processor"))
(:external (:ENTITY :ext1 "this is some external entity
text")))
((item1 att1 "att1-default" att2 "1" att3 "one"
att4 "four")
(item3 "this is some external entity
text")))
Usage Notes
(setf *xml-example-string4*
"<bibliography
xmlns:bib='http://www.bibliography.org/XML/bib.ns'
xmlns='urn:com:books-r-us'>
<bib:book owner='Smith'>
<bib:title>A Tale of Two Cities</bib:title>
<bib:bibliography
xmlns:bib='http://www.franz.com/XML/bib.ns'
xmlns='urn:com:books-r-us'>
<bib:library branch='Main'>UK
Library</bib:library>
<bib:date calendar='Julian'>1999</bib:date>
</bib:bibliography>
<bib:date calendar='Julian'>1999</bib:date>
</bib:book>
</bibliography>")
(setf *uri-to-package* nil)
(setf *uri-to-package*
(acons (parse-uri "http://www.bibliography.org/XML/bib.ns")
(make-package "bib") *uri-to-package*))
(setf *uri-to-package*
(acons (parse-uri "urn:com:books-r-us")
(make-package "royal") *uri-to-package*))
(setf *uri-to-package*
(acons (parse-uri "http://www.franz.com/XML/bib.ns")
(make-package "franz-ns") *uri-to-package*))
(pprint (multiple-value-list
(parse-xml
*xml-example-string4*
:uri-to-package
*uri-to-package*)))
-->
((((bibliography |xmlns:bib| "http://www.bibliography.org/XML/bib.ns"
xmlns "urn:com:books-r-us")
"
"
((bib::book royal::owner "Smith") "
" (bib::title "A Tale of Two
Cities") "
"
((bib::bibliography royal::|xmlns:bib|
"http://www.franz.com/XML/bib.ns" royal::xmlns
"urn:com:books-r-us")
"
" ((franz-ns::library royal::branch
"Main") "UK Library") "
" ((franz-ns::date royal::calendar
"Julian") "1999") "
")
"
" ((bib::date royal::calendar
"Julian") "1999") "
")
"
"))
((#<uri http://www.franz.com/XML/bib.ns> . #<The franz-ns package>)
(#<uri urn:com:books-r-us> . #<The royal package>)
(#<uri http://www.bibliography.org/XML/bib.ns> . #<The bib package>)))
(defun file-callback (uri-object token &optional public) ;; The uri-object is an ACL URI object created from ;; the XML input. In this example, this function ;; assumes that all uri's will be file specifications. ;; ;; The token argument identifies what token is associated ;; with the external parse (for example :DOCTYPE for external ;; DTD subset ;; ;; The public argument contains the associated PUBLIC string, ;; when present ;; (declare (ignorable token public)) ;; An open stream is returned on success, ;; a nil return value indicates that the external ;; parse should not occur. ;; Note that parse-xml will close the open stream before exiting. (ignore-errors (open (uri-path uri-object))))
The general-entities argument is an association list containing general entity symbol and replacement text pairs. The entity symbols should be in the keyword package. Note that this option may be useful in generating desirable parse results in situations where you do not wish to parse external entities or the external DTD subset.
The parameter-entities argument is an association list containing parameter entity symbol and replacement text pairs. The entity symbols should be in the keyword package. Note that this option may be useful in generating desirable parse results in situations where you do not wish to parse external entities or the external DTD subset.
The uri-to-package argument is an association list containing uri objects and package objects. Typically, the uri objects correspond to XML Namespace attribute values, and the package objects correspond to the desired package for interning symbols associated with the uri namespace. If the parser encounters an uri object not contained in this list, it will generate a new package. The first generated package will be named net.xml.namespace.0, the second will be named net.xml.namespace.1, and so on.
(parse-xml (p stream) &key external-callback content-only general-entities parameter-entities uri-to-package) (parse-xml (str string) &key external-callback content-only general-entities parameter-entities uri-to-package)An easy way to parse a file containing XML input:
(with-open-file (p "example.xml") (parse-xml p :content-only p))
*debug-xml*
When true, parse-xml generates XML lexical state and intermediary
parse result debugging output.
*debug-dtd*
When true, parse-xml generates DTD lexical state and intermediary
parse result debugging output.