[librecat-dev] Catmandu::XML and Catmandu::XSD

Patrick Hochstenbach Patrick.Hochstenbach at UGent.be
Thu Oct 13 09:43:43 CEST 2016


Hi

There is a new Catmandu module available to process XML files: Catmandu::XSD. The existing Catmandu::XML by Jakob can be used for XML data where no schema is needed. For example given as input:

test.xml:
<foo>
  <bar>test</bar>
</foo>

will be parsed into a YAML like:

$ catmandu convert XML to YAML < test.xml
—
bar: test


Any syntatic correct XML can processed with Catmandu::XML. But, the module itself can’t guess the structure of the XML files. When you have another XML like:

test2.xml:
<foo>
  <bar>test</bar>
  <bar>test</bar>
</foo>

it will be parsed into YAML like:

$ catmandu convert XML to YAML < test.xml
—
bar: 
  - test
  - test

In the first example bar contains a string, in the second example bar contains an array. This is something you need to remember when creating Fix-es for this data. The same is true for XML input which has “mixed” content (text and xml-elements mixed).

With the new Catmandu::XSD  module an XSD schema file must be provided that contains the exact definition how XML elements should be parsed. When an XSD is avaible, then you’ll get arrays when you need arrays, hashes when you need hash etc:

$ catmandu convert XSD —root ‘{}foo’ —schemas foo.xsd to YAML < test.xml
—
bar:
 - test
$ catmandu convert XSD —root ‘{}foo’ —schemas foo.xsd to YAML < test2.xml
—
bar:
 - test
 - test

The Catmandu::XSD uses XML::Compile internally which is already used in a Belgian project processing LIDO museum data. Based on the same techniques EAD, METS, MODS, PNX, etc can be processed. E.g.

$ cat catmandu.yml
---
importer:
    mets:
        package: XSD
        options:
            root: "{http://www.loc.gov/METS/}mets"
            schemas: t/demo/mets/*.xsd
exporter:
    mets:
        package: XSD
        options:
            root: "{http://www.loc.gov/METS/}mets"
            schemas: t/demo/mets/*.xsd

# process one file...
$ catmandu convert mets < mets.file

# process many files
$ catmandu convert mets —files “dir/*.xml”

For more options see : https://metacpan.org/pod/Catmandu::XSD

Patrick





More information about the librecat-dev mailing list