[librecat-dev] Working with MARC bibs and holdings

Hemme, Felix F.Hemme at zbw.eu
Fri Dec 6 11:32:20 CET 2024


Hi, I'm currently processing metadata from the German Union Catalog of Serials (ZDB) that contains MARC bibs and their associated holdings for a given library. It's a classic ETL approach by fetching the data via SRU, running it through a fix file and converting it to TSV. My setup:

A catmandu.yaml file with the content:
importer:
  marcxml:
    package: MARC
    options:
      type: XML
  zdb:
    package: SRU
    options:
      base: https://services.dnb.de/sru/zdb
      recordSchema: MARC21plus-xml
      parser: marcxml
      limit: 100

A simple fix file rules_marc.fix to test if the conversion is working:
marc_map("001","ppn")
remove_field(record);
remove_field(_id);

A shell script get_marc.sh:
#!/bin/sh
catmandu convert zdb --query "sigel=206 and frm=O and dok=Zeitung" --fix rules_marc.fix to CSV --fields ppn --sep-char '\t' > marc_records_zdb.tsv

Running the script does create an empty TSV file. I assume this is because the XML is nested with an <collection/> element: https://services.dnb.de/sru/zdb?version=1.1&operation=searchRetrieve&query=sigel%3D206+AND+frm%3DO+AND+dok%3DZeitschrift&recordSchema=MARC21plus-xml&maximumRecords=1. How do I access the element and how can I filter for record type="Bibliographic"> or <record type="Holdings">?

When I change the recordSchema to MARC21-xml to only contain MARC bibs (not holdings), a list of PPN's (record identifiers) is created as expected: https://services.dnb.de/sru/zdb?version=1.1&operation=searchRetrieve&query=sigel%3D206+AND+frm%3DO+AND+dok%3DZeitschrift&recordSchema=MARC21-xml&maximumRecords=1.

Best,
Felix



More information about the librecat-dev mailing list