[librecat-dev] retrieving MARC subfields with marc_map

Patrick Hochstenbach Patrick.Hochstenbach at UGent.be
Tue Jul 4 16:15:07 CEST 2017


This depends on what you find ‘important’ :).

There are many ways how a MARC record can be read. One way is to view the MARC record as a …card catalog card..which is tagged with fields/subfields (just as the tags in a HTML document). Another way to view a MARC record is like a database records with specific columns where to find information. There is no right way in MARC..alas most of the time the ‘card catalog’ markup view is the only way the get high qualitity metadata out of the record.

What the catalogers sees is prose, annotated with subfields. This view is what you usually transfer to a web page.
The places where you treat the MARC record as a database record requires much more data processing and cleaning (because this is not the common way how MARC is produced).

In the use-case we had when creating marc_map `300ch` meant : display the content of 300 subfield $c$h as the cataloger sees it (not as the programmer would like it: first the ‘c' and then the ‘h’).

But indeed..this is part of the nastiness of MARC and could be better documented.

Pat


> On 4 Jul 2017, at 15:27, Uldis Bojars <captsolo at gmail.com> wrote:
> 
> Hi,
> 
> Reporting back on retrieving "raw" subfield information from MARC records:
> 
> When you need to retrieve specific subfields from a MARC record (e.g. to get an array from field 246 with separate subfields $a, $n and $p) it is important to use the option "pluck:1" with marc_map():
> 
> Input data:
>       [
>          "246",
>          "3",
>          "0",
>          "a",
>          "Villa \" Palsais zirgs \"",
>          "n",
>          "Lala nnn"
>       ],
>       [
>          "246",
>          "3",
>          "0",
>          "a",
>          "Imperatora tabakdoze",
>          "p",
>          "LuLu ppp"
>       ]
> 
> Without "pluck:1" = marc_map('246anp', f246_temp_1, split:1, nested_arrays:1) :
> 
>    "f246_temp_1" : [
>       [
>          "Villa \" Palsais zirgs \"",
>          "Lala nnn"
>       ],
>       [
>          "Imperatora tabakdoze",
>          "LuLu ppp"
>       ]
>    ]
> 
> Notice that you can not tell any more what information was in which subfield (both "Lala nnn" and "LuLu ppp" are 2nd elements of the respective arrays though these are values of two different subfields).
> 
> This can be fixed using "pluck:1" - but only in case if there are no repeated subfields:
> 
> marc_map('246anp', f246_temp_2, split:1, nested_arrays:1, pluck:1)
> 
>    "f246_temp_2" : [
>       [
>          [
>             "Villa \" Palsais zirgs \"",
>             "Lala nnn",
>             null
>          ],
>          [
>             "Imperatora tabakdoze",
>             null,
>             "LuLu ppp"
>          ]
>       ]
> 
> Now you can tell (from the array position) what was the value of each subfield.
> 
> This is an important difference but it is not directly mentioned in the documentation. I would have missed it if not for Patrick's answer to trmurakami's question at http://librecat.org/catmandu/2013/06/21/catmandu-cheat-sheet.html
> 
> Would it be possible to add this to the documentation?
> 
> Cheers,
> Uldis
> 
> 
> 
> 
> _______________________________________________
> librecat-dev mailing list
> - send list mails to librecat-dev at lists.uni-bielefeld.de
> - to unsubscribe or change options, visit https://lists.uni-bielefeld.de/mailman2/cgi/unibi/listinfo/librecat-dev
> - project website: http://librecat.org/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.uni-bielefeld.de/mailman2/unibi/public/librecat-dev/attachments/20170704/7450350e/attachment.asc>


More information about the librecat-dev mailing list