[librecat-dev] Catmandu::MARC and potentially a UTF-8 bug

Emmanuel Di Pretoro edipretoro at gmail.com
Mon Jan 30 22:38:06 CET 2017


Hi,

I've working with a bunch of UNIMARC files these last days and I've been
learning a lot about Catmandu! But I've come across a UTF-8 problem and I
couldn't be sure if it was a bug or a personal mistake.

So, here is a way to reproduce the problem:
1. I've got 2 UTF-8 UNIMARC records from the BNF via Z39.50 ; you can find
the file on GitHub:
https://gist.github.com/edipretoro/ecdbd91cbd202022a939477f224aa712
2. when I read the file with yaz-marcdump, everything is fine: eg the
title: « 200 1  $a Perl moderne $b Texte imprimé $f Sébastien
Aperghis-Tramoni, Damien Krotkine, Jérôme Quelin $g avec la contribution de
Philippe Bruhat » ;
3. when I process the file with Catmandu, eg with this command: « catmandu
convert MARC --fix 'marc_map("200abfg", title, -join => "
");remove_field(record);' < some_records.mrc », here is what I get: «
[{"_id":"FRBNF423141140000009","title":"Perl moderne Texte imprimé
Sébastien Aperghis-Tramoni, Damien Krotkine, Jérôme Quelin avec la
contribution de Philippe Bruhat"},{"title":"De l'art de programmer en Perl
Texte imprimé Damian Conway traduction de Philippe Bruhat, Jérôme Fenal,
Jean Forget","_id":"FRBNF40135550000000X"}] » ; as the value of encoding is
set by default to UTF-8, I don't think I'm missing anything here.

As a work-around to continue to go forward with my project, I converted the
ISO2709 file into a XML file with yaz-marcdump with the following command:
« yaz-marcdump -o marcxml some_records.mrc > some_records.xml » and retry
the previous Catmandu command adapted for the XML: « catmandu convert MARC
--type XML --fix 'marc_map("200abfg", title, -join => "
");remove_field(record);' < some_records.xml ». And I got a perfect UTF-8
string as a result: « [{"_id":"FRBNF423141140000009","title":"Perl moderne
Texte imprimé Sébastien Aperghis-Tramoni, Damien Krotkine, Jérôme Quelin
avec la contribution de Philippe Bruhat"},{"title":"De l'art de programmer
en Perl Texte imprimé Damian Conway traduction de Philippe Bruhat, Jérôme
Fenal, Jean Forget","_id":"FRBNF40135550000000X"}] ». OK, I did received a
warning message: « Use of uninitialized value in concatenation (.) or
string at
/Users/manu/.plenv/versions/5.24.1/lib/perl5/site_perl/5.24.1/MARC/File/XML.pm
line 397, <GEN0> chunk 5. » but it doesn't seem to be Catmandu-related.

Can you tell me if I've been missing something?

Thanks in advance and have a nice day!

Emmanuel Di Pretoro
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.uni-bielefeld.de/mailman2/unibi/public/librecat-dev/attachments/20170130/10485b92/attachment.html>


More information about the librecat-dev mailing list