[librecat-dev] Catmandu::MARC and potentially a UTF-8 bug

Emmanuel Di Pretoro edipretoro at gmail.com
Tue Jan 31 15:17:03 CET 2017


Hi,

Thanks for your answer! That did solve the problem.

Kind regards,

Emmanuel Di Pretoro

2017-01-31 13:51 GMT+01:00 Patrick Hochstenbach <
Patrick.Hochstenbach at ugent.be>:

> Hi
>
> The standard input for Catmandu::MARC is MARC21. To use UNIMARC input we
> advice to use the RAW parser. As an example:
>
> # From the command line
>
> $ catmandu convert MARC —type RAW to MARC —type XML < some_records.mrc.txt
>
> Or from a Perl script:
>
> #!/bin/env perl
>
> use Catmandu;
>
> my $importer = Catmandu->importer(‘MARC’, type => ‘RAW’ , file =>
> ‘some_records.mrc.txt’);
> my $exporter = Catmandu->exporter(‘MARC’, type => ‘XML’);
>
> $exporter->add_many( $importer );
>
> $exporter->commit;
>
> Cheers
> Patrick
>
> > On 30 Jan 2017, at 22:38, Emmanuel Di Pretoro <edipretoro at gmail.com>
> wrote:
> >
> > Hi,
> >
> > I've working with a bunch of UNIMARC files these last days and I've been
> learning a lot about Catmandu! But I've come across a UTF-8 problem and I
> couldn't be sure if it was a bug or a personal mistake.
> >
> > So, here is a way to reproduce the problem:
> > 1. I've got 2 UTF-8 UNIMARC records from the BNF via Z39.50 ; you can
> find the file on GitHub: https://gist.github.com/edipretoro/
> ecdbd91cbd202022a939477f224aa712
> > 2. when I read the file with yaz-marcdump, everything is fine: eg the
> title: « 200 1  $a Perl moderne $b Texte imprimé $f Sébastien
> Aperghis-Tramoni, Damien Krotkine, Jérôme Quelin $g avec la contribution de
> Philippe Bruhat » ;
> > 3. when I process the file with Catmandu, eg with this command: «
> catmandu convert MARC --fix 'marc_map("200abfg", title, -join => "
> ");remove_field(record);' < some_records.mrc », here is what I get: «
> [{"_id":"FRBNF423141140000009","title":"Perl moderne Texte imprimé
> Sébastien Aperghis-Tramoni, Damien Krotkine, Jérôme Quelin avec la
> contribution de Philippe Bruhat"},{"title":"De l'art de programmer en Perl
> Texte imprimé Damian Conway traduction de Philippe Bruhat, Jérôme Fenal,
> Jean Forget","_id":"FRBNF40135550000000X"}] » ; as the value of encoding
> is set by default to UTF-8, I don't think I'm missing anything here.
> >
> > As a work-around to continue to go forward with my project, I converted
> the ISO2709 file into a XML file with yaz-marcdump with the following
> command: « yaz-marcdump -o marcxml some_records.mrc > some_records.xml »
> and retry the previous Catmandu command adapted for the XML: « catmandu
> convert MARC --type XML --fix 'marc_map("200abfg", title, -join => "
> ");remove_field(record);' < some_records.xml ». And I got a perfect UTF-8
> string as a result: « [{"_id":"FRBNF423141140000009","title":"Perl
> moderne Texte imprimé Sébastien Aperghis-Tramoni, Damien Krotkine, Jérôme
> Quelin avec la contribution de Philippe Bruhat"},{"title":"De l'art de
> programmer en Perl Texte imprimé Damian Conway traduction de Philippe
> Bruhat, Jérôme Fenal, Jean Forget","_id":"FRBNF40135550000000X"}] ». OK,
> I did received a warning message: « Use of uninitialized value in
> concatenation (.) or string at /Users/manu/.plenv/versions/5.
> 24.1/lib/perl5/site_perl/5.24.1/MARC/File/XML.pm line 397, <GEN0> chunk
> 5. » but it doesn't seem to be Catmandu-related.
> >
> > Can you tell me if I've been missing something?
> >
> > Thanks in advance and have a nice day!
> >
> > Emmanuel Di Pretoro
> > _______________________________________________
> > librecat-dev mailing list
> > - send list mails to librecat-dev at lists.uni-bielefeld.de
> > - to unsubscribe or change options, visit https://lists.uni-bielefeld.
> de/mailman2/cgi/unibi/listinfo/librecat-dev
> > - project website: http://librecat.org/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.uni-bielefeld.de/mailman2/unibi/public/librecat-dev/attachments/20170131/e6ff0fd0/attachment.html>


More information about the librecat-dev mailing list