<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Dear Martina</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Hints are added to the latest version of Catmandu v 1.2021 now available on CPAN.</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
BR</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Patrick</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Siebert, Dr. Martina <Martina.Siebert@sbb.spk-berlin.de><br>
<b>Sent:</b> 03 November 2023 17:09<br>
<b>To:</b> Patrick Hochstenbach <Patrick.Hochstenbach@UGent.be>; librecat-dev@lists.uni-bielefeld.de <librecat-dev@lists.uni-bielefeld.de><br>
<b>Subject:</b> AW: [librecat-dev] Catmandu: field not converted to TSV/CSV output if not present in first (how many?) input records</font>
<div> </div>
</div>
<style>
<!--
@font-face
{font-family:Wingdings}
@font-face
{font-family:"Cambria Math"}
@font-face
{font-family:DengXian}
@font-face
{font-family:Calibri}
@font-face
{}
p.x_MsoNormal, li.x_MsoNormal, div.x_MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Calibri",sans-serif}
a:link, span.x_MsoHyperlink
{color:#0563C1;
text-decoration:underline}
a:visited, span.x_MsoHyperlinkFollowed
{color:#954F72;
text-decoration:underline}
p.x_MsoListParagraph, li.x_MsoListParagraph, div.x_MsoListParagraph
{margin-top:0cm;
margin-right:0cm;
margin-bottom:0cm;
margin-left:36.0pt;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Calibri",sans-serif}
p.x_msonormal0, li.x_msonormal0, div.x_msonormal0
{margin-right:0cm;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman",serif}
span.x_E-MailFormatvorlage19
{font-family:"Calibri",sans-serif;
color:windowtext}
span.x_E-MailFormatvorlage20
{font-family:"Calibri",sans-serif;
color:#1F497D}
.x_MsoChpDefault
{font-size:10.0pt}
@page WordSection1
{margin:70.85pt 70.85pt 2.0cm 70.85pt}
div.x_WordSection1
{}
ol
{margin-bottom:0cm}
ul
{margin-bottom:0cm}
-->
</style>
<div lang="DE" link="#0563C1" vlink="#954F72">
<div class="x_WordSection1">
<p class="x_MsoNormal"><span style="font-size:11.0pt; color:#1F497D">Hi Patrick,</span></p>
<p class="x_MsoNormal"><span style="font-size:11.0pt; color:#1F497D"> </span></p>
<p class="x_MsoNormal"><span lang="EN-US" style="font-size:11.0pt; color:#1F497D">makes 100% sense. Thanks for the heads-up and the ways around the “feature” ;-)</span></p>
<p class="x_MsoNormal"><span lang="EN-US" style="font-size:11.0pt; color:#1F497D">Maybe a note in the “Description” that the first record serves as model for the fields exported would help the un-initiated like myself:
<a href="https://metacpan.org/pod/Catmandu::Exporter::TSV" originalsrc="https://metacpan.org/pod/Catmandu::Exporter::TSV" shash="SWHO8ivysh4cxh7XcAJ6GDrKfCTw77HHkYMUd6B3WqQ+tX+DshLQ2g+dtnSh3G+xw58zQ4tnMIOgVqDa1O0DMUQ5lOLrKRguMz3CxMXB1s40qSP/XGUhW7sogjsLP/r41ZdkgRgBurHWmm+XSJ5hlWd/u35kSuTQuHJ3i7wmC7U=">
https://metacpan.org/pod/Catmandu::Exporter::TSV</a></span></p>
<p class="x_MsoNormal"><span lang="EN-US" style="font-size:11.0pt; color:#1F497D"> </span></p>
<p class="x_MsoNormal"><span lang="EN-US" style="font-size:11.0pt; color:#1F497D">Best,</span></p>
<p class="x_MsoNormal"><span lang="EN-US" style="font-size:11.0pt; color:#1F497D">Martina</span></p>
<p class="x_MsoNormal"><span lang="EN-US"> </span></p>
<div>
<div style="border:none; border-top:solid #E1E1E1 1.0pt; padding:3.0pt 0cm 0cm 0cm">
<p class="x_MsoNormal"><b><span style="font-size:11.0pt">Von:</span></b><span style="font-size:11.0pt"> Patrick Hochstenbach <Patrick.Hochstenbach@UGent.be>
<br>
<b>Gesendet:</b> Freitag, 3. November 2023 16:58<br>
<b>An:</b> Siebert, Dr. Martina <Martina.Siebert@sbb.spk-berlin.de>; librecat-dev@lists.uni-bielefeld.de<br>
<b>Betreff:</b> Re: [librecat-dev] Catmandu: field not converted to TSV/CSV output if not present in first (how many?) input records</span></p>
</div>
</div>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal"><span lang="NL" style="font-size:14.0pt">Hello Martina,</span></p>
<p class="x_MsoNormal"><span lang="NL" style="font-size:14.0pt"> </span></p>
<p class="x_MsoNormal"><span lang="EN-US" style="font-size:14.0pt">This is a known feature. Catmandu was created to work with very large files in streaming mode. This means that Catmandu doesn’t read the complete input data first to see what kind of fields
are available. The software needs to work with the data at hand.<br>
<br>
The JSON format allows that every record have different fields, and doesn’t care if ‘late-comer’ records have a different field layout. Formats such as TSV and CSV don’t have this feature: from the first record you need to tell the software what fields need
to be made available. Catmandu use two procedures:</span></p>
<p class="x_MsoNormal"><span lang="EN-US" style="font-size:14.0pt"> </span></p>
<p class="x_MsoNormal" style="margin-left:21.0pt; text-indent:-18.0pt"><span lang="EN-US" style="font-size:14.0pt"><span style="mso-list:Ignore">-<span style="font:7.0pt "Times New Roman"">
</span></span></span><span lang="EN-US" style="font-size:14.0pt">If no information is provided, the field layout is guessed from the first record it receives.</span></p>
<p class="x_MsoNormal" style="margin-left:21.0pt; text-indent:-18.0pt"><span lang="EN-US" style="font-size:14.0pt"><span style="mso-list:Ignore">-<span style="font:7.0pt "Times New Roman"">
</span></span></span><span lang="EN-US" style="font-size:14.0pt">One can provide information to Catmandu which fields one wants to see in the output.</span></p>
<p class="x_MsoNormal"><span lang="EN-US" style="font-size:14.0pt"> </span></p>
<p class="x_MsoNormal"><span lang="EN-US" style="font-size:14.0pt">For the latter Catmandu has the `--fields` option:</span></p>
<p class="x_MsoNormal"><span lang="EN-US" style="font-size:14.0pt"> </span></p>
<p class="x_MsoNormal"><span lang="EN-US" style="font-size:14.0pt; font-family:"Courier New"">$ catmandu convert JSON to CSV –-fix my.fix –-fields ‘id,name,title,author’ < data.json<br>
<br>
</span><span lang="EN-US" style="font-size:14.0pt">As I am writing this, I see also a `--collect_fields 1` option, that does what you want. But first load all the data into memory.</span></p>
<p class="x_MsoNormal"><span lang="EN-US" style="font-size:14.0pt; font-family:"Courier New""><br>
</span><span lang="EN-US" style="font-size:14.0pt">BR</span></p>
<p class="x_MsoNormal"><span lang="EN-US" style="font-size:14.0pt">Patrick</span></p>
<p class="x_MsoNormal"><span lang="EN-US" style="font-size:14.0pt; font-family:"Courier New""><br>
</span><span lang="EN-US" style="font-size:14.0pt">PS: Do to outlook issues some hyphens and quotes may be lost in the email</span><span lang="EN-US" style="font-size:14.0pt; font-family:"Courier New""> </span></p>
<p class="x_MsoNormal"><span style="font-size:14.0pt"> </span></p>
<div id="x_mail-editor-reference-message-container">
<div>
<div style="border:none; border-top:solid #B5C4DF 1.0pt; padding:3.0pt 0cm 0cm 0cm">
<p class="x_MsoNormal" style="margin-bottom:12.0pt"><b><span style="font-size:12.0pt; color:black">From:
</span></b><span style="font-size:12.0pt; color:black"><a href="mailto:librecat-dev-bounces@lists.uni-bielefeld.de">librecat-dev-bounces@lists.uni-bielefeld.de</a> <<a href="mailto:librecat-dev-bounces@lists.uni-bielefeld.de">librecat-dev-bounces@lists.uni-bielefeld.de</a>>
on behalf of Siebert, Dr. Martina <<a href="mailto:Martina.Siebert@sbb.spk-berlin.de">Martina.Siebert@sbb.spk-berlin.de</a>><br>
<b>Date: </b>Friday, 3 November 2023 at 15:29<br>
<b>To: </b><a href="mailto:librecat-dev@lists.uni-bielefeld.de">librecat-dev@lists.uni-bielefeld.de</a> <<a href="mailto:librecat-dev@lists.uni-bielefeld.de">librecat-dev@lists.uni-bielefeld.de</a>><br>
<b>Subject: </b>[librecat-dev] Catmandu: field not converted to TSV/CSV output if not present in first (how many?) input records</span></p>
</div>
<p class="x_MsoNormal"><span style="font-size:11.0pt">Hello,</span></p>
<p class="x_MsoNormal"><span style="font-size:11.0pt"> </span></p>
<p class="x_MsoNormal"><span lang="EN-US" style="font-size:11.0pt">It seems not possible to produce a TSV/CSV output that includes fields that appear for the first time only much later in an input file. In the JSON export all is fine, but when exporting to
TSV/CSV the “late-comer” fields are missing. When I fake-add the field to the first record the TSV/CSV export is correct.</span><span style="font-size:11.0pt"></span></p>
<p class="x_MsoNormal"><span lang="EN-US" style="font-size:11.0pt">Is this a known bug? Can it be fixed?</span><span style="font-size:11.0pt"></span></p>
<p class="x_MsoNormal"><span lang="EN-US" style="font-size:11.0pt"> </span><span style="font-size:11.0pt"></span></p>
<p class="x_MsoNormal"><span style="font-size:11.0pt">Best,</span></p>
<p class="x_MsoNormal"><span style="font-size:11.0pt">Martina</span></p>
<p class="x_MsoNormal"><span style="font-size:8.0pt; font-family:"Arial",sans-serif">_____________________________________________</span><span style="font-size:11.0pt"></span></p>
<p class="x_MsoNormal"><span style="font-size:9.0pt">Dr. Martina Siebert</span><span style="font-size:11.0pt"></span></p>
<p class="x_MsoNormal"><span style="font-size:9.0pt">Ostasienabteilung | CrossAsia</span><span style="font-size:11.0pt"></span></p>
<p class="x_MsoNormal"><span style="font-size:9.0pt">Staatsbibliothek zu Berlin – Preußischer Kulturbesitz</span><span style="font-size:11.0pt"></span></p>
<p class="x_MsoNormal"><span style="font-size:9.0pt"> </span><span style="font-size:11.0pt"></span></p>
<p class="x_MsoNormal"><span style="font-size:9.0pt"><a href="mailto:martina.siebert@sbb.spk-berlin.de">martina.siebert@sbb.spk-berlin.de</a></span><span style="font-size:11.0pt"></span></p>
<p class="x_MsoNormal"><span style="font-size:9.0pt"><a href="http://www.staatsbibliothek-berlin.de/" originalsrc="http://www.staatsbibliothek-berlin.de/" shash="NS0p6I0pmm50FpyuAFgBAPjdapK3MoeO213tOt9rPemHlEyvYRGwrj5FJ3nUUFPLj5KU5F7gqFyoKDHAEnzsOV/372wFgmcWekXOsd1ob6VXUIM4jt6Xs/G59OXZWpxW8uOQqNfgsadL3mMgR5tDhkXxhA9QZLykJJBkzCg6Zic=">www.staatsbibliothek-berlin.de</a></span><span style="font-size:11.0pt"></span></p>
<p class="x_MsoNormal"><span style="font-size:9.0pt"> </span><span style="font-size:11.0pt"></span></p>
<p class="x_MsoNormal"><span style="font-size:9.0pt">Im Rahmen der E-Mail-Kommunikation werden gegebenenfalls personenbezogene Daten verarbeitet.
<br>
Unsere Hinweise zum Datenschutz finden Sie hier: <a href="http://sbb.berlin/datenschutz" originalsrc="http://sbb.berlin/datenschutz" shash="aEFzyQET1H74Q6Lxb+n4sfJ/aBRjxhcUV45ukfTl0WFsjY953Zx5hW9uowWqsnsl/xVPZoj+KQiAKhPdtTghl/10c5aJyoymXz5Y8dcKCOirV5BIdaE1a7PTDZZVL2Ov3EpumqJN6+Poq7C0aDVHHl4WOG1AGy9TZbHpdFsP9tE=" target="_short" title="Kurz-URL">
http://sbb.berlin/datenschutz</a></span><span style="font-size:11.0pt"></span></p>
<p class="x_MsoNormal"><span style="font-size:11.0pt"> </span></p>
</div>
</div>
</div>
</div>
</body>
</html>