[librecat-dev] Vacuum fix on keys?

Tobias Bülte tobias.buelte at hbz-nrw.de
Thu Sep 14 10:56:32 CEST 2023


Yes, I see.

An empty to level element name seems to work.

Not sure why our fix repeates the toplevel element name as sublevel element.
I thought that `remove_field("some_key.")` might do the intended fix.

Anyway when using just `nothing()` the

https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7C+as-records%0A%7C+decode-json%0A%7C+fix%28transformationFile%29%0A%7C+encode-json%28prettyPrinting%3D%22true%22%29%0A%7C+print%0A%3B&transformation=nothing%28%29&data=%7B%0A++++%22Hello%22%3A+%22World%22%2C%0A++++%22%22+%3A+%22This+is+an+empty+key%22%2C%0A++++%22some_key%22%3A+%7B%22%22%3A+%22bad+data+here%22%2C+%22ok%22%3A+%22I+am+ok%22%7D%0A%7D

I will open a bug issue for our MF Fix module: 
https://github.com/metafacture/metafacture-fix

Thanks Vitali



Am 14.09.23 um 08:49 schrieb Peil, Vitali:
> Thanks for your help!
>
> I definitely need the visitor bind as empty keys are also contained in nested fields.
>
> I have created an exampe which does not work for me as expected in metafacture:
> https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7C+as-records%0A%7C+decode-json%0A%7C+fix%28transformationFile%29%0A%7C+encode-json%28prettyPrinting%3D%22true%22%29%0A%7C+print%0A%3B&transformation=remove_field%28%22%22%29&data=%7B%0A++++%22Hello%22%3A+%22World%22%2C%0A++++%22%22+%3A+%22This+is+an+empty+key%22%2C%0A++++%22some_key%22%3A+%7B%22%22%3A+%22bad+data+here%22%2C+%22ok%22%3A+%22I+am+ok%22%7D%0A%7D
>
> Best,
> Vitali
>
>
> --------------
> Vitali Peil
> LibTec - Bibliothekstechnologie und Wissensmanagement
> Universitätsbibliothek
> Universität Bielefeld
> UHG L3-128 Tel. +49 521 106-4010
> E-Mail: vitali.peil at uni-bielefeld.de
>
> ________________________________________
> Von: librecat-dev-bounces at lists.uni-bielefeld.de <librecat-dev-bounces at lists.uni-bielefeld.de> im Auftrag von Patrick Hochstenbach <Patrick.Hochstenbach at UGent.be>
> Gesendet: Mittwoch, 13. September 2023 20:21:00
> An: Tobias Bülte; Nicolas Franck
> Cc: librecat-dev at lists.uni-bielefeld.de
> Betreff: Re: [librecat-dev] Vacuum fix on keys?
>
> Ok, than we need to align Catmandy with Metafacture. I've a pull request ready to should add support for empty paths in Catmandu:
>
> https://github.com/LibreCat/Catmandu/pull/397
> [https://opengraph.githubassets.com/e26327752db703a1b491a89cecdc756189634aa631eeef86a75b7af94ebe04bc/LibreCat/Catmandu/pull/397]<https://github.com/LibreCat/Catmandu/pull/397>
>
> Adding support for empty paths by phochste · Pull Request #397 · LibreCat/Catmandu<https://github.com/LibreCat/Catmandu/pull/397>
> Adding support for empty paths in fixes. E.g. echo '{"": "Empty", "ok": 1}' | catmandu convert JSON --fix "remove_field('')" should give {"ok": 1 }
> github.com
>
>
>
> BR
> Patrick
> ________________________________
> From: librecat-dev-bounces at lists.uni-bielefeld.de <librecat-dev-bounces at lists.uni-bielefeld.de> on behalf of Tobias Bülte <tobias.buelte at hbz-nrw.de>
> Sent: 13 September 2023 17:57
> To: Nicolas Franck <Nicolas.Franck at UGent.be>
> Cc: librecat-dev at lists.uni-bielefeld.de <librecat-dev at lists.uni-bielefeld.de>
> Subject: Re: [librecat-dev] Vacuum fix on keys?
>
> I was not sure about the behaviour in Catmandu.
>
> But since we implemented fix as transformation language in metafacture,
> I tested it there and with MF it works:
>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmetafacture.org%2Fplayground%2F%3Fflux%3DinputFile%250A%257C%2Bopen-file%250A%257C%2Bas-records%250A%257C%2Bdecode-json%250A%257C%2Bfix%2528transformationFile%2529%250A%257C%2Bencode-json%2528prettyPrinting%253D%2522true%2522%2529%250A%257C%2Bprint%250A%253B%26transformation%3Dremove_field%2528%2522%2522%2529%26data%3D%257B%250A%2B%2B%2B%2B%2522Hello%2522%253A%2B%2522World%2522%252C%250A%2B%2B%2B%2B%2522%2522%2B%253A%2B%2522This%2Bis%2Ban%2Bempty%2Bkey%2522%250A%257D&data=05%7C01%7CPatrick.Hochstenbach%40UGent.be%7Cb132083a6fe04d3fd6a808dbb4725ce3%7Cd7811cdeecef496c8f91a1786241b99c%7C1%7C0%7C638302176105386674%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=Own%2F4qnPUqr5k%2FcgVJOgzSwnyUfDFDkTmdF9rRXdb7E%3D&reserved=0<https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7C+as-records%0A%7C+decode-json%0A%7C+fix%28transformationFile%29%0A%7C+encode-json%28prettyPrinting%3D%22true%22%29%0A%7C+print%0A%3B&transformation=remove_field%28%22%22%29&data=%7B%0A++++%22Hello%22%3A+%22World%22%2C%0A++++%22%22+%3A+%22This+is+an+empty+key%22%0A%7D>
>
> Seems that MF and Catmandu behave differently with this regard.
>
> Am 13.09.23 um 15:18 schrieb Nicolas Franck:
>> @Tobias: if only that were true. Maybe it has two, or three spaces?
>>
>> Another solution would be to copy the input record to another and
>> specify all of the valid keys by name, and then run vacuum on that copy.
>> This way the weird keys are gone.
>>
>>> On 13 Sep 2023, at 15:02, Tobias Bülte <tobias.buelte at hbz-nrw.de> wrote:
>>>
>>> Wouldn't remove_field("") do the trick?
>>>
>>>
>>> Am 13.09.23 um 14:34 schrieb Peil, Vitali:
>>>> Hi all,
>>>>
>>>> I came across some bad data ;-). Tried to fix this with the vacuum fix. The data I have includes empty field names which I want to clean.
>>>>
>>>> with the vacuum fix:
>>>> $ echo '[{"ok": 1, "empty": "", "":"some bad data"}]' | catmandu convert to JSON --fix "vacuum()"
>>>> Output: [{"":"some bad data","ok":1}]
>>>>
>>>> but I would expect as output
>>>> [{"ok":1}]
>>>>
>>>> Is this a bug in the vacuum fix? Is there another way doing this?
>>>>
>>>> Best,
>>>> Vitali
>>>>
>>>>
>>>> _______________________________________________
>>>> librecat-dev mailing list
>>>> - send list mails to librecat-dev at lists.uni-bielefeld.de
>>>> - to unsubscribe or change options, visit https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.uni-bielefeld.de%2Fmailman2%2Fcgi%2Funibi%2Flistinfo%2Flibrecat-dev&data=05%7C01%7CPatrick.Hochstenbach%40UGent.be%7Cb132083a6fe04d3fd6a808dbb4725ce3%7Cd7811cdeecef496c8f91a1786241b99c%7C1%7C0%7C638302176105386674%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=Coc5bfPCrNQv1Hpcd43k0xUfsuYJMqF8SaRXFbknL94%3D&reserved=0<https://lists.uni-bielefeld.de/mailman2/cgi/unibi/listinfo/librecat-dev>
>>>> - project website: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Flibrecat.org%2F&data=05%7C01%7CPatrick.Hochstenbach%40UGent.be%7Cb132083a6fe04d3fd6a808dbb4725ce3%7Cd7811cdeecef496c8f91a1786241b99c%7C1%7C0%7C638302176105386674%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=md1Cl0mrDmchKpxhCzgLUIZa0q2723yaPTLb2nV3Wvk%3D&reserved=0<http://librecat.org/>
>>> _______________________________________________
>>> librecat-dev mailing list
>>> - send list mails to librecat-dev at lists.uni-bielefeld.de
>>> - to unsubscribe or change options, visit https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.uni-bielefeld.de%2Fmailman2%2Fcgi%2Funibi%2Flistinfo%2Flibrecat-dev&data=05%7C01%7CPatrick.Hochstenbach%40UGent.be%7Cb132083a6fe04d3fd6a808dbb4725ce3%7Cd7811cdeecef496c8f91a1786241b99c%7C1%7C0%7C638302176105386674%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=Coc5bfPCrNQv1Hpcd43k0xUfsuYJMqF8SaRXFbknL94%3D&reserved=0<https://lists.uni-bielefeld.de/mailman2/cgi/unibi/listinfo/librecat-dev>
>>> - project website: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Flibrecat.org%2F&data=05%7C01%7CPatrick.Hochstenbach%40UGent.be%7Cb132083a6fe04d3fd6a808dbb4725ce3%7Cd7811cdeecef496c8f91a1786241b99c%7C1%7C0%7C638302176105386674%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=md1Cl0mrDmchKpxhCzgLUIZa0q2723yaPTLb2nV3Wvk%3D&reserved=0<http://librecat.org/>
> _______________________________________________
> librecat-dev mailing list
> - send list mails to librecat-dev at lists.uni-bielefeld.de
> - to unsubscribe or change options, visit https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.uni-bielefeld.de%2Fmailman2%2Fcgi%2Funibi%2Flistinfo%2Flibrecat-dev&data=05%7C01%7CPatrick.Hochstenbach%40UGent.be%7Cb132083a6fe04d3fd6a808dbb4725ce3%7Cd7811cdeecef496c8f91a1786241b99c%7C1%7C0%7C638302176105386674%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=Coc5bfPCrNQv1Hpcd43k0xUfsuYJMqF8SaRXFbknL94%3D&reserved=0<https://lists.uni-bielefeld.de/mailman2/cgi/unibi/listinfo/librecat-dev>
> - project website: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Flibrecat.org%2F&data=05%7C01%7CPatrick.Hochstenbach%40UGent.be%7Cb132083a6fe04d3fd6a808dbb4725ce3%7Cd7811cdeecef496c8f91a1786241b99c%7C1%7C0%7C638302176105386674%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=md1Cl0mrDmchKpxhCzgLUIZa0q2723yaPTLb2nV3Wvk%3D&reserved=0<http://librecat.org/>



More information about the librecat-dev mailing list