[librecat-dev] identify duplicate records with Catmandu
Nicolas.Franck at UGent.be
Fri Dec 2 12:10:10 CET 2016
Identifying duplicate depends on what you see as "duplicate".
I would do the following:
1. at the beginning of the fix: create a new field "identifier" (for example) that is made by joining other fields
2. use "lookup_in_store" to check if it exists already.
3. If yes, then use "reject", which stops the fix, and rejects your current record
4. If no, then store the "identifier" using "add_to_store"
5. do your other stuff
From: librecat-dev-bounces at lists.uni-bielefeld.de <librecat-dev-bounces at lists.uni-bielefeld.de> on behalf of Sergio Letuche <code4libuserx at gmail.com>
Sent: Friday, December 2, 2016 10:03 AM
To: librecat-dev at lists.uni-bielefeld.de
Subject: [librecat-dev] identify duplicate records with Catmandu
how do you dedup duplicate records?
For a use case we have, we consider duplicate records to be those that share the same content
in for example 245 tag, and all 6** tags.
something like a record is identical to another, if in it it has a 245 tag, that has the same value,
with another record, that has the same metadata in tag 245, or the same metadata in any of the 6** tags.
How would you approach this, with a fix?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the librecat-dev