[librecat-dev] example fixes

Patrick Hochstenbach Patrick.Hochstenbach at UGent.be
Tue Mar 29 15:47:46 CEST 2016


I can give you general tricks to get your started on this project but the details are left for you to implement.

All these questions boil down to creating a database first of identifiers that exist in one or more sources. 
Given this database you can lookup your particular records and check if any identifiers exist in this database as record keys.

To create a database for instance with SQLite install: cpanm Catmandu::DBI DBI::SQLite

# Create database
$ catmandu import MARC to DBI --data_source dbi:SQLite:mydb.sqlite < data.mrc

Now you can use this database in lookups. As keys ‘_id’ it contains your marc record identifier and as data your marc record.

For instance, to filter out from a marc file all records where 200x don’t match a _id in this database do:

$ catmandu convert MARC —fix lookup.fix to CSV < data.mrc
  
with lookup.fix as:
 
 marc_map(200x,check)
 lookup_in_store(check,DBI,data_source:"dbi:SQLite:mydb.sqlite")

 if exists(check._id)
  reject()
 end

retain(_id)

As endresult you’ll keep a CSV list of records for where the 200x field doesn’t match a record _id in your database.

You can do this for one file, for many files, for millions of records. Probably you want to use a database like MongoDB for a very large database.

Probably your database contains to much data. In your second example you are only interested in some fileds. You can do this by keeping only 
the values you are interested in when creating the database. Eg.

# Create database
$ catmandu import MARC to DBI --data_source dbi:SQLite:mydb.sqlite —fix keep_fields.fix < data.mrc

marc_map(245,title)
marc_map(008/07-10,date)
retain(_id,title,date)

Notw this database only contains _id,title,date. If you later do the lookup:

 marc_map(200x,check)
 lookup_in_store(check,DBI,data_source:"dbi:SQLite:mydb.sqlite”)

and it matches a record in your database, then your ‘check’ will be overwritten with the data in the database and you can use it:

 if exists(check._id)
   copy_field(check.title, my.record.title)
   copy_field(check.date, my.record.date)
 end

Also read: https://metacpan.org/pod/Catmandu::Fix::lookup_in_store for more information how you can read in these stored values.

> On 26 Mar 2016, at 11:14, aina at openmailbox.org wrote:
> 
> Dear Catmanduers,
> 
> i would appreciate if you could give me an example fix that you would use
> for the following use cases:
> 
> 1)if we have an a.mrc, with a tag say 200, which in it it has a subfield $x,
> where we store the id of a certain target record, that is related to the source record.
> 
> We need to print a message if that target record doesnot exist(for example a record has tag 200, the subfield $x1894, and there is no record with id 1894 in this same mrc file).
> 
> 
> 2 ) between two mrc files
> for many tags, say for example 4xx:
> $3:  in a $3 we store ids that lead to a target authority record of an
> authority mrc file, say b.mrc.  if it leads to a non existing record, we
> should print the "orphan" id. If it exists, we need to check that the
> target authority record has in the Record Label, position 6, the value
> x. if not, this should also be spotted and reported.
> 
> Best seasons wishes to you all
> _______________________________________________
> librecat-dev mailing list
> - send list mails to librecat-dev at lists.uni-bielefeld.de
> - to unsubscribe or change options, visit https://lists.uni-bielefeld.de/mailman2/cgi/unibi/listinfo/librecat-dev
> - project website: http://librecat.org/

Patrick Hochstenbach - digital architect
University Library Ghent
Rozier 9 - 9000 Ghent - Belgium
patrick.hochstenbach at ugent.be
+32 (0)9 264 7980




More information about the librecat-dev mailing list