[librecat-dev] Catmandu and Hadoop/Spark?

Günter Hipler guenter.hipler at unibas.ch
Wed Feb 17 12:07:42 CET 2016

Hi Jakob,

it would be nice to integrate Catmandu in such processes. But I think 
the integration of a Perl based framework is less natural compared to 
e.g. Python. All these "Big Data" components are Java/Scala based and 
Perl is not part of the JVM world (might change in the future with 
Perl6). Spark and Flink (https://flink.apache.org/) are providing 
specialized Python clients.

I know we already have had this discussion more than one year ago ;-) 
and for me this was one important reason to use Metafacture for our 
project (swissbib). But I still hope both frameworks (Catmandu / 
Metafacture) are coming closer together in the future.

Very best wishes from Basel!


On 02/17/2016 10:28 AM, Jakob Voß wrote:
> Hi,
> I just got asked whether Catmandu (or Perl in general) can be used 
> with Hadoop or Spark. Has anyone of you tried this before? This is 
> what I found for Spark:
> https://wiki.ufal.ms.mff.cuni.cz/spark:recipes:using-perl-via-pipes
> Although we successfully do processing of large data sets with 
> Catmandu, I guess it has its limitations with "big data" (whatever 
> that means). Maybe it's worth to use Catmandu on top of existing big 
> data frameworks such as Hadoop and Spark instead of extending Catmandu 
> with big data features such as massive parallel processing?
> Just a thought,
> Jakob

Günter Hipler
Projekt swissbib
Schönbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: +41 61 267 31 12
Fax: +41 61 267 31 03
E-Mail guenter.hipler at unibas.ch
URL www.swissbib.org

More information about the librecat-dev mailing list