[librecat-dev] Catmandu and Hadoop/Spark?
Günter Hipler
guenter.hipler at unibas.ch
Wed Feb 17 12:07:42 CET 2016
Hi Jakob,
it would be nice to integrate Catmandu in such processes. But I think
the integration of a Perl based framework is less natural compared to
e.g. Python. All these "Big Data" components are Java/Scala based and
Perl is not part of the JVM world (might change in the future with
Perl6). Spark and Flink (https://flink.apache.org/) are providing
specialized Python clients.
I know we already have had this discussion more than one year ago ;-)
and for me this was one important reason to use Metafacture for our
project (swissbib). But I still hope both frameworks (Catmandu /
Metafacture) are coming closer together in the future.
Very best wishes from Basel!
Günter
On 02/17/2016 10:28 AM, Jakob Voß wrote:
> Hi,
>
> I just got asked whether Catmandu (or Perl in general) can be used
> with Hadoop or Spark. Has anyone of you tried this before? This is
> what I found for Spark:
>
> https://wiki.ufal.ms.mff.cuni.cz/spark:recipes:using-perl-via-pipes
>
> Although we successfully do processing of large data sets with
> Catmandu, I guess it has its limitations with "big data" (whatever
> that means). Maybe it's worth to use Catmandu on top of existing big
> data frameworks such as Hadoop and Spark instead of extending Catmandu
> with big data features such as massive parallel processing?
>
> Just a thought,
> Jakob
>
--
UNIVERSITÄT BASEL
Universitätsbibliothek
Günter Hipler
Projekt swissbib
Schönbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: +41 61 267 31 12
Fax: +41 61 267 31 03
E-Mail guenter.hipler at unibas.ch
URL www.swissbib.org
More information about the librecat-dev
mailing list