Big Data Becomes Faster Now!

Alluxio, formerly known as Tachyan is a memory-centric, virtual distributed file system. And now is giving big data applications fast, unified access to the storage.
Alluxio_logo-200x56

Alluxio now, at version 1.0 provides frameworks, such as Spark, Flink, MapReduce, or Presto along offering access to multiple storage types. This access also supports major cloud storage providers including Amazon S3, Google Cloud Storage, and OpenStack Swift, alongside storage vendors EMC and NetApp.

alluxio-stack-768x404

Alluxio, from outside, might seem like an in-memory caching system such as Mem-cached or Redis. Rather, it’s a layer that resides between distributed computing applications and storage and gives the former access to the latter via a unified API. All these applications can make use of Alluxio’s API, offering the highest possible speed. As an alternate, they can also use legacy APIs, which are slower but more compatible.

Earlier, engineers at Intel explained the working of Alluxio and that how it helps in addressing few common issues with big data frameworks, like sharing data between applications. Instead of writing data to HDFS and reading it back out again, users can write data to Alluxio’s in-memory store and read it back out at much greater speed.

Similarly, the JVM’s garbage collection and on-heap cache issues that are worsened by frameworks like Spark can be repaired by using Alluxio. Seeing this, IBM has demanded that back in the Tachyon days, Alluxio outdid in-memory HDFS by 110x for writes and improves the end-to-end latency of a realistic workflow by 4x.”

Alluxio also offers other solutions, for example Apache Arrow, fastens the data processing by making it available to an application in a format that suits modern CPUs. And the data requested by Arrow would be procured from storage and provided by Alluxio.

Alluxio drew support from several big data projects in its Tachyon incarnation wherein Spark was the chief among them. The company now plans to continue building support from other big data projects and storage system vendors.