I will explain the challenges of current Big Data engines ( in particular Apache Spark ) integration with object stores, what are the issues and their origin. I will discuss what can be done to make this integration more efficient and remove barriers of some algorithms. I will present Stocator, an open source (Apache License 2.0) object store connector for Hadoop and Apache Spark specifically designed to optimize their performance with object stores.
Gil Vernik is a researcher in IBM Haifa, where he works with Apache Spark, Hadoop, object stores, and NoSQL databases. Gil has more than 25 years of experience as a code developer on both the server side and client side and is fluent in Java, Python, Scala, C/C++, and Erlang. He holds a PhD in mathematics from the University of Haifa and held a postdoctoral position in Germany.