Data Engineering Everywhere!
Fast & Scalable
Cylon uses OpenMPI underneath. It provides core data processing operators many times efficiently than current systems.
Designed to be Integrated
Cylon is designed to work across different data processing frameworks, deep learning frameworks and data formats.
Powered by Apache Arrow
Cylon uses Apache Arrow underneath to represent data.
BYOL, Bring Your Own Language!
Write in the language you are already familiar with, yet experience the same native performance.
1 2 3 4 5 6 7 8 9 10 11 12
from pycylon import read_csv, DataFrame, CylonEnv from pycylon.net import MPIConfig config: MPIConfig = MPIConfig() env: CylonEnv = CylonEnv(config=config, distributed=True) df1: DataFrame = read_csv('/tmp/csv1.csv') df2: DataFrame = read_csv('/tmp/csv2.csv') df3: Table = df1.join(other=df2, on=, algorithm="hash", env=env) print(df3)
Written with Performance & Scalability in Mind!
Cross Language Performance
Join performance with C++, Java and Python
Distributed Join(Strong Scaling)
Cylon(Hash Join) vs Cylon(Sort Join) vs Spark