PyCylon API Docs
Imports#
Context#
Initializing the Cylon Context based on the distributed or non-distributed context Args: config: an object extended from pycylon.net.CommConfig, pycylon.net.MPIConfig for MPI backend distributed: bool to set distributed setting True or False Returns: None
Sequential Programming#
Distributed Programmging#
Rank#
This is the process id (unique per process) :return: an int as the rank (0 for non distributed mode)
World Size#
This is the total number of processes joined for the distributed task :return: an int as the world size (1 for non distributed mode)
Finalize#
Gracefully shuts down the context by closing any distributed processes initialization ,etc :return: None
Barrier#
Calling barrier to sync workers
Initialize Table#
Using a List#
Creating a PyCylon table from a list Args: context: pycylon.CylonContext col_names: Column names as a List[str] data_list: data as a List of List, (List per column)
Using a Dictionary#
Creating a PyCylon table from a dictionary Args: context: pycylon.CylonContext dictionary: dict object with key as column names and values as a List
Using a PyArrow Table#
Creating a PyCylon table from PyArrow Table Args: context: pycylon.CylonContext pyarrow_table: PyArrow Table
Using Numpy#
Creating a PyCylon table from numpy arrays Args: context: pycylon.CylonContext col_names: column names as a List ar_list: Numpy ndarrays as a list (one 1D array per column)
Using Pandas#
Creating a PyCylon table from Pandas DataFrame Args: context: cylon.CylonContext df: pd.DataFrame preserve_index: keep indexes as same as in original DF nthreads: number of threads for the operation columns: column names, if updated safe: safe operation
Convert Table#
To a PyArrow Table#
Creating PyArrow Table from PyCylon table Return: PyArrow Table
To Pandas#
Creating Pandas Dataframe from PyCylon Table Returns: pd.DataFrame
To Numpy#
Add order as F or C to get F_CONTIGUOUS or C_CONTIGUOUS Numpy array.
The default does a zero copy. But for bool values make sure to add zero_copy_only
to False.
To Dictionary#
Creating a dictionary from PyCylon table Returns: dict object
I/O Operations#
Read from CSV#
Write to CSV#
Creating a csv file with PyCylon table data Args: path: path to file csv_write_options: pycylon.io.CSVWriteOptions
Properties#
Column Names#
Column Count#
Shape#
Row Count#
Context#
Relational Algebra Operators#
Join#
Joins two PyCylon tables :param table: PyCylon table on which the join is performed (becomes the left table) :param join_type: Join Type as str ["inner", "left", "right", "outer"] :param algorithm: Join Algorithm as str ["hash", "sort"] :kwargs left_on: Join column of the left table as List[int] or List[str], right_on: Join column of the right table as List[int] or List[str], on: Join column in common with both tables as a List[int] or List[str]. Return: Joined PyCylon table
Note: The print methods are work in progress to provide similar output as Pandas
In sequential setting use join and in distributed setting use distributed_join upon the
use-case.
Subtract (Difference)#
For distributed operations use distributed_subtract instead of subtract.
Intersect#
For distributed operations use distributed_intersect instead of intersect.
Project#
For distributed operations and sequential operations project can be used.
Aggregation Operations#
Currently supports, Sum, Min, Max, Count
SUM#
Min#
Max#
Count#
GroupBy#
Group by operations support aggregations.
Comparison Operators#
Equal#
Equal operator for Table Args: other: can be a numeric scalar or a Table
Not Equal#
Not equal operator for Table Args: other: can be a numeric scalar or Table
Lesser Than#
Lesser than operator for Table Args: other: can be a numeric scalar or Table
Greater Than#
Greater than operator for Table Args: other: can be a numeric scalar or Table
Lesser Than Equal#
Lesser than or equal operator for Table Args: other: can be a numeric scalar or Table
Greater Than Equal#
Greater than or equal operator for Table Args: other: can be a numeric scalar or Table
Logical Operators#
Or#
Or operator for Table Args: other: PyCylon Table
And#
And operator for Table Args: other: PyCylon Table
Invert#
Only support bool valued Tables
Invert operator for Table
Math Operators#
Currently support negation, add, subtract, multiply and division on scalar numeric values.
Negation#
Negation operator for Table
Add#
Add operator for Table Args: other: scalar numeric
Subtract#
Subtract operator for Table Args: other: scalar numeric
Multiply#
Multiply operator for Table Args: other: scalar numeric
Division#
Element-wise division operator for Table Args: other: scalar numeric
Drop#
drop a column or list of columns from a Table Args: column_names: List[str]
Fillna#
Fill not applicable values with a given value Args: fill_value: scalar
Where#
Experimental version of Where operation. Replace values where condition is False Args: condition: bool Table other: Scalar
IsNull#
Checks for null elements and returns a bool Table Returns: PyCylon Table
IsNA#
Check for not applicable values and returns a bool Table Returns: PyCylon Table
Not Null#
Check the not null values and returns a bool Table Returns: PyCylon Table
Not NA#
Checks for not NA values and returns a bool Table Returns: PyCylon Table
Rename#
Rename a Table with a column name or column names Args: column_names: dictionary or full list of new column names
Add Prefix#
Adding a prefix to column names Args: prefix: str
Add Suffix#
Adding a prefix to column names Args: prefix: str
Index#
Retrieve index if exists or provide a range index as default Returns: Index object
Set Index#
Set Index Args: key: pycylon.Index Object or an object extended from pycylon.Index
DropNa#
Drop not applicable values from a Table Args: axis: 0 for column and 1 for row and only do dropping on the specified axis how: any or all, any refers to drop if any value is NA and drop only if all values are NA in the considered axis inplace: do the operation on the existing Table itself when set to True, the default is False and it produces a new Table with the drop update
Distributed Sort#
Does a distributed sort on the table by re-partitioning the data to maintain the sort order across all processes Args: sort_column: str or int sort_options: SortOption
Set Index#
Operation takes place inplace. Args: key: pycylon.indexing.index.BaseIndex
Returns: None
Reset Index#
Here the existing index can be removed and set back to table. This operation takes place in place. Args: drop_index: bool, if True the column is dropped otherwise added to the table with the column name "index"
Returns: None
Loc#
This operator finds value by key
ILoc#
This operator finds value by position as an index (row index)