PyCylon API Docs
#
Imports#
ContextInitializing the Cylon Context based on the distributed or non-distributed context Args: config: an object extended from pycylon.net.CommConfig, pycylon.net.MPIConfig for MPI backend distributed: bool to set distributed setting True or False Returns: None
#
Sequential Programming#
Distributed Programmging#
RankThis is the process id (unique per process) :return: an int as the rank (0 for non distributed mode)
#
World SizeThis is the total number of processes joined for the distributed task :return: an int as the world size (1 for non distributed mode)
#
FinalizeGracefully shuts down the context by closing any distributed processes initialization ,etc :return: None
#
BarrierCalling barrier to sync workers
#
Initialize Table#
Using a ListCreating a PyCylon table from a list Args: context: pycylon.CylonContext col_names: Column names as a List[str] data_list: data as a List of List, (List per column)
#
Using a DictionaryCreating a PyCylon table from a dictionary Args: context: pycylon.CylonContext dictionary: dict object with key as column names and values as a List
#
Using a PyArrow TableCreating a PyCylon table from PyArrow Table Args: context: pycylon.CylonContext pyarrow_table: PyArrow Table
#
Using NumpyCreating a PyCylon table from numpy arrays Args: context: pycylon.CylonContext col_names: column names as a List ar_list: Numpy ndarrays as a list (one 1D array per column)
#
Using PandasCreating a PyCylon table from Pandas DataFrame Args: context: cylon.CylonContext df: pd.DataFrame preserve_index: keep indexes as same as in original DF nthreads: number of threads for the operation columns: column names, if updated safe: safe operation
#
Convert Table#
To a PyArrow TableCreating PyArrow Table from PyCylon table Return: PyArrow Table
#
To PandasCreating Pandas Dataframe from PyCylon Table Returns: pd.DataFrame
#
To NumpyAdd order
as F
or C
to get F_CONTIGUOUS
or C_CONTIGUOUS
Numpy array.
The default does a zero copy. But for bool values make sure to add zero_copy_only
to False
.
#
To DictionaryCreating a dictionary from PyCylon table Returns: dict object
#
I/O Operations#
Read from CSV#
Write to CSVCreating a csv file with PyCylon table data Args: path: path to file csv_write_options: pycylon.io.CSVWriteOptions
#
Properties#
Column Names#
Column Count#
Shape#
Row Count#
Context#
Relational Algebra Operators#
JoinJoins two PyCylon tables :param table: PyCylon table on which the join is performed (becomes the left table) :param join_type: Join Type as str ["inner", "left", "right", "outer"] :param algorithm: Join Algorithm as str ["hash", "sort"] :kwargs left_on: Join column of the left table as List[int] or List[str], right_on: Join column of the right table as List[int] or List[str], on: Join column in common with both tables as a List[int] or List[str]. Return: Joined PyCylon table
Note: The print methods are work in progress to provide similar output as Pandas
In sequential setting use join
and in distributed setting use distributed_join
upon the
use-case.
#
Subtract (Difference)For distributed operations use distributed_subtract
instead of subtract
.
#
IntersectFor distributed operations use distributed_intersect
instead of intersect
.
#
ProjectFor distributed operations and sequential operations project
can be used.
#
Aggregation OperationsCurrently supports, Sum, Min, Max, Count
#
SUM#
Min#
Max#
Count#
GroupByGroup by operations support aggregations.
#
Comparison Operators#
EqualEqual operator for Table Args: other: can be a numeric scalar or a Table
#
Not EqualNot equal operator for Table Args: other: can be a numeric scalar or Table
#
Lesser ThanLesser than operator for Table Args: other: can be a numeric scalar or Table
#
Greater ThanGreater than operator for Table Args: other: can be a numeric scalar or Table
#
Lesser Than EqualLesser than or equal operator for Table Args: other: can be a numeric scalar or Table
#
Greater Than EqualGreater than or equal operator for Table Args: other: can be a numeric scalar or Table
#
Logical Operators#
OrOr operator for Table Args: other: PyCylon Table
#
AndAnd operator for Table Args: other: PyCylon Table
#
InvertOnly support bool valued Tables
Invert operator for Table
#
Math OperatorsCurrently support negation, add, subtract, multiply and division on scalar numeric values.
#
NegationNegation operator for Table
#
AddAdd operator for Table Args: other: scalar numeric
#
SubtractSubtract operator for Table Args: other: scalar numeric
#
MultiplyMultiply operator for Table Args: other: scalar numeric
#
DivisionElement-wise division operator for Table Args: other: scalar numeric
#
Dropdrop a column or list of columns from a Table Args: column_names: List[str]
#
FillnaFill not applicable values with a given value Args: fill_value: scalar
#
WhereExperimental version of Where operation. Replace values where condition is False Args: condition: bool Table other: Scalar
#
IsNullChecks for null elements and returns a bool Table Returns: PyCylon Table
#
IsNACheck for not applicable values and returns a bool Table Returns: PyCylon Table
#
Not NullCheck the not null values and returns a bool Table Returns: PyCylon Table
#
Not NAChecks for not NA values and returns a bool Table Returns: PyCylon Table
#
RenameRename a Table with a column name or column names Args: column_names: dictionary or full list of new column names
#
Add PrefixAdding a prefix to column names Args: prefix: str
#
Add SuffixAdding a prefix to column names Args: prefix: str
#
IndexRetrieve index if exists or provide a range index as default Returns: Index object
#
Set IndexSet Index Args: key: pycylon.Index Object or an object extended from pycylon.Index
#
DropNaDrop not applicable values from a Table Args: axis: 0 for column and 1 for row and only do dropping on the specified axis how: any or all, any refers to drop if any value is NA and drop only if all values are NA in the considered axis inplace: do the operation on the existing Table itself when set to True, the default is False and it produces a new Table with the drop update
#
Distributed SortDoes a distributed sort on the table by re-partitioning the data to maintain the sort order across all processes Args: sort_column: str or int sort_options: SortOption
#
Set IndexOperation takes place inplace. Args: key: pycylon.indexing.index.BaseIndex
Returns: None
#
Reset IndexHere the existing index can be removed and set back to table. This operation takes place in place. Args: drop_index: bool, if True the column is dropped otherwise added to the table with the column name "index"
Returns: None
#
LocThis operator finds value by key
#
ILocThis operator finds value by position as an index (row index)