public class Table extends DataRepresentation
Table is the basic data manipulation endpoint of TwisterX API. This class doesn't hold any data, instead it acts as the mediator between the user's application and the native TwisterX layer. Data transformation, communication and persistence is handled entirely by the native layer.
Tables are immutable and transformations will create another table instance keeping the original table intact.
A unique ID will be assigned to each table at the time of creation. This can be considered as the identifier of a set of data. When transferring data to a different machine, or when persisting data to a disk, this ID will be always associated with the underlying set of data. Hence different APIs(Java, Python, REST based RPC) will be able to refer a set of data irrespective of where it actually got created.
Modifier and Type | Method and Description |
---|---|
void |
clear()
Clear the table and free memory associated with this table
|
Table |
distributedJoin(Table rightTable,
JoinConfig joinConfig)
Apply the join algorithm across a distributed set of nodes/dataset
|
<I> Table |
filter(int columnIndex,
Filter<I> filterLogic)
Filter out rows of a table based on a single column
|
static Table |
fromColumns(List<Column> columns)
Create a
Table by combining a list of columns |
static Table |
fromCSV(CylonContext ctx,
String path)
This method will load a table by reading the data from a CSV file.
|
static Table |
fromCSV(String path,
List<org.apache.arrow.vector.types.Types.MinorType> dataTypes)
This method will load a table by reading the data from a CSV file.
|
int |
getColumnCount()
Get the number of columns of the table
|
int |
getRowCount()
Get the number of rows of the table
|
List<Table> |
hashPartition(List<Integer> hashColumns,
int noOfPartitions)
Partition a table based on the hash value of the specified columns
|
Table |
join(Table rightTable,
JoinConfig joinConfig)
Join two tables based on the value of the columns
|
<I,O> Column<O> |
mapColumn(int colIndex,
Mapper<I,O> mapper)
Maps the values of a column to another value
|
static Table |
merge(CylonContext ctx,
Table... tables)
Merge a set of similar tables into a single table.
|
void |
print()
Prints the entire table to the console
|
void |
print(int row1,
int row2,
int col1,
int col2)
Prints a section of the table to the console
|
List<Table> |
roundRobinPartition(int noOfPartitions)
Partition a table into n partitions of similar size
|
Table |
select(Selector selector)
This method can be used to filter out some rows from a table based on a
user defined logic
|
Table |
sort(int columnIndex)
Sort the rows of a table based on the value of a column
|
getId, unSupportedException
public static Table fromCSV(CylonContext ctx, String path)
path
- path to the CSV fileTable
instance that holds the data from the CSV filepublic static Table fromColumns(List<Column> columns)
Table
by combining a list of columnscolumns
- List
of columnspublic static Table fromCSV(String path, List<org.apache.arrow.vector.types.Types.MinorType> dataTypes)
This method will load a table by reading the data from a CSV file. The behaviour will be similar to
fromCSV(CylonContext, String)
, but additionally data types can be specified for each column.
public int getColumnCount()
public int getRowCount()
public Table join(Table rightTable, JoinConfig joinConfig)
rightTable
- Table to be joined with this tablejoinConfig
- Configurations for the join operationpublic Table distributedJoin(Table rightTable, JoinConfig joinConfig)
rightTable
- Table to be joined with this tablejoinConfig
- Configurations for the join operationpublic <I,O> Column<O> mapColumn(int colIndex, Mapper<I,O> mapper)
I
- Input data typeO
- Output data typecolIndex
- Column index to be transformed(mapped)mapper
- Mapping logicColumn
which represents the mapped valuespublic List<Table> hashPartition(List<Integer> hashColumns, int noOfPartitions)
hashColumns
- Indices of the columns to be hashednoOfPartitions
- No of partitions to generatepublic List<Table> roundRobinPartition(int noOfPartitions)
noOfPartitions
- no of partitions to generatepublic static Table merge(CylonContext ctx, Table... tables)
tables
- List of tables to be mergedTable
public Table sort(int columnIndex)
columnIndex
- index of the column to be usd for sortingTable
instancepublic <I> Table filter(int columnIndex, Filter<I> filterLogic)
I
- data type of the columncolumnIndex
- column to be used for filteringfilterLogic
- filtering logicpublic Table select(Selector selector)
selector
- logic to select(filter) rows from the tablepublic void clear()
public void print()
public void print(int row1, int row2, int col1, int col2)
row1
- starting row indexrow2
- ending row indexcol1
- starting column indexcol2
- ending column indexCopyright © 2020. All rights reserved.