ETL Overview
Welcome to Flojoy’s ETL Blocks page. Here you can find all the information on how to handle ETL tasks using Flojoy.
EXTRACT
DATAFRAME
EXTRACT_COLUMNS Take an input dataframe/matrix and returns a dataframe/matrix with only the specified columns.
FILE
OPEN_IMAGE Load an image file from disk and return a DataContainer of type 'image'.
OPEN_PARQUET Load a local parquet file, then return the data as a dataframe.
READ_CSV Read a .csv file from disk or a URL, and then return it as a dataframe.
READ_S3 Take an S3 key name, S3 bucket name, and file name as input, then extract the file from the specified bucket.
LOAD
CLOUD_DATABASE
FLOJOY_CLOUD_DOWNLOAD Download a DataContainer from Flojoy Cloud (beta).
FLOJOY_CLOUD_UPLOAD Upload a DataContainer to Flojoy Cloud (beta).
LOCAL_FILE_SYSTEM
BATCH_PROCESSOR Blob match a pattern in the given input directory, iterate (in a LOOP) over all of the files found, then return each file path as a TextBlob.
LOCAL_FILE The LOCAL_FILE node loads a local file of a different type and converts it to a DataContainer class.
OPEN_MATLAB The OPEN_MATLAB node loads a local file of the .mat file format.
REMOTE_FILE_SYSTEM
REMOTE_FILE Load a remote file from an HTTP URL endpoint, infer the type, and convert it to a DataContainer class.
TRANSFORM
MATRIX_MANIPULATION
DOT_PRODUCT Take two input matrices, multiply them (by dot product), and return the result.
INVERT Invert a Matrix or OrderedPair.
MATMUL Take two input matrices, multiply them, and return the result.
SHUFFLE_MATRIX Return a matrix that is randomly shuffled by the first axis
SORT_MATRIX Take an input matrix and sort it along the chosen axis.
TRANSPOSE_MATRIX Take an input 2D matrix and transpose it.
ORDERED_PAIR_MANIPULATION
ORDERED_PAIR_XY_INVERT Return an OrderedPair with the axes inverted.
TEXT_MANIPULATION
TEXT_CONCAT Concatenate 2 strings given by 2 TextBlob DataContainers.
TYPE_CASTING
BOOLEAN_2_SCALAR Takes boolean type data and converts it into scalar data type.
DF_2_NP Convert a DataFrame DataContainer to a Matrix DataContainer.
DF_2_ORDERED_TRIPLE Convert a DataFrame DataContainer to an OrderedTriple DataContainer.
MATRIX_2_VECTOR Convert a Matrix DataContainer to a Vector DataContainer.
MAT_2_DF Convert a Matrix DataContainer to a DataFrame DataContainer.
NP_2_DF Infer the type of an array-like DataContainer, then convert it to a DataFrame DataContainer'.
ORDERED_PAIR_2_VECTOR Returns the split components (x, y) of an ordered pair as Vectors.
ORDERED_TRIPLE_2_SURFACE Convert an OrderedTriple DataContainer to a Surface DataContainer.
VECTOR_2_MATRIX Convert a Vector DataContainer to a Matrix DataContainer.
VECTOR_2_ORDERED_PAIR Convert a Vector DataContainer to an OrderedPair DataContainer.
VECTOR_2_SCALAR Takes a vector and transform it into scalar data type.
VECTOR_MANIPULATION
DECIMATE_VECTOR The DECIMATE_VECTOR node returns the input vector by reducing the
INTERLEAVE_VECTOR The INTERLEAVE_VECTOR node combine multiple vectors into a single vector type by interleaving their elements.
REMOVE_DUPLICATES_VECTOR The REMOVE_DUPLICATES_VECTOR node returns a vector with only unique elements.
REPLACE_SUBSET The REPLACE_SUBSET node returns a new Vector with subset of elements replaced.
REVERSE_VECTOR The REVERSE_VECTOR node returns a vector equal to the input vector but reversed.
SHIFT_VECTOR The SHIFT_VECTOR node shifts the elements in the vector by the amount specified
SHUFFLE_VECTOR The SHUFFLE_VECTOR node returns a vector that is randomly shuffled.
SORT_VECTOR The SORT_VECTOR node returns the input Vector that is sorted
SPLIT_VECTOR The SPLIT_VECTOR node returns a vector that is splited by a given index
VECTOR_DELETE The VECTOR_DELETE node returns a new Vector with elements deleted from requested indices
VECTOR_INDEXING The VECTOR_INDEXING node returns the value of the vector at the requested index.
VECTOR_INSERT The VECTOR_INSERT node inserts a value to the Vector at the
VECTOR_LENGTH The VECTOR_LENGTH node returns the length of the input vector.
VECTOR_MAX The VECTOR_MAX node returns the maximum value from the Vector.
VECTOR_MIN The VECTOR_MIN node returns the minimum value from the Vector
VECTOR_SUBSET The VECTOR_SUBSET node returns the subset of values from requested indices