The muHVT package is a collection of R functions to facilitate building topology preserving maps for rich multivariate data. Tending towards a big data preponderance, a large number of rows. A collection of R functions for this typical workflow is organized below :
Data Compression: Vector quantization (VQ), HVQ (hierarchical vector quantization) using means or medians. This step compresses the rows (long data frame) using a compression objective
Data Projection: Dimension projection of the compressed cells to 1D,2D and 3D with the Sammons Nonlinear Algorithm. This step creates topology preserving map coordinates into the desired output dimension
Tessellation: Create cells required for object visualization using the Voronoi Tessellation method, package includes heatmap plots for hierarchical Voronoi tessellations (HVT). This step enables data insights, visualization, and interaction with the topology preserving map. Useful for semi-supervised tasks
Prediction: Scoring new data sets and recording their assignment using the map objects from the above steps, in a sequence of maps if required
06th December, 2022
This package now additionally provides functionality to predict based on a set of maps to monitor entities over time.
The creation of a predictive set of maps involves four steps -
Let us try to understand the steps with the help of the diagram below -
Figure 1: Flow diagram for predicting based on a set of maps using mlayerHVT()
Initially, the raw data is passed, and a highly compressed Map A is
constructed using the HVT
function. The
output of this function will be hierarchically arranged vector quantized
data that is used to identify the outlier cells in the dataset using the
number of data points within each cell and the z-scores for each
cell.
The identified outlier cell(s) is then passed to the
removeOutliers
function along with Map A.
This function removes the identified outlier cell(s) from the dataset
and stores them in Map B as shown in the diagram. The final output of
this function is a list of two items - a newly constructed map (Map B),
and a subset of the dataset without outlier cell(s).
The plotCells
function plots the
Voronoi tessellations for the compressed map (Map A) and highlights the
identified outlier cell(s) in red on the plot. The function requires the
identified outlier cell(s) number and the compressed map (Map A) as
input in order to plot the tessellations map and highlight those outlier
cells on it.
The dataset without outlier(s) gotten as an output from the
removeOutliers function is then passed as an argument to the
HVT
function with other parameters such as
n_cells, quant.error, depth, etc. to construct another map (Map C).
Finally, all the constructed maps are passed to the
mlayerHVT
function along with the test
dataset on which the function will predict/score for finding which map
and what cell each test record gets assigned to.
For detailed information on the above functions, refer the vignette.
library(devtools)
devtools::install_github(repo = "Mu-Sigma/muHVT", ref = "dev")
Following are the links to the vignettes for the muHVT package:
muHVT Vignette: Contains descriptions of the functions used for vector quantization and construction of hierarchical voronoi tessellations for data analysis
muHVT Model Diagnostics Vignette: Contains descriptions of functions used to perform model diagnostics and validation for muHVT model
muHVT - Using mlayerHVT() for Monitoring Entities over Time: Contains descriptions of the functions used for monitoring entities over time using a predictive set of HVT maps