This function fits a high-dimensional model using hexagonal bins and provides options to customize the modeling process, including the choice of bin centroids or bin means, removal of low-density hexagons, and averaging of high-dimensional data.
Usage
fit_high_d_model(
training_data,
nldr_df_with_id,
x = "UMAP1",
y = "UMAP1",
cell_area = 1,
num_bins_x = NA,
shape_val = NA,
is_bin_centroid = TRUE,
is_rm_lwd_hex = FALSE,
benchmark_to_rm_lwd_hex = NA,
is_avg_high_d = TRUE,
column_start_text = "x"
)
Arguments
- training_data
A data frame containing the training high-dimensional data.
- nldr_df_with_id
A data frame containing 2D embeddings with a unique identifier.
- x
The name of the column that contains first 2D embeddings component.
- y
The name of the column that contains second 2D embeddings component.
- cell_area
The area of each hexagonal cell.
- num_bins_x
The number of bins along the x-axis for the hexagonal grid.
- shape_val
The shape parameter for the hexagons.
- is_bin_centroid
Logical, indicating whether to use bin centroids (default is TRUE).
- is_rm_lwd_hex
Logical, indicating whether to remove low-density hexagons (default is FALSE).
- benchmark_to_rm_lwd_hex
The benchmark value to remove low-density hexagons.
- is_avg_high_d
Logical, indicating whether to average the high-dimensional data within bins (default is TRUE).
- column_start_text
The text prefix for columns in the high-dimensional data.
Value
A list containing the data frame with high-dimensional coordinates for 2D bin centroids (df_bin
)
and the data frame containing information about hexagonal bin centroids (df_bin_centroids
) in 2D.
Examples
fit_high_d_model(training_data = s_curve_noise_training, nldr_df_with_id = s_curve_noise_umap)
#> $df_bin
#> # A tibble: 11 × 8
#> hb_id x1 x2 x3 x4 x5 x6 x7
#> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 -0.337 0.186 -1.92 0.00180 0.00255 -0.0445 -0.000892
#> 2 8 -0.515 0.989 -1.69 0.00344 0.00290 -0.0167 0.0000764
#> 3 9 -0.302 1.55 -1.90 0.00763 -0.00494 -0.0511 -0.00819
#> 4 16 -0.214 1.45 -1.98 -0.00966 -0.00370 0.0573 0.000307
#> 5 17 0.515 1.21 -1.80 0.0121 -0.00690 0.00531 -0.00141
#> 6 24 0.958 0.729 -1.16 0.00110 -0.00509 0.0420 0.00307
#> 7 32 0.659 0.628 -0.252 0.000870 0.0127 0.0156 -0.00207
#> 8 33 0.772 1.55 -0.411 -0.00364 0.000835 0.0267 0.000950
#> 9 40 -0.123 1.52 1.04 0.00686 0.0000898 -0.0208 0.00369
#> 10 48 -0.130 0.874 1.22 -0.00244 0.00388 0.0171 -0.00266
#> 11 55 0.660 0.512 1.58 -0.00437 0.00270 0.000203 -0.00344
#>
#> $df_bin_centroids
#> # A tibble: 11 × 5
#> x y hexID counts std_counts
#> <dbl> <dbl> <int> <int> <dbl>
#> 1 -3.27 -3.27 1 5 0.278
#> 2 -2.79 -2.44 8 9 0.5
#> 3 -1.84 -2.44 9 2 0.111
#> 4 -2.32 -1.62 16 1 0.0556
#> 5 -1.36 -1.62 17 6 0.333
#> 6 -0.885 -0.791 24 3 0.167
#> 7 -0.407 0.0355 32 3 0.167
#> 8 0.547 0.0355 33 6 0.333
#> 9 1.02 0.862 40 18 1
#> 10 1.50 1.69 48 14 0.778
#> 11 1.98 2.51 55 8 0.444
#>