Construct the 2D model and lift into high-D — fit_high_d

This function fits a high-dimensional model using hexagonal bins and provides options to customize the modeling process, including the choice of bin centroids or bin means, removal of low-density hexagons, and averaging of high-dimensional data.

Usage

fit_high_d_model(
  training_data,
  nldr_df_with_id,
  x = "UMAP1",
  y = "UMAP1",
  cell_area = 1,
  num_bins_x = NA,
  shape_val = NA,
  is_bin_centroid = TRUE,
  is_rm_lwd_hex = FALSE,
  benchmark_to_rm_lwd_hex = NA,
  is_avg_high_d = TRUE,
  column_start_text = "x"
)

Arguments

training_data: A data frame containing the training high-dimensional data.
nldr_df_with_id: A data frame containing 2D embeddings with a unique identifier.
x: The name of the column that contains first 2D embeddings component.
y: The name of the column that contains second 2D embeddings component.
cell_area: The area of each hexagonal cell.
num_bins_x: The number of bins along the x-axis for the hexagonal grid.
shape_val: The shape parameter for the hexagons.
is_bin_centroid: Logical, indicating whether to use bin centroids (default is TRUE).
is_rm_lwd_hex: Logical, indicating whether to remove low-density hexagons (default is FALSE).
benchmark_to_rm_lwd_hex: The benchmark value to remove low-density hexagons.
is_avg_high_d: Logical, indicating whether to average the high-dimensional data within bins (default is TRUE).
column_start_text: The text prefix for columns in the high-dimensional data.

Value

A list containing the data frame with high-dimensional coordinates for 2D bin centroids (df_bin) and the data frame containing information about hexagonal bin centroids (df_bin_centroids) in 2D.

Examples

fit_high_d_model(training_data = s_curve_noise_training, nldr_df_with_id = s_curve_noise_umap)
#> $df_bin
#> # A tibble: 11 × 8
#>    hb_id     x1    x2     x3        x4         x5        x6         x7
#>    <int>  <dbl> <dbl>  <dbl>     <dbl>      <dbl>     <dbl>      <dbl>
#>  1     1 -0.337 0.186 -1.92   0.00180   0.00255   -0.0445   -0.000892 
#>  2     8 -0.515 0.989 -1.69   0.00344   0.00290   -0.0167    0.0000764
#>  3     9 -0.302 1.55  -1.90   0.00763  -0.00494   -0.0511   -0.00819  
#>  4    16 -0.214 1.45  -1.98  -0.00966  -0.00370    0.0573    0.000307 
#>  5    17  0.515 1.21  -1.80   0.0121   -0.00690    0.00531  -0.00141  
#>  6    24  0.958 0.729 -1.16   0.00110  -0.00509    0.0420    0.00307  
#>  7    32  0.659 0.628 -0.252  0.000870  0.0127     0.0156   -0.00207  
#>  8    33  0.772 1.55  -0.411 -0.00364   0.000835   0.0267    0.000950 
#>  9    40 -0.123 1.52   1.04   0.00686   0.0000898 -0.0208    0.00369  
#> 10    48 -0.130 0.874  1.22  -0.00244   0.00388    0.0171   -0.00266  
#> 11    55  0.660 0.512  1.58  -0.00437   0.00270    0.000203 -0.00344  
#> 
#> $df_bin_centroids
#> # A tibble: 11 × 5
#>         x       y hexID counts std_counts
#>     <dbl>   <dbl> <int>  <int>      <dbl>
#>  1 -3.27  -3.27       1      5     0.278 
#>  2 -2.79  -2.44       8      9     0.5   
#>  3 -1.84  -2.44       9      2     0.111 
#>  4 -2.32  -1.62      16      1     0.0556
#>  5 -1.36  -1.62      17      6     0.333 
#>  6 -0.885 -0.791     24      3     0.167 
#>  7 -0.407  0.0355    32      3     0.167 
#>  8  0.547  0.0355    33      6     0.333 
#>  9  1.02   0.862     40     18     1     
#> 10  1.50   1.69      48     14     0.778 
#> 11  1.98   2.51      55      8     0.444 
#>