ag_dtype: Set the default floating-point precision for ag_* GPU operations
Description
Controls the dtype used when uploading tensors to the ggml backend.
"bf16" halves memory usage vs "f32" with minimal accuracy loss.
Backward pass always uses f32 R matrices regardless of this setting.