Design¶
Key architectural decisions in numeris.
Column-Major Storage¶
Matrix<T, M, N> stores data as [[T; M]; N] — N columns of M rows. Element (row, col) is at data[col][row].
This is column-major (Fortran / LAPACK order). The reasons:
- LAPACK compatibility: all standard numerical algorithms (LU, QR, Cholesky, etc.) are formulated and optimized for column-major storage.
- SIMD inner loops: Householder reflections, LU elimination, and BLAS-1 AXPY (
y += α·x) operate on a single column — a contiguous block of M floats.col_as_slice()returns this directly, enabling SIMD dispatch with no gathering. - Matmul: the j-k-i loop order accesses A in column-major (contiguous i-access) and B by row (B[k,j] strides), matching the memory layout for the SIMD-vectorized i-loop.
Matrix::new() accepts row-major input (as written on paper) and transposes internally, so the API is natural. DynMatrix::from_rows() does the same.
Stack Allocation and Const Generics¶
Fixed-size matrices are fully stack-allocated. There is no heap usage, no Box, no Vec — just [[T; M]; N] on the stack. This enables:
- No-std / embedded: works on targets with no heap allocator.
- Zero-cost abstractions: the compiler can see all dimensions at monomorphization time. Dead branches are eliminated. SIMD kernel dispatch via
TypeIdis a compile-time decision. - LLVM optimization: the compiler knows the loop bounds and can unroll, vectorize, and schedule optimally.
Avoids [T; M*N] flat storage (which would require unstable generic_const_exprs for expressions like const LEN: usize = M * N).
MatrixRef / MatrixMut Traits¶
All decomposition free functions are written against two simple accessor traits:
pub trait MatrixRef<T> {
fn nrows(&self) -> usize;
fn ncols(&self) -> usize;
fn get(&self, row: usize, col: usize) -> T;
fn col_as_slice(&self, col: usize) -> &[T];
}
pub trait MatrixMut<T>: MatrixRef<T> {
fn get_mut(&mut self, row: usize, col: usize) -> &mut T;
fn col_as_mut_slice(&mut self, col: usize) -> &mut [T];
}
Both Matrix<T, M, N> and DynMatrix<T> implement these traits. The consequence: the same LU, Cholesky, QR, SVD, Eigen, and Schur code handles fixed and dynamic matrices. There are no separate implementations — just one set of free functions.
col_as_slice() / col_as_mut_slice() are the key methods: they return contiguous &[T] slices of individual columns, enabling SIMD dispatch to operate on contiguous data without gathering.
LinalgScalar Hierarchy¶
Three element traits form a hierarchy:
Scalar Copy + PartialEq + Debug + Zero + One + Num
└─ FloatScalar Scalar + Float + LinalgScalar<Real=Self>
└─ LinalgScalar Scalar + modulus + conj + re + lsqrt + lln + from_real
| Trait | Used by | Examples |
|---|---|---|
Scalar |
Matrix ops, arithmetic, iteration | i32, u64, f32, f64, Complex<f64> |
FloatScalar |
Quaternion, ODE, optim, estimate, ordered comparisons | f32, f64 |
LinalgScalar |
Decompositions, norms | f32, f64, Complex<f32>, Complex<f64> |
Integer matrices work: Matrix<i32, 3, 3> supports arithmetic, indexing, iteration, and transpose. It does not support det(), norms, or decompositions (those require LinalgScalar).
Complex is additive: LinalgScalar methods on f64 — conj(), re(), modulus() — are #[inline] identity functions, fully erased by the compiler. Complex support adds zero code to real-valued paths.
In-Place Algorithms¶
Decompositions operate in-place on &mut impl MatrixMut<T>. This avoids:
- Allocator/storage traits (nalgebra's approach)
- Cloning the input (scipy's approach)
- Extra indirection through trait objects
The pattern:
pub fn cholesky_in_place<T: LinalgScalar>(
a: &mut impl MatrixMut<T>,
) -> Result<(), LinalgError> {
// modifies a in-place: a → L (lower triangular factor)
}
Wrapper structs (CholeskyDecomposition, etc.) own the modified matrix and provide solve(), inverse(), det() as convenience methods.
SIMD Dispatch¶
SIMD is selected at monomorphization time via TypeId:
pub fn dot_dispatch<T: Scalar>(a: &[T], b: &[T]) -> T {
use core::any::TypeId;
if TypeId::of::<T>() == TypeId::of::<f64>() {
// SAFETY: we just verified T = f64
unsafe { f64_dot(transmute(a), transmute(b)) as T }
} else if TypeId::of::<T>() == TypeId::of::<f32>() {
unsafe { f32_dot(transmute(a), transmute(b)) as T }
} else {
scalar_dot(a, b) // generic fallback for integers, complex, etc.
}
}
The TypeId check is a compile-time constant — dead branches are eliminated by LLVM at monomorphization. For integer T, the entire SIMD path compiles away; for f64, only the f64 path remains.
The Scalar trait has a 'static bound (required by TypeId). This is backwards-compatible — all scalar types are 'static.
DynMatrix Storage¶
DynMatrix<T> uses Vec<T> in column-major order: element (row, col) is at index col * nrows + row. Implementing MatrixRef/MatrixMut means all decomposition free functions work automatically, with no duplicate code.
DynVector is a newtype wrapper around DynMatrix that enforces the single-column invariant and provides single-index access.
Avoiding Unstable Features¶
numeris uses only stable Rust (MSRV 1.70). Constraints:
generic_const_exprsis unstable → no[T; M*N]flat storagemin_const_generics(stable since 1.51) →[[T; M]; N]two-level storage workscore::archintrinsics for SIMD are stable on aarch64 and x86_64
num-traits Integration¶
numeris uses num-traits with default-features = false for generic numeric bounds. This avoids pulling in std through num-traits in no-std mode. Key traits used:
Zero,One— additive/multiplicative identityNum— basic arithmetic (Scalar bound)Float— transcendentals,sqrt,sin, etc. (FloatScalar bound)
When std is enabled, Float delegates to the system's hardware libm. Without std, it uses libm's software implementations via the libm feature.