F-INR: Functional Tensor Decomposition for Implicit Neural Representations

Computer Vision Group
Friedrich Schiller University Jena, Germany
WACV 2026

Abstract

Implicit Neural Representations (INRs) model signals as continuous, differentiable functions. However, monolithic INRs scale poorly with data dimensionality, leading to excessive training costs. We propose F-INR, a framework that addresses this limitation by factorizing a high-dimensional INR into a set of compact, axis-specific sub-networks based on functional tensor decomposition. These sub-networks learn low-dimensional functional components that are then combined via tensor operations. This factorization reduces computational complexity while additionally improving representational capacity. F-INR is both architecture- and decomposition-agnostic. It integrates with various existing INR backbones (e.g., SIREN, WIRE, FINER, Factor Fields) and tensor formats (e.g., CP, TT, Tucker), offering fine-grained control over the speed-accuracy trade-off via the tensor rank and mode. Our experiments show F-INR accelerates training by up to 20× and improves fidelity by over 6.0 dB PSNR compared to state-of-the-art INRs. We validate these gains on diverse tasks, including image representation, 3D geometry reconstruction, and neural radiance fields. We further show F-INR's applicability to scientific computing by modeling complex physics simulations. Thus, F-INR provides a scalable, flexible, and efficient framework for high-dimensional signal modeling.

Key Contributions

  1. Functional Tensor Decomposition for INRs. We introduce a new paradigm for representation learning that is orthogonal to network design: factorizing a high-dimensional INR into compact, axis-specific sub-networks via established tensor decomposition modes (CP, TT, Tucker).
  2. F-INR: a modular, agnostic framework. F-INR is both architecture-agnostic (works with SIREN, WIRE, FINER, Factor Fields, ReLU/Tanh ± positional/hash encoding) and decomposition-agnostic, giving fine-grained control over the speed–accuracy trade-off through tensor rank and mode.
  3. Significant empirical gains. F-INR accelerates training by up to 20× while improving reconstruction fidelity by over 6.0 dB PSNR compared to state-of-the-art INRs across image representation, 3D geometry (SDF), neural radiance fields, and physics-informed simulations.

Method

Figure 1: Efficient INRs via Functional Tensor Decomposition

Figure 1: Efficient INRs via Functional Tensor Decomposition. INR models use a single, large network to predict one value (or a batch of values) at a time. Our approach decomposes the function into smaller networks, enabling full prediction in a single step with configurable tensor decomposition modes and compression ranks.

A conventional INR learns a monolithic network \(\Phi_\theta : \mathbb{R}^d \mapsto \mathbb{R}^c\) that maps a \(d\)-dimensional coordinate to a \(c\)-variate output signal. The curse of dimensionality means that the number of parameters, and thus training cost, grows exponentially with the input dimensionality.

F-INR addresses this by replacing the monolithic network with a product of \(d\) small, univariate sub-networks:

\[ \Phi(\mathbf{x}) \approx \bigotimes_{i=1}^{d}\, \phi_i(x_i;\,\theta_i), \tag{1} \]

where \(\phi_i(\cdot)\) denotes a univariate neural network for the \(i\)-th dimension with learnable parameters \(\theta_i\). Each network produces a component of particular rank \(R\) to restore the original signal via classical tensor decomposition modes \((\bigotimes)\). The decomposition tensors have continuous functions as bases, yielding functional tensor decomposition; hence the name F-INR.

The outputs are combined via \(\bigotimes\) corresponding to one of three classical modes:

  • CP (Canonic-Polyadic) — \(d\) factor networks of rank \(R\), combined by element-wise product and summation.
  • TT (Tensor Train) — a chain of low-rank tensor cores; two factor networks and one decomposition core.
  • Tucker (TU) — factor networks plus a small core tensor \(\mathbf{C}\) capturing inter-mode interactions.

This reformulation reduces forward-pass complexity (see Table 1), preserves full differentiability with respect to input coordinates (enabling PDE-based objectives), and is backend-agnostic, any INR architecture can serve as the sub-network backbone.

Table 1: Forward Pass Complexity

Assuming a grid of \(n \times n \times n\), i.e.\ \(n^3\) data points, and a network with \(m\) features and \(l\) layers. Note \(r \ll m^2 l\).

Mode Forward Pass Sub-networks
CP \(\mathcal{O}(n \cdot d \cdot r \cdot m \cdot l + r \cdot n^d)\) \(d\) factor networks
TT \(\mathcal{O}(n \cdot d \cdot r^2 \cdot m \cdot l + r^2 \cdot n^d)\) 2 factors + 1 core
Tucker (TU) \(\mathcal{O}(n \cdot d \cdot r \cdot m \cdot l + r^d)\) \(d\) factors + core \(\mathbf{C}\)

Experiments

We evaluate F-INR on four tasks using a standard four-layer MLP with 256 features, trained for 50k steps with Adam (lr=10−4) on an NVIDIA RTX 3090. We test seven INR backends (ReLU, Tanh, SIREN, WIRE, FINER, Factor Fields, and variants with positional/hash encoding) across all three decomposition modes (CP, TT, Tucker) and a range of ranks, resulting in 136 configurations reported in the supplementary material.

Image Representation

Images are second-order tensors, so a rank-\(R\) CP decomposition (two axis-specific networks combined via matrix multiplication) suffices. In a conventional INR, an image is represented as \(\Phi_\theta(x, y) = (r, g, b)\). F-INR instead learns \(\phi_1(\mathbf{x};\theta_1) \otimes \phi_2(\mathbf{y};\theta_2) = (\mathbf{r}, \mathbf{g}, \mathbf{b})\) where \(\otimes\) denotes matrix multiplication. Evaluated on the DIV2K parrot benchmark against LIIF, DeepTensor, NeuRBF, CoordX, and Factor Fields; all models trained with batch size \(2^{18}\).

  • F-INR (SIREN) achieves +7.29 dB PSNR while cutting training from 3:22 to 1:49 min.
  • F-INR (FINER+PE) gains +8.22 dB PSNR with a \(14\times\) speedup.
  • F-INR (WIRE) offers a \(15\times\) speedup (31:28 → 2:07 min) with +2.33 dB PSNR.
  • F-INR (Factor Fields, rank 128) reaches 46.48 dB PSNR, the top result in the comparison.
  • Speedups over native backends reach \(20\times\); over optimized tiny-cuda-nn kernels \(2.2\times\).
Figure 3: Qualitative Image Results

Figure 3: Qualitative Image Results. F-INR improves both fidelity and training speed over the original backbones. Fidelity improvements reach up to \(+7\,\text{dB}\) (SIREN) and training speedups reach \(20\times\) for native backbones (WIRE) and \(2.2\times\) for optimized CUDA implementations (ReLU). Error maps are magnified for visualization.

Table 2: Quantitative Image Results

PSNR \((\uparrow)\) and SSIM \((\uparrow)\) on the DIV2K parrot image (mean \(\pm\) std over 5 runs). ΔPSNR is relative to the corresponding baseline. † denotes tiny-cuda-nn implementation; * denotes convergence at 10k steps.

Method Backend Rank PSNR (dB) ΔPSNR SSIM Time
INR Baselines
Tanh † 25.04 0.82 5:51
ReLU+PE † 34.35 0.95 5:58
ReLU+Hash (iNGP) † 36.22 0.96 7:40
SIREN † 28.89 0.97 3:22
WIRE 32.06 0.87 31:28
FINER 22.73 0.75 22:45
NeuRBF * 41.61 0.95 8:22
Factor Fields * 41.82 0.95 1:35
F-INR (ours)
F-INR WIRE 316 34.39 +2.33 0.89 2:07
F-INR WIRE+PE 316 35.72 +3.66 0.90 2:07
F-INR SIREN 256 36.18 +7.29 0.90 1:49
F-INR SIREN+PE 256 36.28 +7.39 0.90 1:52
F-INR FINER+PE 128 30.95 +8.22 0.83 1:33
F-INR Factor Fields * 128 46.48 +4.66 0.98 2:40

3D Geometry via Signed Distance Functions

F-INR learns SDFs from dense voxel grids using a physics-informed Eikonal loss evaluated on Stanford 3D Scan geometries (Armadillo and others). The loss combines three terms:

\[ \mathcal{L}_{\text{SDF}} = \underbrace{\int_{\Omega}\!\bigl\|\nabla\Psi - 1\bigr\|}_{{\text{Eikonal}}} + \underbrace{\int_{\Omega}\!\bigl\|\Psi - \hat{\Psi}\bigr\|}_{{\text{data fidelity}}} + \underbrace{\int_{\Omega_{<\Omega_0}}\!\bigl\|\Psi - \hat{\Psi}\bigr\|}_{{\text{surface term}}}, \tag{2} \]

enforcing \(\|\nabla\Psi\|=1\) everywhere, global fidelity to \(\hat{\Psi}\), and extra accuracy near the object surface \(\Omega_{<\Omega_0}\). This objective requires stable high-order gradients w.r.t. input coordinates, making discrete-lookup methods (TensoRF, hash grids, Factor Fields) incompatible.

  • F-INR (WIRE+PE, TT, rank 128) achieves IoU 0.997 and MSE 0.011 on Armadillo in only 13:27 min.
  • F-INR (ReLU+PE, TU, rank 128) achieves IoU 0.992 in 11:54 min — \(>20\times\) faster than the 250+ min baselines.
  • Results surpass specialized methods (DeepSDF, IGR) on all tested geometries.
Figure 4: Qualitative SDF Results

Figure 4: Qualitative SDF Results. F-INR models with strong inductive biases (WIRE, SIREN, ReLU+PE) capture fine geometric details with high fidelity. Others (Tanh, FINER) preserve macro-structure but fail to reconstruct high-frequency details, resulting in oversmoothed surfaces and lower IoU scores.

Table 3: Quantitative SDF Results (Armadillo)

IoU \((\uparrow)\) and MSE \((\downarrow)\) over 5 runs. † = no Eikonal regularization. * = RTX 6000 Ada.

Method Backend Mode Rank IoU (↑) MSE (↓) Time
INR Baselines
ReLU+PE 0.950 0.233 257:08
WIRE 0.988 0.154 252:14
SIREN 0.989 0.127 252:45
FINER 0.989 0.125 217:10
DeepSDF 0.989 0.126 315:32
IGR 0.990 0.127 292:13
NeuRBF †* 0.997 0.153 5:22
Factor Fields † 0.972 10−6 5:24
F-INR (ours)
F-INR Tanh TT 128 0.941 0.034 12:34
F-INR ReLU+PE TU 128 0.992 0.013 11:54
F-INR WIRE+PE TT 128 0.997 0.011 13:27
F-INR SIREN TU 64 0.989 0.085 8:29
F-INR FINER+PE TT 64 0.978 0.039 8:13

Neural Radiance Fields

Neural Radiance Fields represent a 3D scene as a continuous function mapping a 5D input \((x,y,z,\theta,\phi)\) to a 4D output of color and volume density. F-INR factorizes this 5D function into five univariate sub-networks, one per coordinate axis:

\[ f(x,y,z,\theta,\phi) \approx \phi_x(x) \otimes \phi_y(y) \otimes \phi_z(z) \otimes \phi_\theta(\theta) \otimes \phi_\phi(\phi), \tag{3} \]

replacing a monolithic 3D hash grid with three independent 1D hash encodings while keeping all other NeRF components unchanged. Evaluated on the synthetic NeRF dataset (Lego and Drums scenes).

  • F-INR (ReLU+HE, TT, rank 16) achieves 35.67 dB PSNR on Lego and 26.48 dB on Drums in 22.4 min.
  • This exceeds K-Planes (34.23 dB / 76.8 min) and TensoRF (33.14 dB / 25.8 min) in both fidelity and speed.
  • TT decomposition outperforms CP and Tucker across all tested rank configurations.
  • A CUDA-optimized F-INR implementation is expected to yield further gains beyond the current pure-PyTorch results.
NeRF Lego novel view
NeRF Drums novel view

Figure 5: Qualitative NeRF Results. Novel view renderings from the best-performing F-INR model (ReLU+HE, TT rank 16) on the Lego (top) and Drums (bottom) scenes. The results demonstrate the model's ability to reconstruct fine geometric details and complex view-dependent effects with high fidelity.

Table 4: Quantitative NeRF Results

PSNR (dB) on the synthetic NeRF dataset. * = RTX 6000 Ada.

Method Mode Rank Lego (↑) Drums (↑) Time
Baselines
NeRF 30.83 22.01 135 h
DVGO 31.25 24.90 24.4 min
Plenoxels 31.71 24.48 22.5 min
TensoRF 33.14 25.26 25.8 min
ReLU+Hash (iNGP) 34.59 25.28 4.8 min
NeuRBF * 37.29 26.57 1.1 h
Factor Fields 33.14 26.57 28.4 min
K-Planes 34.23 25.30 76.8 min
F-INR (ours, ReLU+HE)
F-INR CP 4 31.87 24.55 5.8 min
F-INR CP 8 33.88 25.12 8.1 min
F-INR CP 16 34.81 25.30 18.3 min
F-INR TT 8 34.24 25.07 16.2 min
F-INR TT 16 35.67 26.48 22.4 min
F-INR TU 8 31.22 23.02 28.6 min

Physics-Informed Learning

F-INR is applied to learning decaying turbulence governed by the incompressible Navier-Stokes equations in vorticity form:

\[ \partial_t \omega + u \cdot \nabla\omega = \nu\,\Delta\omega, \quad x \in \Omega,\; t \in \Gamma, \tag{3a} \] \[ \nabla \cdot u = 0, \quad x \in \Omega,\; t \in \Gamma, \tag{3b} \] \[ \omega(x,0) = \omega_0(x), \quad x \in \Omega, \tag{3c} \]

where \(u\) is the velocity field, \(\omega = \nabla \times u\) is the vorticity, \(\omega_0\) is the initial vorticity, and \(\nu = 0.01\) is the viscosity coefficient. The spatial domain is \(\Omega \in [0, 2\pi]^2\) and the temporal domain \(\Gamma \in [0,1]\). The model is trained on sparse, coarse data (\(10 \times 64 \times 64\) resolution) and must generalize to the full \(101 \times 128 \times 128\) grid — a \(40\times\) reduction in supervision. The composite loss \(\mathcal{L} = \mathcal{L}_{\text{data}} + \mathcal{L}_{\text{PDE}}\) enforces physical consistency at domain collocation points.

  • F-INR (ReLU+PE, TT, rank 128) achieves MSE 0.030, surpassing the previous best PhySR (0.038).
  • Training time drops from 27 h to under 2 h, a \(>20\times\) speedup.
  • F-INR is the first successful application of the WIRE backbone to a complex physics simulation.
  • No time-marching is required; F-INR learns a single continuous function over the full spatio-temporal domain.
Figure 6: Qualitative PINN INR Results

Figure 6: Qualitative PINN INR Results. Vorticity field ground truth (left), F-INR prediction (ReLU+PE, TT rank 128, middle), and pointwise absolute error (right) at three time steps. Trained on sparse data, the model captures the complex dynamics across the full temporal domain. Error concentrates on high-gradient regions (vortex edges) and grows as turbulence develops.

Table 5: Quantitative PINN INR Results

MSE \((\downarrow)\) on the Navier-Stokes benchmark (mean \(\pm\) std). Models trained from \(10\times 64\times 64\) coarse data; evaluated at \(101\times 128\times 128\) full resolution.

Method Backend Mode Rank MSE (↓) Time
Baselines
ReLU+PE 0.097 20:30
WIRE 0.073 20:25
SIREN 0.184 20:24
ModifiedPINN 0.074 28:40
CausalPINN 0.070 33:12
MeshFreeFlowNet 0.048 35:18
PhySR 0.038 27:05
F-INR (ours)
F-INR WIRE TT 64 0.034 0:59
F-INR WIRE TT 256 0.033 1:40
F-INR ReLU+PE TT 64 0.033 0:51
F-INR ReLU+PE TT 128 0.030 1:12
F-INR ReLU+PE TT 256 0.030 1:59
F-INR ReLU+PE TU 128 0.032 1:34
F-INR ReLU+PE TU 256 0.032 2:03

Acknowledgements

A great number of people have contributed to this paper, whether consciously or not. We especially thank the members of our group for their contributions throughout the revisions of this manuscript, which ultimately brought it to a publication-worthy state.

BibTeX

If you find our work useful, utilize with our models, start your own research with our data set, or use our parts of our code, please cite our work:
@InProceedings{Vemuri_2026_WACV,
      author = {Vemuri, Sai Karthikeya and B\"uchner, Tim and Denzler, Joachim},
      title = {F-INR: Functional Tensor Decomposition for Implicit Neural Representations},
      booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
      month = {March},
      year = {2026},
      pages = {6557-6568}
}