Best trained method
PixelNeRFResearch project
2D to 3D Reconstruction of Orthogonal Pollen Images
Bachelor Thesis ยท FHNW, Switzerland
Sequoia is a unified and reproducible framework for reconstructing 3D pollen geometry from orthogonal photographs and holographic inputs. It compares classical geometry, voxel reconstruction, implicit fields, mesh refinement and generative 3D methods inside one consistent research pipeline.
Chamfer Distance
0.043F-Score
90.9%IoU
82.8%3D pollen mesh
PixelNeRF explainer
How NeRF works in our pollen reconstruction case
This animation walks through the PixelNeRF intuition we use for pollen. Instead of learning a single mesh directly from just two images, the model learns a continuous 3D radiance field that can answer what density and color exist at any sampled 3D location.
Input views
The brighter side cameras stand for the orthogonal observations used to condition PixelNeRF.
Ray samples
Rays probe many candidate locations so the network can reason about where pollen matter exists.
Conditioned field
The field and final mesh show how image-conditioned 3D reasoning becomes a renderable pollen volume.
1. inputs
Orthogonal pollen images anchor the scene
The two dominant camera frustums represent our orthogonal pollen captures. Their poses tell the model where each image was taken from, which is crucial because NeRF needs both appearance and geometry context.
Why PixelNeRF
PixelNeRF is not just vanilla NeRF with fewer images
PixelNeRF conditions the radiance field on image features extracted from the observed views. That changes the problem from training one separate NeRF per object into predicting a scene representation directly from sparse inputs, which is exactly why it matters for our orthogonal pollen case.
- Image-aligned features are lifted from the source views and queried together with the 3D sample location.
- Camera pose information keeps those features geometrically anchored instead of treating the views as loose textures.
- The learned field generalizes from sparse evidence, so it can infer unseen parts of the pollen more effectively than a pure silhouette baseline.
- In Sequoia, this makes PixelNeRF a strong comparison point between deterministic geometry and newer generative 3D methods.
In our case
- We start from orthogonal pollen observations, so the first two camera views already give strong structural constraints.
- PixelNeRF conditions the scene representation on the observed images and their camera poses instead of training one field per object from scratch.
- Rays are sampled through 3D space, and the network predicts density and appearance for many points along each ray.
- Once the field is consistent, we can render unseen viewpoints and compare the reconstructed pollen morphology against other methods.
1. inputs
Orthogonal pollen images anchor the scene
The two dominant camera frustums represent our orthogonal pollen captures. Their poses tell the model where each image was taken from, which is crucial because NeRF needs both appearance and geometry context.
2. rays
Rays sample many candidate 3D locations
For each pixel, NeRF sends a ray into space and samples many points along it. In the animation, the bright ray bundles show how the model probes where pollen structure could exist between the cameras and the object center.
3. field
A neural field learns density and color
The blue cloud represents the conditioned PixelNeRF field. For every sampled 3D point, the network predicts density and radiance while reusing image-conditioned features from the source views, so the field stays tied to the observed pollen structure.
4. render
PixelNeRF renders novel pollen views from the conditioned field
After optimization, the field can synthesize novel viewpoints of the pollen grain. That is what makes PixelNeRF useful in our framework: it turns sparse orthogonal evidence into a full 3D representation that can be evaluated and compared.
Thesis results
Results pulled out of the paper tables
These numbers condense the benchmark tables and discussion from the bachelor thesis into a sparse-view comparison you can scan quickly. The focus is the actual two-view pollen setting, because that is the real constraint the project is designed around.
PixelNeRF
Best overall trade-off in the thesis and the strongest trained method under two orthogonal views.
0.043
90.9%
82.8%
Hunyuan3D-2
Zero-shot generative baseline that nearly matches surface fidelity, but with heavier compute and lower overlap.
0.0432
91.15%
79.05%
Pix2Vox
Recovers coarse pollen volume well, but the voxel bottleneck smooths fine ornamentation.
0.060
84.8%
61.1%
Pixel2Mesh++
Competitive surface proximity, but many open meshes pull the volumetric score down.
0.052
88.5%
40.6%
Visual Hull
Clean geometric baseline that stays useful for reference, yet under-carves concavities.
0.091
66.4%
63.7%
Paper discussion
What the thesis says actually worked
The study compares classical geometry, voxel CNNs, mesh deformation, implicit radiance fields and zero-shot generative 3D under one shared pollen pipeline. The main conclusion is that PixelNeRF fits the sparse orthogonal setup best because it balances surface accuracy, stability and volumetric coherence better than the other trained methods.
Biggest practical win
1 to 2 views matters most
PixelNeRF jumps from CD 0.059 and IoU 75.5% with one image to CD 0.043 and IoU 82.8% with two orthogonal images. After that, the gains are much smaller.
Zero-shot surprise
Hunyuan3D-2 stays close
In the thesis runs, Hunyuan3D-2 reaches CD 0.0432 and F-Score 91.15% with two views, but still trails PixelNeRF on IoU and needs much more compute.
Prior effect
Pixel2Mesh++ IoU jumps hard
Swapping to a mean pollen prior lifts Pixel2Mesh++ IoU from 43.1% to 75.9%, which suggests the prior regularizes volume and watertightness more than fine surface detail.
Holographic transfer
Real data stays difficult
Against the visual-hull proxy on holographic images, Pixel2Mesh++ scores best numerically, PixelNeRF tracks morphology better qualitatively, and Pix2Vox degrades the most.
Objective and scope
Objective and scope
- Build a rigorous comparison space for heterogeneous 3D reconstruction methods applied to pollen morphology.
- Keep preprocessing deterministic and modular across synthetic multi-view renders and holographic acquisitions.
- Separate data generation, model definition, training orchestration and evaluation cleanly.
- Make experiments extensible through Hydra configs, reusable scripts and shared metrics.
Reconstruction pipeline
Reconstruction pipeline
Clean and repair raw STL pollen meshes.
Normalize scale and orientation for stable downstream comparisons.
Render multi-view RGB images and silhouettes with Blender and VTK tooling.
Apply optional structural and morphological augmentations for broader training variation.
Train, evaluate and compare models with shared metrics and logging.
Model families
Model families
- Visual Hull baseline for deterministic silhouette carving and geometric reference.
- Pix2Vox with staged encoder, decoder, merger and refiner control.
- PixelNeRF for multi-view conditional radiance-field reconstruction.
- Pixel2Mesh++ for multi-stage mesh refinement with dedicated preprocessing.
- Hunyuan3D-2 for fast generative 3D comparison with varying inference budgets.
Reproducibility layer
Reproducibility layer
- Hydra organizes data, model and experiment configuration with clean overrides.
- PyTorch Lightning unifies training flow, checkpoints and metric reporting.
- Containerized execution and SLURM scripts reduce environment drift for benchmark runs.
- Shared evaluation notebooks and pipelines make qualitative and quantitative comparison easier.
Discussion and interpretation
Discussion and interpretation
- PixelNeRF wins because the learned implicit prior fills in unseen structure from two orthogonal views without collapsing volumetric overlap.
- Pix2Vox is robust and visually plausible at a coarse level, but its 32 cubed voxel output limits thin structures and ornamentation.
- Pixel2Mesh++ can sit close to the target surface while still failing to produce watertight pollen meshes, which is why its IoU lags behind.
- Hunyuan3D-2 is impressive as a zero-shot comparator, yet the project framing still favors PixelNeRF when efficiency, repeatability and controlled benchmarking matter.
How the evaluation worked
How the evaluation worked
- All methods were pushed through one comparable pipeline with canonicalized pollen meshes, shared camera setups and aligned evaluation metrics.
- The thesis evaluates reconstructions with Chamfer Distance, F-Score and IoU after common post-processing and alignment, instead of letting each model define its own success criteria.
- The table values combine synthetic orthogonal pollen renders, augmentation experiments, view-count ablations and a transfer test on holographic inputs.
- This matters because the final story is not only which model looks pretty, but which one stays credible across sparse data, biological shape variation and reproducible evaluation.
