Research project

2D to 3D Reconstruction of Orthogonal Pollen Images

Bachelor Thesis · FHNW, Switzerland

Sequoia is a unified and reproducible framework for reconstructing 3D pollen geometry from orthogonal photographs and holographic inputs. It compares classical geometry, voxel reconstruction, implicit fields, mesh refinement and generative 3D methods inside one consistent research pipeline.

Open GitHub repository Dataset and reconstructions

Visual HullPix2VoxPixelNeRFPixel2Mesh++Hunyuan3D-2

Best trained method

PixelNeRF

Chamfer Distance

0.043

F-Score

90.9%

IoU

82.8%

Orthogonal view X

Orthogonal view Y

3D pollen mesh

PixelNeRF explainer

How NeRF works in our pollen reconstruction case

This animation walks through the PixelNeRF intuition we use for pollen. Instead of learning a single mesh directly from just two images, the model learns a continuous 3D radiance field that can answer what density and color exist at any sampled 3D location.

3D source objectJuniperus communis pollen grain STLLoading real juniper pollen STL

1. inputs2. rays3. field4. render

Input views

The brighter side cameras stand for the orthogonal observations used to condition PixelNeRF.

Ray samples

Rays probe many candidate locations so the network can reason about where pollen matter exists.

Conditioned field

The field and final mesh show how image-conditioned 3D reasoning becomes a renderable pollen volume.

1. inputs

Orthogonal pollen images anchor the scene

The two dominant camera frustums represent our orthogonal pollen captures. Their poses tell the model where each image was taken from, which is crucial because NeRF needs both appearance and geometry context.

Why PixelNeRF

PixelNeRF is not just vanilla NeRF with fewer images

PixelNeRF conditions the radiance field on image features extracted from the observed views. That changes the problem from training one separate NeRF per object into predicting a scene representation directly from sparse inputs, which is exactly why it matters for our orthogonal pollen case.

Image-aligned features are lifted from the source views and queried together with the 3D sample location.
Camera pose information keeps those features geometrically anchored instead of treating the views as loose textures.
The learned field generalizes from sparse evidence, so it can infer unseen parts of the pollen more effectively than a pure silhouette baseline.
In Sequoia, this makes PixelNeRF a strong comparison point between deterministic geometry and newer generative 3D methods.

In our case

We start from orthogonal pollen observations, so the first two camera views already give strong structural constraints.
PixelNeRF conditions the scene representation on the observed images and their camera poses instead of training one field per object from scratch.
Rays are sampled through 3D space, and the network predicts density and appearance for many points along each ray.
Once the field is consistent, we can render unseen viewpoints and compare the reconstructed pollen morphology against other methods.

1. inputs

Orthogonal pollen images anchor the scene

2. rays

Rays sample many candidate 3D locations

For each pixel, NeRF sends a ray into space and samples many points along it. In the animation, the bright ray bundles show how the model probes where pollen structure could exist between the cameras and the object center.

3. field

A neural field learns density and color

The blue cloud represents the conditioned PixelNeRF field. For every sampled 3D point, the network predicts density and radiance while reusing image-conditioned features from the source views, so the field stays tied to the observed pollen structure.

4. render

PixelNeRF renders novel pollen views from the conditioned field

After optimization, the field can synthesize novel viewpoints of the pollen grain. That is what makes PixelNeRF useful in our framework: it turns sparse orthogonal evidence into a full 3D representation that can be evaluated and compared.

Thesis results

Results pulled out of the paper tables

These numbers condense the benchmark tables and discussion from the bachelor thesis into a sparse-view comparison you can scan quickly. The focus is the actual two-view pollen setting, because that is the real constraint the project is designed around.

Read thesis PDF Thesis tables 5.3 to 5.6 and discussion

ModelChamferF-ScoreIoU

PixelNeRF

Best overall trade-off in the thesis and the strongest trained method under two orthogonal views.

0.043

90.9%

82.8%

Hunyuan3D-2

Zero-shot generative baseline that nearly matches surface fidelity, but with heavier compute and lower overlap.

0.0432

91.15%

79.05%

Pix2Vox

Recovers coarse pollen volume well, but the voxel bottleneck smooths fine ornamentation.

0.060

84.8%

61.1%

Pixel2Mesh++

Competitive surface proximity, but many open meshes pull the volumetric score down.

0.052

88.5%

40.6%

Visual Hull

Clean geometric baseline that stays useful for reference, yet under-carves concavities.

0.091

66.4%

63.7%

Paper discussion

What the thesis says actually worked

The study compares classical geometry, voxel CNNs, mesh deformation, implicit radiance fields and zero-shot generative 3D under one shared pollen pipeline. The main conclusion is that PixelNeRF fits the sparse orthogonal setup best because it balances surface accuracy, stability and volumetric coherence better than the other trained methods.

Biggest practical win

1 to 2 views matters most

PixelNeRF jumps from CD 0.059 and IoU 75.5% with one image to CD 0.043 and IoU 82.8% with two orthogonal images. After that, the gains are much smaller.

Zero-shot surprise

Hunyuan3D-2 stays close

In the thesis runs, Hunyuan3D-2 reaches CD 0.0432 and F-Score 91.15% with two views, but still trails PixelNeRF on IoU and needs much more compute.

Prior effect

Pixel2Mesh++ IoU jumps hard

Swapping to a mean pollen prior lifts Pixel2Mesh++ IoU from 43.1% to 75.9%, which suggests the prior regularizes volume and watertightness more than fine surface detail.

Holographic transfer

Real data stays difficult

Against the visual-hull proxy on holographic images, Pixel2Mesh++ scores best numerically, PixelNeRF tracks morphology better qualitatively, and Pix2Vox degrades the most.

Objective and scope

Build a rigorous comparison space for heterogeneous 3D reconstruction methods applied to pollen morphology.
Keep preprocessing deterministic and modular across synthetic multi-view renders and holographic acquisitions.
Separate data generation, model definition, training orchestration and evaluation cleanly.
Make experiments extensible through Hydra configs, reusable scripts and shared metrics.

Reconstruction pipeline

Clean and repair raw STL pollen meshes.

Normalize scale and orientation for stable downstream comparisons.

Render multi-view RGB images and silhouettes with Blender and VTK tooling.

Apply optional structural and morphological augmentations for broader training variation.

Train, evaluate and compare models with shared metrics and logging.

Model families

Visual Hull baseline for deterministic silhouette carving and geometric reference.
Pix2Vox with staged encoder, decoder, merger and refiner control.
PixelNeRF for multi-view conditional radiance-field reconstruction.
Pixel2Mesh++ for multi-stage mesh refinement with dedicated preprocessing.
Hunyuan3D-2 for fast generative 3D comparison with varying inference budgets.

Reproducibility layer

Hydra organizes data, model and experiment configuration with clean overrides.
PyTorch Lightning unifies training flow, checkpoints and metric reporting.
Containerized execution and SLURM scripts reduce environment drift for benchmark runs.
Shared evaluation notebooks and pipelines make qualitative and quantitative comparison easier.

Discussion and interpretation

PixelNeRF wins because the learned implicit prior fills in unseen structure from two orthogonal views without collapsing volumetric overlap.
Pix2Vox is robust and visually plausible at a coarse level, but its 32 cubed voxel output limits thin structures and ornamentation.
Pixel2Mesh++ can sit close to the target surface while still failing to produce watertight pollen meshes, which is why its IoU lags behind.
Hunyuan3D-2 is impressive as a zero-shot comparator, yet the project framing still favors PixelNeRF when efficiency, repeatability and controlled benchmarking matter.

How the evaluation worked

All methods were pushed through one comparable pipeline with canonicalized pollen meshes, shared camera setups and aligned evaluation metrics.
The thesis evaluates reconstructions with Chamfer Distance, F-Score and IoU after common post-processing and alignment, instead of letting each model define its own success criteria.
The table values combine synthetic orthogonal pollen renders, augmentation experiments, view-count ablations and a transfer test on holographic inputs.
This matters because the final story is not only which model looks pretty, but which one stays credible across sparse data, biological shape variation and reproducible evaluation.