| pkgs | ||
| sims | ||
| viz | ||
| .cz.toml | ||
| .editorconfig | ||
| .gitignore | ||
| .licensesnip | ||
| .pre-commit-config.yml | ||
| .python-version | ||
| Cargo.lock | ||
| Cargo.toml | ||
| cog.toml | ||
| LICENSE | ||
| pyproject.toml | ||
| README.md | ||
| rustfmt.toml | ||
| uv.lock | ||
hop: All-Dielectric 3D FDTD Solver w/GPU Acceleration
hop solves Maxwell's equations, for arbitrary dielectric mediums in a Yee-discretized 3D domain.
Warning
This project is currently in a prototype state. Proceed at your own risk!
Development
Development Tools
- (optional)
prekis suggested to make it easy to conform to all project rules, via pre-commits. - (optional)
cocogittois suggested to ease creating conformant commit messages, conformant changelog entries, and standardized release processing. - (optional)
licensesnipis suggested to make adding correct license headers easy.
Release Process
Releases shall proceed in the following fashion:
- Execute
cog bump --auto(nothing may be staged). This performs a few checks ex.prekon all files and a passing test-suite. - Validate that the generated tags, changelog, and bump-commit are appropriate and as expected. If anything is wrong, this is the last chance to revert.
- Do
git push origin mainandgit push origin --tags. - Publish docs, packages, create a Forgejo release
TODO. Perhaps a
Justfilethat validates the GPG signature, builds and publishes the docs to locally defined destination, builds and publishes any packages, and creates a Forgejo release using the changelog and a custom string entry - maybe even publishes a blog post somewhere too, with the custom string entry?
TODO
In-Scope
-
Bounds
- Naive
-
Sources
- EPointDipole
-
Structures
- Cuboid
- Sphere
-
Mediums
- Vacuum
- SimpleDielec
- CPML
-
Dispersive Mediums: These are "trivial" in the sense that it's "just" a function that takes aux fields / data that we've already moved into place for it. NOTE: We may want to reconsider how aux fields are updated with subpixel averaging. Also, could PMLs be made better with subpixel averaging?
-
Inhomogeneous Mediums: Requires some kind of sampling within the single-medium structure.
- Frankly, this is just another aux-field thing. No solver logic needed.
-
Nonlinear Mediums: Weirdly enough, nothing special in the solver. The medium will be doing all kinds of special stuff; namely, solving an auxiliary differential equation with substeps.
Fanciness
-
Bounds
- PEC
- PMC
- Bloch(k=0)
- Bloch(k!=0)
-
Sources
- GaussBeam
- PlaneWave
- TFSF: Not really a source, but likely best expressed as one. Think on this.
-
Structures
- Cylinder
- TriMesh
- VoxCloud
- Really just everything from https://iquilezles.org/articles/distfunctions/ .
- Also some choice common structures ex. a ring resonator.
-
Mediums
- SimpleChiral
- Lorentz
- Drude
- DrudeLorentz
- Debye
-
Medium Enhancements:
- General: Medium built from a models of P, J_f, M.
- We then split up "how do I model mediums" into "how do I model P, J_f, M".
- We keep existing mediums as-is, since they are sensible "special cases".
- Animated: Requires recomputing kernel layout on every time step with altered parameters.
- Nonlinear: This is just a P model, for use in the General medium.
- Chiral: This is just a P model, for use in the General medium. Only kink is that we need an approximation of the local value of the dual field, since chirality is fundamentally a crossover effect.
- Spatial Variation: This is just a P model, for use in the General medium.
- General: Medium built from a models of P, J_f, M.
-
Bloch periodic bounds: Much like NaivePeriodic, except a phase shift is applied, forcing fields to be complex. This is the only correct method of doing periodic boundary conditions - in fact, periodic sim results require a photonic band gap computation.
- There is a split-field formulation that allows keeping all the same real sim logic. F_real and F_imag are both kept, real and imag parts of a bloch mode - and are both updated independently (I don't know how sources fit in). Boundaries then enforce crossover, aka. phase shifting.
- That does require running the sim twice per time step, effectively. Anyway, the problem is well studied.
- See: https://arxiv.org/pdf/2007.05091
-
Subpixel Averaging: Integrate by sampling medium at various points (if it's varying). For two-medium regions, integrate by multiplying sample-point-wise SDF with sample-point-wise medium.
- Can we use Lines cretively to do this very efficiently? An interesting thought.
-
Subgrids: Needed for TFSF, but also generally useful.
-
Structure Kernel Layout Optimization: Directly use SDFs to find label cells without binning, and use the standard binned algo first. Then, refine the solution by annealing, using a cost function that rewards finding larger z-stripes.
- Experiment with loading data into Lines more aggressively, esp. in structure vectors. Each kernel thus has more data to load - and much will be strided and inefficient, yes - but can also do its actual calculations very quickly. In particular, using Z lines
References
FDTD General
- UTAH Lecture Notes: https://my.ece.utah.edu/~ece6340/LECTURES/lecture%2014/FDTD.pdf
- Schneider FDTD Book
CLI
- The Rust CLI Book: https://rust-cli.github.io/book/resources/index.html
clap
clapConfig Files https://github.com/bodo-run/clap-config-file
Boundaries
- Systematic Review of Periodic Structures: https://arxiv.org/pdf/2007.05091
Structures
- SDF Functions: https://iquilezles.org/articles/distfunctions/
- Photonic Crystal Waveguides: https://opg.optica.org/oe/abstract.cfm?uri=OE-12-2-234
PMLs
- The original PML paper
- MEEP Paper on PMLs: https://math.mit.edu/~stevenj/papers/OskooiJo11.pdf
- MEEP Docs on PMLs: https://github.com/NanoComp/meep/blob/master/doc/docs/Perfectly_Matched_Layer.md
- The CPML Paper.
- The lecture notes I was sent.
- IIR Filters: https://en.wikipedia.org/wiki/Infinite_impulse_response#Implementation_and_design
- Notes on PMLs: https://arxiv.org/pdf/2108.05348
Radiation BCs
- Surface Integral Representation: https://ieeexplore.ieee.org/document/237619
Mediums
- Chiral Mediums: https://www.nature.com/articles/s41377-020-00367-8
- Nonlinear Mediums: https://meep.readthedocs.io/en/master/Units_and_Nonlinearity/
zarrs
- Repository: https://github.com/zarrs/zarrs
- Simple Example: https://github.com/zarrs/zarrs/blob/main/zarrs/examples/array_write_read.rs
zarrsBook: https://book.zarrs.devzarrsDocs: https://docs.rs/zarrs/
Inspiration from Other Software
-
fdtdx: https://github.com/ymahlau/fdtdx- This one is very interesting. All in Python, fully
jaxbased, autodiff. - Paper: https://arxiv.org/html/2412.12360v2
- This one is very interesting. All in Python, fully
-
tidy3d: https://docs.flexcompute.com/projects/tidy3d/en/latest/index.html -
Comparison between Lumerical and Tidy3D: https://arxiv.org/html/2506.16665v2
Interesting Books
GPU Programming
- CUDA Instruction Latency: https://ar5iv.labs.arxiv.org/html/1905.08778 Parallel
cubecl
- CubeCL Book (incomplete): https://burn.dev/books/cubecl/overview.html
cubeclDocs: https://docs.rs/cubeclcubecl_strDocs: https://docs.rs/cubecl-std/0.8.1/cubecl_std/cubecl_convolutionDocs: https://docs.rs/cubecl-convolution/0.8.1/cubecl_convolution/cubecl_matmulDocs: https://docs.rs/cubecl-matmul/0.8.1/cubecl_matmul/
- Something that would be incredible is a way for the user to construct a function in a higher level language (ex. Python), which then compiles to CubeCL IR, allowing the user to pass an honest-to-god function to a GPU kernel.
Tribal Knowledge
Hard-earned lessons that were not obvious!
CUDA Shuffle Instructions
- Shuffle Instruction: https://people.maths.ox.ac.uk/gilesm/cuda/lecs/lec4.pdf
- On Shuffle Instruction Performance: https://stackoverflow.com/questions/78496636/when-is-shfl-sync-idx-fast
- Concurrent Kernels: https://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf
- We can do concurrent kernels - it almost seems to a degree that this is good enough to allow us to bake a computational graph library, if we so wished.
cubecl
- (2025-12)
debug_print!does not work with floats - (2025-12) The only way to cast from integers to floats in a kernel is
F::cast_from(). - (2025-12) The slice referenced by
Byteswhen reading out an array is not obviously aligned to the original type, but when you check it, then it is in fact correctly aligned.- The best we can do is ask
Bytesfor a&[u8], but this may invoke a copy - I'm not certain it's possible to remove that copy, even with CPU memory. - To stay sane, we've settled on using
bytemuckto cast that byte-slice to the original type, then letting it be copied into an ordinary owned Vec. This is where the alignment being correct matters a great deal.
- The best we can do is ask