TRIAL
Both direct SVG and TikZ-via-node-tikzjax work for generating HKDSE 3D geometry diagrams from structured diagramSpec JSON. The key insight: because we feed LLMs well-structured specs—not freeform prompts—our accuracy far exceeds the ~73.9% benchmark for general K-12 diagram generation3. Recommendation: start with SVG for speed, validate with TikZ for academic polish. Do not use image-generation AI (DALL-E, Midjourney) for geometric accuracy—they cannot reliably produce labelled vertices, angle marks, or dashed hidden edges.
| Dimension | Rating |
|---|---|
| Technology type | Pipeline (LLM code generation + rendering) |
| Maturity | Beta — our narrow use case makes this viable |
| Documentation | Adequate (TikZ: excellent8; node-tikzjax: basic7; SVG: web standard) |
| Community | Established (TikZ) / Growing (LLM diagram generation) |
| Adoption | Early adopter for LLM-to-diagram pipeline |
| Use Case | Fit | Notes |
|---|---|---|
| Talent Coop 3D diagrams | STRONG | Structured specs, finite geometry types |
| General math worksheets | MODERATE | 2D geometry + graphs need separate templates |
| Non-geometry diagrams | WEAK | Flow charts, Venn diagrams—different problem space |
Should Eric learn this now? Yes. Time to basic competence: 2–4 hours. Key risk: edge cases with complex auxiliary constructions (perpendicular feet, parametric constraints).
Eight approaches exist for generating mathematical diagrams programmatically. Only two are practical for our use case: LLM-generated SVG and LLM-generated TikZ rendered via node-tikzjax. The rest fail on either output format, dependency weight, or problem fit.
| Approach | Output | Quality | Deps | Fit |
|---|---|---|---|---|
| LLM → SVG (Direct) | SVG | 4/5 | Zero | STRONG |
| LLM → TikZ → SVG (node-tikzjax) | SVG | 4.5/5 | npm pkg | STRONG |
| DeTikZify1 | TikZ | 5/5 | GPU | WRONG FIT |
| Three.js + SVGRenderer10 | SVG | 3.5/5 | Browser | OVERKILL |
| GeoGebra API9 | Bitmap | 4/5 | JVM | DEAD END |
| Asymptote | SVG/PDF | 4.5/5 | TeX install | HEAVY |
| Manim | MP4/PNG | 5/5 | Python + FFmpeg | OVERKILL |
| DiagramIR2 | Eval only | N/A | Python | EVAL TOOL |
The LLM generates raw SVG markup from a diagramSpec JSON. Zero dependencies. Full control over styling, label placement, and dashed/solid line rendering. Output is larger (~2–4KB per figure) but embeds directly in HTML. Uses system fonts (no LaTeX typography). For structured specs with explicit vertex coordinates, this produces clean, accurate output on the first attempt.
The LLM generates TikZ code, which is rendered server-side to SVG via node-tikzjax7. Inherits LaTeX's Computer Modern fonts, giving diagrams an academic look that teachers recognize. The TikZ 3D library8 provides native support for oblique projections. Trade-off: node-tikzjax is single-threaded (no concurrent renders) and requires an npm dependency.
A specialized model trained on 360K+ TikZ–image pairs1. Outperforms GPT-4V and Claude on figure-to-TikZ conversion tasks. However, it solves the inverse problem: raster image → TikZ code. Our pipeline goes spec → code, so DeTikZify is architecturally misaligned. It also requires GPU inference.
Browser-native 3D rendering with SVG output10. Supports interactive rotation, which is appealing for web demos. But for static worksheet diagrams it introduces unnecessary complexity: a full 3D scene graph, camera setup, and lighting for what amounts to a wireframe with labels.
GeoGebra's Apps API9 provides programmatic 3D geometry construction. The fatal flaw: 3D view export is bitmap-only (PNG). No SVG export path exists for 3D scenes. Dead end for print-quality vector output.
Asymptote produces publication-grade vector graphics but requires a full TeX installation. Manim (3Blue1Brown's engine) is animation-focused—designed for explanatory videos, not static exam figures. Both are significantly over-engineered for our needs.
An evaluation framework2 for scoring LLM-generated diagrams against reference images. Useful for systematic quality assessment but does not generate diagrams. Could be adopted later to score our pipeline output at scale.
We ran a controlled spike with 3 HKDSE past-paper figures, generating each via both SVG and TikZ pipelines from identical diagramSpec inputs. Full results at research.ericsan.io/talentcoop_diagram_spike.html.
| Test Case | Source | SVG | TikZ | Notes |
|---|---|---|---|---|
| Regular Tetrahedron | 2012 Q40 | Pass | Pass | TikZ slightly better typography |
| Rectangular Box | 2016 Q39 | Pass | Pass* | SVG better layout; TikZ missing angle arc |
| Perpendicular-to-plane | 2014 Q40 | Pass | Pass | SVG has better spatial depth cues |
diagramSpec dramatically improves LLM accuracy versus freeform prompts. All 6 renders were usable on first generation. The spec provides explicit vertex coordinates, edge lists, label positions, and style hints (dashed for hidden edges)—leaving the LLM to handle only code syntax, not spatial reasoning.
The rectangular box case (8 vertices + 2 auxiliary points M and N) was the most complex. SVG handled it cleanly because we could control absolute positioning. TikZ produced correct geometry but missed a small angle arc at vertex A—a prompt refinement issue, not a fundamental limitation.
Stanford's Spring 2025 evaluation found GPT-4o and Claude 3.5 Sonnet achieve only 73.9% accuracy on K-12 diagram generation tasks3. Separately, geometric element recognition by LLMs scores just 53% accuracy on standard benchmarks4. Recent SVG-specific evaluations5 confirm that while LLMs can produce syntactically valid SVG, spatial accuracy degrades with complexity.
| Constraint | Impact | Mitigation |
|---|---|---|
| node-tikzjax: single-threaded | No concurrent renders; ~1–2s per figure | Queue renders; acceptable at our scale (<50 figures/batch) |
| LLM non-determinism | Same prompt may yield different code | Use temperature=0; seed parameter where available |
| Complex auxiliary constructions | Perpendicular feet, angle bisectors may misplace | Pre-compute coordinates in diagramSpec |
| Cost | Negligible: ~200–400 tokens per figure | No mitigation needed |
DiagramIR2 proposes a multi-dimensional evaluation (spatial accuracy, label placement, style adherence), but no benchmark yet tests the structured-spec paradigm. Our results are promising but n=3 is not statistically significant. The hardest HKDSE cases—tetrahedra with ground-plane elevation angles, truncated pyramids with internal diagonals—remain untested.
The honest risk: We tested 3 simple cases. The hardest HKDSE 3D geometry (2024 tetrahedron-on-ground with elevation angles, 2019 frustum with internal diagonal) haven't been tested. The pipeline's ceiling is unknown. Pre-computing coordinates in the diagramSpec mitigates most failures, but at some point the spec itself becomes harder to write than the diagram.
The production path uses SVG as the primary output format, with TikZ as an optional upgrade for academic contexts.
| Step | Input | Output | Tool |
|---|---|---|---|
| 1. Extract spec | Competency map JSON | diagramSpec | Manual / LLM-assisted |
| 2. Generate code | diagramSpec + prompt template | SVG markup or TikZ code | Claude API (temp=0) |
| 3a. SVG path | SVG markup | Embedded in HTML | Direct file write |
| 3b. TikZ path | TikZ code | SVG via rendering | node-tikzjax7 |
| 4. Quality check | Generated SVG | Pass/fail | Manual review vs BrightMind reference |
Node.js 18+, node-tikzjax (npm)7, Claude API key or Cursor agent. No TeX installation required—node-tikzjax bundles its own WASM-compiled TeX engine.
The spec is a JSON object with explicit vertex coordinates (projected to 2D), edge lists with visibility flags, label positions, and optional decorations (angle arcs, right-angle marks, dashed lines). The LLM translates this to SVG or TikZ—it does not compute geometry.
TRIAL. The structured-diagramSpec pipeline is production-viable for Talent Coop's HKDSE 3D geometry figures. Start with SVG. Add TikZ only if teacher feedback demands academic typography.
Immediate next steps:
1. Generate all 11 DSE14 3D geometry figures using the SVG pipeline.
2. Score output against Renee's Scribd reference samples for teacher acceptability.
3. Identify failure cases—which figure types break the pipeline?
4. If ≥9/11 pass: ship SVG pipeline. If <9/11: add TikZ fallback for failed cases.
Do NOT invest in: DeTikZify (wrong direction—image-to-code, not spec-to-code), Asymptote (heavy TeX dependency), Manim (animation, not static figures), GeoGebra 3D (bitmap-only export).
Watch: DiagramIR2 for automated quality scoring once we have >20 figures. SVG generation benchmarks5 for improving prompt templates.