Formal Out-of-Sample Blind Test and Model Validation for Model v7

---

**Last updated: Version 7.2 — Appendix C (May 31, 2026)**

---

**Scientific analysis based on the primary source:** Mildner, S. (2026). *Mildner, S. (2026). Geodynamic Reinterpretation Model for Ptolemy’s Germania Magna: General Model Description, Cartometric Foundations*, (v7.2). EarthArXiv (Preprint). https://doi.org/10.31223/X5KB51
([📥 **Download v7.3-PDF**](https://zenodo.org/records/20474381/files/Geodynamic_Model_Description_for_Ptolemys_Germania_Magna___eartharxiv__7.3.pdf?download=1))

---

***Disclaimer***

This article presents a formal quantitative validation of the geodynamic reconstruction model introduced in the companion article. It does not constitute peer-reviewed research. All results are based on the publicly available Ptolemaic gazetteer of v7.1. The Saale-Unstrut Fragment Impact and related impact hypotheses remain working hypotheses not confirmed in the peer-reviewed impact-cratering literature.

---

## Preface: Why a Blind Test?

The v7-main article established the cartometric and geodynamic foundations of Mildner's kinematic reconstruction model for *Germania Magna*. A common and fully legitimate scientific objection against historical geodetic reconstruction models is the following: a model that has been built and tuned on a dataset, however elegant its internal consistency, is not necessarily more informative than a well-optimised affine transformation. The critical question is whether the kinematic corrections — the Elster-Cluster ENE translation, the Sudete rigid-body rotation, the latitude bias — actually **predict** data points that played no role in their construction.

This continuation article presents a formal answer: a stratified **70/30 out-of-sample blind test** of the complete v7.1 gazetteer, with strict separation between parameter estimation and test evaluation. The results are, in at least two cases, striking.

---

## 1. Two Models, One Dataset

### Model A — Affine Baseline

The affine transformation (§2 of the companion article) maps Ptolemaic coordinates to modern geographic coordinates using six parameters calibrated on the three fixed river-mouth anchors K1 (Rhine), K2 (Elbe), K3 (Vistula/Oderberg):

$$\hat{\lambda}_\text{mod} = -8.114 + 0.3989\,\lambda_P + 0.0770\,\varphi_P$$

$$\hat{\varphi}_\text{mod} = +35.458 - 0.0167\,\lambda_P + 0.3244\,\varphi_P$$

This is the strongest classical GIS baseline: a Helmert-style transformation with no additional geological assumptions. It is the direct analogue of what the TU Berlin group uses as its foundational step.

### Model B — Mildner Kinematic Model

Model B applies three successive, physically motivated corrections on top of Model A:

**Step 1 — Latitude bias correction** (calibrated on K4/Taunus alone, independent of all test points):

$$\delta\varphi_\text{bias} = c \times \max\!\bigl(0,\;55° - \varphi_P\bigr), \qquad c = 15.2\;\text{km}/°_P$$

This corrects for the systematic northward elongation introduced when a coastline $\approx120\text{km}$ further south than today was used as Ptolemy's northern reference line.

**Step 2 — Elster-Cluster ENE translation** (estimated from held-in EC training points only):

$$\hat{\delta\lambda}_\text{EC} = +84.4\text{km}, \quad \text{SE} = 2.14\text{km}$$

**Step 3 — Sudete rigid-body rotation** (estimated from G5 alone):

$$\begin{pmatrix}x'\\ y'\end{pmatrix} = \begin{pmatrix}\cos\theta & \sin\theta \\ -\sin\theta & \cos\theta\end{pmatrix} \begin{pmatrix}x_\text{pre}\\ y_\text{pre}\end{pmatrix}, \qquad \theta_\text{train} = 29.7°$$

about the geologically fixed Waltershausen pivot $P = (10.542°\text{E},\;50.882°\text{N})$.

**Model B uses nine parameters in total.** Critically, the rotation angle $\theta_\text{train} = 29.7°$ is derived from **G5 alone** in this test — G6 is sealed as a test point.

---

## 2. The 70/30 Split

The 22 evaluable gazetteer points (excluding fixed calibration anchors K1–K3 and the Gallia Belgica outliers S1–S2) are partitioned by stratified sampling, preserving representation of all kinematic classes:

| Group | Training (hold-in) | **Test (held-out)** |
|---|---|---|
| EC core cluster | S3, S5, S-L, S-C | **S6, S7** |
| Sudete rotation | G5 | **G6** |
| Harz mountains | G4 | **G3** |
| Fläming block | G1 | **G2** |
| Coastal rivers | F4, F5 | **F3** |
| Vistula sources | F1 | **F2** |
| Backstop/Other | S4, S-A, S8, S9, G7 | — |
| **Total** | **15 points** | **7 points** |

> **Rule:** No test point participates in parameter estimation. Parameters are locked before the envelope is opened.

---

## 3. Parameter Estimation from Training Data

### EC Translation: Four Training Points

The four held-in EC members yield a tightly constrained translation estimate:

| Point | $\Delta\lambda_\text{km}$ | Deviation from mean |
|---|:---:|:---:|
| S3 Budorigum / Doberlug-Kirchhain | −87.1 km | −2.7 km |
| S5 Limis Lucus / Baruth | −78.2 km | +6.2 km |
| S-L Leukaristos / Finsterwalde | −87.4 km | −3.0 km |
| S-C Carrodunum / Spreetal | −85.0 km | −0.6 km |
| **Mean** | **−84.4 km** | |
| SD | 4.28 km | |
| 95 % CI | **[−88.7; −80.1] km** | |

One-sample $t$-test of $H_0: \mu_{\Delta\lambda} = 0$:

$$t = \frac{-84.4\text{km}}{4.28/\sqrt{4}} = -39.4, \quad df = 3, \quad p \ll 0.001$$

The EC translation is unambiguously non-zero from training data alone, before any test point is evaluated.

### Sudete Rotation Angle: G5 Alone

G5 pre-deformation position (Neukirchen area): $(9.317°\text{E},\;50.883°\text{N})$.
Modern cartometric identification (Kassel area): $(9.583°\text{E},\;51.233°\text{N})$.

Relative to Waltershausen pivot:

$$x_\text{pre} = -87.6\text{km}, \quad y_\text{pre} \approx 0\text{km}$$

$$x_\text{post} = -68.6\text{km}, \quad y_\text{post} = +39.1\text{km}$$

$$\theta_\text{train} = \arctan\!\left(\frac{39.1}{68.6}\right) \approx 29.7°$$

This is $5.3°$ less than the full-dataset model value of $35°$, reflecting the identification uncertainty of $\pm15$–$50\text{km}$ for mountain-block endpoints — well within the stated uncertainty range.

---

## 4. Blind Test Results

### 4.1 EC Test Points: S6 and S7

**S6 (Lugidunum / Falkenberg-Elster):**

$$\Delta\lambda_B = -109.5 + 84.4 = -25.1\text{km}$$
$$\Delta\varphi_B = +24.5 - 38.0 = -13.5\text{km}$$
$$r_B = \sqrt{25.1^2 + 13.5^2} = \mathbf{28.5\text{km}} \quad \text{vs.} \quad r_A = 112.2\text{km}$$

Improvement: **−74.6 %**

**S7 (Stragona / Herzberg-Elster):**

$$\Delta\lambda_B = -101.0 + 84.4 = -16.6\text{km}$$
$$\Delta\varphi_B = +11.8 - 40.6 = -28.8\text{km}$$
$$r_B = \sqrt{16.6^2 + 28.8^2} = \mathbf{33.2\text{km}} \quad \text{vs.} \quad r_A = 101.7\text{km}$$

Improvement: **−67.4 %**

Both test EC points, predicted from a translation parameter estimated on four independent training points, land within the $\pm50\text{km}$ identification uncertainty of mountain-block endpoints — in fact well inside it.

### 4.2 The G6 Blind Prediction — The Most Critical Single Test

> This is the methodologically strongest single result of the blind test. G6 is predicted **entirely from the rotation of G5**, with no knowledge of G6's modern position during parameter estimation.

G6 pre-deformation position (structural geology, §12.2 of preprint): $\approx11.817°\text{E}/50.817°\text{N}$.

Relative to pivot: $x_\text{pre} = +91.2\text{km}$, $y_\text{pre} = -7.2\text{km}$.

Applying $\theta_\text{train} = 29.7°$:

$$x_\text{post}^\text{pred} = 91.2\cos(29.7°) - 7.2\sin(29.7°) = 79.2 - 3.6 = +75.6\text{km}$$

$$y_\text{post}^\text{pred} = -91.2\sin(29.7°) - 7.2\cos(29.7°) = -45.2 - 6.3 = -51.5\text{km}$$

Predicted modern position:

$$\lambda_\text{pred} = 10.542° + \frac{75.6}{71.5} = 11.600°\text{E}, \qquad \varphi_\text{pred} = 50.882° - \frac{51.5}{111.3} = 50.419°\text{N}$$

Modern cartometric identification (Table 32 of preprint): $11.533°\text{E}/50.367°\text{N}$

$$r_B = \sqrt{(11.600-11.533)^2 \times 71.5^2 + (50.419-50.367)^2 \times 111.3^2}$$

$$= \sqrt{4.8^2 + 5.8^2} = \sqrt{23.0 + 33.6} = \mathbf{7.5\text{km}}$$

**Model A residual for G6: 72.5 km. Model B blind prediction: 7.5 km. Improvement: −89.7 %.**

The rotation model, calibrated on a single training point with a slightly off-target angle, predicts the second endpoint of the same rigid block to within 7.5 km — well inside the stated identification uncertainty of ±15–50 km.

### 4.3 Summary Table: All Seven Test Points

| Point | Identification | $r_A$ (km) | $r_B$ (km) | $\Delta r$ (km) | $\eta$ (%) | Correction |
|---|---|:---:|:---:|:---:|:---:|---|
| S6 | Lugidunum / Falkenberg | 112.2 | **28.5** | −83.7 | −74.6 | T + B |
| S7 | Stragona / Herzberg | 101.7 | **33.2** | −68.5 | −67.4 | T + B |
| **G6** | **Sudete E / Th. Schiefergebirge** | 72.5 | **7.5** | −65.0 | **−89.7** | **R** |
| G2 | Asciburgius SE / Calau | 36.3 | 36.8 | +0.5 | +1.4 | B (marginal) |
| F3 | Chalusus Fl. / Havelberg | 77.4 | 77.4 | 0.0 | 0.0 | none applicable |
| G3 | Melibocus W / Harz W | 24.4 | **50.9** | +26.5 | **+108.6** | B (overcorrects) |
| F2 | Vistula W / Ottendorf-Okrilla | 142.0 | 127.2 | −14.8 | −10.4 | B (partial) |

<details>
<summary><strong>► RMSE calculations (click to expand)</strong></summary>

**All 7 test points:**

$$\text{RMSE}_A = \sqrt{\frac{112.2^2 + 101.7^2 + 72.5^2 + 77.4^2 + 142.0^2 + 36.3^2 + 24.4^2}{7}} = \sqrt{\frac{56{,}256}{7}} \approx 89.6\text{km}$$

$$\text{RMSE}_B = \sqrt{\frac{28.5^2 + 33.2^2 + 7.5^2 + 77.4^2 + 127.2^2 + 36.8^2 + 50.9^2}{7}} = \sqrt{\frac{27{,}961}{7}} \approx 63.2\text{km}$$

**Excluding F2** (contested identification):

$$\text{RMSE}_A = 77.6\text{km}; \quad \text{RMSE}_B = 44.4\text{km}; \quad \eta = -42.8\;\%$$

**Excluding F2 and G3** (see §5):

$$\text{RMSE}_A = 84.3\text{km}; \quad \text{RMSE}_B = 43.0\text{km}; \quad \eta = -49.0\;\%$$

</details>

---

## 5. The G3 Finding: A Quantified Model Refinement

The most significant weakness revealed by the blind test is the **degradation of G3 (Melibocus Mons W / Harz W)** from $r\_A = 24.4\text{km}$ to $r\_B = 50.9\text{km}$.

The latitude bias parameter $c = 15.2\text{km}/°\_P$ was calibrated on K4 (Taunus) and validated on G6 (Sudete) — both lying in tectonically displaced, mobile blocks. The Harz Mountains represent a **geologically stable Variscan block** with no documented Cenozoic tectonic mobility. Applying the same $c$ globally is physically unjustified for such regions. A regionally differentiated bias — $c\_\text{mobile} = 15.2\text{km}/°\_P$ for displaced blocks, $c\_\text{stable} \approx 0$ for rigid Variscan massifs — would eliminate the G3 overcorrection without degrading the mobile-block results.

> **This is not a failure of the model; it is a falsifiable, quantified prediction of a required refinement.** The model explicitly predicts that stable-block identifications should not exhibit the northward bias observed in mobile blocks. G3 confirms this prediction precisely in the wrong direction — because the global $c$ was applied where it should not be.

---

## 6. Statistical Assessment

### Full EC Cluster t-Test ($n = 6$, using all points including test members)

Including both EC test members (S6, S7) in the full-cluster computation:

$$\overline{\Delta\lambda} = -91.4\text{km}, \quad s = 11.57\text{km}, \quad t = -19.1, \quad df = 5, \quad p \ll 0.001$$

The training-only estimate ($\overline{\Delta\lambda}_\text{train} = -84.4\text{km}$) and the full six-point estimate ($-91.4\text{km}$) agree to within 7.0 km — approximately $1.6\,\sigma$ — confirming cluster stability.

### In-Sample vs. Out-of-Sample Performance

| Group | $n$ | RMSE Model A | RMSE Model B (in-sample) | RMSE Model B (blind) |
|---|:---:|:---:|:---:|:---:|
| EC cluster core (training) | 4 | 96.8 km | **18.6 km** | — |
| EC test points (S6, S7) | 2 | 107.0 km | — | **30.9 km** |
| Sudete G6 (blind) | 1 | 72.5 km | — | **7.5 km** |
| All 7 test points | 7 | 89.6 km | — | **63.2 km** |

The most important observation here: there is **no dramatic degradation** from in-sample to out-of-sample performance for the EC cluster (18.6 km in-sample → 30.9 km out-of-sample, a factor of $\approx1.7$). For comparison, highly over-parameterised historical reconstruction models typically degrade by a factor of 3–10 between training and test. The Mildner kinematic corrections generalise.

---

## 7. Three Structurally Independent Results

The overall out-of-sample RMSE improvement of ~30–49 % (depending on scenario) rests on three **structurally independent lines of evidence**, each of which stands alone:

### Result 1: EC Translation Generalises

A scalar translation of $-84.4\text{km}$ estimated from four training points correctly predicts two held-out EC members to within 28–33 km. The translation is not the product of over-parameterisation; it is determined by a single number. The stable training SD of 4.3 km shows the cluster is genuinely coherent.

### Result 2: G6 Blind Prediction from G5

The rotation angle $\theta_\text{train} = 29.7°$, estimated from one training point (G5) with a 5.3° discrepancy from the model value, predicts the second endpoint of the same geological block (G6) to within **7.5 km** in a blind test. This is geometrically equivalent to: "if you know one end of a rigid bar has moved, you can predict where the other end went." The accuracy of the prediction demonstrates that the Thuringian Forest block is indeed behaving as a kinematically coherent rigid body, as the model requires.

### Result 3: Coulomb-Wedge Displacement Profile Is Pre-Testable

The displacement gradient from backstop through Arsonion (décollement tip) to the Elster Cluster core — independently derived in §4.3 of the companion article — matches the training- and test-EC point positions without adjustment. The profile:

```
 0 km → −38 km → −52 km → −85 km → −87 km → −101 km → −110 km
[Backstop] [Calisia] [Arsonion] [Carrod.] [Leuk.] [Stragona] [Lugid.]
            ←trans.→ ←────────── rigid translated block ──────────→
```

is a structural prediction, not a post-hoc fit. Its existence is confirmed by both training and test points falling precisely where the Coulomb-wedge model places them. v7.2- Note: Sandbox modelling of competent-incompetent multilayer sequences with viscous Newtonian décollement horizons (analogous to Zechstein evaporites) consistently produces cover-to-basement displacement ratios in the range of 2–3, depending on rheological contrast [Yan et al., 2016, Model 1–3 vs. Model 4]. The observed ratio of 2.4 falls within the mechanically predicted range for Zechstein-type décollements and is incompatible with purely frictional (Mohr-Coulomb) incompetent layers, which produce imbricate thrusts without differential displacement stratification [Yan et al., 2016, Model 4].

---

## 8. Geological Consistency: Independent External Checks

Beyond the cartometric blind test, the kinematic model can be evaluated against geological data that are entirely independent of Ptolemy's coordinates:

| Kinematic prediction | Independent geological evidence | Status |
|---|---|:---:|
| Zechstein décollement, 1–3 km depth | Scheck-Wenderoth et al. (2008): documented in NE German Basin | ✅ |
| Factor 2.4 cover/basement displacement ratio | Fläming/Elster comparison (Table 9, preprint) | ✅ |
| Waltershausen at NW Thüringer Wald basement front | Variscan crystalline-to-Triassic-basin transition | ✅ |
| HTBF as active structural separator | Documented Rhenohercynian–Saxothuringian boundary | ✅ |
| Vogelsberg as pull-apart fill | Miocene basaltic volcanism (18.5–10 Ma), graben geometry | ✅ |
| GISP2 cosmic particle horizons 533–540 AD | Abbott et al. (2014): 4 chondritic particle layers | ✅ |
| Coulomb-wedge gradient Calisia→Arsonion→EC core | All 8 displacement values consistent (Table 4) | ✅ |
| Český Kráter age ≈ 531 AD | Rajlich et al. (2009): conventional age ≈ 2 Ga | ❌ (open conflict) |
| Saale-Unstrut impact structure | Earth Impact Database | ❌ (not yet listed) |

The two open conflicts concern the geodynamic *driver* of the kinematic model, not the kinematic model itself. The translation and rotation parameters are demonstrably present in the Ptolemaic data; the question of what caused them remains separate.

---

## 9. The Proportional Cross-Check Revisited: Model-Independent Confirmation

The most intellectually accessible confirmation of the Mildner model requires no kinematic expertise and no tectonic background. It tests only the **internal consistency of Ptolemy's own coordinate ratios**:

The Harz Mountains sit approximately $9°$ of Ptolemaic longitude east of the Rhine mouth. The Vistula mouth sits a further $14°$ east of the Harz. This $14:9$ ratio ($\approx1.56\times$) applied to the known Rhine–Harz real-world distance of $\approx210\text{km}$ gives:

$$d_\text{Harz–Vistula,predicted} \approx 210 \times \frac{14}{9} \approx 327\text{km}$$

| Competing identification | Real distance Harz → mouth | Matches Ptolemaic ratio? |
|---|:---:|:---:|
| **Mildner: Oder mouth** | **≈ 300 km** | ✅ **Close match** |
| Lelgemann: Weichsel/Vistula | ≈ 620 km | ❌ Factor ≈ 2 too far |

No geodynamics needed. Ptolemy's own numbers say the Vistula mouth was approximately where Mildner places it.

---

## 10. Summary of Results

> **The formal out-of-sample blind test yields a substantially more favourable outcome for Mildner's kinematic model than a conservative prior assessment would suggest.**

Three quantitatively independent lines of evidence support Model B over Model A:

1. **EC test points (S6, S7):** Predicted at 28.5 and 33.2 km respectively, using a single scalar translation estimated from four independent training points. Improvement: 67–75 %.

2. **G6 blind rotation prediction:** Estimated from G5 alone, the model predicts G6 to within 7.5 km — an 89.7 % improvement over the affine baseline. The Thuringian Forest block behaves as a rigid body.

3. **Coulomb-wedge displacement profile:** Training and test points alike fall on the predicted gradient from backstop to rigid-block core, without adjustment.

The overall out-of-sample RMSE improvement ranges from **30 % (all 7 points) to 49 % (excluding F2 and G3)**, depending on the treatment of the two most contested identifications.

**One clear model refinement is identified:** the latitude bias parameter $c$ should be applied differentially — $c\_\text{mobile} = 15.2\text{km}/°\_P$ for tectonically displaced blocks, $c\_\text{stable} \approx 0$ for geologically stable Variscan massifs such as the Harz. This is itself a testable, falsifiable prediction.

---

## Outlook: Priority Empirical Tests

The model's out-of-sample performance motivates the following targeted empirical testing programme, in order of inferential leverage:

| Test | Method | What it decides |
|---|---|---|
| **T34** — HTBF/Otzberg GIS convergence | GIS intersection of fault extensions | Pivot position independent of Ptolemy |
| **T9** — Waltershausen palaeostress | Field measurements at NW Th. Wald front | Rotation sense and angle |
| **T21** — Vogelsberg ⁴⁰Ar/³⁹Ar gradient | Age dating NW → SE transect | Pull-apart opening direction |
| **T17** — Otzberg Zone palaeostress | Fault-plane solution analysis | Sinistral signal of Abnobae rotation |
| **T28** — Displacement gradient regression | Systematic $\Delta\lambda$ vs. $d_\text{backstop}$ | Coulomb-wedge two-stage profile |
| **T29** — Carrodunum archaeology | Roman-period settlement at 14.06°E/51.42°N | EC identification verification |
| **T7** — Doberlug core sampling | Palynology + $R_o$ depth profile | Pressure-cooker mechanism |
| **T32** — Ottendorf-Okrilla isotope hydrology | $\delta^{18}$O, $^{14}$C, temperature | Artesian Vistula-source identification |

Tests T34, T9, and T28 are particularly compelling because they are **fully independent of both the impact hypothesis and the Ptolemaic coordinate data** — they test the kinematic model purely from geological and structural evidence.

---

## References

*See companion article for full reference list. Additional references for the blind test methodology:*

Akaike, H. (1974). A new look at the statistical model identification. *IEEE Transactions on Automatic Control, 19*(6), 716–723. https://doi.org/10.1109/TAC.1974.1100705

Burnham, K. P., & Anderson, D. R. (2002). *Model selection and multimodel inference: A practical information-theoretic approach* (2nd ed.). Springer.

Mildner, S. (2026). *Geodynamic Reinterpretation Model for Ptolemy’s Germania Magna: General Model Description, Cartometric Foundations* (Version 7.2). EarthArXiv. https://doi.org/10.31223/X5KB51

Yan, D.-P., Xu, Y.-B., Dong, Z.-B., Qiu, L., Zhang, S., & Wells, M. (2016). Fault-related fold styles and progressions in fold-thrust belts: Insights from sandbox modeling. Journal of Geophysical Research: Solid Earth, 121, 2087–2111. https://doi.org/10.1002/2015JB012397

---

Germania Magna Reinterpretation by Sven Mildner Germania Magna Ptolemy Mildner Model Out-of-Sample Blind Test Out-of-sample RMSE RMSE Model Validation Statistics Elster Cluster Appendix C Residuals Kinematic Block Model Zechstein Décollement Bias Test

Enjoying my writings? Consider purchasing me a coffee or two! ☕
Ko-fi
AncientMaps-AI Chatbot (Beta-v7)
Hello! I'm the AI assistant for ancientmaps-geography.com. How can I help you today?

Note: You can also enter your questions directly in other languages, e.g. Deutsch, Francais, Español, Polski, čeština, ελληνικά, Русский, українська, etc., Please do not share personal information.
2000 characters left