The HolyC Project
Fine-tuning TinyLlama-1.1B on the sacred language of TempleOS
Terry A. Davis spent 12 years building TempleOS — a 640×480 PC operating system he wrote alone, in a custom language he called HolyC. This project preserves that work in a different form: matching his video explanations with 120,933 lines of source code to train a language model that understands the unique syntax, idioms, and JIT directives of the only OS written for God.
Artifacts
Two datasets and one fine-tuned model, all open on Hugging Face.
120,933 lines of TempleOS source code paired with mock assistant conversations in ChatML format. Built by matching ~103 Terry Davis technical explanation videos with their corresponding source modules.
First-layer explanations of HolyC code with train and validation splits. Focused on code understanding and generation tasks — teaching models what HolyC code means, not just what it looks like.
Fine-tuned TinyLlama-1.1B-Chat-v1.0 adapted for the HolyC task. Trained for 1,915 steps with a cosine LR schedule. Achieved best loss of 0.4445 and peak token accuracy of 88.50% at step 530.
What is HolyC?
A C-like language with JIT compilation built into the OS kernel, written entirely by Terry Davis.
// HolyC — the language of TempleOS
U0 Hello()
{
"Hello, God!\n";
}
I64 Factorial(I64 n)
{
if (n <= 1) return 1;
return n * Factorial(n - 1);
}
// JIT compilation directive
#help_index "Math/Factorial"
// DolDoc inline graphics
"$FG,CYAN$TempleOS$FG$\n";
- U0 — void return type. No pointer, no header, no extern — just the name.
- I64 — 64-bit signed integer, the primary numeric type. Also U8, I8, U16, F64…
- Unparenthesized calls — function calls don't always need parens when there are no arguments.
- JIT compilation — code is compiled to x86-64 machine code at runtime by the OS itself.
- #help_index — inline documentation directives baked directly into the JIT compiler.
- DolDoc syntax — a markup language embedded in strings for colored terminal output, sprites, and links.
- No OS calls — HolyC is the OS. Ring 0 only. No privilege separation.
Data Pipeline
From Terry's video explanations and TempleOS source code to a fine-tuned model.
Training Metrics
Logged from step 10 through step 560 (epoch 1.46 of 1,915 total steps).
Training Details
Configuration and learning dynamics for the fine-tuning run.
| Parameter | Value |
|---|---|
| Base model | TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
| Total training steps | 1,915 |
| Logged steps | 560 (epoch 1.46) |
| LR schedule | Warmup → cosine decay |
| Peak learning rate | ~0.0002 (step 60) |
| Warmup steps | ~60 |
| Best checkpoint | step 530 (epoch 1.38) |
| Loss at best checkpoint | 0.4445 |
| Accuracy at best checkpoint | 88.50% |
| License | MIT |
Learning Rate Dynamics
The cosine warmup strategy is deliberate: an aggressive warm-up phase (steps 0–60) allows the model to escape poor local minima quickly, driving the peak LR to ~0.0002. The subsequent cosine decay enables fine-grained weight adjustments as the model converges. The largest gradient norm spikes appeared around epoch 0.1 — the model recovered from each without instability, a sign of a well-configured run. Peak loss improvement and peak accuracy both converged to exactly step 530, confirming clean generalization rather than memorization.
Dataset Construction
How the training data was assembled from primary sources.
Source Material
- ~103 Terry Davis technical explanation videos — transcribed and aligned to source modules
- 120,933 lines of TempleOS source code across all subsystems
- Matching was done by linking video topics to corresponding HolyC modules
Schema
- text — raw HolyC source code block
- formatted.messages — ChatML conversation with system prompt, user turn, and assistant (code) response
- System prompt: "You are Terry Davis, the creator of TempleOS. Write HolyC code in your unique style."
Future Work
Keystroke-level revision modeling. The next planned dataset extracts revision history from the archived templos.org and Sheikh's Place HTML archives — capturing the process of writing HolyC, not just the final output. Rather than training on completed code, future models could learn to write the way Terry wrote: incrementally, iteratively, and with intent.