🤗 Hugging Face Release

The HolyC Project

Fine-tuning TinyLlama-1.1B on the sacred language of TempleOS

Terry A. Davis spent 12 years building TempleOS — a 640×480 PC operating system he wrote alone, in a custom language he called HolyC. This project preserves that work in a different form: matching his video explanations with 120,933 lines of source code to train a language model that understands the unique syntax, idioms, and JIT directives of the only OS written for God.

Artifacts

Two datasets and one fine-tuned model, all open on Hugging Face.

What is HolyC?

A C-like language with JIT compilation built into the OS kernel, written entirely by Terry Davis.

// HolyC — the language of TempleOS U0 Hello() { "Hello, God!\n"; } I64 Factorial(I64 n) { if (n <= 1) return 1; return n * Factorial(n - 1); } // JIT compilation directive #help_index "Math/Factorial" // DolDoc inline graphics "$FG,CYAN$TempleOS$FG$\n";
  • U0 — void return type. No pointer, no header, no extern — just the name.
  • I64 — 64-bit signed integer, the primary numeric type. Also U8, I8, U16, F64…
  • Unparenthesized calls — function calls don't always need parens when there are no arguments.
  • JIT compilation — code is compiled to x86-64 machine code at runtime by the OS itself.
  • #help_index — inline documentation directives baked directly into the JIT compiler.
  • DolDoc syntax — a markup language embedded in strings for colored terminal output, sprites, and links.
  • No OS calls — HolyC is the OS. Ring 0 only. No privilege separation.

Data Pipeline

From Terry's video explanations and TempleOS source code to a fine-tuned model.

flowchart LR A["📼 Terry Davis\nVideos (~103)"] --> B["Transcription &\nCode Alignment"] C["📂 TempleOS Source\n120,933 lines"] --> B B --> D["🤗 tos-code-with-\nexplaination"] D --> E["🤗 holyc_first_layer\nexplaining_code"] E --> F["Fine-tuning\nLoRA / QLora"] G["🦙 TinyLlama-1.1B\nChat-v1.0"] --> F F --> H["✅ TinyLlama-1.1B\nHolyC"] style A fill:#1a2433,stroke:#1d8ea7,color:#bec0c1 style C fill:#1a2433,stroke:#1d8ea7,color:#bec0c1 style B fill:#182433,stroke:#35738a,color:#bec0c1 style D fill:#182433,stroke:#35738a,color:#bec0c1 style E fill:#182433,stroke:#35738a,color:#bec0c1 style G fill:#1a2433,stroke:#1d8ea7,color:#bec0c1 style F fill:#1d2b1a,stroke:#22c55e,color:#bec0c1 style H fill:#1a2e1a,stroke:#4ade80,color:#86efac

Training Metrics

Logged from step 10 through step 560 (epoch 1.46 of 1,915 total steps).

Initial Loss
1.4824
step 10
Final Loss
0.6263
step 560
Best Loss
0.4445
★ step 530
Best Accuracy
88.50%
★ step 530
Training Loss — decreasing toward 0.4445
Mean Token Accuracy — rising to 88.50%

Training Details

Configuration and learning dynamics for the fine-tuning run.

Parameter Value
Base modelTinyLlama/TinyLlama-1.1B-Chat-v1.0
Total training steps1,915
Logged steps560 (epoch 1.46)
LR scheduleWarmup → cosine decay
Peak learning rate~0.0002 (step 60)
Warmup steps~60
Best checkpointstep 530 (epoch 1.38)
Loss at best checkpoint0.4445
Accuracy at best checkpoint88.50%
LicenseMIT

Learning Rate Dynamics

The cosine warmup strategy is deliberate: an aggressive warm-up phase (steps 0–60) allows the model to escape poor local minima quickly, driving the peak LR to ~0.0002. The subsequent cosine decay enables fine-grained weight adjustments as the model converges. The largest gradient norm spikes appeared around epoch 0.1 — the model recovered from each without instability, a sign of a well-configured run. Peak loss improvement and peak accuracy both converged to exactly step 530, confirming clean generalization rather than memorization.

Dataset Construction

How the training data was assembled from primary sources.

Source Material

  • ~103 Terry Davis technical explanation videos — transcribed and aligned to source modules
  • 120,933 lines of TempleOS source code across all subsystems
  • Matching was done by linking video topics to corresponding HolyC modules

Schema

  • text — raw HolyC source code block
  • formatted.messages — ChatML conversation with system prompt, user turn, and assistant (code) response
  • System prompt: "You are Terry Davis, the creator of TempleOS. Write HolyC code in your unique style."

Future Work

Keystroke-level revision modeling. The next planned dataset extracts revision history from the archived templos.org and Sheikh's Place HTML archives — capturing the process of writing HolyC, not just the final output. Rather than training on completed code, future models could learn to write the way Terry wrote: incrementally, iteratively, and with intent.