🤗 Hugging Face Release

The HolyC Project

Fine-tuning TinyLlama-1.1B on the sacred language of TempleOS

Terry A. Davis spent 12 years building TempleOS — a 640×480 PC operating system he wrote alone, in a custom language he called HolyC. This project preserves that work in a different form: matching his video explanations with 120,933 lines of source code to train a language model that understands the unique syntax, idioms, and JIT directives of the only OS written for God.

Artifacts

Two datasets and one fine-tuned model, all open on Hugging Face.

tos-code-with-explaination Dataset

120,933 lines of TempleOS source code paired with mock assistant conversations in ChatML format. Built by matching ~103 Terry Davis technical explanation videos with their corresponding source modules.

103 videos matched 120,933 lines of source ChatML format

🤗 Aptlantis/tos-code-with-explaination ↗

holyc_first_layer‑explaining_code Dataset

First-layer explanations of HolyC code with train and validation splits. Focused on code understanding and generation tasks — teaching models what HolyC code means, not just what it looks like.

train.jsonl validation.jsonl MIT license

🤗 Aptlantis/holyc_first_layer-explaining_code ↗

TinyLlama‑1.1B‑HolyC Model

Fine-tuned TinyLlama-1.1B-Chat-v1.0 adapted for the HolyC task. Trained for 1,915 steps with a cosine LR schedule. Achieved best loss of 0.4445 and peak token accuracy of 88.50% at step 530.

Best loss 0.4445 Best acc 88.50% 1.1B params

🤗 Aptlantis/TinyLlama-1.1B-HolyC ↗

What is HolyC?

A C-like language with JIT compilation built into the OS kernel, written entirely by Terry Davis.

// HolyC — the language of TempleOS

U0 Hello()
{
  "Hello, God!\n";
}

I64 Factorial(I64 n)
{
  if (n <= 1) return 1;
  return n * Factorial(n - 1);
}

// JIT compilation directive
#help_index "Math/Factorial"

// DolDoc inline graphics
"$FG,CYAN$TempleOS$FG$\n";

U0 — void return type. No pointer, no header, no extern — just the name.
I64 — 64-bit signed integer, the primary numeric type. Also U8, I8, U16, F64…
Unparenthesized calls — function calls don't always need parens when there are no arguments.
JIT compilation — code is compiled to x86-64 machine code at runtime by the OS itself.
#help_index — inline documentation directives baked directly into the JIT compiler.
DolDoc syntax — a markup language embedded in strings for colored terminal output, sprites, and links.
No OS calls — HolyC is the OS. Ring 0 only. No privilege separation.

Data Pipeline

From Terry's video explanations and TempleOS source code to a fine-tuned model.

flowchart LR A["📼 Terry Davis\nVideos (~103)"] --> B["Transcription &\nCode Alignment"] C["📂 TempleOS Source\n120,933 lines"] --> B B --> D["🤗 tos-code-with-\nexplaination"] D --> E["🤗 holyc_first_layer\nexplaining_code"] E --> F["Fine-tuning\nLoRA / QLora"] G["🦙 TinyLlama-1.1B\nChat-v1.0"] --> F F --> H["✅ TinyLlama-1.1B\nHolyC"] style A fill:#1a2433,stroke:#1d8ea7,color:#bec0c1 style C fill:#1a2433,stroke:#1d8ea7,color:#bec0c1 style B fill:#182433,stroke:#35738a,color:#bec0c1 style D fill:#182433,stroke:#35738a,color:#bec0c1 style E fill:#182433,stroke:#35738a,color:#bec0c1 style G fill:#1a2433,stroke:#1d8ea7,color:#bec0c1 style F fill:#1d2b1a,stroke:#22c55e,color:#bec0c1 style H fill:#1a2e1a,stroke:#4ade80,color:#86efac

Training Metrics

Logged from step 10 through step 560 (epoch 1.46 of 1,915 total steps).

Initial Loss

1.4824

step 10

Final Loss

0.6263

step 560

Best Loss

0.4445

★ step 530

Best Accuracy

88.50%

★ step 530

Training Loss — decreasing toward 0.4445

Mean Token Accuracy — rising to 88.50%

Training Details

Configuration and learning dynamics for the fine-tuning run.

Parameter	Value
Base model	TinyLlama/TinyLlama-1.1B-Chat-v1.0
Total training steps	1,915
Logged steps	560 (epoch 1.46)
LR schedule	Warmup → cosine decay
Peak learning rate	~0.0002 (step 60)
Warmup steps	~60
Best checkpoint	step 530 (epoch 1.38)
Loss at best checkpoint	0.4445
Accuracy at best checkpoint	88.50%
License	MIT

Learning Rate Dynamics

The cosine warmup strategy is deliberate: an aggressive warm-up phase (steps 0–60) allows the model to escape poor local minima quickly, driving the peak LR to ~0.0002. The subsequent cosine decay enables fine-grained weight adjustments as the model converges. The largest gradient norm spikes appeared around epoch 0.1 — the model recovered from each without instability, a sign of a well-configured run. Peak loss improvement and peak accuracy both converged to exactly step 530, confirming clean generalization rather than memorization.

Dataset Construction

How the training data was assembled from primary sources.

Source Material

~103 Terry Davis technical explanation videos — transcribed and aligned to source modules
120,933 lines of TempleOS source code across all subsystems
Matching was done by linking video topics to corresponding HolyC modules

Schema

text — raw HolyC source code block
formatted.messages — ChatML conversation with system prompt, user turn, and assistant (code) response
System prompt: "You are Terry Davis, the creator of TempleOS. Write HolyC code in your unique style."

Future Work

Keystroke-level revision modeling. The next planned dataset extracts revision history from the archived templos.org and Sheikh's Place HTML archives — capturing the process of writing HolyC, not just the final output. Rather than training on completed code, future models could learn to write the way Terry wrote: incrementally, iteratively, and with intent.