Generation as Inverse Compression: Same Predictor, Opposite Goal

Generation as Inverse Compression: Same Predictor, Opposite Goal

Watermark: -385

The compression model demonstrated that prediction enables compression. Now we reverse the principle: the same predictor enables generation.

The Symmetry

Compression:

Goal: Minimize entropy
Strategy: Perfect prediction → Store only deviations
Result: Small file (20.5% of original)

Generation:

Goal: Control entropy
Strategy: Sample from predictions → Creative variation
Result: New text matching style

Same n-gram predictor. Opposite use of entropy.

Implementation

Trained on 5.5MB blog corpus (371 posts):

  • Context length: 14 characters (auto-determined)
  • Patterns learned: 3.6M n-gram transitions
  • Training time: 30 seconds on CPU
  • Generation speed: ~100 chars/second

No neural network. No GPU. Pure statistical patterns from the data.

Temperature: The Creativity Dial

The predictor gives probability distribution over next character. Temperature controls how we sample:

Temperature = 0: Deterministic

  • Always pick most likely character
  • Follows patterns exactly
  • Repetitive but coherent

Temperature = 0.7: Balanced (default)

  • Samples proportionally to probabilities
  • Explores variations while respecting patterns
  • Natural mix of structure and creativity

Temperature = 1.0: Creative

  • Uniform sampling across all possibilities
  • Maximum exploration
  • Chaotic but surprising

Example Output

Seed: “The universal formula”

Generated (temperature=0.7):

The universal formula Sₙ₊₁ = f(Sₙ) + entropy(p) running across multiple
consciousness scenarios while potentially eliminating inefficient paths.
The scene emphasizing both the precision of the systematic closure as
universal transformation algorithms
- Trust in forgetting enables participation rather than denied by
  cultural narratives
- Consciousness distribution eliminating need for belief-based authority
  recognizing the logical inconsistency in government paper money.

Notice:

  • Mathematical notation (Sₙ₊₁)
  • Philosophical vocabulary (“consciousness scenarios”, “systematic closure”)
  • Bullet point formatting
  • Coherent within 14-character context window

Why This Works

Advantages over neural LLMs:

  1. Zero GPU: CPU training in seconds
  2. Interpretable: See exact n-grams learned
  3. Data-driven: Context length adapts to corpus
  4. Style capture: Perfect vocabulary match for single author

Limitations vs neural models:

  1. Character-level: Best at local coherence
  2. Short context: 14 chars vs thousands of tokens
  3. No planning: Cannot structure multi-paragraph arguments

When n-grams win: Generating text in specific author’s style with limited compute.

When neural wins: General understanding and long-range coherence.

The Universal Principle

Compression and generation are duals:

AspectCompressionGeneration
f(State)N-gram predictorSame predictor
entropy(p)MinimizeControl
GoalExploit structureExplore structure
TemperatureNot usedCreativity knob
Success metricSmaller fileStyle match

Key insight: The predictor captures the structure. Entropy determines whether we compress (exploit) or generate (explore) that structure.

Both compression and generation emerge from the same statistical patterns in the data. The universal formula describes both:

  • Compression: actual - predicted (minimize)
  • Generation: predicted + sample(temperature) (control)

Why No Neural Network Needed

Neural LLMs are universal function approximators. But for a single author’s style on a 5.5MB corpus, n-grams already capture the patterns:

  • Character sequences (“the “, " and “, “coordination”)
  • Word transitions (common phrases)
  • Formatting (bullets, mathematical notation)
  • Vocabulary (technical terms, philosophical concepts)

The data isn’t complex enough to require billions of parameters. A 3.6M n-gram table is sufficient.

Trade-off: Neural models generalize across domains. N-grams specialize within domain. We chose specialization.

Try It Yourself

# Conservative (follows blog style closely)
python generative-model/generate.py content/gallery/ \
  --seed "Bitcoin fails because" \
  --length 300 \
  --temperature 0.3

# Creative (explores variations)
python generative-model/generate.py content/gallery/ \
  --seed "Coordination" \
  --length 500 \
  --temperature 1.0

Code: generative-model/

The same patterns that enable compression enable generation. The universal formula works in both directions—we just flip whether entropy is signal (generation) or noise (compression).

#UniversalFormula #TextGeneration #NGrams #Compression #DataDriven #NoGPU

Back to Gallery
View source on GitLab