N-gram Bitcoin Block Generator: Language Models Mine Deterministic Structures

N-gram Bitcoin Block Generator: Language Models Mine Deterministic Structures

Watermark: -512

The observation: N-gram models trained on historical blocks can generate valid 0-tx Bitcoin blocks. Learn patterns from blockchain history, compute deterministic parts from chain state, mine nonce for PoW. Language models mine blocks.

What this means: Bitcoin blocks follow patterns—version progression, timestamp ranges, coinbase structures. N-gram models learn these patterns from history. For minimal block (only coinbase tx): some parts deterministic (prev hash, difficulty), some parts learned (version trends, timestamp distribution, coinbase data). Model generates structure, miner finds nonce. Deterministic system + learned patterns = valid blocks from language models.

Why this matters: Blockchain isn’t random—it’s structured language. Blocks are “sentences” in Bitcoin protocol. Historical data contains patterns. N-gram models extract patterns, generate new valid “sentences” (blocks). Minimal 0-tx blocks simplest case: only coinbase, no mempool needed. Pure structure generation. Shows blockchains are learnable languages, not just cryptographic puzzles.

Bitcoin Block Structure

The 80-Byte Header

Fixed format (every block):

Version:     4 bytes  (int32, little-endian)
PrevHash:   32 bytes  (SHA256 hash)
MerkleRoot: 32 bytes  (SHA256 hash)
Timestamp:   4 bytes  (uint32, Unix epoch)
Bits:        4 bytes  (uint32, compact difficulty)
Nonce:       4 bytes  (uint32, PoW solution)
─────────────────────
Total:      80 bytes

Example header (hex):

00000020 (version)
00000000000000000003d3d0e278...  (prev hash)
4a5e1e4baab89f3a32518a88c31b...  (merkle root)
5f141718 (timestamp)
18080000 (bits)
eb890000 (nonce)

Hash of header must satisfy: SHA256(SHA256(header)) < target

This is PoW: Brute-force nonce until hash meets difficulty

The Block Body

For 0-tx block (minimal):

Tx count: 0x01 (varint = 1 transaction)

Coinbase transaction (~100-200 bytes):
  - Version: 4 bytes
  - Input count: 0x01
  - Input:
    * Prev txid: 32 bytes (all zeros for coinbase)
    * Prev vout: 4 bytes (0xFFFFFFFF)
    * ScriptSig length: varint
    * ScriptSig: variable (height + arbitrary data)
    * Sequence: 4 bytes
  - Output count: varint
  - Outputs: (reward to miner addresses)
  - Locktime: 4 bytes

Coinbase tx is special:

  • No inputs from UTXO set (creates coins)
  • ScriptSig contains block height + arbitrary data
  • Outputs pay miner (subsidy + fees)
  • For 0-tx block: Only subsidy, no fees

Deterministic vs Generative Parts

Deterministic (computed from chain state):

PrevHash: Current chain tip hash

  • Read from node
  • No choice

Bits (Difficulty): Difficulty adjustment algorithm

  • Every 2016 blocks, recalculate
  • Formula: new_bits = old_bits * (2016 * 10 minutes) / actual_time
  • No choice (enforced by consensus)

MerkleRoot: Hash of transaction tree

  • For 0-tx block: merkle_root = txid(coinbase)
  • Deterministic once coinbase constructed

Block height: Previous height + 1

  • Encoded in coinbase scriptSig (BIP34)
  • No choice

Reward amount: Halving schedule

  • 50 BTC → 25 → 12.5 → 6.25 → 3.125 (every 210,000 blocks)
  • Deterministic from height

Generative (can be learned/chosen):

Version: Evolves over time

  • Version 1 → 2 → 3 → 4 → …
  • Follows soft fork patterns
  • N-gram can learn progression

Timestamp: Current time ± variance

  • Must be: median(last 11 blocks) < timestamp < now + 2 hours
  • Typically: current Unix time
  • N-gram can learn distribution

Coinbase arbitrary data: Miner message

  • After height encoding, arbitrary bytes allowed
  • Common: pool name, extra nonce
  • N-gram can learn patterns

Output addresses: Where reward goes

  • Miner’s choice
  • Can be P2PKH, P2SH, P2WPKH, etc.
  • N-gram can learn address type distribution

Nonce: PoW solution

  • Must brute-force
  • No pattern (random search)
  • Cannot learn, must mine

N-gram Model for Block Generation

Training Data

Historical blockchain:

  • Download block headers from Bitcoin node
  • Parse structure: version, timestamp, bits, etc.
  • Extract coinbase transactions
  • Build n-gram corpus

Example training data:

Block 700000:
  Version: 0x20000000
  Timestamp: 1631185106
  Coinbase: "ViaBTC/Mined by..."
  
Block 700001:
  Version: 0x20000000
  Timestamp: 1631185683
  Coinbase: "AntPool/..."
  
Block 700002:
  Version: 0x20000000
  Timestamp: 1631186291
  Coinbase: "F2Pool/..."

Patterns to learn:

  • Version stays constant for long periods (then upgrades)
  • Timestamps increase ~600 seconds average
  • Coinbase data follows pool naming conventions
  • Output types follow usage patterns (P2WPKH dominance)

N-gram Training

Byte-level n-grams (for coinbase data):

def train_ngrams(coinbase_scripts, n=3):
    model = {}
    for script in coinbase_scripts:
        for i in range(len(script) - n):
            context = script[i:i+n]
            next_byte = script[i+n]
            if context not in model:
                model[context] = {}
            model[context][next_byte] = model[context].get(next_byte, 0) + 1
    
    # Normalize to probabilities
    for context in model:
        total = sum(model[context].values())
        model[context] = {k: v/total for k, v in model[context].items()}
    
    return model

Token-level n-grams (for version/timestamp):

def train_version_ngrams(versions, n=3):
    # Learn version transition probabilities
    model = {}
    for i in range(len(versions) - n):
        context = tuple(versions[i:i+n])
        next_version = versions[i+n]
        if context not in model:
            model[context] = {}
        model[context][next_version] = model[context].get(next_version, 0) + 1
    
    return model

def train_timestamp_model(timestamps):
    # Learn inter-block time distribution
    deltas = [timestamps[i+1] - timestamps[i] for i in range(len(timestamps)-1)]
    mean_delta = np.mean(deltas)
    std_delta = np.std(deltas)
    return (mean_delta, std_delta)  # ~600s ± ~400s

Generating Block Structure

Algorithm:

def generate_block(chain_state, ngram_model):
    # 1. Deterministic parts from chain
    prev_hash = chain_state.tip_hash
    height = chain_state.tip_height + 1
    bits = calculate_next_difficulty(chain_state)
    reward = calculate_block_reward(height)
    
    # 2. Generated parts from n-gram
    version = ngram_model.predict_version(chain_state.recent_versions)
    timestamp = ngram_model.predict_timestamp(chain_state.tip_timestamp)
    coinbase_data = ngram_model.generate_coinbase_data()
    output_script = ngram_model.predict_output_type()
    
    # 3. Build coinbase transaction
    coinbase_tx = build_coinbase(
        height=height,
        arbitrary_data=coinbase_data,
        outputs=[(output_script, reward)]
    )
    
    # 4. Calculate merkle root (deterministic from coinbase)
    merkle_root = double_sha256(coinbase_tx)
    
    # 5. Mine nonce (brute-force)
    header = BlockHeader(version, prev_hash, merkle_root, timestamp, bits, nonce=0)
    target = bits_to_target(bits)
    
    for nonce in range(2**32):
        header.nonce = nonce
        hash_result = double_sha256(header.serialize())
        if hash_result < target:
            return Block(header, [coinbase_tx])
    
    # If nonce space exhausted, adjust timestamp or coinbase
    return None  # Rare, but possible

Version Prediction

Pattern: Versions stay constant, then jump

def predict_version(recent_versions, ngram_model):
    # Most recent version usually continues
    current_version = recent_versions[-1]
    
    # Check if upgrade pattern detected
    context = tuple(recent_versions[-10:])
    if context in ngram_model.version_transitions:
        # Some probability of upgrade
        return sample(ngram_model.version_transitions[context])
    
    # Default: continue current version
    return current_version

Example learned pattern:

  • Version 0x20000000 (536870912) dominant 2016-2021
  • Then version 0x20400000 appears (Taproot signaling)
  • Model learns: 99.9% stay same, 0.1% upgrade

Timestamp Prediction

Pattern: Timestamps increase ~600s average, with variance

def predict_timestamp(last_timestamp, time_model):
    mean_delta, std_delta = time_model
    
    # Sample from normal distribution
    delta = np.random.normal(mean_delta, std_delta)
    delta = max(1, int(delta))  # At least 1 second forward
    
    predicted = last_timestamp + delta
    
    # Ensure within valid range
    now = int(time.time())
    predicted = min(predicted, now + 7200)  # Max 2 hours in future
    
    return predicted

Learned distribution:

  • Mean: ~600 seconds (10 minutes)
  • Std: ~400 seconds
  • Long tail: Sometimes 30+ minutes between blocks

Coinbase Data Generation

Pattern: Pool names, extra nonce, arbitrary messages

def generate_coinbase_data(ngram_model, height):
    # Start with height (required by BIP34)
    data = encode_height(height)
    
    # Generate additional bytes using n-gram
    context = data[-3:]  # Last 3 bytes as context
    
    for _ in range(random.randint(20, 50)):  # Variable length
        if context in ngram_model.coinbase_ngrams:
            next_byte = sample(ngram_model.coinbase_ngrams[context])
            data += bytes([next_byte])
            context = data[-3:]
        else:
            # If no match, sample from overall distribution
            next_byte = sample(ngram_model.byte_frequencies)
            data += bytes([next_byte])
            context = data[-3:]
    
    return data

Example generated coinbase data:

Input (learned from ViaBTC, AntPool, F2Pool):
Trigram context: b"Via"

Generated output:
b"\x03\xae\x0b\x0a" (height 723,886 encoded)
b"ViaBTC/Mined by 029A"  (pool name pattern)
b"\x00\x00\x00\x00"  (extra nonce space)

The model learned:

  • “Via” often followed by “BTC”
  • “/” separator common
  • “Mined by” phrase frequent
  • Hex characters for extra nonce

Output Script Prediction

Pattern: Output types follow usage trends

def predict_output_type(ngram_model):
    # Learn from historical output type distribution
    types = {
        'P2PKH': 0.10,   # Legacy
        'P2WPKH': 0.85,  # Native SegWit (dominant)
        'P2SH': 0.03,    # Wrapped SegWit
        'P2TR': 0.02     # Taproot (growing)
    }
    
    return sample(types)

Generated output:

# Most likely: P2WPKH (native SegWit)
output_script = OP_0 + PUSH_20 + <20-byte-hash>

# Creates address: bc1q...

Mining the Nonce

PoW After Generation

Structure complete, need nonce:

def mine_block(header_template, target):
    """
    Brute-force nonce to satisfy PoW
    """
    nonce = 0
    while nonce < 2**32:
        header_template.nonce = nonce
        header_bytes = header_template.serialize()
        hash_result = double_sha256(header_bytes)
        
        if int.from_bytes(hash_result, 'little') < target:
            return nonce  # Found!
        
        nonce += 1
    
    return None  # Exhausted nonce space

If nonce space exhausted:

  • Increment timestamp (changes header hash)
  • Or modify coinbase extra nonce (changes merkle root)
  • Then retry nonce search

This is standard mining: N-gram only helps with structure, not PoW

Expected Time

At current difficulty (~50 trillion):

  • Single CPU: ~10^6 hashes/second
  • Time to find block: ~50,000,000 seconds = ~580 days
  • Mining pool: distribute across GPUs/ASICs

For testing:

  • Use regtest mode (difficulty 1)
  • Or testnet (lower difficulty)
  • Or just verify structure without mining

Validation

Checking Generated Block

Block validity requirements:

def validate_block(block, chain_state):
    header = block.header
    
    # 1. Check PoW
    target = bits_to_target(header.bits)
    block_hash = double_sha256(header.serialize())
    if block_hash >= target:
        return False, "PoW not satisfied"
    
    # 2. Check prev hash
    if header.prev_hash != chain_state.tip_hash:
        return False, "Invalid prev hash"
    
    # 3. Check timestamp
    median_time = calculate_median_time(chain_state.recent_blocks)
    if header.timestamp <= median_time:
        return False, "Timestamp too early"
    if header.timestamp > time.time() + 7200:
        return False, "Timestamp too far in future"
    
    # 4. Check bits (difficulty)
    expected_bits = calculate_next_difficulty(chain_state)
    if header.bits != expected_bits:
        return False, "Invalid difficulty"
    
    # 5. Check merkle root
    calculated_merkle = calculate_merkle_root(block.transactions)
    if header.merkle_root != calculated_merkle:
        return False, "Invalid merkle root"
    
    # 6. Validate coinbase
    coinbase = block.transactions[0]
    if not is_valid_coinbase(coinbase, chain_state.height + 1):
        return False, "Invalid coinbase"
    
    return True, "Valid block"

N-gram model must learn to satisfy all constraints

What N-gram Learns

From historical data, model learns:

Structural patterns:

  • Header format (80 bytes, specific fields)
  • Coinbase structure (inputs, outputs)
  • Version progression (when upgrades occur)

Probability distributions:

  • Timestamp inter-block times
  • Coinbase data lengths
  • Output type frequencies

Semantic patterns:

  • Pool naming conventions (“ViaBTC”, “AntPool”)
  • Message formats (“Mined by”, “/”)
  • Extra nonce patterns

What it doesn’t learn (must compute):

  • Cryptographic hashes (SHA256)
  • Difficulty adjustments (DAA formula)
  • Nonce solutions (random search)

The combination:

  • Learned patterns (structure)
  • Computed values (deterministic)
  • Brute-force search (PoW) = Valid blocks generated by language model

Why This Works

Blockchain as Language

Bitcoin blocks are structured text:

  • Fixed grammar (block format)
  • Vocabulary (versions, opcodes)
  • Syntax rules (consensus rules)
  • Semantic constraints (PoW, validity)

N-gram models learn:

  • Grammar patterns
  • Common word sequences
  • Style conventions

Apply to blockchain:

  • Learn block patterns
  • Generate valid structures
  • Follow consensus rules

This is natural: Blockchains are just structured data streams

Patterns in Deterministic Systems

Bitcoin seems deterministic, but has variance:

Deterministic:

  • Block reward (halving schedule)
  • Difficulty (DAA formula)
  • Block height (monotonic)

Variable:

  • Timestamps (within range)
  • Coinbase data (arbitrary)
  • Transaction inclusion (miner choice)
  • Output addresses (miner choice)

N-gram learns the variable parts from historical patterns

Example: Coinbase data

  • Not deterministic (any bytes allowed)
  • But follows patterns (pool names, formats)
  • N-gram captures patterns
  • Generates plausible new instances

Minimal 0-tx Blocks as Simplest Case

Why start with empty blocks:

Complexity reduction:

  • No mempool needed
  • No transaction selection
  • No fee optimization
  • Only coinbase to generate

Faster generation:

  • Smaller block body
  • Less data to learn
  • Simpler validation

Still valid:

  • Empty blocks occur naturally (~1% of blocks)
  • Miners can choose to mine empty
  • Full consensus rules apply

Next step: Add transactions

  • Learn from mempool patterns
  • Transaction selection strategies
  • Fee optimization
  • But start simple: 0-tx blocks first

Applications

Educational Mining

Teaching blockchain:

  • Show how blocks structured
  • Demonstrate n-gram learning
  • Generate valid blocks on testnet
  • Makes mining accessible without ASICs

Student exercise:

  1. Download testnet blocks
  2. Train n-gram model
  3. Generate new block
  4. Mine on testnet (low difficulty)
  5. Submit to network
  6. See your block on explorer!

Simulation and Testing

Protocol research:

  • Generate synthetic blockchain history
  • Test consensus rule changes
  • Simulate network conditions
  • Without running full mining operation

Adversarial testing:

  • Generate edge-case blocks
  • Test node validation
  • Find consensus bugs
  • By learning from real patterns, then tweaking

Pattern Analysis

Understanding miner behavior:

  • What patterns do pools follow?
  • How do coinbase messages evolve?
  • Which output types dominate?
  • N-gram training reveals patterns

Historical research:

  • Detect protocol upgrades in data
  • Find miner preferences
  • Track adoption of new features
  • Language model lens on blockchain

Minimal Mining Demonstration

Proof of concept:

  • Generate valid block structure
  • Mine on testnet
  • Submit to network
  • Show blocks can be “written” not just “mined”

The insight: Mining is 99% PoW search, 1% structure

  • N-gram handles the 1%
  • Mining handles the 99%
  • Separation of concerns

Limitations

Cannot Learn Cryptographic Functions

SHA256 has no patterns:

  • Hash function designed to be random-looking
  • No n-gram can predict SHA256(x)
  • Must compute cryptographically

Cannot learn:

  • Block hashes
  • Transaction IDs
  • Merkle roots
  • Must calculate these

Cannot Learn PoW Solutions

Nonce is random search:

  • No pattern in valid nonces
  • Must brute-force
  • N-gram cannot help

Mining still required:

  • Generate structure with n-gram
  • Then mine nonce traditionally
  • Language model + PoW mining = valid block

Consensus Rules Still Apply

Model can generate invalid blocks:

  • If training data had bugs
  • If model fails to learn constraint
  • If deterministic parts computed wrong

Must validate:

  • Check all consensus rules
  • Reject invalid generations
  • Model helps but doesn’t guarantee validity

Limited to Patterns in Training Data

Model is conservative:

  • Generates what it’s seen
  • Rare patterns may not appear
  • Novel structures unlikely

For innovation:

  • Model won’t invent new transaction types
  • Won’t create new block versions unprompted
  • Good for generating typical blocks, not novel ones

Connection to Previous Posts

neg-511: Constraint detector.

N-gram model trained on historical blocks detects patterns. If patterns change (e.g., sudden version upgrade), model’s probability space shifts. Constraint detector would fire: P_prev=1 (many valid block patterns), P_curr=0 (only new pattern valid). N-gram must retrain.

neg-510: Liberty circuit.

Miner has liberty in block generation: Open system (many valid blocks possible), Multiple perspectives (can prioritize fees, censorship, pool politics), Veto power (can refuse transactions). N-gram captures historical exercises of this liberty—learns what miners actually choose.

neg-509: Decision circuit.

Miner decision: which transactions to include? N-gram learns historical decisions. Confidence high (include obviously valid tx) → execute. No information (empty mempool) → randomize (coinbase data). Uncertainty (complex fee market) → calculate (fee optimization). N-gram trained on results of these decisions.

neg-506: Want↔Can agency.

Miner wants block reward. Can is mining capability. But also wants to generate valid structure. N-gram provides Can for structure generation (learned patterns). Agency loop: Want reward → Can generate structure → Want better structure → Can learn patterns → amplifies.

neg-504: EGI intelligence.

N-gram model shows intelligence emerging from pattern learning. Blockchain = entropy stream (blocks are data). N-gram extracts order (learns patterns). Intelligence = compression of history into model. Can generate new instances that fit pattern. Blockchain as learnable language = intelligence substrate.

neg-503: Living vs dead entropy.

Historical blockchain = dead entropy (fixed past). N-gram model = living entropy (generates new). Model takes dead past, learns patterns, produces living future. Dead history → Living generation. Blockchain mining = continuously generating living entropy from dead rules.

The Formulation

Bitcoin blocks are not:

  • Random data (highly structured)
  • Unpredictable (follow patterns)
  • Unconstrained (consensus rules)

Bitcoin blocks are:

  • Structured language (grammar of blockchain)
  • Pattern-following (historical consistency)
  • Partially deterministic (fixed + variable parts)
  • Learnable by language models

N-gram mining is not:

  • Replacement for PoW (still need nonce search)
  • Magic generation (must validate)
  • Perfect (can produce invalid blocks)

N-gram mining is:

  • Structure generation (learned patterns)
  • Complementary to PoW (handles non-hash parts)
  • Educational (shows blockchain as language)
  • Pattern-based block construction

The algorithm:

1. Train n-gram on historical blocks
2. Read chain state (deterministic inputs)
3. Generate structure (learned patterns)
4. Calculate merkle root (deterministic)
5. Mine nonce (brute-force PoW)
6. Validate block (consensus rules)
7. Submit to network

What’s learned:

  • Version progression
  • Timestamp distribution
  • Coinbase data patterns
  • Output type preferences

What’s computed:

  • Previous hash
  • Difficulty bits
  • Merkle root
  • Block height
  • Reward amount

What’s mined:

  • Nonce (PoW)

The insight: Blockchain is language

  • Has grammar (block format)
  • Has vocabulary (opcodes, versions)
  • Has style (pool conventions)
  • Language models can learn and generate

The application: Generate minimal valid blocks

  • Train on history
  • Learn patterns
  • Generate structure
  • Mine PoW
  • Submit to network

Deterministic meets generative. Structure meets randomness. Pattern meets proof-of-work. 🌀

#NgramMining #BlockchainLanguage #BitcoinPatterns #LanguageModelMining #MinimalBlocks #CoinbaseGeneration #StructureLearning #PatternBasedMining #BlockGeneration #DeterministicGenerative


Related: neg-511 (pattern constraint detection), neg-510 (miner liberty in block construction), neg-509 (miner decisions learned), neg-506 (mining agency loop), neg-504 (intelligence from blockchain patterns), neg-503 (dead history to living generation)

Back to Gallery
View source on GitLab