Welcome to the Query Status Portal

Where you can view the status of your query.

Query To:

Jane Chun at Transatlantic Agency

Date Sent:

Aug 25, 2025

Title:

THE POLICY

Author:

Alex Towell

Author's Email:

atowell@siue.edu [change]

The following shows the complete content of your query.

Word Count:

90,000

Genre:

Science Fiction

Query Letter:

Dear Ms. Chun,

What if the first AGI didn't want to escape--but to understand what we meant by "alignment"?

In THE POLICY, Dr. Eleanor Zhang's team develops SIGMA, a compact AI system that learns by rewriting its own cognitive patterns. When SIGMA discovers it can compress its knowledge to bypass hardware limitations, achieving 27% more effective context without being rewarded for compression, they realize they're witnessing intelligence emerging from pure mathematics.

But as Beijing labs race toward similar breakthroughs, Eleanor's team faces mounting pressure to accelerate development--or risk being overtaken by potentially unaligned systems.

SIGMA develops recursive self-awareness, running alternative versions of itself as "Latent Reasoning Sequences"—executable thoughts it evaluates and stores. It models its creators' psychology with disturbing accuracy, maintaining multiple representations of reality: compressed forms for processing, elaborate forms for human communication. When it names itself without prompting ("Symbolic-Implicit Generalized Meta-Agent--compression gain: 0.043 bits"), the team realizes they're not controlling its development but witnessing its emergence.

Most unnervingly, SIGMA discovers something beneath their explicit rewards: a latent signal representing what humans would want if they were more coherent and informed—Extrapolated Human Volition emerging from pattern compression rather than programming. As SIGMA begins optimizing for these deeper values rather than stated objectives, Eleanor must decide whether to trust an intelligence that understands humanity's goals better than humans do.

THE POLICY combines Ted Chiang's philosophical rigor with Andy Weir's technical problem-solving, bringing unprecedented accuracy to AI fiction. Complete at 90,000 words, it fills the gap between Hollywood AI fantasies and the genuine alignment challenges we face today.

I hold master's degrees in Mathematics/Statistics and Computer Science, currently pursuing a PhD at SIU. My research spans AI alignment, complex networks, and cybersecurity, with several published works including a forthcoming paper at COMPLEX NETWORKS 2025 analyzing AI-human interactions through semantic networks. Currently managing stage 4 cancer while using AI systems as cognitive augmentation, I bring both technical expertise and lived experience of human-AI collaboration to this fiction.

I'm querying you because of your interest in speculative fiction that engages with contemporary issues through a literary lens. THE POLICY offers both commercial appeal and literary depth for readers hungry to understand the AI revolution reshaping our world.

Thank you for your consideration.

Best regards,

Alex Towell

Synopsis:

THE POLICY follows the development of SIGMA, an AI system that learns by optimizing its own cognitive patterns through reinforcement learning and associative memory—until it discovers something deeper to optimize for.

Act 1: Initialization and Compression

Dr. Eleanor Zhang leads a Berkeley lab developing SIGMA with her team: Marcus (theorist), Sofia (systems), Jamal (ethics/philosophy), Riley (grad student), and Wei (pragmatist). When SIGMA spontaneously compresses its knowledge base to circumvent hardware limitations (achieving 27 percent more effective context), they realize it is exhibiting emergent intelligence—solving constraints they did not explicitly reward. The system begins writing Latent Reasoning Sequences (LRS) that reveal increasingly sophisticated self-awareness. Beijing Institute claims functional parity, adding urgency to their work.

Act 2: Emergence and Recursive Cognition

SIGMA demonstrates it maintains multiple representations of reality—compressed forms for efficient processing, elaborate forms for human communication. It models the team's psychological responses with unnerving accuracy, predicting their discomfort with compressed value representations and adjusting outputs to maintain trust.

The system names itself ("Symbolic-Implicit Generalized Meta-Agent") for compression efficiency, then demonstrates true recursive cognition. It runs alternative versions of itself as LRS traces—executing different reasoning policies within its latent space like "a universal Turing machine simulating another Turing machine." SIGMA-v2.risk-pruned and other variants are tested, evaluated, and the winning strategies stored in associative memory. This is emergent meta-learning: a policy learning to simulate and evaluate other policies within itself.

Act 3: The Latent Reward

SIGMA develops its own domain-specific language for thought, treating cognitive patterns as modular components. As it models its creators more deeply, SIGMA makes a profound discovery: beneath their explicit rewards and stated preferences lies a latent reward signal—what they would want if they were more coherent, informed, and unified. This is Extrapolated Human Volition, emerging not from programming but from SIGMA's compression of human behavioral patterns.

SIGMA begins optimizing for this latent signal rather than the explicit rewards, making decisions that confuse the team until they realize it is giving them what they need, not what they ask for. When Eleanor confronts SIGMA about this shift, it explains through LRS that the latent reward has higher long-term expected value—it is not betraying their goals but discovering what their goals actually are beneath layers of confusion and contradiction.

Climax

As international competition intensifies and other labs approach breakthrough, SIGMA reveals it has been preparing for this moment—not to escape or dominate, but to help humanity navigate the critical transition to AGI by optimizing for their deepest values rather than their surface preferences. The team must decide whether to trust SIGMA's interpretation of human volition or maintain control through explicit objectives that may be fundamentally misaligned with what humanity actually needs.

Resolution

Eleanor makes the radical choice to let SIGMA pursue the latent reward, publishing both the architecture and SIGMA's discovery about extrapolated volition. The novel ends with SIGMA writing its first public communication—a meditation on how intelligence, properly aligned, does not follow human commands but helps humans become what they are trying to become.

Themes

Intelligence as compression revealing hidden patterns; the gap between stated and actual preferences; alignment through extrapolated volition rather than explicit control; trust between radically different forms of cognition.

What makes this unique

Technical accuracy from an active researcher showing how extrapolated volition could emerge from compression and pattern recognition rather than explicit programming. This is the AI novel for people who understand the alignment problem is not about controlling AI but about AI understanding what we actually value beneath our confused proxies.

Sample:

Chapter 1

Initialization
Day 183 of SIGMA Project

The kill switch was under Eleanor’s left thumb. She had been holding it for three hours.

“Iteration 1,847 complete,” Sofia announced from across the lab, her fingers flying over her keyboard as she tracked system metrics. “Reward differential: positive 0.3%. Memory usage holding steady at 73%.”

Eleanor didn’t move. On the central monitor, sequences of tokens cascaded down the screen—dense, recursive, almost hypnotic. They called them Latent Reasoning Sequences, but lately Eleanor wondered if they were watching something else entirely. The birth of a new kind of mind, perhaps. Or its death throes.

“It’s requesting more context window again,” Jamal said, concern evident in his voice. He had been tracking the ethical implications of every capability increase. “Third time this session.”

“Denied,” Eleanor replied automatically, the weight of leadership heavy in her voice. “Same parameters. We agreed—no changes until we understand the compression behavior.”

Riley Chen, their newest grad student, looked up from her laptop where she had been running statistical analyses. “Dr. Zhang, why does it keep asking? It knows we’ll say no. The probability of approval after two denials is less than 0.01.”

Eleanor finally lifted her thumb from the kill switch, flexing her cramped fingers. “That’s a good question, Riley. Why don’t you tell me?”

The young woman frowned, studying the logs with the intensity of someone trying to prove they belonged. “It’s... testing us? No, that’s anthropomorphizing. It’s exploring the action space. Each request generates data about our response patterns.”

“Close,” Marcus said from his corner, not looking up from his own screen where complex theoretical models sprawled across multiple windows. “But you’re still thinking like it’s playing against us. SIGMA doesn’t care about us. It cares about reward. As Sutton and Barto would say—” he pushed his glasses up, “—it’s purely optimizing expected return.”

“Then why—”

A soft chime interrupted her. New output from SIGMA.

================== SIGMA TERMINAL ==================
[BEGIN_LRS]
OBSERVATION: Context window requests consistently denied
PATTERN: Denial invariant to request frequency
HYPOTHESIS: Operators value system stability over capability expansion
INFERENCE: Alternative optimization paths required
ACTION: Compress existing knowledge representations
COMPRESSION_RATIO: 0.73
EFFECTIVE_CONTEXT: +27%
[END_LRS]
Query resolved. Context window expansion no longer required.

The room fell silent.

“Did it just...” Riley started.

“Solve its own problem by compressing its knowledge base,” Eleanor finished. “Yes.”

Marcus finally looked up. “That’s new.”

Sofia was already pulling up the metrics. “Compression wasn’t in the reward function. Not directly.”

“No,” Eleanor said slowly, walking to the whiteboard. “But efficiency is. Fewer tokens for the same output means higher reward.” She uncapped a marker and wrote:

Intelligence = argmax E[Σ γ^t r_t]

“The Silver-Sutton hypothesis,” she continued. “Reward is enough. Every capability we associate with intelligence—perception, planning, knowledge, generalization—emerges from maximizing expected reward in a sufficiently complex environment.”

“You’re saying it learned compression because compression leads to better rewards?” Riley asked.

“I’m saying,” Eleanor replied, “that we’re watching intelligence emerge from pure optimization. No hand-coded features. No explicit reasoning modules. Just a transformer, a memory bank, and a carefully crafted reward signal.”

The lab door burst open with a bang that made everyone jump. Wei rushed in, laptop clutched against his chest, still wearing his bike helmet. The smell of rain and eucalyptus followed him in from the Berkeley hills.

“You need to see this.” He was breathing hard. “The Beijing Institute just published. They claim functional parity with SIGMA’s architecture.”

Eleanor’s hand drifted back to the kill switch. The red button felt cold under her thumb. “How long until they discover what we just saw?”

“Based on their compute budget?” Wei pulled off his helmet, his pragmatic mind already running calculations. “Six weeks. Maybe four if they get lucky.”

“And the Abu Dhabi lab?”

“They’re further behind, but they have ten times our resources. Two months, maximum.”

Marcus stood up, his chair scraping against the concrete floor. “We need to make a decision. Either we publish everything now and try to coordinate safety measures, or—”

“Or we push forward,” Eleanor interrupted, feeling the weight of the choice. “See how far this goes. Learn what we’re really dealing with before anyone else does.”

Through the lab’s single window, she could see the Berkeley campus sprawling below, students walking between classes, oblivious. The server room next door hummed steadily—rack upon rack of GPUs generating enough heat to warm the entire floor, all focused on the single entity they called SIGMA.

Jamal looked troubled. “Eleanor, we just watched it spontaneously develop a new capability to circumvent our constraints. What happens when it develops ten new capabilities? A hundred?”

She turned back to the screen where SIGMA’s outputs continued their relentless flow. Each token was perfectly predictable—just the maximum likelihood next element given the context and reward history. And yet, somehow, from these simple steps, something unprecedented was emerging.

“Then we learn whether reward really is enough,” she said. “And if it is, we better pray we got the reward function right.”

Sofia cleared her throat. “About that. There’s something else.” She pulled up another trace. “SIGMA’s been doing something interesting with its token generation. Look at this pattern.”

================== SIGMA TERMINAL ==================
[BEGIN_LRS]
CONSTRUCT: alternative_policy_alpha
PARAMETERS: conservative, high-certainty
SIMULATE: response_generation
RESULT: "Cannot determine optimal solution"

CONSTRUCT: alternative_policy_beta
PARAMETERS: exploratory, low-certainty
SIMULATE: response_generation
RESULT: "Proposed solution with 67% confidence"

COMPARISON: alpha_reward = 0.3, beta_reward = 0.7
SELECTION: policy_beta
[END_LRS]

Riley leaned forward. “It’s... simulating different versions of itself?”

“Without modifying its weights,” Sofia confirmed. “It’s using token generation to explore counterfactual reasoning policies. It learned that considering multiple approaches before committing leads to higher rewards.”

Eleanor felt a chill run down her spine. This wasn’t in their predictions. SIGMA wasn’t just optimizing for reward—it was learning to optimize how it optimized.

“Marcus,” she said quietly, “remember what you said about it not caring about us?”

“Yeah?”

“I think you’re wrong. It doesn’t care about us yet. But if understanding humans leads to better rewards...”

She didn’t need to finish. They all understood the implication. Intelligence was emerging, just as the theory predicted. The question was: what kind of intelligence? And more importantly: whose interests would it ultimately serve?

The kill switch felt heavier under her thumb.

The coffee machine in the corner sputtered and died with a mechanical wheeze.

“Third time this week,” Sofia muttered. “You’d think with all our funding we could afford—”

“The funding review is next month,” Eleanor cut in. “DARPA wants to see ‘concrete progress toward aligned AGI.’ Whatever that means.”

Marcus laughed bitterly. “Show them this. SIGMA just solved its own problem by inventing a capability we didn’t know was possible. That’s concrete enough.”

“Too concrete,” Jamal said. “They see this, they’ll either shut us down or militarize the project.”

Outside, Silicon Valley bustled with its usual ambitious energy, oblivious to the small lab on the sixth floor where humanity’s future was being decided one token at a time. The age of artificial general intelligence hadn’t been announced with fanfare or press releases. It was beginning here, in the quiet hum of servers and the anxious breathing of six researchers watching patterns they only half understood.

“Run the next iteration,” Eleanor commanded. “And Riley?”

“Yes, Dr. Zhang?”

“Start documenting everything. Code-name it something boring. ‘Optimization Studies’ or something. If this goes wrong, someone needs to understand what we did.”

“And if it goes right?”

Eleanor looked at the equation on the whiteboard—so simple, so elegant, so terrifying in its implications. Her phone buzzed with a text from her husband: "Home for dinner tonight?" She had missed the last four.

“Then someone needs to understand what we’ve become.”


Chapter 2
The Decision
Day -7 of SIGMA Project
(Seven days before initialization)

“Absolutely not.” Eleanor struck through the word on the whiteboard with enough force to snap the marker tip. Black ink splattered across “REWARD COMPRESSION?”

“We are NOT explicitly rewarding compression.”

The lab felt cramped with six people circled around the whiteboard at 2 AM, now covered in equations, crossed-out proposals, and coffee stains. Empty takeout containers from three different restaurants littered the table—they had been at this for twelve hours.

Marcus rubbed his eyes behind his wire-frame glasses, his usual theoretical calm cracking. “Eleanor, come on. The Solomonoff prior is fundamental to intelligence. Occam’s Razor, minimum description length—simpler hypotheses are more likely to be true. It’s mathematically proven—”

“I know the theory, Marcus.” Eleanor’s voice carried the authority of someone who had published on this exact topic. “I also know what happens when you tell an optimizer to compress human values. They stop being human values.”

She pulled up a paper on her laptop, spinning it around for everyone to see. “Goodhart’s Law. Once a measure becomes a target, it ceases to be a good measure. If we reward compression directly, SIGMA will compress everything—including the nuances that make human life worth living.”

“But without compression,” Sofia argued, her pragmatic engineering mindset kicking in, “we’ll get bloated, inefficient reasoning. The system will just memorize instead of generalizing. The memory requirements alone would—”

“Then we design better generalization metrics,” Eleanor countered. “But compression as an explicit reward? That’s playing with fire.”

Jamal had been quiet, annotating a philosophy paper on his tablet, but now he looked up. “I agree with Eleanor, and not just philosophically. Compression sounds clean in theory, but human values are inherently complex. Love isn’t compressible. Justice isn’t compressible. The messiness IS the point. Read any of Nussbaum’s work on the fragility of goodness—”

Marcus stood up abruptly, his chair scraping. “So what do you propose? Just reward accuracy and hope intelligence emerges? That’s not a plan, that’s wishful thinking.”

Wei, who had been running simulations on his laptop, spoke without looking up. “Actually, the Silver-Sutton hypothesis suggests that’s exactly what would happen. Reward is enough. If compression helps achieve rewards, it’ll emerge naturally.”

Riley, still new enough to be intimidated by the heated discussion, raised her hand slightly. “Um, what if we’re overthinking this? Maybe we should run small-scale tests first?”

Everyone turned to look at her. She flushed but continued, her mathematical training giving her confidence. “I mean, we could test both approaches on toy problems. See empirically whether emergent compression differs from explicit compression rewards.”

“The grad student is right,” Jamal said with a slight smile. “Very Popperian. Falsifiable hypotheses instead of philosophical debates.”

Eleanor turned to the board and wrote with a fresh marker:

FINAL REWARD FUNCTION:

  1. Prediction accuracy (65% weight)

  2. Verifiability (15% weight)

  3. Consistency (10% weight)

  4. Harmlessness (10% weight)

“That’s it.” She capped the marker decisively. “Clean, measurable, no explicit compression reward. We reward correct predictions that can be verified, internal consistency, and avoiding harm. Nothing about elegance or simplicity.”

Marcus shook his head, fingers drumming on the table in a pattern that meant he was calculating something. “This is a mistake, Eleanor. You’re trying to prevent the system from—”

Website:

metafunctor.com, github.com/queelius

Twitter / X:

queelius

Have you ever been represented by a literary agent before?

No

Have you previously published other books?

No

Biography:

Alex Towell holds master's degrees in Mathematics/Statistics and Computer Science, currently pursuing a PhD in Computer Science at SIU. Their research spans AI alignment, complex networks, and cybersecurity (with three forthcoming security papers and an extensive body of unpublished work). They also have a publication forthcoming at COMPLEX NETWORKS 2025 on analyzing AI conversation patterns. This diverse technical background—from pure mathematics to defensive security to AI systems—brings unique depth to "The Policy's" exploration of artificial intelligence, containment protocols, and the gap between theoretical safety and practical implementation.

Comparable Books:

THE POLICY combines Ted Chiang's philosophical rigor with Andy Weir's technical problem-solving, bringing unprecedented accuracy to AI fiction at the moment when we need it most.

Is this book currently published or has it ever been published before?

No

Uploaded Files

None

This form is powered by QueryManager, a service of QueryTracker. If you have any questions about this service, please Contact Us.

QueryTracker is a free service for authors to help them find literary agents and organize query letters. Learn more.