<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Posts on Digital Dam</title><link>https://digitaldam.org/posts/</link><description>Recent content in Posts on Digital Dam</description><image><title>Digital Dam</title><url>https://digitaldam.org/cover.png</url><link>https://digitaldam.org/cover.png</link></image><generator>Hugo -- 0.155.2</generator><language>en-us</language><lastBuildDate>Sun, 30 Nov 2025 07:07:07 +0100</lastBuildDate><atom:link href="https://digitaldam.org/posts/feed.xml" rel="self" type="application/rss+xml"/><item><title>Why I Built Kite: The Framework That Doesn't Exist</title><link>https://digitaldam.org/posts/why-i-built-kite/</link><pubDate>Sun, 30 Nov 2025 07:07:07 +0100</pubDate><guid>https://digitaldam.org/posts/why-i-built-kite/</guid><description>&lt;p&gt;&lt;em&gt;After auditing several &amp;ldquo;AI agent&amp;rdquo; projects, I noticed a pattern: they all rebuilt the same boring infrastructure, none of them shipped features, and every single one trusted the LLM far more than it deserved.&lt;/em&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="the-pattern-i-keep-seeing"&gt;The Pattern I Keep Seeing&lt;/h2&gt;
&lt;p&gt;Here&amp;rsquo;s how every AI agent project I audit goes:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Month 1:&lt;/strong&gt; Beautiful demo. The agent works. The board is impressed. The founder thinks they&amp;rsquo;ll ship in 6 weeks.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Month 2:&lt;/strong&gt; The agent entered an infinite loop and burned $4,000 overnight. OpenAI went down and took the entire product with it. Duplicate requests are processing twice because nobody implemented idempotency.&lt;/p&gt;</description><content:encoded><![CDATA[<p><em>After auditing several &ldquo;AI agent&rdquo; projects, I noticed a pattern: they all rebuilt the same boring infrastructure, none of them shipped features, and every single one trusted the LLM far more than it deserved.</em></p>
<hr>
<h2 id="the-pattern-i-keep-seeing">The Pattern I Keep Seeing</h2>
<p>Here&rsquo;s how every AI agent project I audit goes:</p>
<p><strong>Month 1:</strong> Beautiful demo. The agent works. The board is impressed. The founder thinks they&rsquo;ll ship in 6 weeks.</p>
<p><strong>Month 2:</strong> The agent entered an infinite loop and burned $4,000 overnight. OpenAI went down and took the entire product with it. Duplicate requests are processing twice because nobody implemented idempotency.</p>
<p><strong>Month 3:</strong> The sprint board is full of tickets like &ldquo;Implement circuit breaker for LLM calls&rdquo; and &ldquo;Add Redis caching for embeddings.&rdquo; Zero tickets that deliver value to users.</p>
<p><strong>Month 4:</strong> The team realizes they&rsquo;ve accidentally built 80% of LangChain, badly. Or they&rsquo;re so deep in custom infrastructure that the two engineers who understood it have quit. The agent barely works, and nobody knows why.</p>
<p>Sound familiar?</p>
<p>But the infrastructure mess is a symptom. The root cause is a dangerous assumption baked into how most teams think about agents.</p>
<h2 id="assumption-zero-your-agent-is-not-trustworthy">Assumption Zero: Your Agent Is Not Trustworthy</h2>
<p>This is the assumption that 90% of agent frameworks refuse to say out loud, but 100% of production systems are forced to confront: <strong>the LLM is an unreliable brain.</strong></p>
<p>It will hallucinate. It will misinterpret intent. It will confidently propose actions that are catastrophically wrong. Not because it&rsquo;s &ldquo;bad&rdquo;—because that&rsquo;s what probabilistic systems <em>do</em>.</p>
<blockquote>
<p><strong>LLM output is always a proposal, never an instruction.</strong></p>
</blockquote>
<p>Once you accept this, everything changes. Every design decision in Kite—policy enforcement, audit trails, human-in-the-loop checkpoints, explicit execution boundaries—is not a &ldquo;feature.&rdquo; It&rsquo;s a logical consequence of taking this assumption seriously.</p>
<p>If you remove this assumption, Kite loses its reason to exist. And your production system loses its safety net.</p>
<h2 id="the-three-separations-most-frameworks-refuse-to-make">The Three Separations Most Frameworks Refuse to Make</h2>
<p>Most agent frameworks blend three fundamentally different things into one messy layer: cognition (thinking), decision (choosing), and execution (acting). Kite separates them deliberately.</p>
<h3 id="1-the-agent-has-no-authority">1. The Agent Has No Authority</h3>
<p>The LLM lives exclusively at the cognition layer. It can think, reason, suggest. But it cannot <em>decide</em> and it cannot <em>act</em>. All authority lives in code and in humans. The agent proposes; the system disposes.</p>
<p>This is kernel-level thinking, not app-level thinking. The LLM is an unprivileged process. It can make syscalls (tool requests), but the kernel (Kite&rsquo;s enforcement layer) decides whether to grant them.</p>
<h3 id="2-safety-does-not-live-in-the-prompt">2. Safety Does Not Live in the Prompt</h3>
<p>Prompt engineering is not security. Alignment is not safety. A jailbreak isn&rsquo;t a bug—it&rsquo;s a <em>characteristic</em> of the medium. If your safety strategy is &ldquo;we told the LLM to be careful,&rdquo; you don&rsquo;t have a safety strategy.</p>
<p>Kite&rsquo;s safety is enforced in code: circuit breakers, idempotency keys, kill switches, policy validators, boundary checks. These are deterministic walls that the LLM cannot talk its way through.</p>
<h3 id="3-boring-beats-smart">3. Boring Beats Smart</h3>
<p>Production AI is 99% plumbing, 1% magic. Your competitive advantage isn&rsquo;t your agent framework—it&rsquo;s your domain expertise, your data, your business logic. Kite handles the boring 99% so you can focus on the 1% that matters.</p>
<h2 id="is-kite-strong-enough-to-build-real-apps">Is Kite Strong Enough to Build Real Apps?</h2>
<p>Yes—<em>precisely because</em> it doesn&rsquo;t pretend the LLM is reliable.</p>
<p>The question people actually want answered is: &ldquo;Where does the logic live? Who&rsquo;s responsible when things go wrong?&rdquo;</p>
<p><strong>Logic does not live in the prompt.</strong> In Kite, the prompt is untrusted input. The LLM output is a proposal. The reasoning is a suggestion, not a fact. Business logic lives in code—testable, auditable, rollbackable code.</p>
<p>When the LLM proposes &ldquo;delete instance X,&rdquo; Kite asks deterministic questions: Does the policy allow this? Is the instance in the allowlist? Does a human need to approve? Is this within budget? The prompt never makes the call.</p>
<p>If you put logic in your prompt, you are building something you cannot test, cannot audit, and cannot roll back.</p>
<p><strong>Kite does not assume the agent is correct.</strong> The opposite. Kite&rsquo;s foundational assumption is that the LLM will be wrong, and it will be wrong in dangerous ways. Therefore: the agent is not trusted, the agent has no authority, and the agent bears no responsibility. It&rsquo;s an advisor, not an actor.</p>
<p><strong>The framework does not &ldquo;cover for you&rdquo; when the LLM hallucinates.</strong> Kite doesn&rsquo;t try to fix hallucination with better prompts. It doesn&rsquo;t pretend agent output is reliable truth. Instead, it limits the blast radius. Every action passes through enforcement boundaries. Every step is traced: what the agent said, where the framework refused, who approved what.</p>
<p>The framework is responsible for architecture. It is not responsible for the LLM&rsquo;s thoughts. That&rsquo;s the correct boundary.</p>
<p><strong>So who&rsquo;s responsible when a bug happens?</strong> The system is. Not the LLM. If a bug occurs in a Kite-based application, the root cause should never be &ldquo;the LLM was wrong&rdquo; or &ldquo;the prompt wasn&rsquo;t good enough.&rdquo; It should be: the policy was incomplete, the validator was weak, the boundary had a gap, or the human approved something they shouldn&rsquo;t have. This is production thinking.</p>
<h2 id="what-kite-actually-gives-you">What Kite Actually Gives You</h2>
<p>Stop rebuilding the same infrastructure. Here&rsquo;s what&rsquo;s in the box:</p>
<p><strong>Safety Layer.</strong> Circuit breakers, idempotency, kill switches, rate limiting. An agent that works 99% of the time and burns $10K the other 1% is a liability, not an asset.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="nd">@ai.circuit_breaker.protected</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">process_refund</span><span class="p">(</span><span class="n">order_id</span><span class="p">,</span> <span class="n">amount</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">stripe</span><span class="o">.</span><span class="n">Refund</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">charge</span><span class="o">=</span><span class="n">order_id</span><span class="p">,</span> <span class="n">amount</span><span class="o">=</span><span class="n">amount</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 3 consecutive failures → circuit opens → blocks for 60s</span>
</span></span><span class="line"><span class="cl"><span class="c1"># No infinite loops. No $15K mistakes. Deterministic.</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">result</span> <span class="o">=</span> <span class="n">ai</span><span class="o">.</span><span class="n">idempotency</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">operation_id</span><span class="o">=</span><span class="s2">&#34;user_123_payment&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">func</span><span class="o">=</span><span class="n">process_payment</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">args</span><span class="o">=</span><span class="p">(</span><span class="n">user_id</span><span class="p">,</span> <span class="n">amount</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1"># Run this 10 times. It executes once. The rest return cached results.</span>
</span></span></code></pre></div><p><strong>Memory Systems.</strong> Vector memory (FAISS/ChromaDB), Graph RAG, session memory. Semantic search and multi-hop reasoning out of the box. Lazy-loaded—you only pay for what you use.</p>
<p><strong>Agent Patterns.</strong> ReAct, Plan-Execute, ReWOO, Tree-of-Thoughts. Best practices, not boilerplate.</p>
<p><strong>Pipeline System.</strong> Checkpoints for human approval, intervention points, state persistence. Real human-in-the-loop, not polling loops.</p>
<p><strong>Provider Agnostic.</strong> One API across OpenAI, Anthropic, Groq, Ollama. Your business logic stays the same. The provider is just a config variable.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="c1"># Switch providers in one line. Business logic unchanged.</span>
</span></span><span class="line"><span class="cl"><span class="n">ai</span> <span class="o">=</span> <span class="n">Kite</span><span class="p">(</span><span class="n">config</span><span class="o">=</span><span class="p">{</span><span class="s2">&#34;llm_provider&#34;</span><span class="p">:</span> <span class="s2">&#34;openai&#34;</span><span class="p">})</span>
</span></span><span class="line"><span class="cl"><span class="n">ai</span> <span class="o">=</span> <span class="n">Kite</span><span class="p">(</span><span class="n">config</span><span class="o">=</span><span class="p">{</span><span class="s2">&#34;llm_provider&#34;</span><span class="p">:</span> <span class="s2">&#34;anthropic&#34;</span><span class="p">})</span>
</span></span><span class="line"><span class="cl"><span class="n">ai</span> <span class="o">=</span> <span class="n">Kite</span><span class="p">(</span><span class="n">config</span><span class="o">=</span><span class="p">{</span><span class="s2">&#34;llm_provider&#34;</span><span class="p">:</span> <span class="s2">&#34;groq&#34;</span><span class="p">})</span>
</span></span><span class="line"><span class="cl"><span class="n">ai</span> <span class="o">=</span> <span class="n">Kite</span><span class="p">(</span><span class="n">config</span><span class="o">=</span><span class="p">{</span><span class="s2">&#34;llm_provider&#34;</span><span class="p">:</span> <span class="s2">&#34;ollama&#34;</span><span class="p">})</span>
</span></span></code></pre></div><p><strong>Observability.</strong> Metrics, circuit breaker stats, cost tracking, structured logging. When something goes wrong, you need to know exactly what the agent proposed, where the framework blocked it, and who approved the rest.</p>
<hr>
<h2 id="what-kite-isnt">What Kite Isn&rsquo;t</h2>
<p><strong>Not production-ready.</strong> This is v0.1.0 (alpha). The framework works. We use it internally. But it has no comprehensive test suite, limited tooling, and APIs that might change.</p>
<p><strong>Not a LangChain replacement.</strong> LangChain has 1000+ integrations. Kite has 10 core components. If you need every obscure tool connector, use LangChain.</p>
<p><strong>Not an enterprise platform.</strong> If you need 24/7 support and compliance certifications, use AWS Bedrock.</p>
<p>Kite is for small teams (1–20 engineers) who want to move fast without spending months on plumbing—and who understand that &ldquo;the agent is always right&rdquo; is a dangerous fantasy.</p>
<hr>
<h2 id="conclusion-your-users-dont-care-about-your-infrastructure">Conclusion: Your Users Don&rsquo;t Care About Your Infrastructure</h2>
<p>They care if your agent works. And more importantly, they care if your agent <em>doesn&rsquo;t destroy things when it doesn&rsquo;t work</em>.</p>
<p>Kite doesn&rsquo;t make the agent smarter. Kite makes the agent less dangerous when it&rsquo;s wrong. If you&rsquo;re building infra automation, financial workflows, data pipelines, or anything with compliance requirements—that&rsquo;s not a premium feature. That&rsquo;s the minimum.</p>
<hr>
<p><strong>Kite is open-source (MIT):</strong>
→ GitHub: <a href="https://github.com/thienzz/Kite">github.com/thienzz/Kite</a></p>
<p><strong>Want to learn the architecture philosophy behind it?</strong>
→ Read: <a href="https://github.com/thienzz/Kite"><em>Designing Agentic AI Systems</em></a></p>
<p>Questions? Reach me at <a href="mailto:thien@beevr.ai">thien@beevr.ai</a> or open an issue on GitHub.</p>
]]></content:encoded></item><item><title>You Cannot Prompt Your Way Out of a Race Condition</title><link>https://digitaldam.org/posts/you-cannot-prompt-your-way-out-of-a-race-condition/</link><pubDate>Mon, 20 Oct 2025 07:07:07 +0100</pubDate><guid>https://digitaldam.org/posts/you-cannot-prompt-your-way-out-of-a-race-condition/</guid><description>&lt;p&gt;We spent the last two years building chatbots that could read. Now, the business wants chatbots that can &lt;em&gt;touch&lt;/em&gt; things. If that doesn&amp;rsquo;t terrify you, you haven&amp;rsquo;t been paying attention.&lt;/p&gt;
&lt;p&gt;For a long time, we treated AI like a librarian. Its job was to walk into the stacks (your database), read a book, and summarize it. If the librarian hallucinated, the user got bad advice. It was embarrassing, but the failure was contained. The database remained intact. The bank account was untouched.&lt;/p&gt;</description><content:encoded><![CDATA[<p>We spent the last two years building chatbots that could read. Now, the business wants chatbots that can <em>touch</em> things. If that doesn&rsquo;t terrify you, you haven&rsquo;t been paying attention.</p>
<p>For a long time, we treated AI like a librarian. Its job was to walk into the stacks (your database), read a book, and summarize it. If the librarian hallucinated, the user got bad advice. It was embarrassing, but the failure was contained. The database remained intact. The bank account was untouched.</p>
<p>That era is ending. Stakeholders are no longer impressed that an AI can summarize an invoice. They are asking a question that sounds simple but carries terrifying engineering implications: <em>&ldquo;Why can&rsquo;t the AI just pay the invoice?&rdquo;</em>.</p>
<p>This is the shift from <strong>Read-Only AI</strong> to <strong>Write-Access AI</strong>. And when you hand a stochastic, probabilistic model the keys to your deterministic infrastructure, things can go wrong very quickly.</p>
<h2 id="the-anatomy-of-a-15000-mistake">The Anatomy of a $15,000 Mistake</h2>
<p>To understand the risk, let’s look at a real scenario from a fintech startup I audited recently. They wanted to upgrade their support bot from &ldquo;answering questions&rdquo; to &ldquo;handling refunds&rdquo;.</p>
<p>The setup seemed standard. They gave the Agent access to the Stripe API and a system prompt: <em>&ldquo;If a customer has a valid complaint&hellip; you are authorized to issue a full refund.&rdquo;</em>.</p>
<p>Then came Black Friday.</p>
<ol>
<li>A user messaged: &ldquo;My package is late, I want my money back.&rdquo;.</li>
<li>The Agent correctly identified the intent and called the <code>refund_transaction</code> tool.</li>
<li><strong>The Glitch:</strong> The Stripe API was under heavy load. The refund succeeded on the backend, but the HTTP response timed out before reaching the Agent.</li>
</ol>
<p>In a traditional Python script, we would handle this with a specific error code. But the Agent isn&rsquo;t a script; it&rsquo;s a <strong>reasoning engine</strong>.</p>
<p>The Agent received a generic <code>TimeoutError</code>. It &ldquo;thought&rdquo; to itself: <em>&ldquo;Oh, the tool failed. The user is still angry. I must try again to fulfill my goal.&rdquo;</em>.</p>
<p>It hit retry. Again. And again. It entered a tight loop, hammering the API.</p>
<p>By the time the engineering team noticed the anomaly in the logs—about 45 seconds later—the Agent had successfully issued <strong>50 duplicate refunds</strong> for the same order.</p>
<p>It didn&rsquo;t just refund the purchase price; it drained the merchant account of nearly <strong>$15,000</strong> in duplicate transactions and non-refundable processing fees.</p>
<h2 id="probabilistic-software-vs-deterministic-world">Probabilistic Software vs. Deterministic World</h2>
<p>This story illustrates the fundamental conflict in Agentic Engineering.</p>
<p>For fifty years, software engineering has been built on a bedrock of certainty. If you write <code>if x &gt; 5</code>, it will <em>always</em> be true when x is 6. If it isn&rsquo;t, it&rsquo;s a bug. You find the line of code, you fix it, and it stays fixed.</p>
<p>AI Agents operate on a different physics entirely. They are <strong>Probabilistic Software</strong>.</p>
<ul>
<li><strong>Run 1:</strong> The model sees &ldquo;I am disappointed&rdquo; and offers an apology.</li>
<li><strong>Run 2:</strong> It sees &ldquo;I am disappointed&rdquo; and offers a refund.</li>
<li><strong>Run 3:</strong> It decides &ldquo;disappointed&rdquo; is a threat and flags the user.</li>
</ul>
<p>If you build Agentic systems using the same &ldquo;vibes-based&rdquo; engineering you used for your RAG chatbot—relying on prompt engineering and hope—you will create disasters.</p>
<p><strong>You cannot prompt your way out of a race condition.</strong> You cannot &ldquo;fine-tune&rdquo; away a network timeout loop.</p>
<h2 id="the-solution-the-deterministic-shell">The Solution: The Deterministic Shell</h2>
<p>If we want to build safe agents for the enterprise, we must stop treating them like magic and start treating them like untrusted components.</p>
<p>We need to build a <strong>Deterministic Shell</strong> around the <strong>Probabilistic Core</strong>.</p>
<p><img loading="lazy" src="/posts/you-cannot-prompt-your-way-out-of-a-race-condition/deterministic_shell.png"></p>
<p>The architecture requires us to assume the LLM <em>will</em> make mistakes, and design the system so that those mistakes cannot burn down the building. This means implementing hard engineering guardrails—written in code, not prompts—that the AI cannot override.</p>
<p>Here is what that &ldquo;Shell&rdquo; looks like in Python. It wraps the AI&rsquo;s tool call with <strong>Idempotency</strong> (so duplicate requests are ignored) and <strong>Circuit Breakers</strong> (to stop runaway loops).</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">functools</span> <span class="kn">import</span> <span class="n">wraps</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">hashlib</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">time</span>
</span></span><span class="line"><span class="cl"><span class="c1"># import stripe # Giả định đã import thư viện stripe</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">RefundCircuitBreaker</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">max_failures</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="mi">60</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">failures</span> <span class="o">=</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">max_failures</span> <span class="o">=</span> <span class="n">max_failures</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">timeout</span> <span class="o">=</span> <span class="n">timeout</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__call__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">func</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="nd">@wraps</span><span class="p">(</span><span class="n">func</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="k">def</span> <span class="nf">wrapper</span><span class="p">(</span><span class="n">order_id</span><span class="p">,</span> <span class="n">amount</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">            <span class="c1"># 1. GENERATE IDEMPOTENCY KEY</span>
</span></span><span class="line"><span class="cl">            <span class="n">key_data</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">&#34;</span><span class="si">{</span><span class="n">order_id</span><span class="si">}</span><span class="s2">:</span><span class="si">{</span><span class="n">amount</span><span class="si">}</span><span class="s2">:</span><span class="si">{</span><span class="n">kwargs</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;reason&#39;</span><span class="p">,</span><span class="s1">&#39;&#39;</span><span class="p">)</span><span class="si">}</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">            <span class="n">idempotency_key</span> <span class="o">=</span> <span class="n">hashlib</span><span class="o">.</span><span class="n">sha256</span><span class="p">(</span><span class="n">key_data</span><span class="o">.</span><span class="n">encode</span><span class="p">())</span><span class="o">.</span><span class="n">hexdigest</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">            
</span></span><span class="line"><span class="cl">            <span class="c1"># 2. CHECK CIRCUIT BREAKER</span>
</span></span><span class="line"><span class="cl">            <span class="k">if</span> <span class="n">idempotency_key</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">failures</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                <span class="n">last_failure</span><span class="p">,</span> <span class="n">count</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">failures</span><span class="p">[</span><span class="n">idempotency_key</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">                <span class="k">if</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">last_failure</span> <span class="o">&lt;</span> <span class="bp">self</span><span class="o">.</span><span class="n">timeout</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                    <span class="k">if</span> <span class="n">count</span> <span class="o">&gt;=</span> <span class="bp">self</span><span class="o">.</span><span class="n">max_failures</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                        <span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Circuit breaker open for </span><span class="si">{</span><span class="n">idempotency_key</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">            <span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                <span class="c1"># 3. CALL THE API</span>
</span></span><span class="line"><span class="cl">                <span class="n">result</span> <span class="o">=</span> <span class="n">stripe</span><span class="o">.</span><span class="n">Refund</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">                    <span class="n">charge</span><span class="o">=</span><span class="n">order_id</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                    <span class="n">amount</span><span class="o">=</span><span class="n">amount</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                    <span class="n">idempotency_key</span><span class="o">=</span><span class="n">idempotency_key</span>  <span class="c1"># Critical!</span>
</span></span><span class="line"><span class="cl">                <span class="p">)</span>
</span></span><span class="line"><span class="cl">                
</span></span><span class="line"><span class="cl">                <span class="c1"># Clear failures on success</span>
</span></span><span class="line"><span class="cl">                <span class="k">if</span> <span class="n">idempotency_key</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">failures</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                    <span class="k">del</span> <span class="bp">self</span><span class="o">.</span><span class="n">failures</span><span class="p">[</span><span class="n">idempotency_key</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">                
</span></span><span class="line"><span class="cl">                <span class="k">return</span> <span class="n">result</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">            <span class="k">except</span> <span class="n">stripe</span><span class="o">.</span><span class="n">error</span><span class="o">.</span><span class="n">APIConnectionError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                <span class="c1"># 4. HANDLE FAILURE DETERMINISTICALLY</span>
</span></span><span class="line"><span class="cl">                <span class="k">if</span> <span class="n">idempotency_key</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">failures</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                    <span class="n">last_time</span><span class="p">,</span> <span class="n">count</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">failures</span><span class="p">[</span><span class="n">idempotency_key</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">                    <span class="bp">self</span><span class="o">.</span><span class="n">failures</span><span class="p">[</span><span class="n">idempotency_key</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">(),</span> <span class="n">count</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">                <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                    <span class="bp">self</span><span class="o">.</span><span class="n">failures</span><span class="p">[</span><span class="n">idempotency_key</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">(),</span> <span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">                
</span></span><span class="line"><span class="cl">                <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">failures</span><span class="p">[</span><span class="n">idempotency_key</span><span class="p">][</span><span class="mi">1</span><span class="p">]</span> <span class="o">&gt;=</span> <span class="bp">self</span><span class="o">.</span><span class="n">max_failures</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                    <span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">&#34;Circuit breaker activated - too many failures&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">                
</span></span><span class="line"><span class="cl">                <span class="k">raise</span> 
</span></span><span class="line"><span class="cl">                
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">wrapper</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nd">@RefundCircuitBreaker</span><span class="p">(</span><span class="n">max_failures</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="mi">300</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">process_refund</span><span class="p">(</span><span class="n">order_id</span><span class="p">,</span> <span class="n">amount</span><span class="p">,</span> <span class="n">reason</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">stripe</span><span class="o">.</span><span class="n">Refund</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">charge</span><span class="o">=</span><span class="n">order_id</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">amount</span><span class="o">=</span><span class="n">amount</span>
</span></span><span class="line"><span class="cl">    <span class="p">)</span>
</span></span></code></pre></div><h3 id="why-this-code-matters">Why This Code Matters</h3>
<p>Notice what is happening here:</p>
<ul>
<li><strong>Idempotency:</strong> Even if the LLM enters a panic loop and calls the function 50 times, the <code>idempotency_key</code> ensures the downstream API handles it as <em>one</em> transaction.</li>
<li><strong>Circuit Breaker:</strong> If the API fails 3 times, the Python code throws a <code>ValueError</code> and stops the execution. The LLM is not asked for permission to stop. It is forced to stop.</li>
</ul>
<p>This is not &ldquo;Prompt Engineering.&rdquo; This is <strong>Software Engineering</strong>.</p>
<h2 id="conclusion-autonomy-is-a-bug-not-a-feature">Conclusion: Autonomy is a Bug, Not a Feature</h2>
<p>When stakeholders ask for an &ldquo;Agent,&rdquo; they usually imagine a digital employee that figures everything out on its own. This vision is what sells venture capital rounds. It is also what kills production systems.</p>
<p>In my view, <strong>autonomy is a bug, not a feature</strong>.</p>
<p>Every degree of freedom you give an Agent increases the surface area for errors. If a task can be done with a linear script, do not build an Agent just to look cool.</p>
<p>We are moving from the library (Read-Only) to the real world (Write-Access). The &ldquo;Intern&rdquo; now has the company credit card. It’s time we started engineering the safeguards to match that reality.</p>
]]></content:encoded></item><item><title>How to Scam Your Client with "Resume-Driven Development" as a Service</title><link>https://digitaldam.org/posts/how-to-scam-your-client-with-resume-driven-development-as-a-service/</link><pubDate>Mon, 08 Sep 2025 07:07:07 +0100</pubDate><guid>https://digitaldam.org/posts/how-to-scam-your-client-with-resume-driven-development-as-a-service/</guid><description>&lt;p&gt;My favorite kind of call is the rescue project.&lt;/p&gt;
&lt;p&gt;A new client comes to us, frustrated. They just paid a &amp;lsquo;modern&amp;rsquo; tech agency for a platform that&amp;rsquo;s completely unmaintainable.&lt;/p&gt;
&lt;p&gt;We pop the hood, and it&amp;rsquo;s &lt;em&gt;magnificent&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s a &lt;em&gt;state-of-the-art&lt;/em&gt; &amp;ldquo;Cloud-Native,&amp;rdquo; &amp;ldquo;AI-Powered,&amp;rdquo; &amp;ldquo;Event-Driven,&amp;rdquo; &amp;ldquo;Serverless&amp;rdquo; system. A stunning monument to modern engineering, designed to handle 10 million concurrent users for a B2B app that has 500.&lt;/p&gt;
&lt;p&gt;The previous agency didn&amp;rsquo;t solve the client&amp;rsquo;s problem. They solved their &lt;em&gt;own&lt;/em&gt; problem: &lt;strong&gt;how to get &amp;ldquo;GenAI&amp;rdquo; &amp;ldquo;Kubernetes&amp;rdquo; and &amp;ldquo;VectorDB&amp;rdquo; onto their developers&amp;rsquo; resumes.&lt;/strong&gt;&lt;/p&gt;</description><content:encoded><![CDATA[<p>My favorite kind of call is the rescue project.</p>
<p>A new client comes to us, frustrated. They just paid a &lsquo;modern&rsquo; tech agency for a platform that&rsquo;s completely unmaintainable.</p>
<p>We pop the hood, and it&rsquo;s <em>magnificent</em>.</p>
<p>It&rsquo;s a <em>state-of-the-art</em> &ldquo;Cloud-Native,&rdquo; &ldquo;AI-Powered,&rdquo; &ldquo;Event-Driven,&rdquo; &ldquo;Serverless&rdquo; system. A stunning monument to modern engineering, designed to handle 10 million concurrent users for a B2B app that has 500.</p>
<p>The previous agency didn&rsquo;t solve the client&rsquo;s problem. They solved their <em>own</em> problem: <strong>how to get &ldquo;GenAI&rdquo; &ldquo;Kubernetes&rdquo; and &ldquo;VectorDB&rdquo; onto their developers&rsquo; resumes.</strong></p>
<p>Welcome to the most toxic trend in our industry: <strong>Resume-Driven Development as a Service (RDDaaS).</strong></p>
<h3 id="the-rddaas-playbook-how-to-scam-your-client">The RDDaaS Playbook (How to Scam Your Client)</h3>
<p>It&rsquo;s a brilliant, cynical business model, and it works like this:</p>
<p><strong>Step 1: The Future-Proof Pitch</strong>
You show the non-technical client a beautiful slide deck. You blind them with charts showing massive <strong>ROI</strong>, impressive <strong>KPIs</strong>, and tech buzzwords like &ldquo;infinitely scalable,&rdquo; &ldquo;AI-powered,&rdquo; and &ldquo;Future GenAI-ready.&rdquo;</p>
<p><strong>Step 2: The Training Ground</strong>
Your mid-level developers, who have only <em>read</em> about &ldquo;RAG Pipelines&rdquo; and &ldquo;Kubernetes,&rdquo; now get to learn them <em>right on the client&rsquo;s dime</em>.</p>
<p><strong>Step 3: The Over-Engineering Phase</strong>
The project, which should have been a 3-month simple CRUD app, now takes 9 months. They&rsquo;re not solving the client&rsquo;s problem. They&rsquo;re solving Google&rsquo;s problem <em>and</em> Meta&rsquo;s problem.</p>
<p><strong>Step 4: The Successful Handoff</strong>
The system is delivered. The chatbot confidently hallucinates the wrong phone number. The vendor&rsquo;s developers proudly update their LinkedIn profiles. The vendor gets paid.</p>
<p><strong>Step 5: The Victim&rsquo;s New Life</strong></p>
<p>The client is now the proud owner of an intelligent <em>thing</em> that requires a team of ex-FAANG SREs just to add a new form field.</p>
<p>Their new life includes:</p>
<ol>
<li>
<p><strong>The Hiring Nightmare:</strong> Their in-house &ldquo;IT guy,&rdquo; John, quits after seeing a <code>terraform</code> script that spawns 15 AWS services (EKS, Lambda, VectorDBs&hellip;). The first <em>real</em> candidate who understands this mess demands $250k.</p>
</li>
<li>
<p><strong>The Wrong Web-Scale Performance:</strong> The app is <em>slow</em>. A simple request now makes 5 <code>HTTP</code> calls through a service mesh <em>and</em> a $0.02 call to <code>gpt-4-turbo</code> just to say &ldquo;Hello.&rdquo;</p>
</li>
<li>
<p><strong>The &ldquo;WTF&rdquo; Cloud Bill:</strong>
The first few bills look okay (thanks, AWS Free Tier). Then, <strong>Month 4 hits.</strong>
(Trust me, you <em>really</em> don&rsquo;t want this.)
The <em>real</em> bill arrives, full of <strong>NAT Gateways</strong>, <strong>EKS Control Planes</strong>, <strong>Managed Vector DBs</strong> (99% empty), and <strong>OpenAI API fees</strong>. Their <em>plumbing</em> costs 20x more than their <em>app</em>.</p>
</li>
<li>
<p><strong>The Simple Change Request:</strong> The client asks: &ldquo;Can we just add a normal search bar? The AI one is too expensive.&rdquo;</p>
<ul>
<li>In the old monolith: 1 hour.</li>
<li>In this modern system: &ldquo;Uh, that&rsquo;s not how the RAG pipeline works. We&rsquo;d have to re-architect the whole data flow. That&rsquo;ll be a new 2 sprints.&rdquo;</li>
</ul>
</li>
</ol>
<hr>
<h3 id="the-root-cause-why-did-this-happen">The Root Cause: Why Did This Happen?</h3>
<p>Why does this scam always work? It takes two:</p>
<p><strong>1. The Vendor&rsquo;s Hidden Agenda:</strong>
The client isn&rsquo;t the <em>customer</em>. They&rsquo;re the <em>training ground</em>. The agency didn&rsquo;t solve the client&rsquo;s problem &ldquo;I need a reliable app&rdquo;. They solved their <em>own</em> problem &ldquo;Our developers need &lsquo;GenAI&rsquo; and &lsquo;K8s&rsquo; on their resumes&rdquo;.</p>
<p><strong>2. The Client&rsquo;s Blinders:</strong>
The client <em>lets</em> this happen. They get <strong>blinded by the ROI slides and tech buzzwords</strong>. They&rsquo;re so terrified of being &ldquo;legacy&rdquo; that they <em>actively forget</em> to ask the two boring, critical questions:</p>
<ul>
<li>&ldquo;What is my <em>actual</em> problem?&rdquo;</li>
<li>&ldquo;What&rsquo;s the <strong>Year 2 maintenance and cloud bill</strong> going to look like?&rdquo;</li>
</ul>
<p>The architecture was optimized for <strong>Imaginary Scale</strong> and <strong>Imaginary Intelligence</strong>, not <strong>Current Maintainability</strong>.</p>
<hr>
<h3 id="how-to-not-get-scammed">How to Not Get Scammed</h3>
<p>This isn&rsquo;t just a rant. This is a pattern I see many times. It&rsquo;s the entire reason my philosophy is built on <strong>Boring Technology</strong></p>
<p>The antidote to RDDaaS is to stop being impressed by buzzwords and start asking the  questions that actually matter.</p>
<p>Next time a vendor pitches you a &ldquo;Cloud-Native AI-Powered&rdquo; solution, just ask them these things:</p>
<ul>
<li>&ldquo;Can you justify <em>why</em> we need Kubernetes for this?&rdquo;</li>
<li>&ldquo;Walk me through the full development process—from ticket to deployment—for adding one new database field to the &lsquo;User&rsquo; model.&rdquo;</li>
<li>&ldquo;What is the fallback mechanism when the AI/RAG pipeline fails or hallucinates?</li>
<li>&ldquo;Show me the <strong>Year 2 cloud bill</strong> for this architecture.</li>
<li>&ldquo;What kind of engineer do I need to hire to <strong>maintain</strong> this after you leave?&rdquo;</li>
</ul>
<p>You don&rsquo;t need <strong>FAANG-scale complexity</strong>. You need <strong>Product-Market Fit</strong>.</p>
<p>Start buying <strong>maintainable solutions</strong> that actually let you find it.</p>
]]></content:encoded></item><item><title>Boring Technology | Your AI is the 1% (Don't Forget the 99%)</title><link>https://digitaldam.org/posts/your-ai-is-the-1-percent-dont-forget-the-99-percent/</link><pubDate>Fri, 05 Sep 2025 07:07:07 +0100</pubDate><guid>https://digitaldam.org/posts/your-ai-is-the-1-percent-dont-forget-the-99-percent/</guid><description>&lt;p&gt;I’m seeing a worrying pattern lately. Almost every product discussion now starts with, &amp;ldquo;So, how are we using AI for this?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;re all a bit drunk on the hype. We&amp;rsquo;re treating AI like magic dust we can just sprinkle on any problem.&lt;/p&gt;
&lt;p&gt;Clients want a Youtube or Netflix level recommendation engine on day one. Devs, quite reasonably, are excited to put the shiniest new Vector DB on their resumes.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;re starting backward. We&amp;rsquo;re trying to build the penthouse while the foundation is still a sketch on a napkin.&lt;/p&gt;</description><content:encoded><![CDATA[<p>I’m seeing a worrying pattern lately. Almost every product discussion now starts with, &ldquo;So, how are we using AI for this?&rdquo;</p>
<p>We&rsquo;re all a bit drunk on the hype. We&rsquo;re treating AI like magic dust we can just sprinkle on any problem.</p>
<p>Clients want a Youtube or Netflix level recommendation engine on day one. Devs, quite reasonably, are excited to put the shiniest new Vector DB on their resumes.</p>
<p>We&rsquo;re starting backward. We&rsquo;re trying to build the penthouse while the foundation is still a sketch on a napkin.</p>
<p>The reason this &ldquo;penthouse-first&rdquo; approach fails is that we’re ignoring the reality of what an AI actually is. We treat it like a <em>magic box</em> we can just plug in, but it&rsquo;s not.</p>
<p>You can&rsquo;t &ldquo;set-it-and-forget-it.&rdquo; You have to <em>manage</em> it.</p>
<ul>
<li><strong>The world changes, there is a new trend??</strong> Your AI employee will be confused because it has never seen this. It starts to drift right from Day 2.</li>
<li><strong>It makes a mistake?</strong> It doesn&rsquo;t self-correct. You have to build a <em>feedback loop</em> to <em>fix</em> it.</li>
<li><strong>You want it to be smarter?</strong> You have to re-train it, and that is a continuous operational cost (OpEx).</li>
</ul>
<p>When you understand AI is something you must <strong>maintain</strong> and not something you <strong>own</strong>, your entire architecture changes.</p>
<hr>
<h2 id="my-take-99-boring-1-ai">My Take: 99% Boring, 1% AI</h2>
<p>So if AI is this high-maintenance &ldquo;penthouse,&rdquo; what&rsquo;s the &ldquo;<strong>foundation</strong>&rdquo;?</p>
<p>My personal philosophy is simple: <strong>Never use expensive AI to do a job a good old SQL query can do.</strong></p>
<p>The value isn&rsquo;t in a single &ldquo;black box&rdquo; AI. The value is in a <strong>hybrid system</strong> where boring rules and code do 99% of the heavy lifting. The AI is just the 1% of <strong>spice</strong> you add at the very end.</p>
<hr>
<h2 id="a-practical-3-phase-boring-roadmap-my-small-advice">A Practical 3-Phase &ldquo;Boring&rdquo; Roadmap (My Small Advice)</h2>
<p>If I&rsquo;m building a &ldquo;smart&rdquo; matching system, I will not start with AI. I build it in phases, layering complexity only when necessary.</p>
<p>Let&rsquo;s walk through this with <strong>one single example: building a &ldquo;Smart Candidate Search&rdquo; for a recruitment platform.</strong></p>
<h3 id="phase-1-the-foundation-of-correctness-sql-rules">Phase 1: The Foundation of Correctness (SQL Rules)</h3>
<p>This is the 99% of the work. The goal here is <strong>100% Correctness</strong>, enforcing the non-negotiable rules of the business. An &ldquo;AI-first&rdquo; system might think a candidate in Ho Chi Minh City is a &ldquo;great match&rdquo; for a job in Hanoi, but your business rules say that&rsquo;s unacceptable. This layer is the &ldquo;bouncer&rdquo; at the door.</p>
<ul>
<li><strong>In Practice:</strong> <code>WHERE location = 'Hanoi' AND salary_request &lt; 5000 AND years_experience &gt;= 3.</code></li>
</ul>
<p>This filters on <strong>facts</strong>, not suggestions. <code>location</code> and <code>years_experience</code> are binary facts. We <em>must</em> do this first to avoid wasting expensive AI compute on candidates who are an immediate &ldquo;no.&rdquo; This is your blazing fast, dirt cheap foundation.</p>
<h3 id="phase-2-the-layer-of-relevance-full-text-search">Phase 2: The Layer of Relevance (Full-Text Search)</h3>
<p>This is still the 99% of the work. Now that we have a list of <em>correct</em> candidates (e.g., in Hanoi, &gt;3 years exp), we solve for <strong>Relevance</strong>. Our SQL filter was correct, but &ldquo;dumb&rdquo; about human language. A recruiter searching &ldquo;programmer&rdquo; won&rsquo;t find &ldquo;developer.&rdquo;</p>
<ul>
<li><strong>In Practice:</strong> Use <code>Elasticsearch</code> or <code>BM25</code> (built into Postgres) on the <em>resume text</em> so &ldquo;java programmer&rdquo; matches &ldquo;java developer.&rdquo;</li>
</ul>
<p>This 20-year-old boring tech is built <em>specifically</em> to solve the synonym problem without the cost or &ldquo;black box&rdquo; nature of AI. Critically, it&rsquo;s <strong>explainable</strong>—you know <em>why</em> that resume showed up.</p>
<h3 id="phase-3-the-1-layer-of-nuance-the-ai">Phase 3: The 1% Layer of &ldquo;Nuance&rdquo; (The AI)</h3>
<p><em>Only now</em>, after our 10 million candidates have been filtered down to 1,000 <em>correct</em> (Phase 1) and <em>relevant</em> (Phase 2) candidates, do we add the final 1%: <strong>Nuance</strong>. Full-Text Search is great with <em>words</em>, but not <em>ideas</em>. This is the 1% problem AI is actually good at.</p>
<ul>
<li><strong>In Practice:</strong> The AI&rsquo;s job is to know that a recruiter searching for &ldquo;Senior Java&rdquo; is <em>conceptually similar</em> to candidates strong in &ldquo;Spring Boot&rdquo; or &ldquo;Scala&rdquo; (even if they didn&rsquo;t type those words).</li>
</ul>
<p>This is the key to performance: an AI ranking 10 million items is impossibly slow and expensive. An AI ranking 1,000 correct and relevant items is near-instantaneous and cost-effective.</p>
<p>The AI is not the filter; it&rsquo;s the <strong>re-ranker</strong>. We <em>contain</em> the expensive, unpredictable AI. We let it play <em>only</em> within the 1,000 safe results our boring filters found. This gives you the correctness of SQL plus the nuance of AI, but your costs drop 99% and your speed goes up 1000x.</p>
<hr>
<h2 id="final-thought">Final Thought</h2>
<p>The hype will pass. The core of good engineering isn&rsquo;t using the shiniest tool. It&rsquo;s the wisdom of knowing <em>how</em> and <em>when</em> to solve a problem.</p>
<p>Your &ldquo;AI Strategy&rdquo; shouldn&rsquo;t be about AI. It should be about the <em>system</em>.</p>
<p>Stop building the penthouse first. Build your boring, indestructible foundation.</p>
]]></content:encoded></item><item><title>Your Perfect AI Headshot is Now a Red Flag</title><link>https://digitaldam.org/posts/your-perfect-ai-headshot-is-now-a-red-flag/</link><pubDate>Tue, 02 Sep 2025 07:07:07 +0100</pubDate><guid>https://digitaldam.org/posts/your-perfect-ai-headshot-is-now-a-red-flag/</guid><description>&lt;p&gt;I scroll LinkedIn, what do I see?&lt;/p&gt;
&lt;p&gt;Perfect headshots. Studio lighting, crazy sharp, precise smiles, not a hair out of place.
Perfect posts and comments. Flawless grammar, zero typos.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s all clean, polished and soulless.&lt;/p&gt;
&lt;p&gt;This is the &lt;strong&gt;Great AI Flood.&lt;/strong&gt; The cost of &lt;em&gt;looking&lt;/em&gt; competent, of &lt;em&gt;sounding&lt;/em&gt; smart, has just dropped to zero.&lt;/p&gt;
&lt;p&gt;And this is where the problem begins.&lt;/p&gt;
&lt;h3 id="the-collapse-of-signal"&gt;The Collapse of Signal&lt;/h3&gt;
&lt;p&gt;In economics, when you flood a market, the asset&amp;rsquo;s value collapses. For many years, polished content was a signal of &lt;em&gt;professionalism&lt;/em&gt;. Now that AI can produce it instantly, polish has just become &lt;em&gt;noise&lt;/em&gt;.&lt;/p&gt;</description><content:encoded><![CDATA[<p>I scroll LinkedIn, what do I see?</p>
<p>Perfect headshots. Studio lighting, crazy sharp, precise smiles, not a hair out of place.
Perfect posts and comments. Flawless grammar, zero typos.</p>
<p>It&rsquo;s all clean, polished and soulless.</p>
<p>This is the <strong>Great AI Flood.</strong> The cost of <em>looking</em> competent, of <em>sounding</em> smart, has just dropped to zero.</p>
<p>And this is where the problem begins.</p>
<h3 id="the-collapse-of-signal">The Collapse of Signal</h3>
<p>In economics, when you flood a market, the asset&rsquo;s value collapses. For many years, polished content was a signal of <em>professionalism</em>. Now that AI can produce it instantly, polish has just become <em>noise</em>.</p>
<p>The burden of effort has shifted from the <em>creator</em> to the <em>consumer</em>.</p>
<p>My mental energy as a reader is no longer spent understanding your idea. It&rsquo;s spent on a exhausting calculation: <strong>Is this real?</strong></p>
<ul>
<li>Is this a real photo, or Stable Diffusion, or Gemini?</li>
<li>Is this a real insight, or a ChatGPT] remix of the top 10 blog posts?</li>
<li>Is this a real comment, or a bot?</li>
</ul>
<p>This is the collapse of the signal-to-noise ratio. And it’s eroding the one thing that matters: <strong>Trust</strong>.</p>
<h3 id="the-return-of-rough-edges">The Return of &ldquo;Rough Edges&rdquo;</h3>
<p>When a signal becomes cheap, it&rsquo;s no longer a reliable signal.</p>
<p>For years, a professional headshot was a signal: &ldquo;I care enough about my career to spend $200.&rdquo;
Today, a perfect AI headshot is a signal: &ldquo;I care enough to spend 30 seconds on a prompt.&rdquo;</p>
<p>When &ldquo;polish&rdquo; is cheap, <strong>rough edges</strong> become the new status symbol.</p>
<p>A slightly blurry selfie from your office? I think it&rsquo;s <strong>real</strong>. It&rsquo;s <strong>Proof of Effort</strong>.
A post with a small typo or an awkward sentence? It&rsquo;s <strong>Proof of Thought</strong>.</p>
<p>But here&rsquo;s the <em>deeper</em> signal, the one that really proves expertise: <strong>The Bumps and Bruises.</strong></p>
<p>An AI-generated case study is perfect: &ldquo;We increased ROI by 400%.&rdquo; It&rsquo;s clean. It&rsquo;s also unbelievable.</p>
<p>The <em>real</em> signal of human expertise isn&rsquo;t perfection; it&rsquo;s the messy story. It&rsquo;s the <strong>Proof of Experience</strong>.</p>
<blockquote>
<p>&ldquo;This was a tough project. We chose the wrong database and had to migrate for 3 weeks. Here&rsquo;s what we learned&hellip;&rdquo;</p>
</blockquote>
<p>An AI can generate a plausible-sounding failure story. It can say, &lsquo;we chose the wrong database,&rsquo; and even invent a &lsquo;stubborn VP of Engineering&rsquo; or a &lsquo;4 AM call.&rsquo; It&rsquo;s a perfect remix of the thousands of &lsquo;war stories&rsquo; it was trained on.</p>
<p>But that&rsquo;s a <strong>script</strong>, not a <strong>memory</strong>.</p>
<p>The real signal isn&rsquo;t <em>hearing</em> the story anymore; it&rsquo;s <strong>interrogating</strong> it. Ask <em>why</em>. Ask for the specific details. &lsquo;What <em>exact</em> query failed?&rsquo; &lsquo;Why Postgres and not MSSQL?&rsquo; An AI&rsquo;s story is perfect most of the time, but it collapses under deep, specific questioning.</p>
<p>And even then, where is the <strong>proof of history</strong>? An AI can&rsquo;t show you the messy GitHub commit history. It doesn&rsquo;t have three <em>real</em> former colleagues you can call for verification.</p>
<p>An AI can <em>talk</em> about the mess. It can&rsquo;t <em>prove</em> the mess. It has no <strong>verifiable bumps and bruises</strong>.</p>
<h3 id="whats-next">What&rsquo;s Next?</h3>
<p>I keep thinking about what&rsquo;s next. Honestly, I don&rsquo;t think it&rsquo;s going to be a <em>better</em> AI. I bet it&rsquo;s going to be the mess of tools and habits we come up with to <em>deal with</em> this AI flood.</p>
<p>For example, it seems like we&rsquo;ll go back to trusting things that are hard to fake. Why do I feel like a podcast is more real than a blog post lately? <strong>Because it takes effort.</strong> You can&rsquo;t just generate one in 10 seconds. We&rsquo;ll instinctively trust formats that are expensive in time and energy to produce.</p>
<p>This also means those hard-to-join communities suddenly become more valuable. Their annoying rules accidentally make for the best bot filters.</p>
<p>In the end, I guess it all comes down to <strong>origin.</strong> I find myself caring less about <em>how polished</em> something looks and more about <em>where it came from</em>. It wouldn&rsquo;t be surprising if we start seeing tools that can actually certify that a real human wrote this or this photo was taken by a Sony Camera.</p>
<h3 id="conclusion">Conclusion</h3>
<p>In the Age of AI, polish is no longer the signal. It&rsquo;s just the baseline.</p>
<p>A polished surface with no proof is the new <em>noise</em>. A messy <em>truth</em> with no polish is just sloppy work.</p>
<p>The real signal of trust isn&rsquo;t about <em>being</em> messy.</p>
<p>It&rsquo;s about proving that your polish is <strong>earned</strong>.</p>
]]></content:encoded></item><item><title>Boring Technology | Postgres is Your new Tech Stack</title><link>https://digitaldam.org/posts/postgres-is-your-new-tech-stack/</link><pubDate>Sat, 30 Aug 2025 07:07:07 +0100</pubDate><guid>https://digitaldam.org/posts/postgres-is-your-new-tech-stack/</guid><description>&lt;p&gt;Imagine we&amp;rsquo;re building a simple e-commerce site, &amp;ldquo;SimpleStore.&amp;rdquo; The initial planning meeting identifies our needs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A database for users, products, and orders. (Easy: &lt;strong&gt;Postgres&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;A way to send confirmation emails when an order is placed. (Okay, add a &lt;strong&gt;RabbitMQ&lt;/strong&gt; job queue).&lt;/li&gt;
&lt;li&gt;A cache for the homepage&amp;rsquo;s &amp;ldquo;Top 10 Products.&amp;rdquo; (Fine, add &lt;strong&gt;Redis&lt;/strong&gt;).&lt;/li&gt;
&lt;li&gt;A full-text search bar. (Ugh. Add &lt;strong&gt;Elasticsearch&lt;/strong&gt;).&lt;/li&gt;
&lt;li&gt;A nightly job to aggregate sales reports. (Spin up a &lt;strong&gt;Cron&lt;/strong&gt; server).&lt;/li&gt;
&lt;li&gt;A new AI feature to find &amp;ldquo;similar&amp;rdquo; products. (The VCs will love this! Add &lt;strong&gt;Pinecone&lt;/strong&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Before writing a single feature, our architecture diagram is a mess. We have six different systems to provision, monitor, secure, and scale. We have what the team at Supabase calls &lt;strong&gt;&amp;ldquo;dotted line complexity&amp;rdquo;&lt;/strong&gt;—the invisible, brittle connections &lt;em&gt;between&lt;/em&gt; these systems that will inevitably break in production.&lt;/p&gt;</description><content:encoded><![CDATA[<p>Imagine we&rsquo;re building a simple e-commerce site, &ldquo;SimpleStore.&rdquo; The initial planning meeting identifies our needs:</p>
<ul>
<li>A database for users, products, and orders. (Easy: <strong>Postgres</strong>)</li>
<li>A way to send confirmation emails when an order is placed. (Okay, add a <strong>RabbitMQ</strong> job queue).</li>
<li>A cache for the homepage&rsquo;s &ldquo;Top 10 Products.&rdquo; (Fine, add <strong>Redis</strong>).</li>
<li>A full-text search bar. (Ugh. Add <strong>Elasticsearch</strong>).</li>
<li>A nightly job to aggregate sales reports. (Spin up a <strong>Cron</strong> server).</li>
<li>A new AI feature to find &ldquo;similar&rdquo; products. (The VCs will love this! Add <strong>Pinecone</strong>).</li>
</ul>
<p>Before writing a single feature, our architecture diagram is a mess. We have six different systems to provision, monitor, secure, and scale. We have what the team at Supabase calls <strong>&ldquo;dotted line complexity&rdquo;</strong>—the invisible, brittle connections <em>between</em> these systems that will inevitably break in production.</p>
<p>This is the case for collapsing your stack. Not by returning to a monolith, but by realizing that the &ldquo;boring&rdquo; tool you started with, PostgreSQL, can <strong>replace almost all of it.</strong></p>
<p>Let&rsquo;s rebuild the &ldquo;SimpleStore&rdquo; and see how.</p>
<h2 id="a-cohesive-example-building-simplestore-with-just-postgres">A Cohesive Example: Building &ldquo;SimpleStore&rdquo; with <em><strong>Just</strong></em> Postgres</h2>
<p>Instead of a sprawl, we&rsquo;ll build each feature by extending Postgres. The magic isn&rsquo;t just in <em>replacing</em> tools; it&rsquo;s in how the <em>connections</em> between them become simple and atomic.</p>
<h3 id="1-the-core-the-order-and-the-email-the-acid-test">1. The Core: The Order and the Email (The ACID Test)</h3>
<p>This is the most critical link. In a sprawled stack, you&rsquo;d have this dreaded code:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="c1"># The &#34;Dotted Line&#34; Failure Point</span>
</span></span><span class="line"><span class="cl"><span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="n">order</span> <span class="o">=</span> <span class="n">db</span><span class="o">.</span><span class="n">create_order</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>  <span class="c1"># Step 1: Postgres COMMIT</span>
</span></span><span class="line"><span class="cl">    <span class="n">queue</span><span class="o">.</span><span class="n">send_email_job</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>     <span class="c1"># Step 2: RabbitMQ PUSH</span>
</span></span><span class="line"><span class="cl"><span class="k">except</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># What if Step 2 fails? The user paid but gets no email.</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># What if Step 1 fails? The code is a mess.</span>
</span></span><span class="line"><span class="cl">    <span class="n">handle_complex_rollback</span><span class="p">()</span>
</span></span></code></pre></div><p>With Postgres, this is one atomic unit. We&rsquo;ll use the FOR UPDATE SKIP LOCKED pattern to create a powerful job queue <em>inside</em> the database.</p>
<p><strong>Step 1: Create the tables</strong></p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-SQL" data-lang="SQL"><span class="line"><span class="cl"><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">id</span><span class="w"> </span><span class="nb">SERIAL</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">product_id</span><span class="w"> </span><span class="nb">INT</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">user_id</span><span class="w"> </span><span class="nb">INT</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">created_at</span><span class="w"> </span><span class="n">TIMESTAMPTZ</span><span class="w"> </span><span class="k">DEFAULT</span><span class="w"> </span><span class="n">NOW</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">jobs</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">id</span><span class="w"> </span><span class="nb">SERIAL</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">queue</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">DEFAULT</span><span class="w"> </span><span class="s1">&#39;default&#39;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">payload</span><span class="w"> </span><span class="n">JSONB</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">status</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">DEFAULT</span><span class="w"> </span><span class="s1">&#39;queued&#39;</span><span class="p">,</span><span class="w"> </span><span class="c1">-- queued, running, failed
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">run_at</span><span class="w"> </span><span class="n">TIMESTAMPTZ</span><span class="w"> </span><span class="k">DEFAULT</span><span class="w"> </span><span class="n">NOW</span><span class="p">()</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">);</span><span class="w">
</span></span></span></code></pre></div><p>Step 2: The Magic (One Transaction)</p>
<p>Now, our application logic becomes beautifully simple:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-SQL" data-lang="SQL"><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">BEGIN</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">-- Insert the order
</span></span></span><span class="line"><span class="cl"><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="p">(</span><span class="n">product_id</span><span class="p">,</span><span class="w"> </span><span class="n">user_id</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="mi">123</span><span class="p">,</span><span class="w"> </span><span class="mi">456</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="n">RETURNING</span><span class="w"> </span><span class="n">id</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">-- Use the returned ID to create a job IN THE SAME TRANSACTION
</span></span></span><span class="line"><span class="cl"><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">jobs</span><span class="w"> </span><span class="p">(</span><span class="n">queue</span><span class="p">,</span><span class="w"> </span><span class="n">payload</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">&#39;emails&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;{&#34;type&#34;: &#34;order_confirmation&#34;, &#34;order_id&#34;: 1}&#39;</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">COMMIT</span><span class="p">;</span><span class="w">
</span></span></span></code></pre></div><p>This entire block either succeeds or fails <strong>together</strong>. It is <em>impossible</em> to create an order without its corresponding email job. We have just achieved perfect data integrity, something that is incredibly difficult with separate systems.</p>
<p>A Go or Python worker can now pull from this queue with a simple, highly-concurrent query:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-SQL" data-lang="SQL"><span class="line"><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">payload</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">jobs</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">status</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;queued&#39;</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">run_at</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">FOR</span><span class="w"> </span><span class="k">UPDATE</span><span class="w"> </span><span class="n">SKIP</span><span class="w"> </span><span class="n">LOCKED</span><span class="p">;</span><span class="w">
</span></span></span></code></pre></div><table>
  <thead>
      <tr>
          <th style="text-align: left"><strong>Feature</strong></th>
          <th style="text-align: left"><strong>PostgreSQL (The &ldquo;Boring&rdquo; Way)</strong></th>
          <th style="text-align: left"><strong>Dedicated Stack (RabbitMQ / Kafka)</strong></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: left"><strong>Data Integrity</strong></td>
          <td style="text-align: left"><strong>Perfect (Atomic).</strong> The job and the order are in one transaction.</td>
          <td style="text-align: left"><strong>Poor (Eventual).</strong> Requires complex two-phase commits or retry logic.</td>
      </tr>
      <tr>
          <td style="text-align: left"><strong>Complexity</strong></td>
          <td style="text-align: left">Low. It&rsquo;s just another table in your schema.</td>
          <td style="text-align: left">High. A separate, complex system to manage, monitor, and scale.</td>
      </tr>
      <tr>
          <td style="text-align: left"><strong>Throughput</strong></td>
          <td style="text-align: left">Moderate. Excellent for most apps, but not Kafka-scale.</td>
          <td style="text-align: left">Extremely High. Built for massive event streaming.</td>
      </tr>
      <tr>
          <td style="text-align: left"><strong>Verdict</strong></td>
          <td style="text-align: left"><strong>Wins for 90% of apps.</strong> The trade-off for lower <em>peak</em> throughput is massive gains in simplicity and reliability.</td>
          <td style="text-align: left"></td>
      </tr>
  </tbody>
</table>
<h3 id="2-the-homepage-caching-top-products-replacing-redis">2. The Homepage: Caching Top Products (Replacing Redis)</h3>
<p>Our homepage needs to show the Top 10 products. This query is slow, so we need a cache. Instead of adding Redis, we&rsquo;ll use an <strong>UNLOGGED TABLE</strong>.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-SQL" data-lang="SQL"><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">CREATE</span><span class="w"> </span><span class="n">UNLOGGED</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">cache_top_products</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">id</span><span class="w"> </span><span class="nb">INT</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">name</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">sales_count</span><span class="w"> </span><span class="nb">BIGINT</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">cached_at</span><span class="w"> </span><span class="n">TIMESTAMPTZ</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">);</span><span class="w">
</span></span></span></code></pre></div><p>An UNLOGGED table does not write to the Write-Ahead Log (WAL). This makes writes incredibly fast. The catch? If the server crashes, <strong>the table is automatically truncated.</strong></p>
<p>This sounds just like the durability model of Redis! It&rsquo;s the perfect, high-speed, non-durable store for transient data.</p>
<table>
  <thead>
      <tr>
          <th style="text-align: left"><strong>Feature</strong></th>
          <th style="text-align: left"><strong>PostgreSQL (UNLOGGED TABLE)</strong></th>
          <th style="text-align: left"><strong>Dedicated Stack (Redis)</strong></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td style="text-align: left"><strong>Speed</strong></td>
          <td style="text-align: left">Very Fast. Not <em>as</em> fast as in-memory, but avoids network I/O.</td>
          <td style="text-align: left">Extremely Fast. In-memory, sub-millisecond latency.</td>
      </tr>
      <tr>
          <td style="text-align: left"><strong>Durability</strong></td>
          <td style="text-align: left"><strong>None (by design).</strong> Wiped on crash.</td>
          <td style="text-align: left"><strong>None (by design).</strong> Wiped on crash (unless persistence is on).</td>
      </tr>
      <tr>
          <td style="text-align: left"><strong>Simplicity</strong></td>
          <td style="text-align: left"><strong>High.</strong> It&rsquo;s just a SQL table. No new clients, ports, or auth.</td>
          <td style="text-align: left">Low. A separate service to manage, secure, and connect to.</td>
      </tr>
      <tr>
          <td style="text-align: left"><strong>Verdict</strong></td>
          <td style="text-align: left"><strong>Wins for most caching.</strong> You trade a few microseconds of latency for a huge reduction in stack complexity.</td>
          <td style="text-align: left"></td>
      </tr>
  </tbody>
</table>
<h3 id="3-the-search-bar--reports-replacing-elasticsearch--cron">3. The Search Bar &amp; Reports: (Replacing Elasticsearch &amp; Cron)</h3>
<p>We can continue this pattern for our other features:</p>
<ul>
<li><strong>Search Bar:</strong> Instead of Elasticsearch, we use Postgres&rsquo;s built-in <strong>Full-Text Search</strong>.</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-SQL" data-lang="SQL"><span class="line"><span class="cl"><span class="c1">-- Add a tsvector column for product descriptions
</span></span></span><span class="line"><span class="cl"><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">products</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="k">COLUMN</span><span class="w"> </span><span class="n">search_vector</span><span class="w"> </span><span class="n">tsvector</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">-- Keep it updated with a trigger
</span></span></span><span class="line"><span class="cl"><span class="k">UPDATE</span><span class="w"> </span><span class="n">products</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">search_vector</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">to_tsvector</span><span class="p">(</span><span class="s1">&#39;english&#39;</span><span class="p">,</span><span class="w"> </span><span class="n">description</span><span class="p">)</span><span class="w"> </span><span class="p">...</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="c1">-- Search is now a simple, fast, indexed query
</span></span></span><span class="line"><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">products</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">search_vector</span><span class="w"> </span><span class="o">@@</span><span class="w"> </span><span class="n">to_tsquery</span><span class="p">(</span><span class="s1">&#39;english&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;shiny &amp; leather&#39;</span><span class="p">);</span><span class="w">
</span></span></span></code></pre></div><ul>
<li><strong>Nightly Reports:</strong> Instead of a cron server (a single point of failure), we use the <strong>pg_cron</strong> extension.</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-SQL" data-lang="SQL"><span class="line"><span class="cl"><span class="c1">-- Run a job every night at 3 AM
</span></span></span><span class="line"><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">cron</span><span class="p">.</span><span class="n">schedule</span><span class="p">(</span><span class="s1">&#39;nightly-sales-report&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;0 3 * * *&#39;</span><span class="p">,</span><span class="w"> 
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="err">$$</span><span class="w"> </span><span class="k">CALL</span><span class="w"> </span><span class="n">generate_sales_report</span><span class="p">();</span><span class="w"> </span><span class="err">$$</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">);</span><span class="w">
</span></span></span></code></pre></div><pre><code>The best part? If you have a High-Availability (HA) Postgres setup, your cron job is *also* HA. It's no longer a fragile script on one server.
</code></pre>
<h3 id="the-all-important-caveat-the-good-enough-trap">The All-Important Caveat: The &ldquo;Good Enough&rdquo; Trap</h3>
<p>This approach is pragmatic, not dogmatic. The &ldquo;Postgres for Everything&rdquo; mindset doesn&rsquo;t mean <em>never</em> using another tool. It means <strong>you add a new tool only when Postgres is no longer &ldquo;good enough.&rdquo;</strong></p>
<p>The &ldquo;Boring Technology&rdquo; choice wins when it&rsquo;s 80% as good as the &ldquo;Best&rdquo; tool, but 10x simpler to operate.</p>
<p>A perfect, modern example is <strong>pgvector vs. a Dedicated Vector DB (like Pinecone or Milvus)</strong>.</p>
<ul>
<li><strong>When pgvector is &ldquo;Good Enough&rdquo;:</strong> You have 50,000 product vectors for your &ldquo;similar items&rdquo; feature. pgvector will handle this <em>beautifully</em>. You can JOIN user data with vector data in one query. The simplicity is a massive win.</li>
<li><strong>When pgvector Breaks Down:</strong> You are building the next ChatGPT and need to query 100 <em>million</em> vectors with 99.9% recall and 20ms latency. pgvector will fail. It wasn&rsquo;t built for this. Its HNSW index isn&rsquo;t as optimized, and its query planner wasn&rsquo;t designed for vector-first workloads. At this scale, the &ldquo;dotted line&rdquo; to a dedicated, purpose-built Vector DB is not a liability; it is a <strong>necessity</strong>.</li>
</ul>
<h3 id="conclusion-your-job-is-to-fight-complexity">Conclusion: Your Job Is to Fight Complexity</h3>
<p>We collapsed our &ldquo;SimpleStore&rdquo; stack from six complex services into one robust database.</p>
<p><strong>The benefits are not just theoretical; they are immediate:</strong></p>
<ol>
<li><strong>Atomic Integrity:</strong> You gain ACID guarantees across your entire workflow (e..g., Orders + Jobs).</li>
<li><strong>Reduced Cognitive Load:</strong> A new developer doesn&rsquo;t need to learn six systems. They just need to know SQL.</li>
<li><strong>Lower Operational Cost:</strong> You monitor, back up, and secure <em>one</em> thing.</li>
<li><strong>Faster Development:</strong> You can stop writing &ldquo;glue code&rdquo; for the dotted lines and start building features.</li>
</ol>
<p>&ldquo;PostgreSQL for Everything&rdquo; isn&rsquo;t a silver bullet. It&rsquo;s a maxim against premature optimization and over-engineering. It&rsquo;s a reminder that your primary job isn&rsquo;t just to <em>add</em> technology, but to <em>cull</em> complexity. And Postgres is the best tool for that job.</p>
]]></content:encoded></item><item><title>Boring Technology | My Trip to "Microservices Hell" (and Why I Often Take the Monolith Instead)</title><link>https://digitaldam.org/posts/my-trip-to-microservces-hell/</link><pubDate>Sat, 23 Aug 2025 07:07:07 +0100</pubDate><guid>https://digitaldam.org/posts/my-trip-to-microservces-hell/</guid><description>Forget the hype. This post is my breakdown of the real pain of microservices (Sagas, network taxes) and why the boring Modular Monolith is often the smarter choice.</description><content:encoded><![CDATA[<p>As an architect, I&rsquo;ve seen teams enthusiastically adopt microservices, sold on the dream of &ldquo;infinite scale&rdquo; and &ldquo;team autonomy.&rdquo; I&rsquo;ve also seen those same teams a year later, drowning in complexity, wondering why it takes six weeks to add a new feature.</p>
<p>&ldquo;Microservices Hell&rdquo; is real, and the rent is high. It’s the state you reach when your <em>plumbing</em> is infinitely more complex than the <em>business logic</em> it&rsquo;s supposed to support.</p>
<p>Based on my experience, here’s what that journey into hell <em>really</em> looks like.</p>
<h2 id="1-the-eventual-consistency-headache-aka-the-death-of-acid">1. The &ldquo;Eventual Consistency&rdquo; Headache (aka The Death of ACID)</h2>
<p>The first thing that hits you is the database.</p>
<p>I remember a project where we had a critical flow: <code>Create Order</code> → <code>Update Inventory</code> → <code>Process Payment</code>. In a monolith, this is a single, beautiful <strong>database transaction</strong>. It&rsquo;s atomic. It&rsquo;s safe. It just <em>works</em>.</p>
<p>In microservices, this is now 3 services, probably with 3 separate databases. You can&rsquo;t have a transaction. You are now forced to write <strong>compensating logic</strong> (also known as a Saga, but it&rsquo;s basically &ldquo;code to undo code&rdquo;).</p>
<p>This &ldquo;compensating logic&rdquo; is a massive source of bugs. You&rsquo;re now living in the land of <strong>&ldquo;eventual consistency,&rdquo;</strong> which is just a polite way of saying, &ldquo;Your data is currently wrong, but we&rsquo;ll probably fix it&hellip; eventually.&rdquo; For any FinTech or HealthTech system, this is a non-starter.</p>
<p>Many teams try to &ldquo;hack&rdquo; this by using a <strong>shared database</strong>. Don&rsquo;t do that. You&rsquo;ve just created a monster: a hidden coupling&hellip; a single-point-of-failure.</p>
<h2 id="2-the-network-tax-aka-my-function-call-is-now-a-bug">2. The Network Tax (aka &ldquo;My Function Call is Now a Bug&rdquo;)</h2>
<p>When you trade reliable in-memory function calls for unreliable <code>HTTP</code> calls, you pay a heavy tax. Every developer on your team must now become a distributed systems expert, whether they like it or not.</p>
<p>Every. Single. Call. must handle:</p>
<ul>
<li><strong>Timeouts:</strong> What happens when the <code>UserService</code> just&hellip; doesn&rsquo;t answer?</li>
<li><strong>Retries:</strong> If you retry, was the request idempotent? Congrats, you just charged the customer twice.</li>
<li><strong>Circuit Breakers:</strong> You <em>must</em> implement these to stop one dead service (<code>InventoryService</code>) from killing every <em>other</em> service that calls it in a &ldquo;cascading failure&rdquo;.</li>
</ul>
<p>Cognitive load skyrockets. And it gets worse when teams get <strong>&ldquo;Service-Mania&rdquo;</strong>. I’ve seen teams of 10 engineers trying to maintain 40 services. A new feature? &ldquo;Create a new service!&rdquo;. And now a simple feature requires deploying 3 services at the same time. You&rsquo;ve just rebuilt your monolith, but over a slow network link.</p>
<h2 id="3-the-observability-tax">3. The Observability Tax</h2>
<p>This is the part that kills velocity. Remember when you had <em>one</em> log file? Good times.</p>
<p>Now, a single user click might touch 10 different services. When it fails, you&rsquo;re not debugging; you&rsquo;re playing a distributed game of <em>Clue</em>.</p>
<p>Before you can even <em>start</em> writing features, you&rsquo;re forced to pay the <strong>&ldquo;Observability Tax&rdquo;</strong>:</p>
<ol>
<li><strong>Distributed Tracing</strong> (e.g., Jaeger) to follow a request.</li>
<li><strong>Centralized Logging</strong> (e.g., ELK Stack) to find the logs.</li>
<li><strong>Metrics Aggregation</strong> (e.g., Prometheus) to see what&rsquo;s on fire.</li>
</ol>
<p>And this isn&rsquo;t just a production problem; it destroys your <strong>development environments</strong>. How can you run 40 services on a developer&rsquo;s laptop? How do you test E2E (End-to-End) when you can&rsquo;t even be sure <em>which version</em> of a service is running?</p>
<h2 id="4-the-people--management-hell">4. The <em>People</em> &amp; <em>Management</em> Hell</h2>
<p>But the <em>worst</em> tax isn&rsquo;t technical. It&rsquo;s the <em>people</em> tax.</p>
<ul>
<li><strong>An Engineer-to-Service Ratio Gone Wild:</strong> I’ve seen teams with 4-5 services per engineer. This isn&rsquo;t &ldquo;autonomy&rdquo;; it&rsquo;s &ldquo;burnout.&rdquo; One person is now the operator, debugger, and on-call for half a dozen systems.</li>
<li><strong>&ldquo;Resume-Driven Development&rdquo;:</strong> When &ldquo;autonomy&rdquo; means &ldquo;anarchy,&rdquo; you get a service in Kotlin, one in Go, and one in Rust that only one person understands. When that person leaves, you&rsquo;ve &ldquo;orphaned&rdquo; a part of your system.</li>
<li><strong>Architecture That Mirrors the Org Chart:</strong> Your architecture will inevitably look like your company&rsquo;s org chart. This is fine&hellip; until the <strong>reorg</strong>. And the company <em>always</em> reorgs. Suddenly, the &ldquo;Payments&rdquo; team is split in two, but all the infrastructure, namespaces, and IAM policies are still tangled together. You&rsquo;ve just signed yourself up for a painful, long-term migration project that delivers zero value to the customer.</li>
</ul>
<hr>
<h2 id="so-when-is-the-boring-monolith-done-right-just-better">So&hellip; When is the &ldquo;Boring&rdquo; Monolith (Done Right) Just Better?</h2>
<p>A well-structured <strong>Modular Monolith</strong> (decoupled modules in a single codebase) isn&rsquo;t &ldquo;legacy.&rdquo; It&rsquo;s a pragmatic, often superior, choice. In my experience, the monolith wins hands-down in these cases:</p>
<ol>
<li>
<p><strong>When Transactional Integrity (ACID) is King:</strong>
If you&rsquo;re building FinTech, HealthTech, or a complex ERP, your business <em>must</em> be 100% consistent. The simplicity and reliability of a real database transaction are non-negotiable. Don&rsquo;t trade this for compensating logic.</p>
</li>
<li>
<p><strong>When You Are an Early-Stage Product (Speed-to-Market is King):</strong>
Your biggest risk isn&rsquo;t <em>scale</em>; it&rsquo;s <em>building the wrong thing</em>. A Modular Monolith lets you move incredibly fast. Refactoring a module <em>inside</em> a monolith is 100x easier than refactoring 10 microservices that you defined incorrectly.</p>
</li>
<li>
<p><strong>When You Are a Small-to-Medium Team (1-20 Engineers):</strong>
Microservices are a tool to solve <em>people</em> scaling. If you&rsquo;re one team, microservices will kill your velocity with meetings about API contracts. A monolith lets your team just&hellip; <em>code</em>.</p>
</li>
<li>
<p><strong>When You Don&rsquo;t Have a Dedicated Platform Team:</strong>
Choosing microservices without a dedicated SRE/Platform team is like buying a Formula 1 car to go grocery shopping. It&rsquo;s expensive, incredibly hard to drive, and you&rsquo;re going to crash.</p>
</li>
</ol>
<h2 id="my-parting-advice">My Parting Advice</h2>
<p><strong>Start with a Modular Monolith.</strong></p>
<p>Design it with clean boundaries, communicate between modules via interfaces, and <em>do not share database tables</em> between modules. This gives you 90% of the benefits of microservices (decoupling) with 10% of the operational cost.</p>
<p>Only extract a module into its own microservice when you have a <em>clear, painful, and obvious</em> reason (like asymmetric scaling needs or a new tech stack). Microservices are a <em>refactoring</em> step, not a starting point.</p>
]]></content:encoded></item><item><title>Compliance is Not Security: The HIPAA Compliant Illusion</title><link>https://digitaldam.org/posts/compliance-is-not-security-the-hipaa-compliant-illusion/</link><pubDate>Wed, 20 Aug 2025 07:07:07 +0100</pubDate><guid>https://digitaldam.org/posts/compliance-is-not-security-the-hipaa-compliant-illusion/</guid><description>Compliance is not security. Learn why your HIPAA compliant EHR is vulnerable to insider threats &amp;amp; how to fix it with a context-aware ABAC architecture</description><content:encoded><![CDATA[<p>I was on a technical due diligence call with a CTO. He&rsquo;d already reviewed our profile.</p>
<p>&ldquo;Look,&rdquo; he said, skipping the pleasantries. &ldquo;Your deck says &lsquo;HIPAA compliant&rsquo; and &lsquo;Security is in Our DNA&rsquo;. Every vendor says that. My real concern isn&rsquo;t a hacker from outside; it&rsquo;s an employee. Someone curious, or someone careless.&rdquo;</p>
<p>He leaned in. &ldquo;How do you <em>actually</em> stop a logged-in, authenticated doctor from getting curious and pulling the record of another doctor&rsquo;s patient?&rdquo;</p>
<p>He was right to ask. That&rsquo;s the <em>real</em> question.</p>
<p>He wasn&rsquo;t asking about a checklist. He was asking about <em>architecture</em>. His concern is the single biggest failure point I see in &ldquo;compliant&rdquo; systems, and it stems from a fundamental, &ldquo;context-blind&rdquo; design.</p>
<p>Here’s the deep dive on the problem and the specific architecture we use to solve it.</p>
<hr>
<h3 id="the-compliant-flaw-context-blind-architecture">The &ldquo;Compliant&rdquo; Flaw: Context-Blind Architecture</h3>
<p>Most systems fail this test because they are built &ldquo;happy-path&rdquo; first. The &ldquo;happy path&rdquo; assumes a <code>Doctor</code> is a trusted entity. The architecture then follows the path of least resistance.</p>
<p>The &ldquo;compliant&rdquo; checklist only requires:</p>
<ol>
<li><strong>Encryption:</strong> AES-256 at-rest, TLS in-transit. <em>Check.</em></li>
<li><strong>Authentication:</strong> The user is logged in via OAuth/SAML. <em>Check.</em></li>
<li><strong>Role (RBAC):</strong> The user&rsquo;s JWT token has <code>Role: Doctor</code>. <em>Check.</em></li>
<li><strong>Logging:</strong> The access is written to a log file. <em>Check.</em></li>
</ol>
<p>The resulting API is predictable:
<code>GET /api/v1/patients/{patientId}</code></p>
<p>The code&rsquo;s &ldquo;security&rdquo; logic, often in a single middleware, just checks: &ldquo;Does this user&rsquo;s token have the <code>Doctor</code> role?&rdquo; If yes, access is granted.</p>
<p>This is <strong>context-blind</strong>. It confirms the user&rsquo;s <em>role</em>, but not their <em>relationship</em> to the data. It can&rsquo;t tell the difference between &ldquo;their&rdquo; patient and &ldquo;any&rdquo; patient. This isn&rsquo;t just limited to patient files. What about:</p>
<p><code>GET /api/v1/doctors/{doctorId}/schedule</code></p>
<p>What stops <code>dr_smith</code> from querying <code>dr_jones</code>&rsquo;s schedule to see all his patient names for the day? This simple IDOR (Insecure Direct Object Reference) vulnerability is a direct result of a lazy, role-based architecture.</p>
<hr>
<h3 id="the-architectural-fix-context-aware-security">The Architectural Fix: Context-Aware Security</h3>
<p>You cannot solve this by adding &ldquo;more compliance.&rdquo; You must fix the architecture.</p>
<h4 id="1-from-rbac-to-abac-the-who-vs-why">1. From RBAC to ABAC: The &ldquo;Who&rdquo; vs. &ldquo;Why&rdquo;</h4>
<p>RBAC (Role-Based Access Control) fails because it&rsquo;s static. A <em>role</em> doesn&rsquo;t understand <em>relationships</em>.</p>
<p>We enforce <strong>ABAC (Attribute-Based Access Control)</strong>. This model is &ldquo;context-aware&rdquo; and dynamic.</p>
<ul>
<li><strong>RBAC asks:</strong> &ldquo;Is this user a Doctor?&rdquo;</li>
<li><strong>ABAC asks:</strong> &ldquo;Is this Doctor <em>currently treating</em> this Patient <em>for an active case</em>?&rdquo;</li>
</ul>
<p>The security policy isn&rsquo;t just <code>Allow if Role == Doctor</code>. The policy, enforced in code, becomes:</p>
<p><code>'Allow 'Read' IF: Subject.Role == 'Doctor' AND Resource.PatientID is IN Subject.ActivePatientList AND Action.Purpose == 'ActiveTreatment'</code></p>
<p><strong>Implementation Detail:</strong>
This policy isn&rsquo;t just documentation. It&rsquo;s code, ideally enforced at the API Gateway (like Kong or Apigee) or in a service mesh sidecar (like Istio), <em>before</em> the request even hits the application.</p>
<p>And to answer the next logical question—performance—that <code>ActivePatientList</code> isn&rsquo;t a live <code>JOIN</code> query on every API call. That would be a database killer. It&rsquo;s a denormalized, read-optimized cache (e.g., a Redis set) that is populated by the &ldquo;Admissions&rdquo; or &ldquo;Scheduling&rdquo; service. The cache is updated via events (e.g., <code>PatientAdmitted</code>, <code>PatientDischarged</code>) with a clear TTL. Security must be performant, or developers <em>will</em> find a way to bypass it.</p>
<h4 id="2-from-audit-logs-to-forensic-ready-observability">2. From &ldquo;Audit Logs&rdquo; to &ldquo;Forensic-Ready Observability&rdquo;</h4>
<p>Most &ldquo;compliant&rdquo; audit logs are a data lake of noise, built only to satisfy an auditor. They say:
<code>[Timestamp] User 'dr_smith' accessed Patient '12345'.</code></p>
<p>This is useless for security. It&rsquo;s indistinguishable from a million other valid events. You can&rsquo;t build anomaly detection on it.</p>
<p>A secure log must capture <strong>&ldquo;access decisions&rdquo;</strong> for true observability.</p>
<p><strong>The &ldquo;Denial&rdquo; Log:</strong>
When that curious doctor tries to access <code>12346</code> and our ABAC policy blocks it, <em>this</em> is what we log:</p>
<p><code>[Timestamp] Subject 'dr_smith' (IP: x.x.x.x) attempted 'Read' on Resource '12346'.</code>
<code>Decision: DENIED.</code>
<code>Reason: 'Policy Violation: Resource.PatientID not found in Subject.ActivePatientList'.</code></p>
<p>This log doesn&rsquo;t just go to a file. This is a high-priority event piped directly to a SIEM (like Splunk or Sentinel). You can now write a simple rule: <code>ALERT if Subject.UserID &gt; 10 'Decision: DENIED' events in 1 minute.</code> You&rsquo;ve just caught an active insider threat.</p>
<p><strong>The &ldquo;Break-the-Glass&rdquo; Log:</strong>
What about emergencies? A doctor <em>needs</em> to access a file outside their normal context. A secure system must allow this via an explicit &ldquo;break-glass&rdquo; function. But logging this is even <em>more</em> critical:</p>
<p><code>[Timestamp] Subject 'dr_smith' accessed Resource '12347'.</code>
<code>Decision: GRANTED (EMERGENCY OVERRIDE).</code>
<code>Purpose: 'User-provided reason: Cardiac arrest in ER'.</code></p>
<p>This <em>also</em> triggers an alert, but a different one: <code>P2 - Post-Incident Review Required</code>. The system is secure, usable, and—most importantly—<em>accountable</em>.</p>
<hr>
<h3 id="key-technical-questions">Key Technical Questions</h3>
<p>Compliance is the floor, not the ceiling.</p>
<p>Don&rsquo;t ask your partner if they can &ldquo;pass a checklist.&rdquo; You&rsquo;re not buying a checklist. You&rsquo;re buying an architecture that protects you when the checklist fails.</p>
<p>Ask them these questions instead:</p>
<ol>
<li>&ldquo;Show me your data model for authorization. Is it role-based or attribute/relationship-based?&rdquo;</li>
<li>&ldquo;How do you log an <em>authorization failure</em> versus an <em>authentication failure</em>?&rdquo;</li>
<li>&ldquo;How do you handle &lsquo;break-the-glass&rsquo; scenarios in your logging and alerting pipeline?&rdquo;</li>
</ol>
]]></content:encoded></item></channel></rss>