Posts on Digital Dam

How to Scam Your Client with "Resume-Driven Development" as a Service

Mon, 08 Sep 2025 07:07:07 +0100

My favorite kind of call is the rescue project.

A new client comes to us, frustrated. They just paid a ‘modern’ tech agency for a platform that’s completely unmaintainable.

We pop the hood, and it’s magnificent.

It’s a state-of-the-art “Cloud-Native,” “AI-Powered,” “Event-Driven,” “Serverless” system. A stunning monument to modern engineering, designed to handle 10 million concurrent users for a B2B app that has 500.

The previous agency didn’t solve the client’s problem. They solved their own problem: how to get “GenAI” “Kubernetes” and “VectorDB” onto their developers’ resumes.

Welcome to the most toxic trend in our industry: Resume-Driven Development as a Service (RDDaaS).

The RDDaaS Playbook (How to Scam Your Client)

It’s a brilliant, cynical business model, and it works like this:

Step 1: The Future-Proof Pitch You show the non-technical client a beautiful slide deck. You blind them with charts showing massive ROI, impressive KPIs, and tech buzzwords like “infinitely scalable,” “AI-powered,” and “Future GenAI-ready.”

Step 2: The Training Ground Your mid-level developers, who have only read about “RAG Pipelines” and “Kubernetes,” now get to learn them right on the client’s dime.

Step 3: The Over-Engineering Phase The project, which should have been a 3-month simple CRUD app, now takes 9 months. They’re not solving the client’s problem. They’re solving Google’s problem and Meta’s problem.

Step 4: The Successful Handoff The system is delivered. The chatbot confidently hallucinates the wrong phone number. The vendor’s developers proudly update their LinkedIn profiles. The vendor gets paid.

Step 5: The Victim’s New Life

The client is now the proud owner of an intelligent thing that requires a team of ex-FAANG SREs just to add a new form field.

Their new life includes:

The Hiring Nightmare: Their in-house “IT guy,” John, quits after seeing a terraform script that spawns 15 AWS services (EKS, Lambda, VectorDBs…). The first real candidate who understands this mess demands $250k.
The Wrong Web-Scale Performance: The app is slow. A simple request now makes 5 HTTP calls through a service mesh and a $0.02 call to gpt-4-turbo just to say “Hello.”
The “WTF” Cloud Bill: The first few bills look okay (thanks, AWS Free Tier). Then, Month 4 hits. (Trust me, you really don’t want this.) The real bill arrives, full of NAT Gateways, EKS Control Planes, Managed Vector DBs (99% empty), and OpenAI API fees. Their plumbing costs 20x more than their app.
The Simple Change Request: The client asks: “Can we just add a normal search bar? The AI one is too expensive.”
- In the old monolith: 1 hour.
- In this modern system: “Uh, that’s not how the RAG pipeline works. We’d have to re-architect the whole data flow. That’ll be a new 2 sprints.”

The Root Cause: Why Did This Happen?

Why does this scam always work? It takes two:

1. The Vendor’s Hidden Agenda: The client isn’t the customer. They’re the training ground. The agency didn’t solve the client’s problem “I need a reliable app”. They solved their own problem “Our developers need ‘GenAI’ and ‘K8s’ on their resumes”.

2. The Client’s Blinders: The client lets this happen. They get blinded by the ROI slides and tech buzzwords. They’re so terrified of being “legacy” that they actively forget to ask the two boring, critical questions:

“What is my actual problem?”
“What’s the Year 2 maintenance and cloud bill going to look like?”

The architecture was optimized for Imaginary Scale and Imaginary Intelligence, not Current Maintainability.

How to Not Get Scammed

This isn’t just a rant. This is a pattern I see many times. It’s the entire reason my philosophy is built on Boring Technology

The antidote to RDDaaS is to stop being impressed by buzzwords and start asking the questions that actually matter.

Next time a vendor pitches you a “Cloud-Native AI-Powered” solution, just ask them these things:

“Can you justify why we need Kubernetes for this?”
“Walk me through the full development process—from ticket to deployment—for adding one new database field to the ‘User’ model.”
“What is the fallback mechanism when the AI/RAG pipeline fails or hallucinates?
“Show me the Year 2 cloud bill for this architecture.
“What kind of engineer do I need to hire to maintain this after you leave?”

You don’t need FAANG-scale complexity. You need Product-Market Fit.

Start buying maintainable solutions that actually let you find it.

Boring Technology | Your AI is the 1% (Don't Forget the 99%)

Fri, 05 Sep 2025 07:07:07 +0100

I’m seeing a worrying pattern lately. Almost every product discussion now starts with, “So, how are we using AI for this?”

We’re all a bit drunk on the hype. We’re treating AI like magic dust we can just sprinkle on any problem.

Clients want a Youtube or Netflix level recommendation engine on day one. Devs, quite reasonably, are excited to put the shiniest new Vector DB on their resumes.

We’re starting backward. We’re trying to build the penthouse while the foundation is still a sketch on a napkin.

The reason this “penthouse-first” approach fails is that we’re ignoring the reality of what an AI actually is. We treat it like a magic box we can just plug in, but it’s not.

You can’t “set-it-and-forget-it.” You have to manage it.

The world changes, there is a new trend?? Your AI employee will be confused because it has never seen this. It starts to drift right from Day 2.
It makes a mistake? It doesn’t self-correct. You have to build a feedback loop to fix it.
You want it to be smarter? You have to re-train it, and that is a continuous operational cost (OpEx).

When you understand AI is something you must maintain and not something you own, your entire architecture changes.

My Take: 99% Boring, 1% AI

So if AI is this high-maintenance “penthouse,” what’s the “foundation”?

My personal philosophy is simple: Never use expensive AI to do a job a good old SQL query can do.

The value isn’t in a single “black box” AI. The value is in a hybrid system where boring rules and code do 99% of the heavy lifting. The AI is just the 1% of spice you add at the very end.

A Practical 3-Phase “Boring” Roadmap (My Small Advice)

If I’m building a “smart” matching system, I will not start with AI. I build it in phases, layering complexity only when necessary.

Let’s walk through this with one single example: building a “Smart Candidate Search” for a recruitment platform.

Phase 1: The Foundation of Correctness (SQL Rules)

This is the 99% of the work. The goal here is 100% Correctness, enforcing the non-negotiable rules of the business. An “AI-first” system might think a candidate in Ho Chi Minh City is a “great match” for a job in Hanoi, but your business rules say that’s unacceptable. This layer is the “bouncer” at the door.

In Practice: WHERE location = 'Hanoi' AND salary_request < 5000 AND years_experience >= 3.

This filters on facts, not suggestions. location and years_experience are binary facts. We must do this first to avoid wasting expensive AI compute on candidates who are an immediate “no.” This is your blazing fast, dirt cheap foundation.

Phase 2: The Layer of Relevance (Full-Text Search)

This is still the 99% of the work. Now that we have a list of correct candidates (e.g., in Hanoi, >3 years exp), we solve for Relevance. Our SQL filter was correct, but “dumb” about human language. A recruiter searching “programmer” won’t find “developer.”

In Practice: Use Elasticsearch or BM25 (built into Postgres) on the resume text so “java programmer” matches “java developer.”

This 20-year-old boring tech is built specifically to solve the synonym problem without the cost or “black box” nature of AI. Critically, it’s explainable—you know why that resume showed up.

Phase 3: The 1% Layer of “Nuance” (The AI)

Only now, after our 10 million candidates have been filtered down to 1,000 correct (Phase 1) and relevant (Phase 2) candidates, do we add the final 1%: Nuance. Full-Text Search is great with words, but not ideas. This is the 1% problem AI is actually good at.

In Practice: The AI’s job is to know that a recruiter searching for “Senior Java” is conceptually similar to candidates strong in “Spring Boot” or “Scala” (even if they didn’t type those words).

This is the key to performance: an AI ranking 10 million items is impossibly slow and expensive. An AI ranking 1,000 correct and relevant items is near-instantaneous and cost-effective.

The AI is not the filter; it’s the re-ranker. We contain the expensive, unpredictable AI. We let it play only within the 1,000 safe results our boring filters found. This gives you the correctness of SQL plus the nuance of AI, but your costs drop 99% and your speed goes up 1000x.

Final Thought

The hype will pass. The core of good engineering isn’t using the shiniest tool. It’s the wisdom of knowing how and when to solve a problem.

Your “AI Strategy” shouldn’t be about AI. It should be about the system.

Stop building the penthouse first. Build your boring, indestructible foundation.

Your Perfect AI Headshot is Now a Red Flag

Tue, 02 Sep 2025 07:07:07 +0100

I scroll LinkedIn, what do I see?

Perfect headshots. Studio lighting, crazy sharp, precise smiles, not a hair out of place. Perfect posts and comments. Flawless grammar, zero typos.

It’s all clean, polished and soulless.

This is the Great AI Flood. The cost of looking competent, of sounding smart, has just dropped to zero.

And this is where the problem begins.

The Collapse of Signal

In economics, when you flood a market, the asset’s value collapses. For many years, polished content was a signal of professionalism. Now that AI can produce it instantly, polish has just become noise.

The burden of effort has shifted from the creator to the consumer.

My mental energy as a reader is no longer spent understanding your idea. It’s spent on a exhausting calculation: Is this real?

Is this a real photo, or Stable Diffusion, or Gemini?
Is this a real insight, or a ChatGPT] remix of the top 10 blog posts?
Is this a real comment, or a bot?

This is the collapse of the signal-to-noise ratio. And it’s eroding the one thing that matters: Trust.

The Return of “Rough Edges”

When a signal becomes cheap, it’s no longer a reliable signal.

For years, a professional headshot was a signal: “I care enough about my career to spend $200.” Today, a perfect AI headshot is a signal: “I care enough to spend 30 seconds on a prompt.”

When “polish” is cheap, rough edges become the new status symbol.

A slightly blurry selfie from your office? I think it’s real. It’s Proof of Effort. A post with a small typo or an awkward sentence? It’s Proof of Thought.

But here’s the deeper signal, the one that really proves expertise: The Bumps and Bruises.

An AI-generated case study is perfect: “We increased ROI by 400%.” It’s clean. It’s also unbelievable.

The real signal of human expertise isn’t perfection; it’s the messy story. It’s the Proof of Experience.

“This was a tough project. We chose the wrong database and had to migrate for 3 weeks. Here’s what we learned…”

An AI can generate a plausible-sounding failure story. It can say, ‘we chose the wrong database,’ and even invent a ‘stubborn VP of Engineering’ or a ‘4 AM call.’ It’s a perfect remix of the thousands of ‘war stories’ it was trained on.

But that’s a script, not a memory.

The real signal isn’t hearing the story anymore; it’s interrogating it. Ask why. Ask for the specific details. ‘What exact query failed?’ ‘Why Postgres and not MSSQL?’ An AI’s story is perfect most of the time, but it collapses under deep, specific questioning.

And even then, where is the proof of history? An AI can’t show you the messy GitHub commit history. It doesn’t have three real former colleagues you can call for verification.

An AI can talk about the mess. It can’t prove the mess. It has no verifiable bumps and bruises.

What’s Next?

I keep thinking about what’s next. Honestly, I don’t think it’s going to be a better AI. I bet it’s going to be the mess of tools and habits we come up with to deal with this AI flood.

For example, it seems like we’ll go back to trusting things that are hard to fake. Why do I feel like a podcast is more real than a blog post lately? Because it takes effort. You can’t just generate one in 10 seconds. We’ll instinctively trust formats that are expensive in time and energy to produce.

This also means those hard-to-join communities suddenly become more valuable. Their annoying rules accidentally make for the best bot filters.

In the end, I guess it all comes down to origin. I find myself caring less about how polished something looks and more about where it came from. It wouldn’t be surprising if we start seeing tools that can actually certify that a real human wrote this or this photo was taken by a Sony Camera.

Conclusion

In the Age of AI, polish is no longer the signal. It’s just the baseline.

A polished surface with no proof is the new noise. A messy truth with no polish is just sloppy work.

The real signal of trust isn’t about being messy.

It’s about proving that your polish is earned.

Boring Technology | Postgres is Your new Tech Stack

Sat, 30 Aug 2025 07:07:07 +0100

Imagine we’re building a simple e-commerce site, “SimpleStore.” The initial planning meeting identifies our needs:

A database for users, products, and orders. (Easy: Postgres)
A way to send confirmation emails when an order is placed. (Okay, add a RabbitMQ job queue).
A cache for the homepage’s “Top 10 Products.” (Fine, add Redis).
A full-text search bar. (Ugh. Add Elasticsearch).
A nightly job to aggregate sales reports. (Spin up a Cron server).
A new AI feature to find “similar” products. (The VCs will love this! Add Pinecone).

Before writing a single feature, our architecture diagram is a mess. We have six different systems to provision, monitor, secure, and scale. We have what the team at Supabase calls “dotted line complexity”—the invisible, brittle connections between these systems that will inevitably break in production.

This is the case for collapsing your stack. Not by returning to a monolith, but by realizing that the “boring” tool you started with, PostgreSQL, can replace almost all of it.

Let’s rebuild the “SimpleStore” and see how.

A Cohesive Example: Building “SimpleStore” with Just Postgres

Instead of a sprawl, we’ll build each feature by extending Postgres. The magic isn’t just in replacing tools; it’s in how the connections between them become simple and atomic.

1. The Core: The Order and the Email (The ACID Test)

This is the most critical link. In a sprawled stack, you’d have this dreaded code:

# The "Dotted Line" Failure Point
try:
    order = db.create_order(...)  # Step 1: Postgres COMMIT
    queue.send_email_job(...)     # Step 2: RabbitMQ PUSH
except:
    # What if Step 2 fails? The user paid but gets no email.
    # What if Step 1 fails? The code is a mess.
    handle_complex_rollback()

With Postgres, this is one atomic unit. We’ll use the FOR UPDATE SKIP LOCKED pattern to create a powerful job queue inside the database.

Step 1: Create the tables

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    product_id INT,
    user_id INT,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE jobs (
    id SERIAL PRIMARY KEY,
    queue TEXT DEFAULT 'default',
    payload JSONB,
    status TEXT DEFAULT 'queued', -- queued, running, failed
    run_at TIMESTAMPTZ DEFAULT NOW()
);

Step 2: The Magic (One Transaction)

Now, our application logic becomes beautifully simple:


BEGIN;

-- Insert the order
INSERT INTO orders (product_id, user_id) VALUES (123, 456)
RETURNING id;

-- Use the returned ID to create a job IN THE SAME TRANSACTION
INSERT INTO jobs (queue, payload)
VALUES ('emails', '{"type": "order_confirmation", "order_id": 1}');

COMMIT;

This entire block either succeeds or fails together. It is impossible to create an order without its corresponding email job. We have just achieved perfect data integrity, something that is incredibly difficult with separate systems.

A Go or Python worker can now pull from this queue with a simple, highly-concurrent query:

SELECT id, payload FROM jobs WHERE status = 'queued' ORDER BY run_at LIMIT 1 FOR UPDATE SKIP LOCKED;

Feature	PostgreSQL (The “Boring” Way)	Dedicated Stack (RabbitMQ / Kafka)
Data Integrity	Perfect (Atomic). The job and the order are in one transaction.	Poor (Eventual). Requires complex two-phase commits or retry logic.
Complexity	Low. It’s just another table in your schema.	High. A separate, complex system to manage, monitor, and scale.
Throughput	Moderate. Excellent for most apps, but not Kafka-scale.	Extremely High. Built for massive event streaming.
Verdict	Wins for 90% of apps. The trade-off for lower peak throughput is massive gains in simplicity and reliability.

2. The Homepage: Caching Top Products (Replacing Redis)

Our homepage needs to show the Top 10 products. This query is slow, so we need a cache. Instead of adding Redis, we’ll use an UNLOGGED TABLE.


CREATE UNLOGGED TABLE cache_top_products (
    id INT PRIMARY KEY,
    name TEXT,
    sales_count BIGINT,
    cached_at TIMESTAMPTZ
);

An UNLOGGED table does not write to the Write-Ahead Log (WAL). This makes writes incredibly fast. The catch? If the server crashes, the table is automatically truncated.

This sounds just like the durability model of Redis! It’s the perfect, high-speed, non-durable store for transient data.

Feature	PostgreSQL (UNLOGGED TABLE)	Dedicated Stack (Redis)
Speed	Very Fast. Not as fast as in-memory, but avoids network I/O.	Extremely Fast. In-memory, sub-millisecond latency.
Durability	None (by design). Wiped on crash.	None (by design). Wiped on crash (unless persistence is on).
Simplicity	High. It’s just a SQL table. No new clients, ports, or auth.	Low. A separate service to manage, secure, and connect to.
Verdict	Wins for most caching. You trade a few microseconds of latency for a huge reduction in stack complexity.

3. The Search Bar & Reports: (Replacing Elasticsearch & Cron)

We can continue this pattern for our other features:

Search Bar: Instead of Elasticsearch, we use Postgres’s built-in Full-Text Search.

-- Add a tsvector column for product descriptions
ALTER TABLE products ADD COLUMN search_vector tsvector;

-- Keep it updated with a trigger
UPDATE products SET search_vector = to_tsvector('english', description) ...

-- Search is now a simple, fast, indexed query
SELECT * FROM products WHERE search_vector @@ to_tsquery('english', 'shiny & leather');

Nightly Reports: Instead of a cron server (a single point of failure), we use the pg_cron extension.

-- Run a job every night at 3 AM
SELECT cron.schedule('nightly-sales-report', '0 3 * * *', 
  $$ CALL generate_sales_report(); $$
);

The best part? If you have a High-Availability (HA) Postgres setup, your cron job is *also* HA. It's no longer a fragile script on one server.

The All-Important Caveat: The “Good Enough” Trap

This approach is pragmatic, not dogmatic. The “Postgres for Everything” mindset doesn’t mean never using another tool. It means you add a new tool only when Postgres is no longer “good enough.”

The “Boring Technology” choice wins when it’s 80% as good as the “Best” tool, but 10x simpler to operate.

A perfect, modern example is pgvector vs. a Dedicated Vector DB (like Pinecone or Milvus).

When pgvector is “Good Enough”: You have 50,000 product vectors for your “similar items” feature. pgvector will handle this beautifully. You can JOIN user data with vector data in one query. The simplicity is a massive win.
When pgvector Breaks Down: You are building the next ChatGPT and need to query 100 million vectors with 99.9% recall and 20ms latency. pgvector will fail. It wasn’t built for this. Its HNSW index isn’t as optimized, and its query planner wasn’t designed for vector-first workloads. At this scale, the “dotted line” to a dedicated, purpose-built Vector DB is not a liability; it is a necessity.

Conclusion: Your Job Is to Fight Complexity

We collapsed our “SimpleStore” stack from six complex services into one robust database.

The benefits are not just theoretical; they are immediate:

Atomic Integrity: You gain ACID guarantees across your entire workflow (e..g., Orders + Jobs).
Reduced Cognitive Load: A new developer doesn’t need to learn six systems. They just need to know SQL.
Lower Operational Cost: You monitor, back up, and secure one thing.
Faster Development: You can stop writing “glue code” for the dotted lines and start building features.

“PostgreSQL for Everything” isn’t a silver bullet. It’s a maxim against premature optimization and over-engineering. It’s a reminder that your primary job isn’t just to add technology, but to cull complexity. And Postgres is the best tool for that job.

Boring Technology | My Trip to "Microservices Hell" (and Why I Often Take the Monolith Instead)

Sat, 23 Aug 2025 07:07:07 +0100

As an architect, I’ve seen teams enthusiastically adopt microservices, sold on the dream of “infinite scale” and “team autonomy.” I’ve also seen those same teams a year later, drowning in complexity, wondering why it takes six weeks to add a new feature.

“Microservices Hell” is real, and the rent is high. It’s the state you reach when your plumbing is infinitely more complex than the business logic it’s supposed to support.

Based on my experience, here’s what that journey into hell really looks like.

1. The “Eventual Consistency” Headache (aka The Death of ACID)

The first thing that hits you is the database.

I remember a project where we had a critical flow: Create Order → Update Inventory → Process Payment. In a monolith, this is a single, beautiful database transaction. It’s atomic. It’s safe. It just works.

In microservices, this is now 3 services, probably with 3 separate databases. You can’t have a transaction. You are now forced to write compensating logic (also known as a Saga, but it’s basically “code to undo code”).

This “compensating logic” is a massive source of bugs. You’re now living in the land of “eventual consistency,” which is just a polite way of saying, “Your data is currently wrong, but we’ll probably fix it… eventually.” For any FinTech or HealthTech system, this is a non-starter.

Many teams try to “hack” this by using a shared database. Don’t do that. You’ve just created a monster: a hidden coupling… a single-point-of-failure.

2. The Network Tax (aka “My Function Call is Now a Bug”)

When you trade reliable in-memory function calls for unreliable HTTP calls, you pay a heavy tax. Every developer on your team must now become a distributed systems expert, whether they like it or not.

Every. Single. Call. must handle:

Timeouts: What happens when the UserService just… doesn’t answer?
Retries: If you retry, was the request idempotent? Congrats, you just charged the customer twice.
Circuit Breakers: You must implement these to stop one dead service (InventoryService) from killing every other service that calls it in a “cascading failure”.

Cognitive load skyrockets. And it gets worse when teams get “Service-Mania”. I’ve seen teams of 10 engineers trying to maintain 40 services. A new feature? “Create a new service!”. And now a simple feature requires deploying 3 services at the same time. You’ve just rebuilt your monolith, but over a slow network link.

3. The Observability Tax

This is the part that kills velocity. Remember when you had one log file? Good times.

Now, a single user click might touch 10 different services. When it fails, you’re not debugging; you’re playing a distributed game of Clue.

Before you can even start writing features, you’re forced to pay the “Observability Tax”:

Distributed Tracing (e.g., Jaeger) to follow a request.
Centralized Logging (e.g., ELK Stack) to find the logs.
Metrics Aggregation (e.g., Prometheus) to see what’s on fire.

And this isn’t just a production problem; it destroys your development environments. How can you run 40 services on a developer’s laptop? How do you test E2E (End-to-End) when you can’t even be sure which version of a service is running?

4. The People & Management Hell

But the worst tax isn’t technical. It’s the people tax.

An Engineer-to-Service Ratio Gone Wild: I’ve seen teams with 4-5 services per engineer. This isn’t “autonomy”; it’s “burnout.” One person is now the operator, debugger, and on-call for half a dozen systems.
“Resume-Driven Development”: When “autonomy” means “anarchy,” you get a service in Kotlin, one in Go, and one in Rust that only one person understands. When that person leaves, you’ve “orphaned” a part of your system.
Architecture That Mirrors the Org Chart: Your architecture will inevitably look like your company’s org chart. This is fine… until the reorg. And the company always reorgs. Suddenly, the “Payments” team is split in two, but all the infrastructure, namespaces, and IAM policies are still tangled together. You’ve just signed yourself up for a painful, long-term migration project that delivers zero value to the customer.

So… When is the “Boring” Monolith (Done Right) Just Better?

A well-structured Modular Monolith (decoupled modules in a single codebase) isn’t “legacy.” It’s a pragmatic, often superior, choice. In my experience, the monolith wins hands-down in these cases:

When Transactional Integrity (ACID) is King: If you’re building FinTech, HealthTech, or a complex ERP, your business must be 100% consistent. The simplicity and reliability of a real database transaction are non-negotiable. Don’t trade this for compensating logic.
When You Are an Early-Stage Product (Speed-to-Market is King): Your biggest risk isn’t scale; it’s building the wrong thing. A Modular Monolith lets you move incredibly fast. Refactoring a module inside a monolith is 100x easier than refactoring 10 microservices that you defined incorrectly.
When You Are a Small-to-Medium Team (1-20 Engineers): Microservices are a tool to solve people scaling. If you’re one team, microservices will kill your velocity with meetings about API contracts. A monolith lets your team just… code.
When You Don’t Have a Dedicated Platform Team: Choosing microservices without a dedicated SRE/Platform team is like buying a Formula 1 car to go grocery shopping. It’s expensive, incredibly hard to drive, and you’re going to crash.

My Parting Advice

Start with a Modular Monolith.

Design it with clean boundaries, communicate between modules via interfaces, and do not share database tables between modules. This gives you 90% of the benefits of microservices (decoupling) with 10% of the operational cost.

Only extract a module into its own microservice when you have a clear, painful, and obvious reason (like asymmetric scaling needs or a new tech stack). Microservices are a refactoring step, not a starting point.

Compliance is Not Security: The HIPAA Compliant Illusion

Wed, 20 Aug 2025 07:07:07 +0100

I was on a technical due diligence call with a CTO. He’d already reviewed our profile.

“Look,” he said, skipping the pleasantries. “Your deck says ‘HIPAA compliant’ and ‘Security is in Our DNA’. Every vendor says that. My real concern isn’t a hacker from outside; it’s an employee. Someone curious, or someone careless.”

He leaned in. “How do you actually stop a logged-in, authenticated doctor from getting curious and pulling the record of another doctor’s patient?”

He was right to ask. That’s the real question.

He wasn’t asking about a checklist. He was asking about architecture. His concern is the single biggest failure point I see in “compliant” systems, and it stems from a fundamental, “context-blind” design.

Here’s the deep dive on the problem and the specific architecture we use to solve it.

The “Compliant” Flaw: Context-Blind Architecture

Most systems fail this test because they are built “happy-path” first. The “happy path” assumes a Doctor is a trusted entity. The architecture then follows the path of least resistance.

The “compliant” checklist only requires:

Encryption: AES-256 at-rest, TLS in-transit. Check.
Authentication: The user is logged in via OAuth/SAML. Check.
Role (RBAC): The user’s JWT token has Role: Doctor. Check.
Logging: The access is written to a log file. Check.

The resulting API is predictable: GET /api/v1/patients/{patientId}

The code’s “security” logic, often in a single middleware, just checks: “Does this user’s token have the Doctor role?” If yes, access is granted.

This is context-blind. It confirms the user’s role, but not their relationship to the data. It can’t tell the difference between “their” patient and “any” patient. This isn’t just limited to patient files. What about:

GET /api/v1/doctors/{doctorId}/schedule

What stops dr_smith from querying dr_jones’s schedule to see all his patient names for the day? This simple IDOR (Insecure Direct Object Reference) vulnerability is a direct result of a lazy, role-based architecture.

The Architectural Fix: Context-Aware Security

You cannot solve this by adding “more compliance.” You must fix the architecture.

1. From RBAC to ABAC: The “Who” vs. “Why”

RBAC (Role-Based Access Control) fails because it’s static. A role doesn’t understand relationships.

We enforce ABAC (Attribute-Based Access Control). This model is “context-aware” and dynamic.

RBAC asks: “Is this user a Doctor?”
ABAC asks: “Is this Doctor currently treating this Patient for an active case?”

The security policy isn’t just Allow if Role == Doctor. The policy, enforced in code, becomes:

'Allow 'Read' IF: Subject.Role == 'Doctor' AND Resource.PatientID is IN Subject.ActivePatientList AND Action.Purpose == 'ActiveTreatment'

Implementation Detail: This policy isn’t just documentation. It’s code, ideally enforced at the API Gateway (like Kong or Apigee) or in a service mesh sidecar (like Istio), before the request even hits the application.

And to answer the next logical question—performance—that ActivePatientList isn’t a live JOIN query on every API call. That would be a database killer. It’s a denormalized, read-optimized cache (e.g., a Redis set) that is populated by the “Admissions” or “Scheduling” service. The cache is updated via events (e.g., PatientAdmitted, PatientDischarged) with a clear TTL. Security must be performant, or developers will find a way to bypass it.

2. From “Audit Logs” to “Forensic-Ready Observability”

Most “compliant” audit logs are a data lake of noise, built only to satisfy an auditor. They say: [Timestamp] User 'dr_smith' accessed Patient '12345'.

This is useless for security. It’s indistinguishable from a million other valid events. You can’t build anomaly detection on it.

A secure log must capture “access decisions” for true observability.

The “Denial” Log: When that curious doctor tries to access 12346 and our ABAC policy blocks it, this is what we log:

[Timestamp] Subject 'dr_smith' (IP: x.x.x.x) attempted 'Read' on Resource '12346'. Decision: DENIED. Reason: 'Policy Violation: Resource.PatientID not found in Subject.ActivePatientList'.

This log doesn’t just go to a file. This is a high-priority event piped directly to a SIEM (like Splunk or Sentinel). You can now write a simple rule: ALERT if Subject.UserID > 10 'Decision: DENIED' events in 1 minute. You’ve just caught an active insider threat.

The “Break-the-Glass” Log: What about emergencies? A doctor needs to access a file outside their normal context. A secure system must allow this via an explicit “break-glass” function. But logging this is even more critical:

[Timestamp] Subject 'dr_smith' accessed Resource '12347'. Decision: GRANTED (EMERGENCY OVERRIDE). Purpose: 'User-provided reason: Cardiac arrest in ER'.

This also triggers an alert, but a different one: P2 - Post-Incident Review Required. The system is secure, usable, and—most importantly—accountable.

Key Technical Questions

Compliance is the floor, not the ceiling.

Don’t ask your partner if they can “pass a checklist.” You’re not buying a checklist. You’re buying an architecture that protects you when the checklist fails.

Ask them these questions instead:

“Show me your data model for authorization. Is it role-based or attribute/relationship-based?”
“How do you log an authorization failure versus an authentication failure?”
“How do you handle ‘break-the-glass’ scenarios in your logging and alerting pipeline?”