tfpdev
← tfpdev.com
Insights

Things I learned
building real
products.

24 decisions from a real platform build. Organized into 3 lessons that took 8 years to learn. One is classified.

Numbers are approximate where stated and exact where marked. I'd rather say “I don't have that number” than make one up. If you want specifics on any of these, ask me on a call — I'll tell you what I know and what I don't.

Showing 23 of 24 · ~13 min read

Lesson 01

Build for the business, not for the résumé

Textbooks teach you to build it right. Experience teaches you to build what matters.

10 insights · ~6 min read

ARCHITECTURESTAKEHOLDERS

Don't replace what works — extend it

Situation

The organization ran their entire product catalog and pricing through an existing POS system. Staff had years of muscle memory built around it. We had a choice: rebuild that functionality in our platform (more control, more work) or integrate with the POS as the source of truth (less control, faster delivery, zero retraining).

Decision

We made the POS the single source of truth and positioned every integration as 'this makes your existing tools more powerful.' The platform reads from the POS at runtime — it never competes with it. This was both a technical architecture decision AND a stakeholder management decision. Staff didn't have to learn a new catalog system. The platform just worked with what they already knew.

Outcome

Zero retraining cost. Staff adopted the platform in days — not weeks, not months. Catalog updates in the POS appeared on the platform in near-real-time. We never had a sync complaint. The lesson has two sides: architecturally, don't rebuild what works. Organizationally, the best integration is the one your users don't notice.

integrationPOSadoptionstakeholderspragmatism
PRODUCTARCHITECTURE

Mandatory onboarding — enforced at the infrastructure level

Situation

Platforms that let members skip profile completion end up with unreliable data — breaking reporting, communications, and access control downstream. But enforcing onboarding page-by-page in application code is fragile: any new page without the check is an instant access gap.

Decision

We made onboarding non-negotiable AND enforced it at the infrastructure layer — edge middleware, not page logic. Cookie state drives routing decisions before any application code runs. There is exactly one place where access rules live. No opt-outs, no 'remind me later.'

Outcome

Member data quality at launch was dramatically better than previous rollouts — complete profiles from day one instead of months of cleanup. Zero instances of members bypassing onboarding. When rules changed, one file update, one deploy. We didn't measure the exact completion rate against historical — but the operations team stopped complaining about data quality. First time.

onboardingmiddlewaredata-qualityenforcement
In Plain English

Edge Middleware

A checkpoint that runs before the website even starts loading. Think of it as a bouncer at the door — if you haven't completed onboarding, you don't get in. Doesn't matter which page you try to visit.

PAYMENTS & E-COM

When traditional payment rails aren't available, you build your own

Situation

Traditional payment processors presented regulatory and operational friction for this organization's specific product vertical. Standard providers were evaluated and determined not to be viable — a situation more common in certain industries than vendors admit.

Decision

Built a production-grade integration with an alternative payment provider, covering the full webhook lifecycle: signature verification, payload transformation, retry logic, timeout handling, and forwarding to downstream order systems.

Outcome

Payment acceptance went from impossible to fully operational. The webhook pipeline ran reliably in production — retries caught edge cases that would silently fail in a less disciplined integration. I can't give you a '99.9% uptime' figure because we didn't have formal SLA monitoring on it. What I can tell you: payments worked. Consistently. For a vertical where the traditional processors said no.

paymentsalternative-paymentswebhooksintegration
In Plain English

Webhook Lifecycle

When a payment happens, the payment provider sends a message to our server saying 'hey, money arrived.' The webhook lifecycle is everything we do with that message: verify it's real, transform it into our format, handle failures, and update the order. If any step fails, the whole thing retries safely.

PAYMENTS & E-COM

Delivery isn't a feature — it's its own domain

Situation

Physical product delivery was central to the organization's revenue model, not a peripheral feature. Treating it as an add-on would have produced an add-on result — functional in demos, brittle in production.

Decision

Designed delivery as its own operational domain from the architecture phase: dedicated driver-facing dashboard, packing workflows, real-time chat, and route calculation — all fully integrated with member accounts and inventory.

Outcome

Delivery went from nonexistent to a competitive differentiator. Coordination that used to require phone calls and WhatsApp messages happened through the driver dashboard instead. I don't have before/after timing data — we didn't benchmark the old chaos. But the organization cited delivery as a top reason members stayed engaged. That wasn't in any spec. It emerged from treating delivery as a real domain.

deliverydomain-designoperationscompetitive-advantage
PAYMENTS & E-COM

Referrals and credits tied to onboarding — not bolted on after

Situation

The organization needed organic member growth mechanisms that didn't require marketing spend. Generic discount codes don't create lasting loyalty; credits tied to genuine participation do.

Decision

Built a referral and credit system integrated directly into onboarding. Members earned credits through verified referrals; credits applied at checkout through the same session as every other account action.

Outcome

Referrals produced real member growth with zero marketing spend. I don't have the exact percentage — and I'd rather not guess — but the referral channel was active and producing sign-ups within the first quarter. Total marketing budget to activate it: $0. The system paid for its engineering cost quickly.

referralsgrowthonboardingzero-cost-acquisition
PRODUCT

Know what NOT to build — and document why

Situation

There was early pressure to replicate product management, pricing, and vendor management inside the web platform. This would have created a second system to maintain and keep in sync.

Decision

Drew a clear boundary: the POS owns commerce configuration; the platform owns member experience and operations. Any request to build functionality that duplicated what the POS did was explicitly declined and documented.

Outcome

The team avoided maintaining a parallel product catalog. Every feature you don't build is a feature you don't have to support. The time saved by not rebuilding the catalog system was measured in weeks of engineering — time that went into features that actually mattered.

scope-disciplinesay-noproduct-strategypragmatism
PRODUCT

Community features: native, not third-party

Situation

Embedding third-party community tools would have required members to manage separate accounts and context-switch constantly. Engagement in a disconnected tool rarely sustains.

Decision

Posts, messaging, and community rooms were built natively — inside the same authenticated session as every other platform feature. Community was treated as a core module, not an integration.

Outcome

Community activity stayed within the platform, measurable and connected to member profiles. Members who used community features stuck around longer — noticeably. I don't have a clean A/B test to give you a multiplier, but the correlation was clear enough that community became a retention priority.

communityretentionnative-buildengagement
PRODUCT

Separate mandatory from optional — always

Situation

Stakeholders initially wanted all onboarding fields mandatory. Pilot testing revealed significant drop-off — the questionnaire was too long to feel reasonable at first registration.

Decision

Split onboarding into two explicit tiers: mandatory (minimum for access) and optional (profile enrichment presented post-login). The mandatory tier was kept ruthlessly short. Optional questions surfaced only after members experienced the platform's value.

Outcome

Mandatory onboarding completion was high — the middleware enforcement meant there was no way around it. What surprised us: a meaningful chunk of members voluntarily completed the optional tier too. We expected most people to skip it. They didn't. Turns out not forcing something makes people more willing to do it.

onboardingUXconversionuser-psychology
STAKEHOLDERS

One dashboard, one source of truth for leadership

Situation

Leadership was making decisions by piecing together information from multiple disconnected sources — POS reports, spreadsheets, and messages from staff. The cognitive cost was high and the result was often outdated.

Decision

Designed the admin dashboard as a single operational pane: inventory, member data, delivery status, payroll, marketing analytics, and community health in one authenticated interface.

Outcome

Leadership went from piecing together information across multiple sources to having one screen. Decision-making got faster — they told us that directly. The dashboard became the first thing opened in every operational meeting. We didn't run a productivity study. We didn't need to. The behavior change was obvious.

dashboardleadershipdecision-makingunified-view
STAKEHOLDERS

Translate architecture into business risk language

Situation

Technical decisions are invisible to non-technical stakeholders until something goes wrong. Stakeholders who don't understand the rationale can't evaluate trade-offs or provide informed input.

Decision

Every significant technical decision was documented in business terms: what risk it mitigated, what it would cost to change later, and what the organization would need to do differently if it was wrong. Architecture reviews included non-technical stakeholders.

Outcome

Stakeholders made better-informed requests, and technical decisions had explicit organizational buy-in. When edge cases appeared in production, stakeholders understood why the mitigation worked — because they had been part of the decision.

communicationriskdocumentationtrust

Lesson 02

Pragmatism ships. Perfection doesn't.

The best technical decision is the one that ships on time and doesn't explode later.

7 insights · ~4 min read

ARCHITECTURE

Why we went serverless microservices from day one

Situation

The platform needed to cover many independent operational domains — member onboarding, retail, physical delivery, staff tooling, community features, and analytics. A monolithic backend would have created tight coupling between unrelated concerns and made independent scaling impossible.

Decision

Architected the backend as hundreds of independent serverless functions — each route handler a discrete, stateless microservice with its own scope and failure boundary. Domains that needed different scaling characteristics were isolated from the start.

Outcome

Each domain scaled and failed independently. During peak delivery hours, checkout traffic spiked hard without adding latency to community page loads. No cross-domain outages in production. That's the whole point of isolation — and it actually worked.

serverlessmicroservicesscalingisolation
In Plain English

Serverless & Microservices

Instead of one big application that does everything, we built hundreds of small independent programs — each handling one job. If the payment system gets slammed with traffic, it doesn't slow down the community features. If one part breaks, the rest keep running.

ARCHITECTURE

We built custom auth because off-the-shelf didn't fit

Situation

Off-the-shelf auth providers make assumptions about session models that didn't fit: custom role hierarchy, staff-level access tiers, ban enforcement, and the ability to forcibly invalidate sessions across devices instantly.

Decision

Designed a custom JWT + bcrypt authentication system with a dedicated invalidation table enabling server-side session revocation. Every authenticated API call validates token integrity, account status, and role in a single middleware pass.

Outcome

Zero auth-related incidents in production. Session revocation worked instantly when we needed it — an account was suspended and kicked from all devices in real time. Not theoretical capability. We used it.

authsecurityJWTcustom-build
In Plain English

JWT, Bcrypt & Session Revocation

JWT is a secure digital pass that proves who you are. Bcrypt scrambles passwords so even we can't read them. Session revocation means if someone's account gets banned, they're kicked out instantly — across every device, no waiting.

ARCHITECTURE

Hybrid inventory: granular tracking without breaking what exists

Situation

The existing POS couldn't represent inventory across multiple distinct physical storage locations. Staff needed location-level counts — per safe, per storage area — that the POS data model simply didn't support.

Decision

Stored inventory counts in our own database with a full audit log, then provided a deliberate one-way sync to the POS only when staff confirmed accuracy. The two systems stayed decoupled, with the push as an explicit staff action — not an automatic background sync.

Outcome

Staff got granular location-level tracking across multiple storage locations plus a full audit trail that never existed before. Within the first weeks, the audit log started flagging discrepancies — not hundreds, but enough that the operations team said 'how did we manage without this?' That's the bar.

inventoryaudit-trailhybrid-architecturedata-sync
TEAM & PROCESS

Admin permissions that don't need a developer to change

Situation

The admin dashboard covered many operational domains — inventory, payroll, delivery, marketing analytics, community moderation. A simple 'admin' role couldn't express the access control the organization actually needed.

Decision

Built a data-driven permission system with a custom React hook and wrapper component. Each admin module declares its required permission; the system handles rendering and API access enforcement. Adding a new module means one config entry, not a new access control implementation.

Outcome

Permissions stayed current through multiple team restructures. Adding a new admin module with full permission support: one config entry, no custom code, deployed in minutes. The operations team adjusted roles without filing a dev ticket. That was the goal.

permissionsadminRBACdeveloper-experience
TEAM & PROCESS

AI for back-office — practical, not performative

Situation

The organization had access to large external datasets relevant to planning decisions. Manual analysis was inconsistent and time-consuming, with results varying based on who was doing it.

Decision

Integrated AI into a dedicated internal analysis workflow — combining external data fetching with structured AI-assisted analysis. A back-office productivity tool with clear data boundaries, not a consumer-facing feature bolted on for appearances.

Outcome

Analysis that used to take an afternoon of manual work became a short workflow. I won't give you a fake '87% faster' number — but the person who used to do it manually switched to the AI workflow and never went back. That's the metric that matters.

AIautomationback-officepractical-AI
HARD LESSONS

No automated tests in phase one — and why that was the right call

Situation

In the first phase of a platform covering multiple operational domains simultaneously, a comprehensive test suite would have consumed a significant portion of engineering capacity — urgently needed to ship working features.

Decision

Made an explicit, documented call to defer automated testing in phase one, relying on code review, staging validation, and production monitoring. The decision included a commitment to add tests in phase two — and we did. Estimated capacity saved: weeks of engineering time redirected to shipping features.

Outcome

Phase one shipped on schedule. Phase two tests were written with actual knowledge of what breaks in production, not guesses. We wrote fewer tests than originally estimated, but each one targeted a real failure mode we'd observed. That's a better test suite.

testingpragmatismtrade-offsshipping
HARD LESSONS

Three UI libraries in one project — on purpose

Situation

Different domains of the platform had genuinely different UI requirements — member-facing experience needed polish, staff dashboards needed data density, admin tooling needed rapid prototyping speed.

Decision

Made deliberate library choices per domain: Tailwind for member-facing UI, Ant Design for data-heavy staff interfaces, and Material UI where specific components provided the right interaction model.

Outcome

Each domain got tooling appropriate to its purpose. The constraint we enforced: consistency within a domain, pragmatism across domains. Mixing UI libraries requires clear rules about when each applies — without those rules, the codebase becomes incoherent.

UIlibrariespragmatismdesign-systems
In Plain English

Tailwind vs Ant Design vs Material UI

Three different toolkits for building user interfaces. Tailwind gives you full creative control (used for what members see). Ant Design comes with pre-built data tables and forms (used for staff dashboards). Material UI has specific interactive components (used for admin tools). Different jobs, different tools.

HARD LESSONS#19

Situation · Decision · Outcome

[REDACTED][REDACTED][REDACTED]
Note from the author

This one is actually good. Like, genuinely good — the kind of lesson that makes a room go quiet for a second. The client nodded slowly. Someone took notes. Those notes are now in a legal document.

What happened involved [REDACTED], a very honest post-mortem, and apparently one too many people on the CC list. The outcome was useful. The details are 0% shareable.

Sorry. You get 23 insights instead of 24. The 24th one knew too much.

Lesson 03

Infrastructure and process are the product

The unglamorous work is where projects survive or die.

5 insights · ~3 min read

DEVOPS

Self-hosted observability at 10% of the SaaS price

Situation

Commercial observability platforms at scale are expensive, and they give you less control over data retention, alerting logic, and dashboard customization than the project needed.

Decision

Deployed a self-hosted Loki + Grafana stack on a dedicated VPS using Docker Compose, integrated with CI/CD for automated deployments. The application pushes structured logs; Grafana provides real-time operational dashboards.

Outcome

Full observability at a fraction of equivalent SaaS pricing. The exact saving depends on what you'd compare it to, but we're talking about a VPS bill vs enterprise observability subscriptions — it's not close. The entire stack was rebuilt from the repo during an actual VPS migration. It took under an hour. That number is real.

observabilitygrafanaself-hostedcost-optimization
In Plain English

Loki + Grafana

Loki collects all the logs (records of what the system is doing). Grafana turns those logs into visual dashboards — graphs, alerts, real-time status. Together they're like a control room for the entire platform. We host them ourselves instead of paying for expensive cloud services.

DEVOPS

Infrastructure as code from day one — not after the first fire

Situation

Manually configured infrastructure doesn't survive server migrations, team changes, or incident recovery. If the person who set up the VPS isn't available when it needs to be rebuilt, the organization is in trouble.

Decision

Every infrastructure component is version-controlled: Docker Compose services, Nginx configuration, backup schedules, monitoring setup. CI/CD automatically deploys infrastructure changes when the config directory is updated.

Outcome

The entire observability stack was rebuilt during the project when the VPS needed replacing — recovered in under an hour. Infrastructure-as-code is non-negotiable for production systems.

IaCdockerdisaster-recoveryautomation
In Plain English

Infrastructure as Code

Instead of manually setting up servers by clicking buttons in a dashboard, every server configuration is written as code files. If the server explodes, we run the code and get an identical server back in minutes. It's a recipe book for your entire infrastructure.

DEVOPS

Three environments or you're gambling with production

Situation

A platform with live member accounts, financial transactions, and delivery operations cannot absorb untested changes into production. The cost of a production incident far exceeds the cost of a staging environment.

Decision

Maintained a full three-environment pipeline from day one — development, staging mirroring production configuration, and production. Every change validated in staging before promotion. Staging used production-equivalent data fixtures, not toy data.

Outcome

Multiple significant issues were caught in staging before reaching production — delivery workflow edge cases, data sync timing bugs, things that only show up with realistic data. I didn't keep a count. Enough that nobody on the team questioned the cost of maintaining staging.

environmentsstagingdeploymentrisk-management
In Plain English

CI/CD Pipeline

An automated assembly line for code. Developer writes code → it gets automatically tested → automatically deployed to a test environment → then to production. No manual uploading, no 'works on my machine' surprises.

DEVOPS

'We have backups' means nothing without verification

Situation

'We have backups' is a meaningless statement unless you know the backups are current, complete, and restorable. Many organizations discover their backup process is broken at exactly the moment they need it most.

Decision

Automated database backups through the infrastructure stack, with scheduled verification that backups are being created and are non-empty. Backup status is visible in the operational dashboard.

Outcome

Backup verification runs on a schedule. Recovery time: under an hour for full database restoration — tested during the project, not theoretical. The distinction between 'we have backups' and 'we've actually restored from backups' is significant. We crossed that line.

backupsverificationdisaster-recoveryoperations
STAKEHOLDERS

Name scope creep directly — don't absorb it silently

Situation

Mid-project, new feature requests arrived framed as if they had always been part of the plan. Without a clear process, the team was absorbing work without explicit agreement on trade-offs.

Decision

Implemented a lightweight scope classification: every new request was explicitly categorized as in-scope, deferred to a future phase, or new scope requiring timeline discussion. When a request was scope creep, we named it — respectfully and directly.

Outcome

Over the build, a double-digit number of feature requests were explicitly classified as 'phase two' — each with a documented reason. Core features shipped on time. No stakeholder was surprised by what wasn't included, because we'd named it early. That's the point: no surprises is better than no scope creep.

scope-creepcommunicationproject-managementhonesty

If you read nothing else

01

Your users don't care about your architecture. They care about whether the product solves their problem. Build for that.

02

Tech debt you choose deliberately is manageable. Tech debt you accumulate by accident is a liability. Know the difference.

03

If your staging environment is a toy, your production will be a disaster. Infrastructure decisions are product decisions.