• How One Engineer Cut Incident Response from Hours to Seconds with a Runbook
    Jun 7 2026
    Episode 37 of The Software Engineering Podcast with Fexingo dives into a specific operational win: how a senior engineer at a mid-size fintech company automated incident response runbooks, slashing mean time to resolution from over two hours to under thirty seconds. Lucas and Luna walk through the before-and-after — the chaotic Slack threads, the manual playbook that lived in a Google Doc, and the gradual shift to code-driven remediation. They discuss why a runbook-as-code approach reduced human error, how the team tested incident flows in staging, and the one misstep that nearly caused a false positive cascade. The episode also touches on the broader movement toward 'incident response as software' and what it means for on-call culture. No hot takes, no buzzwords — just a concrete story of making systems more resilient by writing better automation scripts. #IncidentResponse #RunbookAutomation #SiteReliabilityEngineering #DevOps #OnCall #IncidentManagement #SoftwareEngineering #Automation #RunbookAsCode #Fintech #EngineeringCulture #MTTR #ReliabilityEngineering #Observability #Postmortem #Technology #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    7 mins
  • How a Single Bit Flip Brought Down an Entire Data Center
    Jun 7 2026
    In episode 36 of The Software Engineering Podcast with Fexingo, Lucas and Luna dive into one of the most infamous hardware-induced software bugs in recent memory: the 2021 Facebook outage caused by a single bit flip. Lucas explains how a routine configuration change triggered a cascading failure that took down Facebook, Instagram, and WhatsApp for over six hours. He walks through the exact sequence — a BGP withdrawal, DNS failures, and a data center network meltdown — and why a single incorrect bit in a router's memory was the root cause. Luna challenges the conventional wisdom about redundancy and asks whether engineers can realistically guard against single-bit errors at scale. They discuss cosmic rays, memory error-correcting codes, and the trade-offs between software abstraction and hardware reality. Along the way, they share practical lessons for engineers designing resilient systems: from careful change management to the dangers of assuming hardware is perfect. #SoftwareEngineering #Technology #FacebookOutage #BitFlip #BGP #DNS #DataCenter #Resilience #Infrastructure #Networking #CosmicRays #ECC #ErrorCorrection #ChangeManagement #CascadingFailure #EngineeringLessons #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    6 mins
  • How One Engineer Slashed Build Times from 40 Minutes to 90 Seconds
    Jun 6 2026
    In this episode of The Software Engineering Podcast, Lucas and Luna dive into the story of a senior engineer at a mid-size SaaS company who cut their CI build pipeline from 40 minutes to 90 seconds. They walk through the specific bottlenecks: a monolithic Gradle build with unnecessary task dependencies, Docker layers being rebuilt on every commit, and a test suite that ran sequentially. The fix was a combination of incremental compilation, parallel test execution, and a custom caching layer using BuildKit. They also discuss the cultural resistance to changing a 'working' build system and how the engineer used data to win buy-in. By the end, listeners get a concrete playbook for auditing their own CI pipelines and a reminder that build speed is often the cheapest performance optimization you can make. #SoftwareEngineering #BuildOptimization #CI #Gradle #Docker #BuildKit #IncrementalCompilation #ParallelTesting #DevEx #DeveloperProductivity #CacheStrategy #BuildPipeline #Tech #EngineeringBestPractices #FexingoBusiness #BusinessPodcast #TheSoftwareEngineeringPodcast #Fexingo Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    8 mins
  • How One Engineer Cut Docker Image Size by 90 Percent
    Jun 6 2026
    Episode 34 of The Software Engineering Podcast with Fexingo dives into a practical optimization story: how one engineer at a mid-sized SaaS company shrank a bloated Docker image from 2.1 GB to just 210 MB. Lucas and Luna walk through the specific techniques used — from switching to Alpine base images to eliminating layer bloat with multi-stage builds and removing unnecessary packages. They discuss why smaller images matter beyond disk space: faster CI pipelines, reduced network transfer, and improved security posture. The episode also touches on the engineer's debugging process, including how they used 'docker history' and 'dive' to find hidden culprits like cached pip packages and leftover build artifacts. Listeners will walk away with a concrete checklist they can apply to their own Dockerfiles today. #Docker #ContainerOptimization #AlpineLinux #MultiStageBuild #DevOps #CI #SaaS #SoftwareEngineering #Technology #FexingoBusiness #BusinessPodcast #EngineeringBestPractices #Dockerfile #LayerCaching #ImageSize #Performance #Security #Tooling Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    8 mins
  • How One Engineer Refactored a 10 Year Old Codebase in Six Weeks
    Jun 5 2026
    Episode 33 of The Software Engineering Podcast with Fexingo. Lucas and Luna dive into the story of a senior engineer who inherited a monolithic ten-year-old codebase with zero tests and a single deployment causing multi-hour outages. Over six weeks, they systematically added integration tests, extracted domain modules, and built a CI pipeline that cut deployment failures by 90 percent. The episode focuses on the practical first steps: the decision to start with a single endpoint, the use of characterization tests to capture existing behavior, and the trade-off between perfect abstraction and incremental improvement. No hot takes – just the concrete tactics that turned a legacy nightmare into a deployable system. #LegacyCodebase #Refactoring #IntegrationTesting #CI #CharacterizationTests #IncrementalImprovement #SoftwareEngineering #CodeArchitecture #BestPractices #TechnicalDebt #DeploymentPipeline #TestingStrategy #ModuleExtraction #SeniorEngineer #FexingoBusiness #BusinessPodcast #Technology #Episode33 Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    9 mins
  • How One Engineer Prevented a Deletion Cascade with a Soft Delete
    Jun 5 2026
    In this episode, Lucas and Luna dive into a near-disaster at a mid-sized SaaS company where a single engineer prevented a cascading data loss by implementing a soft delete pattern. They walk through the specific scenario: a misconfigured database trigger that would have wiped thousands of customer records during a routine cleanup. The engineer's decision to use a tombstone column and a scheduled purge job saved the company from a catastrophic data loss, costing only a few engineering hours. Lucas explains the technical details of soft delete versus hard delete, the trade-offs in query complexity and storage overhead, and why this pattern should be part of every production codebase. Luna asks the critical questions about when soft delete is overkill and how to automate the eventual cleanup. Listeners will learn one concrete pattern they can apply to their own systems to prevent irreversible data loss. #SoftDelete #DatabasePatterns #DataLoss #ProductionSafety #SoftwareEngineering #PostgreSQL #SQLErrors #Prevention #CascadeFailure #Tombstone #ScheduledJob #ReverseEngineering #BackupStrategy #AuditLog #Technology #FexingoBusiness #BusinessPodcast #EngineeringBestPractices Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    5 mins
  • How a Single Consistent Hashing Change Prevented a Cascade Failure
    Jun 4 2026
    In this episode of The Software Engineering Podcast, Lucas and Luna dive into a real-world case of consistent hashing preventing a production cascade failure. They break down how one engineer at a major streaming platform rearchitected a cache layer to avoid the thundering herd problem, reducing p99 latency spikes from 12 seconds to under 200 milliseconds during a regional outage. The conversation covers hash ring design, virtual nodes, and why this pattern is critical for distributed systems at scale. No fluff, just the concrete decision that saved a service from collapsing under its own traffic. #ConsistentHashing #CacheLayer #DistributedSystems #ThunderingHerd #HashRing #VirtualNodes #ProductionIncident #LatencyOptimization #Scalability #SoftwareEngineering #SystemDesign #Technology #FexingoBusiness #BusinessPodcast #EngineeringPodcast #CodeArchitecture #BestPractices #PodcastEpisode Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    9 mins
  • How One Team Cut Cloud Costs by 60 Percent with a FinOps Strategy
    Jun 4 2026
    In this episode of The Software Engineering Podcast, Lucas and Luna dive into the practical world of FinOps — cloud financial operations. They explore how a mid-size SaaS company called DataNest slashed its AWS bill by 60 percent in just six months. Lucas breaks down the specific tactics: moving from on-demand to reserved instances, implementing automated right-sizing policies, and building cost-awareness into developer workflows. Luna shares how the team used a simple dashboard to give every engineer visibility into their resource usage. The conversation covers the human side of cost optimization — how to get engineers to care about cloud spend without killing velocity. By the end, you'll walk away with actionable steps to apply FinOps principles to your own infrastructure, whether you're at a startup or a large enterprise. This is a focused, concrete look at how engineering teams can save real money without sacrificing performance. #FinOps #CloudCostOptimization #AWS #DataNest #EngineeringCulture #DevOps #InfrastructureAsCode #CostAwareness #ReservedInstances #RightSizing #CloudFinancialManagement #Technology #SoftwareEngineering #Podcast #FexingoBusiness #BusinessPodcast #TechSavings #DeveloperProductivity Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    13 mins