What's Inside
A diagnosis of Datadog's internal education gap, a concrete proposal for the first 90 days, a content architecture framework, and the evidence that I've already built systems like this.
11 sections · Swipe or use arrows to navigate · Built for a skim or live walkthrough
1. The Company
Datadog is the gold-standard observability and security platform for cloud infrastructure. Founded in 2010, the company went public in 2019 (NASDAQ: DDOG) and has grown into one of the most consequential enterprise SaaS companies of the last decade. With 8,100+ employees worldwide, Datadog provides unified monitoring across metrics, traces, logs, user experience, and security — all from a single pane of glass.
Their customers are developers, DevOps engineers, SREs, and IT operations teams at companies of every size, from startups to the Fortune 500. If your infrastructure runs in the cloud, Datadog is probably watching it. The platform integrates with 750+ technologies out of the box, from AWS and Kubernetes to Redis and Postgres.
The company is hiring aggressively across nearly every function. As of early 2026, open headcount breaks down roughly as follows:
- Engineering: 122 open roles — backend, frontend, infrastructure, ML
- Sales: 172 open roles — enterprise AEs, SDRs, solutions engineers
- Technical Solutions: 72 open roles — customer success, TAMs, professional services
- Product & Design: 40+ roles across PM, UX research, and design systems
Datadog's strategic bets for 2026 center on three vectors: AI observability (LLM monitoring, agentic application tracing, model performance tracking), security (Cloud SIEM, application security management, compliance), and developer experience (CI visibility, software delivery analytics, code-level profiling). The company is expanding from "monitoring platform" to "everything platform for cloud operations," and that expansion creates an enormous internal education surface area.
This matters for the role because every new product line means more internal systems engineers need to understand, more deployment patterns to learn, and more onboarding surface to cover. The faster Datadog ships new capabilities, the wider the knowledge gap between experienced engineers and new hires. That gap is the job.
2. The Role
This is an internal education role, not a customer-facing one. You are not writing Datadog's public documentation or training external users. You are building courses, codelabs, facilitator guides, and recorded presentations for Datadog's own people — engineers, product managers, designers, and data analysts — teaching them how Datadog's internal systems actually work.
The scope is broad and deeply technical. The content you create covers:
- Internal service mesh architecture — how Datadog's microservices communicate, how traffic routing works, what happens during failover
- Deployment pipelines — how code gets from PR to production, the CI/CD toolchain, feature flags, canary deployments
- Code review workflows — team conventions, review expectations, approval gates
- Internal tooling — bespoke CLIs, internal dashboards, debugging utilities that don't exist outside Datadog
- Monitoring practices — how to instrument a new service, what SLOs to set, how to use Datadog to monitor Datadog (yes, it's recursive)
The role requires partnering closely with subject matter experts (SMEs) across engineering. You are the translator between the person who built the system and the person who needs to use it. Your deliverables include:
- Self-paced codelabs with hands-on exercises
- Facilitator guides for instructor-led sessions
- Recorded video presentations for async learning
- Assessment rubrics to verify comprehension
- Maintenance schedules to keep content current as systems evolve
They want someone who has been an SRE, developer, or PM in a cloud environment — you need to have lived the world you are teaching. Additionally, the role asks for 5+ years building courses for technical audiences and a solid grasp of instructional design principles (learning objectives, scaffolding, assessment design, cognitive load theory). This is not a role where you can wing the pedagogy or fake the technical depth. You need both.
3. The Problem I'd Name
Datadog has 8,100+ engineers spread across a global hybrid workforce. The company ships new products and features at an extraordinary pace — multiple product launches per year, each one adding new internal systems, new APIs, new deployment patterns. Every launch widens the gap between what new engineers need to know to ship safely and the time SMEs have to explain it.
This is not a documentation problem. Datadog almost certainly has internal docs. The problem is that most internal enablement programs at this scale fall into one of two failure modes:
The onboarding path relies too heavily on the SME's time. New hires shadow senior engineers for weeks. The senior engineer's own velocity drops. When that engineer goes on vacation or leaves, the onboarding path breaks. Knowledge lives in people's heads, not in materials. This does not scale.
Someone writes a codelab or a long Confluence page. It is technically correct. But it was written by an engineer, not an educator. There is no scaffolding. There is no cognitive load management. There are no checkpoints. There is no practice. The content dumps information in the order the author thinks about it, not in the order the learner needs to receive it. Completion rates are dismal. People skim it, get stuck, and Slack the SME anyway.
The specific failure mode I would name in the interview: most internal codelabs are built by engineers, not educators. They are accurate but not learnable. The author assumes the reader's mental model matches their own. There is no pre-assessment to gauge where the learner is starting. There is no chunking to manage cognitive load. There is no failure feedback — when a learner gets stuck, they do not know why or what to revisit. The content is a wall of technically correct prose with screenshots, and the learner's only recourse when confused is to find the SME on Slack.
This is the gap the role exists to close. And closing it requires someone who understands both the technical systems and the science of how adults learn complex material.
Real-World Context: The Translation Problem
The Parsons Analogy
When I taught JavaScript to design students at Parsons School of Design, I faced the exact same translation problem this role demands. My students were not unintelligent — they were college juniors (i.e. aspiring artists) in Design and Technology. They had different mental models. They thought in visual hierarchies, spatial relationships, and interaction flows. They did not think in variables, loops, and call stacks.
The mistake a lesser instructor makes is to start with the syntax. "Here's how you declare a variable in JavaScript." That approach fails because it starts from the expert's mental model, not the learner's. Instead, I started from what they already understood — visual composition, cause and effect on a canvas — and bridged to the programming concepts through those existing models.
The same dynamic applies at Datadog. A mid-level engineer joining the Platform team needs to understand the service mesh in their first two weeks. They are not unintelligent — they may have years of experience at other companies. But they have different mental models. They know how service meshes work in the abstract; they do not know how Datadog's specific implementation works, where the gotchas are, or what the tribal knowledge says about failure modes.
The content needs to meet the learner where they are, not where the SME is. The SME thinks about the service mesh in terms of its implementation details because they built it. The new engineer needs to think about it in terms of what they already know about distributed systems, and then bridge from there to Datadog's specifics. That translation — from expert mental model to learner mental model — is the core skill of instructional design, and it is the exact skill I built at Parsons.
Where This Analogy Breaks Down: At Parsons, the gap was between disciplines (design vs. engineering). At Datadog, the gap is within the same discipline but across institutional knowledge boundaries. The pedagogical principles are identical — scaffolding, chunking, meeting the learner where they are — but the content itself is more technical and the learners are more advanced. That actually makes it easier in some ways: you can assume higher baseline competence and move faster through foundations.
5. The Proposal
A Framework for Engineering Onboarding Content That Engineers Actually Complete
This proposal addresses the core failure mode: internal codelabs that are technically accurate but pedagogically flat. The framework has three pillars.
Pillar 1: Separate Authorship from Expertise
Most internal codelabs fail because the person who knows the system is also the person writing the training. Engineers are domain experts, not curriculum designers. When an engineer writes a codelab, they unconsciously skip steps that feel obvious to them, organize content in implementation order rather than learning order, and provide no mechanism for the learner to verify their own understanding.
Before: Engineer-Authored Codelab
## Service Mesh Configuration
Our service mesh uses Envoy proxies with custom xDS
configuration. Each service registers with the control
plane via gRPC. Configuration is managed through our
internal tool `meshctl`.
To configure a new service:
1. Add your service to the mesh registry
2. Define upstream and downstream dependencies
3. Run `meshctl apply` to propagate changes
4. Verify with `meshctl status`
[45 minutes of dense text continues...]
After: Educationally Scaffolded Codelab
## Module 1: What Problem Does the Service Mesh Solve?
Learning Objective: Explain why Datadog uses a service
mesh instead of direct service-to-service communication.
Before you begin: What do you already know about how
services talk to each other at scale? (2-minute reflection)
Context Bridge: If you have used an API gateway before,
the service mesh extends that concept to ALL internal
traffic, not just external requests.
[Hands-on: Trace a single request through the mesh
using the internal dashboard. Draw the path it takes.]
Checkpoint: Can you name three benefits of routing
traffic through Envoy proxies instead of direct calls?
[8 minutes per module, 5 modules total]
Pillar 2: Scaffold for Cognitive Load, Not Completeness
The instinct when building training is to be comprehensive — cover everything, leave nothing out. This is backwards. Comprehensive content creates cognitive overload. The learner cannot hold the entire service mesh architecture in working memory on day one. Instead, scaffold the learning: start with the smallest useful mental model, let them practice with it, then expand.
Take one of Datadog's publicly visible training patterns — the Datadog Learning Center or the public documentation — and sketch what a properly scaffolded version looks like. The public docs are written for reference (look up what you need). Internal training should be written for acquisition (build understanding progressively). These are different genres with different structures.
A properly scaffolded module includes:
- Chunking: No module longer than 8-10 minutes. If it takes longer, split it.
- Checkpoints: Every module ends with a verification question or exercise. The learner must demonstrate understanding before moving on.
- Spaced repetition hooks: Day 3 content references Day 1 concepts. Week 2 includes a brief recall exercise on Week 1 material. This is how long-term retention works.
- Facilitator guide: For instructor-led versions, the facilitator knows which questions to ask, which misconceptions are common, and when to pause for hands-on work.
Pillar 3: Content Architecture as a System
Each piece of training content should follow a repeatable architecture. This is not a template — it is a system that ensures every module covers the right bases regardless of topic.
CONTENT ARCHITECTURE FLOW
========================
Learning Objective
|
v
Pre-Assessment (Where is the learner starting?)
|
v
Context Bridge (Connect new concept to existing knowledge)
|
v
Chunked Content (8-10 min modules, one concept each)
|
v
Checkpoint (Verify understanding before proceeding)
|
v
Practice Exercise (Hands-on, in a real or sandbox environment)
|
v
"What Can Go Wrong" Sidebar (Common failure modes)
|
v
Spaced Review (Callback to this content in later modules)
This architecture is not theoretical. It maps directly to instructional design theories and principles — and I have already built it into production software. Teacher's Pet is a course diagnostics tool I built that analyzes courses against our own deterministic reasoning engine, identifies structural failures, scores cognitive load balance, and recommends specific fixes. The framework above is the same framework the tool validates automatically. Every module I would build at Datadog would follow this pattern — and I can point to running software that proves the system works.
6. First 90 Days
Here is exactly what I would do in my first month. The goal is to deliver one tangible, useful artifact by day 90 — not a strategy deck, but an actual piece of content that engineers can use.
Map the existing onboarding path for mid-level engineers joining the Platform team. What materials exist today? What format are they in? Who owns them? When were they last updated? I would catalog every piece of onboarding content — Confluence pages, Slack bookmarks, recorded Zooms, GitHub repos with READMEs — and note the gaps between what exists and what a new hire actually needs in their first two weeks.
Deliverable: Content audit spreadsheet with coverage gaps identifiedTalk to 3-5 engineers who joined the Platform team in the last 6 months. Ask them: Where did you get stuck? What took longer than it should have? What did you wish someone had explained earlier? What did you have to learn from a person that should have been in a document? These interviews surface the actual friction points, not the ones leadership assumes exist. The gap between what leaders think is hard and what new hires actually struggle with is almost always surprising.
Deliverable: Friction map with top 5 onboarding pain points ranked by frequency and severityTake the #1 friction point from the interviews and propose a single, bounded codelab module to address it. This is not a full course — it is one module, 30-40 minutes total, broken into 4-5 chunks. I would write a one-page proposal that includes: the learning objective, the target audience, the prerequisite knowledge, the module structure, and the assessment criteria. This proposal goes to the relevant SME and the hiring manager for feedback before I start writing.
Deliverable: One-page module proposal with learning objectives and structure outlineWrite the first draft of the module, including the facilitator guide. Then test it with one cohort — ideally 3-5 engineers who are currently onboarding or who recently onboarded and can give informed feedback. Observe where they get stuck. Note which checkpoint questions they get wrong. Measure time-on-task versus estimated time. Revise based on what you see, not what you assume.
Deliverable: Complete codelab module with facilitator guide, tested with one cohort, revision notes documentedThe point of this plan is to demonstrate the working method, not just talk about it. By day 90, there is a real artifact that engineers have used. That artifact becomes the proof-of-concept for the broader curriculum framework. It is much easier to get buy-in for a 12-month content roadmap when you can point to something that already worked in week 4.
7. Content Architecture Example
Here is a concrete example of how I would restructure a hypothetical "Datadog Service Mesh 101" codelab, transforming it from a typical engineer-authored document into a properly scaffolded learning experience.
Current State: The Typical Internal Codelab
Format: Single Confluence page, 45 minutes of continuous text with screenshots. No exercises. No checkpoints. Written by the engineer who built the service mesh. Assumes reader understands Envoy, gRPC, and Datadog's internal control plane. Completion rate: unknown (no tracking). Anecdotally, most people read the first third and then Slack the author with questions.
Proposed State: Scaffolded Module Structure
Five modules, each approximately 8 minutes, following the content architecture framework:
| Module | Title | Learning Objective | Exercise | Checkpoint |
|---|---|---|---|---|
| 1 | Why a Service Mesh? | Explain the problem the mesh solves at Datadog's scale | Diagram request flow with and without mesh | Name 3 failure modes the mesh prevents |
| 2 | The Control Plane | Describe how services register and discover each other | Use meshctl to query a service's upstream dependencies |
Predict what happens if the control plane goes down |
| 3 | Traffic Routing | Trace a request through the Envoy sidecar proxies | Use the internal dashboard to follow a live request | Identify where latency is added by the mesh |
| 4 | Adding a New Service | Register a service with the mesh and verify connectivity | Deploy a test service to the staging mesh | Verify your service appears in mesh topology view |
| 5 | When Things Break | Diagnose and resolve common mesh configuration errors | Deliberately misconfigure and fix a routing rule | Explain the debugging steps for a 503 from the mesh |
What Each Module Contains
MODULE TEMPLATE
==============
1. Learning Objective (1 sentence, action verb)
"After this module, you can [verb] [concept]."
2. Context Bridge (2-3 sentences)
Connect this module to what the learner already knows.
"If you have used [X] before, this is similar but [Y]."
3. Core Content (5-6 minutes of reading/watching)
Focused on ONE concept. No tangents.
Includes diagrams or terminal output, not just prose.
4. Hands-On Exercise (2-3 minutes)
The learner DOES something, not just reads.
Uses a real or sandbox environment.
5. Checkpoint Question (1 question)
Tests comprehension, not recall.
"What would happen if..." not "What is the command for..."
6. "What Can Go Wrong" Sidebar (optional)
Common mistakes or misconceptions for this topic.
Sourced from SME interviews and support tickets.
7. Bridge to Next Module (1-2 sentences)
"Now that you understand [X], the next module covers
how [Y] builds on it."
This structure is modular by design. If Datadog's service mesh changes, you update Module 4 without rewriting the entire codelab. If a new failure mode emerges, you add it to Module 5's sidebar. The architecture supports maintenance, not just initial creation.
8. Why Me
This role requires a rare combination: someone who can go deep on cloud infrastructure systems and design learning experiences that actually produce comprehension. Here is how my background maps directly to what Datadog needs:
| Role Requirement | My Evidence |
|---|---|
| 5+ years building courses for technical audiences | Taught JavaScript and creative coding at Parsons School of Design for non-engineer undergraduate students. Built the full curriculum from scratch: syllabi, exercises, assessments, rubrics. The core challenge — translating complex technical concepts for people with different mental models — is the exact same translation problem this role demands. |
| Cloud environment experience (SRE/developer/PM) | Product Engineer at HashiCorp, working with Nomad (container orchestration), extending feature-offerings to offer 'lite services' of Consul (service mesh) and Vault (secrets engine). I was not adjacent to the cloud infrastructure world — I was in it, building and deploying the tools that SREs and platform engineers use daily. |
| Instructional design expertise | Built Teacher's Pet — a production tool that diagnoses course quality against my own learning science model, measures cognitive load balance, and scores assessment alignment. Published 26+ books through a custom LMS with documented architecture. Designed 7 personalized learning pathways with condition-specific scaffolding for health protocols. Runs his own LMS with tiered offerings ($75–$1,000) and documented student outcomes from QEDC, Parsons, and Equinox. |
| Working with SMEs to extract teachable content | At HashiCorp, I routinely extracted teachable patterns from senior engineers who could build the systems but could not explain them to newcomers. The skill is knowing which questions to ask the SME, how to identify the implicit knowledge they skip over, and how to restructure their expert narrative into a learner-friendly sequence. |
| Content that people actually complete | 30 algorithm reference guides using progressive disclosure, analogy-first explanations, and hands-on walkthroughs. 26+ published books on a live LMS with Cal.com scheduling, LangSmith observability, and a public "How We Built This" architecture page. Content is designed for completion, not decoration — every guide follows the same scaffolded structure because that structure is what drives people to finish. |
The intersection of "has built cloud infrastructure tooling" and "has designed and delivered technical curriculum from scratch" is a very small Venn diagram. Most curriculum developers come from an education background and learn the tech. Most engineers who write training do it as a side task without pedagogical training. I have done both, professionally, at recognized organizations.
9. Common Pitfalls I'd Avoid
These are the failure modes I have seen (and sometimes made) in technical curriculum development. Naming them upfront signals to the interviewer that I have been around the block.
Pitfall 1: Building Content in Isolation from SMEs
THE FAILURE:
Curriculum developer goes off for 3 weeks, writes an
entire codelab based on documentation and their own
understanding, then shows it to the SME for "review."
SME says: "This is wrong in 6 places, and the deployment
workflow changed last sprint."
Result: Rework. Wasted time. Eroded trust.
THE FIX:
Co-create with the SME from day one. Start with a
30-minute interview to build the outline TOGETHER.
Share drafts every 2-3 days, not at the end.
The SME's time investment: 2-3 hours total.
The accuracy gain: no rework, no drift.
SMEs are busy. Respect their time by making the
feedback loop tight, not by avoiding them.
Why this matters: Accuracy drift is the fastest way to lose engineering trust in internal training. If one codelab has outdated information, engineers stop trusting all of them.
Pitfall 2: Front-Loading Theory Before Hands-On
THE FAILURE:
Module starts with 20 minutes of conceptual explanation.
Architecture diagrams, design decisions, historical
context. THEN gives the learner something to do.
By minute 12, the learner has checked Slack twice
and retained nothing.
THE FIX:
Hands-on within the first 3 minutes. Give the learner
something to DO immediately, even if they do not fully
understand it yet. Then explain what they just did.
"Run this command. See that output? Here's what it
means and why it matters."
Experience first, theory second. This is how adults
actually learn technical systems.
Why this matters: Completion rates for internal training are the leading indicator of whether the content is working. Front-loaded theory kills completion. Every engineer has a Confluence graveyard of half-read onboarding docs.
Pitfall 3: Measuring Completion Without Comprehension
THE FAILURE:
"95% of new hires completed the onboarding codelab!"
But: Can they actually deploy a service to the mesh
without help? Can they debug a routing error? Can they
set up monitoring for a new service?
Completion is a vanity metric. It tells you someone
scrolled to the bottom, not that they learned anything.
THE FIX:
Measure comprehension through checkpoint questions
and practical assessments. Track:
- Checkpoint pass rates per module (identifies weak spots)
- Time-to-first-deploy for new engineers (the real KPI)
- SME interruption frequency (are people still Slacking
the expert, or can they self-serve?)
- 30-day confidence survey (do new hires FEEL ready?)
These metrics tell you whether learning actually happened.
Why this matters: If you cannot prove the training works, it will eventually lose budget and headcount. Comprehension metrics protect the program's existence.
10. The Walk-In Script
The One-Liner
"I built curriculum for non-engineers learning to code at Parsons, and before that I was shipping infrastructure tooling at HashiCorp. I want to show you what I'd do in the first 30 days in this role."
This sentence does three things: it establishes the education credential, it establishes the technical credential, and it pivots immediately to the deliverable. Do not linger on your background. Get to the proposal.
The Approach
Answer the first couple of interview questions normally. Build rapport. Demonstrate that you can hold a technical conversation about observability, service meshes, and cloud infrastructure. Show that you are not a pure education person who will need to be hand-held through the technical content.
Then, at the right moment — usually when they ask "What would your approach be?" or "How would you tackle this?" — say:
"Actually, before we get into the abstract — I put something together. It's a proposal for how I'd approach the first 30 days, including a content architecture framework and a concrete example of how I'd restructure an onboarding codelab. Can I walk you through it?"
Pull out the paper (or share the screen). Walk them through the proposal section by section. Do not read it to them — narrate it. Point to the before/after example. Explain the content architecture diagram. Show the module table for Service Mesh 101.
Then — and this is the key move — invite them to edit it. Say: "This is obviously based on what I could learn from the outside. Where am I wrong? What would you change?" This does two things: it shows intellectual humility, and it turns the interview into a collaborative working session. You are no longer being evaluated. You are co-creating. That is a much stronger position.
If They Push Back
They might say "We already have codelabs" or "Our engineers write their own training." Good. That means you know where the content is. Ask: "What are the completion rates? When someone finishes, can they do the thing without asking the SME?" If they do not know the answer, you have just demonstrated the problem you are there to solve.
11. Key Takeaways
- Datadog's internal education gap is pedagogical, not technical. The knowledge exists inside the company. The problem is that it lives in engineers' heads and in documents that are accurate but not learnable. The fix is instructional design, not more documentation.
- SMEs are bottlenecked — curriculum needs to scale without them. Every hour an SME spends explaining the service mesh to a new hire is an hour they are not shipping. The curriculum developer's job is to extract that knowledge once and package it so it can be delivered repeatedly without the SME in the room.
- The first project should be small, bounded, and immediately useful. Do not start with a 6-month curriculum roadmap. Start with one module for one team's top friction point. Ship it in 30 days. Measure whether it works. Then use that proof-of-concept to build the case for the broader program.
- Measure comprehension, not completion. Completion rates are vanity metrics. Track checkpoint pass rates, time-to-first-deploy, and SME interruption frequency. These tell you whether learning actually happened.
- Engineer + educator is the exact combination this role needs. Most candidates will be strong on one side or the other. The ability to understand a service mesh architecture at the implementation level AND design a scaffolded learning experience that produces comprehension is the differentiator. That is not a common profile, and it is exactly what I bring.