Files

T

dennisthiessen e286c6a9bb docs(applications): Google DE interview prep brief + STAR stories

Prep materials for the Google Senior Data Engineer (Merchant Data
Science) process after passing the Hiring Assessment 2026-06-20:
interview prep brief + STAR story bank alongside the session files.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

2026-07-03 13:58:43 +02:00

12 KiB

Raw Blame History

Google — Sr Data Engineer — Behavioral STAR Stories

For the "Googleyness & Leadership" round (and to seed answers in technical rounds). Google scores: General Cognitive Ability, Role-Related Knowledge, Leadership, Googleyness. Behavioral round wants: ownership, autonomy, ambiguity, impact, collaboration, "raise the bar."

How to use: Each story is in STAR form (Situation / Task / Action / Result) plus a "maps to" tag and follow-up-probe prep. Tell them out loud, ~2–3 min each. Lead with YOUR action ("I…"), be specific about decisions and trade-offs, end with quantified or concrete impact.

Accuracy guardrails (do not drift in the room):

Own your domains/components/products. NEVER claim you solo-built Swisscom's company-wide Data Mesh — say "governed data products within Swisscom's company-wide Data Mesh."

No fabricated metrics. Where you don't have a hard number, describe the concrete before/after.

Bosch ML: you designed AND executed the inference-integration strategy (Zeugnis "Erarbeitung und Durchführung") — own it fully.

Story 1 — Ownership / Autonomy (flagship)

"Owning business-critical pipelines end-to-end" — SW-2

Maps to: "Operate with high autonomy, own from conception to impact" (JD responsibility #4); Leadership/ownership.

S — At Swisscom I'm Component Owner for the Fulfillment domain's data pipelines — business-critical flows from Oracle source systems through Kafka into our Teradata data warehouse, feeding downstream B2B analytics. If these break, stakeholders lose visibility into core fulfillment operations.

T — As Component Owner I'm the single accountable engineer: data availability, SLA, data quality, governance compliance, and on-call. Not just building — keeping it correct and available in production.

A — I own the full lifecycle: I built and maintain the Python ingestion, enforce Data Governance and privacy standards on the data, and carry 2nd/3rd-level support and on-call duty. When incidents hit, I run root-cause analysis and drive the fix rather than escalating it away. I designed the pipelines to be idempotent and re-runnable so backfills don't corrupt downstream tables.

R — The Fulfillment pipelines run under a maintained SLA with full governance compliance, and downstream B2B analytics teams get reliable, on-time data. The ownership signal was part of what got me promoted from Senior to Staff (Engineer IV) in April 2025.

Probes to be ready for:

"Tell me about a time it broke." → walk a concrete incident: detection (monitoring/alert), triage, RCA, the fix, and the guardrail you added so it didn't recur.
"How do you decide what to automate?" → recurring manual toil + risk of human error → automate; one-offs → don't.

Story 2 — Driving a migration / technical judgment

"Legacy Teradata/Oracle → cloud-native AWS" — SW-1

Maps to: "Apply advanced data engineering, modeling and architectural frameworks" (resp. #1); Role-Related Knowledge + Leadership.

S — Swisscom's legacy ETL ran on Teradata and Oracle — heavy operational overhead, hard to scale, not cloud-native.

T — Lead the migration of my domains' ETL stack to a serverless AWS architecture without disrupting the business-critical data the team depends on.

A — I designed and built the target stack: S3 as the lake, Glue for transforms, Athena over Apache Iceberg as an open table format (so we get schema evolution and time-travel without lock-in), Redshift for serving, and Airflow plus Step Functions/Lambda for orchestration — all provisioned as code with CloudFormation. I sequenced it to migrate incrementally and run old/new in parallel to de-risk cutover, validating outputs matched before switching consumers over. I deliberately chose Iceberg over a closed format for evolvability.

R — Reduced manual operational overhead, improved pipeline observability, and gave the team a scalable serverless foundation — data availability for downstream analytics got faster and the stack is positioned for modern lakehouse workflows.

Probes:

"Why Iceberg?" → open table format, schema evolution, time-travel, engine-agnostic, avoids warehouse lock-in.
"What would you do differently?" → have an honest answer (e.g., invest earlier in automated output-diffing to speed validation).
Google bridge: their internal equivalent is BigQuery/Dremel — say you'd map these patterns onto their stack quickly.

Story 3 — Ambiguity / non-routine problem

"Building governed data products in a Data Mesh" — SW-7

Maps to: "Identify the underlying need… solve non-routine problems… build reliable data products used across the org" (resp. #1, #2). This is the JD's literal mission. L5-grade story — bring it at that level.

S — Swisscom is moving to a decentralized Data Mesh — domains own and publish their own data as products rather than everything funneling through one central team. The hard part isn't the tooling; it's that "what makes a good, reusable data product" is genuinely ambiguous up front.

T — Within that company-wide initiative, build governed, reusable data products with proper metadata so other teams can actually discover and trust them — turning raw domain data into self-serve assets.

A — I worked on the AWS side (Glue, Athena, CloudFormation, automated CI/CD) to model and build reusable data products and the active metadata management around them — clear schemas, ownership, descriptions, and discoverability. Because requirements weren't handed to me, I started from the consumer's need: what questions do downstream AI/analytics workflows actually need to ask of this data? Then modeled backward from that. I treated metadata and governance as first-class, not an afterthought, so the products are discoverable and trustworthy.

Phrasing guardrail: "governed data products within Swisscom's company-wide Data Mesh" — I contributed to the migration and own the modelling/build/onboarding of products in my scope, I did not single-handedly build the Mesh.

R — The result is a discoverable, well-described data foundation that downstream analytics and agentic-AI workflows query directly for grounded retrieval — exactly the "self-serve data products used across the org" pattern. This is my current Staff-level focus.

Probes:

"How do you define a data product?" → owned, discoverable, documented, quality-SLA'd, addressable, interoperable — the data-as-a-product principles.
"How do you get adoption?" → solve a real consumer need first, make it self-serve, document it, reduce their friction vs building their own.

Story 4 — Production reliability under hard constraints

"Containerizing ML inference into a 24/7 fab" — BS-1

Maps to: "Experience with ML for production workflows" (preferred qual); reliability monitoring (resp. #3); Leadership.

S — At Bosch's 300mm semiconductor fab, wafer defect classification in the Defect Management domain was manual — line engineers eyeballing images. It was a bottleneck, and the fab runs 24/7, so there's no maintenance window and no tolerance for breaking the line.

T — I was given the goal of automating it. I designed and executed the strategy to embed ML inference into the live production pipeline (the Zeugnis credits me with both the design and the execution).

A — I containerized the defect-detection models with Docker and orchestrated them with Kubernetes and Ansible so inference ran as a managed, repeatable service inside the production environment. Because it was 24/7, I engineered for unattended operation — no manual intervention in the classification path — and designed deployment so it could go in without stopping active lines.

R — Manual wafer inspection bottleneck eliminated; defect classification became continuous and automated across active 300mm lines, freeing line engineers from inspection toil. Bosch rated my performance "sehr gut" (top tier).

Probes:

"How did you validate the model was right in production?" → talk monitoring of outputs, and that this is where data-quality/reliability monitoring matters (bridge to BS-4).
"Biggest risk?" → breaking a 24/7 line; mitigated via containerization + careful rollout.

Story 5 — Raising the bar / proactive initiative

"Standing up observability from scratch (ELK + Grafana/Prometheus/Loki)" — BS-4

Maps to: "Advance product quality through automated validation, data quality, and reliability monitoring" (resp. #3); Googleyness (raising the bar without being told to).

S — At Bosch, the manufacturing systems didn't have centralized monitoring or anomaly detection — issues were caught reactively.

T — No one assigned this; I saw the gap and built a proof of concept to prove out centralized monitoring and anomaly detection for the 24/7 production systems.

A — I built an anomaly-detection PoC on the ELK stack (Elasticsearch, Logstash, Kibana) with Kafka for log ingestion, containerized in Docker, and added a full observability layer — Grafana dashboards, Prometheus metrics, Loki log aggregation — to validate centralized monitoring and alerting for high-volume production data.

R — Demonstrated that centralized observability and anomaly alerting were viable for the fab's systems, giving the team a concrete path from reactive to proactive monitoring.

Probes:

"It was a PoC — did it ship?" → be honest it was a PoC; the value was de-risking and proving the pattern. Don't overclaim production rollout.
Bridge: this is exactly the "reliability monitoring / data quality" the Google JD calls for — I've done it from zero.

Story 6 — Cross-functional collaboration / stakeholder management

"B2B data products with PMs and stakeholders" — SW-4 (+ BS-3 Application Owner)

Maps to: "Collaborate with a multidisciplinary team of data scientists, engineers, and PMs… sharp communication" (about-the-job); Googleyness.

S — At Swisscom I deliver data products, dashboards and analyses for B2B stakeholders, working with a Product Owner on a shared backlog — engineering depth meeting business delivery cadence.

T — Translate fuzzy stakeholder asks into prioritized, deliverable data products without over-building, and keep delivery moving at an agile cadence.

A — I partnered with the Product Owner to refine and prioritize the backlog, pushed back when requests weren't well-formed by digging for the underlying need, and delivered data products and dashboards iteratively. I also drove automation of recurring technical processes so the team spent less time on toil. (At Bosch I did the analogous role formally as Application Owner — SLOs, user training, documentation, vendor management for the analytics suite.)

R — Stakeholders got data products that fit their real needs, delivered at agile cadence, with recurring manual work automated away — and at Bosch the Application Owner ownership kept a 24/7 analytics suite reliable and adopted across analysis teams.

Probes:

"Tell me about a disagreement with a stakeholder." → have a real one ready: they wanted X, the underlying need was Y, you proposed Y, outcome.
"How do you say no?" → reframe around the underlying need and priority/impact, not a flat refusal.

Quick-reference: which story for which prompt

If they ask about…	Lead with
Ownership / "most impactful project"	Story 1 (SW-2) or Story 4 (BS-1)
A hard technical decision / trade-off	Story 2 (SW-1, Iceberg)
Ambiguity / no clear requirements	Story 3 (SW-7) — the JD's mission
Production reliability / pressure	Story 4 (BS-1)
Going beyond your remit / raising the bar	Story 5 (BS-4)
Conflict / collaboration / stakeholders	Story 6 (SW-4)
Failure / "what would you do differently"	Story 2 probe or Story 5 (PoC honesty)
Leadership without authority	Story 5 (BS-4) or Story 3 (SW-7)

Delivery reminders:

Lead with "I," not "we." Name the decision and why.
2–3 min per story; pause for follow-ups rather than monologuing.
Always close on impact (before→after), even when you lack a hard metric.
Be honest about scope and PoC-vs-production — Google interviewers probe, and honesty reads as senior.

Generated 2026-06-20. Source: experience_swisscom.md (SW-1/2/4/7), experience_bosch.md (BS-1/3/4), live JD. Pairs with interview_prep_brief.md.

12 KiB Raw Blame History Unescape Escape

Google — Sr Data Engineer — Behavioral STAR Stories

Story 1 — Ownership / Autonomy (flagship)

"Owning business-critical pipelines end-to-end" — SW-2

Story 2 — Driving a migration / technical judgment

"Legacy Teradata/Oracle → cloud-native AWS" — SW-1

Story 3 — Ambiguity / non-routine problem

"Building governed data products in a Data Mesh" — SW-7

Story 4 — Production reliability under hard constraints

"Containerizing ML inference into a 24/7 fab" — BS-1

Story 5 — Raising the bar / proactive initiative

"Standing up observability from scratch (ELK + Grafana/Prometheus/Loki)" — BS-4

Story 6 — Cross-functional collaboration / stakeholder management

"B2B data products with PMs and stakeholders" — SW-4 (+ BS-3 Application Owner)

Quick-reference: which story for which prompt

12 KiB

Raw Blame History