Prep materials for the Google Senior Data Engineer (Merchant Data Science) process after passing the Hiring Assessment 2026-06-20: interview prep brief + STAR story bank alongside the session files. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
12 KiB
Google — Sr Data Engineer — Behavioral STAR Stories
For the "Googleyness & Leadership" round (and to seed answers in technical rounds). Google scores: General Cognitive Ability, Role-Related Knowledge, Leadership, Googleyness. Behavioral round wants: ownership, autonomy, ambiguity, impact, collaboration, "raise the bar."
How to use: Each story is in STAR form (Situation / Task / Action / Result) plus a "maps to" tag and follow-up-probe prep. Tell them out loud, ~2–3 min each. Lead with YOUR action ("I…"), be specific about decisions and trade-offs, end with quantified or concrete impact.
Accuracy guardrails (do not drift in the room):
- Own your domains/components/products. NEVER claim you solo-built Swisscom's company-wide Data Mesh — say "governed data products within Swisscom's company-wide Data Mesh."
- No fabricated metrics. Where you don't have a hard number, describe the concrete before/after.
- Bosch ML: you designed AND executed the inference-integration strategy (Zeugnis "Erarbeitung und Durchführung") — own it fully.
Story 1 — Ownership / Autonomy (flagship)
"Owning business-critical pipelines end-to-end" — SW-2
Maps to: "Operate with high autonomy, own from conception to impact" (JD responsibility #4); Leadership/ownership.
S — At Swisscom I'm Component Owner for the Fulfillment domain's data pipelines — business-critical flows from Oracle source systems through Kafka into our Teradata data warehouse, feeding downstream B2B analytics. If these break, stakeholders lose visibility into core fulfillment operations.
T — As Component Owner I'm the single accountable engineer: data availability, SLA, data quality, governance compliance, and on-call. Not just building — keeping it correct and available in production.
A — I own the full lifecycle: I built and maintain the Python ingestion, enforce Data Governance and privacy standards on the data, and carry 2nd/3rd-level support and on-call duty. When incidents hit, I run root-cause analysis and drive the fix rather than escalating it away. I designed the pipelines to be idempotent and re-runnable so backfills don't corrupt downstream tables.
R — The Fulfillment pipelines run under a maintained SLA with full governance compliance, and downstream B2B analytics teams get reliable, on-time data. The ownership signal was part of what got me promoted from Senior to Staff (Engineer IV) in April 2025.
Probes to be ready for:
- "Tell me about a time it broke." → walk a concrete incident: detection (monitoring/alert), triage, RCA, the fix, and the guardrail you added so it didn't recur.
- "How do you decide what to automate?" → recurring manual toil + risk of human error → automate; one-offs → don't.
Story 2 — Driving a migration / technical judgment
"Legacy Teradata/Oracle → cloud-native AWS" — SW-1
Maps to: "Apply advanced data engineering, modeling and architectural frameworks" (resp. #1); Role-Related Knowledge + Leadership.
S — Swisscom's legacy ETL ran on Teradata and Oracle — heavy operational overhead, hard to scale, not cloud-native.
T — Lead the migration of my domains' ETL stack to a serverless AWS architecture without disrupting the business-critical data the team depends on.
A — I designed and built the target stack: S3 as the lake, Glue for transforms, Athena over Apache Iceberg as an open table format (so we get schema evolution and time-travel without lock-in), Redshift for serving, and Airflow plus Step Functions/Lambda for orchestration — all provisioned as code with CloudFormation. I sequenced it to migrate incrementally and run old/new in parallel to de-risk cutover, validating outputs matched before switching consumers over. I deliberately chose Iceberg over a closed format for evolvability.
R — Reduced manual operational overhead, improved pipeline observability, and gave the team a scalable serverless foundation — data availability for downstream analytics got faster and the stack is positioned for modern lakehouse workflows.
Probes:
- "Why Iceberg?" → open table format, schema evolution, time-travel, engine-agnostic, avoids warehouse lock-in.
- "What would you do differently?" → have an honest answer (e.g., invest earlier in automated output-diffing to speed validation).
- Google bridge: their internal equivalent is BigQuery/Dremel — say you'd map these patterns onto their stack quickly.
Story 3 — Ambiguity / non-routine problem
"Building governed data products in a Data Mesh" — SW-7
Maps to: "Identify the underlying need… solve non-routine problems… build reliable data products used across the org" (resp. #1, #2). This is the JD's literal mission. L5-grade story — bring it at that level.
S — Swisscom is moving to a decentralized Data Mesh — domains own and publish their own data as products rather than everything funneling through one central team. The hard part isn't the tooling; it's that "what makes a good, reusable data product" is genuinely ambiguous up front.
T — Within that company-wide initiative, build governed, reusable data products with proper metadata so other teams can actually discover and trust them — turning raw domain data into self-serve assets.
A — I worked on the AWS side (Glue, Athena, CloudFormation, automated CI/CD) to model and build reusable data products and the active metadata management around them — clear schemas, ownership, descriptions, and discoverability. Because requirements weren't handed to me, I started from the consumer's need: what questions do downstream AI/analytics workflows actually need to ask of this data? Then modeled backward from that. I treated metadata and governance as first-class, not an afterthought, so the products are discoverable and trustworthy.
Phrasing guardrail: "governed data products within Swisscom's company-wide Data Mesh" — I contributed to the migration and own the modelling/build/onboarding of products in my scope, I did not single-handedly build the Mesh.
R — The result is a discoverable, well-described data foundation that downstream analytics and agentic-AI workflows query directly for grounded retrieval — exactly the "self-serve data products used across the org" pattern. This is my current Staff-level focus.
Probes:
- "How do you define a data product?" → owned, discoverable, documented, quality-SLA'd, addressable, interoperable — the data-as-a-product principles.
- "How do you get adoption?" → solve a real consumer need first, make it self-serve, document it, reduce their friction vs building their own.
Story 4 — Production reliability under hard constraints
"Containerizing ML inference into a 24/7 fab" — BS-1
Maps to: "Experience with ML for production workflows" (preferred qual); reliability monitoring (resp. #3); Leadership.
S — At Bosch's 300mm semiconductor fab, wafer defect classification in the Defect Management domain was manual — line engineers eyeballing images. It was a bottleneck, and the fab runs 24/7, so there's no maintenance window and no tolerance for breaking the line.
T — I was given the goal of automating it. I designed and executed the strategy to embed ML inference into the live production pipeline (the Zeugnis credits me with both the design and the execution).
A — I containerized the defect-detection models with Docker and orchestrated them with Kubernetes and Ansible so inference ran as a managed, repeatable service inside the production environment. Because it was 24/7, I engineered for unattended operation — no manual intervention in the classification path — and designed deployment so it could go in without stopping active lines.
R — Manual wafer inspection bottleneck eliminated; defect classification became continuous and automated across active 300mm lines, freeing line engineers from inspection toil. Bosch rated my performance "sehr gut" (top tier).
Probes:
- "How did you validate the model was right in production?" → talk monitoring of outputs, and that this is where data-quality/reliability monitoring matters (bridge to BS-4).
- "Biggest risk?" → breaking a 24/7 line; mitigated via containerization + careful rollout.
Story 5 — Raising the bar / proactive initiative
"Standing up observability from scratch (ELK + Grafana/Prometheus/Loki)" — BS-4
Maps to: "Advance product quality through automated validation, data quality, and reliability monitoring" (resp. #3); Googleyness (raising the bar without being told to).
S — At Bosch, the manufacturing systems didn't have centralized monitoring or anomaly detection — issues were caught reactively.
T — No one assigned this; I saw the gap and built a proof of concept to prove out centralized monitoring and anomaly detection for the 24/7 production systems.
A — I built an anomaly-detection PoC on the ELK stack (Elasticsearch, Logstash, Kibana) with Kafka for log ingestion, containerized in Docker, and added a full observability layer — Grafana dashboards, Prometheus metrics, Loki log aggregation — to validate centralized monitoring and alerting for high-volume production data.
R — Demonstrated that centralized observability and anomaly alerting were viable for the fab's systems, giving the team a concrete path from reactive to proactive monitoring.
Probes:
- "It was a PoC — did it ship?" → be honest it was a PoC; the value was de-risking and proving the pattern. Don't overclaim production rollout.
- Bridge: this is exactly the "reliability monitoring / data quality" the Google JD calls for — I've done it from zero.
Story 6 — Cross-functional collaboration / stakeholder management
"B2B data products with PMs and stakeholders" — SW-4 (+ BS-3 Application Owner)
Maps to: "Collaborate with a multidisciplinary team of data scientists, engineers, and PMs… sharp communication" (about-the-job); Googleyness.
S — At Swisscom I deliver data products, dashboards and analyses for B2B stakeholders, working with a Product Owner on a shared backlog — engineering depth meeting business delivery cadence.
T — Translate fuzzy stakeholder asks into prioritized, deliverable data products without over-building, and keep delivery moving at an agile cadence.
A — I partnered with the Product Owner to refine and prioritize the backlog, pushed back when requests weren't well-formed by digging for the underlying need, and delivered data products and dashboards iteratively. I also drove automation of recurring technical processes so the team spent less time on toil. (At Bosch I did the analogous role formally as Application Owner — SLOs, user training, documentation, vendor management for the analytics suite.)
R — Stakeholders got data products that fit their real needs, delivered at agile cadence, with recurring manual work automated away — and at Bosch the Application Owner ownership kept a 24/7 analytics suite reliable and adopted across analysis teams.
Probes:
- "Tell me about a disagreement with a stakeholder." → have a real one ready: they wanted X, the underlying need was Y, you proposed Y, outcome.
- "How do you say no?" → reframe around the underlying need and priority/impact, not a flat refusal.
Quick-reference: which story for which prompt
| If they ask about… | Lead with |
|---|---|
| Ownership / "most impactful project" | Story 1 (SW-2) or Story 4 (BS-1) |
| A hard technical decision / trade-off | Story 2 (SW-1, Iceberg) |
| Ambiguity / no clear requirements | Story 3 (SW-7) — the JD's mission |
| Production reliability / pressure | Story 4 (BS-1) |
| Going beyond your remit / raising the bar | Story 5 (BS-4) |
| Conflict / collaboration / stakeholders | Story 6 (SW-4) |
| Failure / "what would you do differently" | Story 2 probe or Story 5 (PoC honesty) |
| Leadership without authority | Story 5 (BS-4) or Story 3 (SW-7) |
Delivery reminders:
- Lead with "I," not "we." Name the decision and why.
- 2–3 min per story; pause for follow-ups rather than monologuing.
- Always close on impact (before→after), even when you lack a hard metric.
- Be honest about scope and PoC-vs-production — Google interviewers probe, and honesty reads as senior.
Generated 2026-06-20. Source: experience_swisscom.md (SW-1/2/4/7), experience_bosch.md (BS-1/3/4), live JD. Pairs with interview_prep_brief.md.