# Google — Sr Data Engineer — Behavioral STAR Stories > For the "Googleyness & Leadership" round (and to seed answers in technical rounds). > Google scores: **General Cognitive Ability, Role-Related Knowledge, Leadership, Googleyness.** > Behavioral round wants: ownership, autonomy, ambiguity, impact, collaboration, "raise the bar." > > **How to use:** Each story is in STAR form (Situation / Task / Action / Result) plus a "maps to" tag > and follow-up-probe prep. Tell them out loud, ~2–3 min each. Lead with YOUR action ("I…"), be > specific about decisions and trade-offs, end with quantified or concrete impact. > > **Accuracy guardrails (do not drift in the room):** > - Own your domains/components/products. NEVER claim you solo-built Swisscom's company-wide Data Mesh — > say "governed data products within Swisscom's company-wide Data Mesh." > - No fabricated metrics. Where you don't have a hard number, describe the concrete before/after. > - Bosch ML: you designed AND executed the inference-integration strategy (Zeugnis "Erarbeitung und Durchführung") — own it fully. --- ## Story 1 — Ownership / Autonomy (flagship) ### "Owning business-critical pipelines end-to-end" — SW-2 **Maps to:** "Operate with high autonomy, own from conception to impact" (JD responsibility #4); Leadership/ownership. **S** — At Swisscom I'm Component Owner for the Fulfillment domain's data pipelines — business-critical flows from Oracle source systems through Kafka into our Teradata data warehouse, feeding downstream B2B analytics. If these break, stakeholders lose visibility into core fulfillment operations. **T** — As Component Owner I'm the single accountable engineer: data availability, SLA, data quality, governance compliance, and on-call. Not just building — keeping it correct and available in production. **A** — I own the full lifecycle: I built and maintain the Python ingestion, enforce Data Governance and privacy standards on the data, and carry 2nd/3rd-level support and on-call duty. When incidents hit, I run root-cause analysis and drive the fix rather than escalating it away. I designed the pipelines to be idempotent and re-runnable so backfills don't corrupt downstream tables. **R** — The Fulfillment pipelines run under a maintained SLA with full governance compliance, and downstream B2B analytics teams get reliable, on-time data. The ownership signal was part of what got me promoted from Senior to Staff (Engineer IV) in April 2025. **Probes to be ready for:** - *"Tell me about a time it broke."* → walk a concrete incident: detection (monitoring/alert), triage, RCA, the fix, and the guardrail you added so it didn't recur. - *"How do you decide what to automate?"* → recurring manual toil + risk of human error → automate; one-offs → don't. --- ## Story 2 — Driving a migration / technical judgment ### "Legacy Teradata/Oracle → cloud-native AWS" — SW-1 **Maps to:** "Apply advanced data engineering, modeling and architectural frameworks" (resp. #1); Role-Related Knowledge + Leadership. **S** — Swisscom's legacy ETL ran on Teradata and Oracle — heavy operational overhead, hard to scale, not cloud-native. **T** — Lead the migration of my domains' ETL stack to a serverless AWS architecture without disrupting the business-critical data the team depends on. **A** — I designed and built the target stack: S3 as the lake, Glue for transforms, Athena over **Apache Iceberg** as an open table format (so we get schema evolution and time-travel without lock-in), Redshift for serving, and Airflow plus Step Functions/Lambda for orchestration — all provisioned as code with CloudFormation. I sequenced it to migrate incrementally and run old/new in parallel to de-risk cutover, validating outputs matched before switching consumers over. I deliberately chose Iceberg over a closed format for evolvability. **R** — Reduced manual operational overhead, improved pipeline observability, and gave the team a scalable serverless foundation — data availability for downstream analytics got faster and the stack is positioned for modern lakehouse workflows. **Probes:** - *"Why Iceberg?"* → open table format, schema evolution, time-travel, engine-agnostic, avoids warehouse lock-in. - *"What would you do differently?"* → have an honest answer (e.g., invest earlier in automated output-diffing to speed validation). - *Google bridge:* their internal equivalent is BigQuery/Dremel — say you'd map these patterns onto their stack quickly. --- ## Story 3 — Ambiguity / non-routine problem ### "Building governed data products in a Data Mesh" — SW-7 **Maps to:** "Identify the underlying need… solve non-routine problems… build reliable data products used across the org" (resp. #1, #2). This is the JD's literal mission. **L5-grade story — bring it at that level.** **S** — Swisscom is moving to a decentralized Data Mesh — domains own and publish their own data as products rather than everything funneling through one central team. The hard part isn't the tooling; it's that "what makes a good, reusable data product" is genuinely ambiguous up front. **T** — Within that company-wide initiative, build governed, reusable data products with proper metadata so other teams can actually discover and trust them — turning raw domain data into self-serve assets. **A** — I worked on the AWS side (Glue, Athena, CloudFormation, automated CI/CD) to model and build reusable data products and the active metadata management around them — clear schemas, ownership, descriptions, and discoverability. Because requirements weren't handed to me, I started from the consumer's need: what questions do downstream AI/analytics workflows actually need to ask of this data? Then modeled backward from that. I treated metadata and governance as first-class, not an afterthought, so the products are discoverable and trustworthy. > Phrasing guardrail: "governed data products **within** Swisscom's company-wide Data Mesh" — I contributed to the migration and own the modelling/build/onboarding of products in my scope, I did **not** single-handedly build the Mesh. **R** — The result is a discoverable, well-described data foundation that downstream analytics and agentic-AI workflows query directly for grounded retrieval — exactly the "self-serve data products used across the org" pattern. This is my current Staff-level focus. **Probes:** - *"How do you define a data product?"* → owned, discoverable, documented, quality-SLA'd, addressable, interoperable — the data-as-a-product principles. - *"How do you get adoption?"* → solve a real consumer need first, make it self-serve, document it, reduce their friction vs building their own. --- ## Story 4 — Production reliability under hard constraints ### "Containerizing ML inference into a 24/7 fab" — BS-1 **Maps to:** "Experience with ML for production workflows" (preferred qual); reliability monitoring (resp. #3); Leadership. **S** — At Bosch's 300mm semiconductor fab, wafer defect classification in the Defect Management domain was manual — line engineers eyeballing images. It was a bottleneck, and the fab runs 24/7, so there's no maintenance window and no tolerance for breaking the line. **T** — I was given the goal of automating it. I designed *and* executed the strategy to embed ML inference into the live production pipeline (the Zeugnis credits me with both the design and the execution). **A** — I containerized the defect-detection models with Docker and orchestrated them with Kubernetes and Ansible so inference ran as a managed, repeatable service inside the production environment. Because it was 24/7, I engineered for unattended operation — no manual intervention in the classification path — and designed deployment so it could go in without stopping active lines. **R** — Manual wafer inspection bottleneck eliminated; defect classification became continuous and automated across active 300mm lines, freeing line engineers from inspection toil. Bosch rated my performance "sehr gut" (top tier). **Probes:** - *"How did you validate the model was right in production?"* → talk monitoring of outputs, and that this is where data-quality/reliability monitoring matters (bridge to BS-4). - *"Biggest risk?"* → breaking a 24/7 line; mitigated via containerization + careful rollout. --- ## Story 5 — Raising the bar / proactive initiative ### "Standing up observability from scratch (ELK + Grafana/Prometheus/Loki)" — BS-4 **Maps to:** "Advance product quality through automated validation, data quality, and reliability monitoring" (resp. #3); Googleyness (raising the bar without being told to). **S** — At Bosch, the manufacturing systems didn't have centralized monitoring or anomaly detection — issues were caught reactively. **T** — No one assigned this; I saw the gap and built a proof of concept to prove out centralized monitoring and anomaly detection for the 24/7 production systems. **A** — I built an anomaly-detection PoC on the ELK stack (Elasticsearch, Logstash, Kibana) with Kafka for log ingestion, containerized in Docker, and added a full observability layer — Grafana dashboards, Prometheus metrics, Loki log aggregation — to validate centralized monitoring and alerting for high-volume production data. **R** — Demonstrated that centralized observability and anomaly alerting were viable for the fab's systems, giving the team a concrete path from reactive to proactive monitoring. **Probes:** - *"It was a PoC — did it ship?"* → be honest it was a PoC; the value was de-risking and proving the pattern. Don't overclaim production rollout. - *Bridge:* this is exactly the "reliability monitoring / data quality" the Google JD calls for — I've done it from zero. --- ## Story 6 — Cross-functional collaboration / stakeholder management ### "B2B data products with PMs and stakeholders" — SW-4 (+ BS-3 Application Owner) **Maps to:** "Collaborate with a multidisciplinary team of data scientists, engineers, and PMs… sharp communication" (about-the-job); Googleyness. **S** — At Swisscom I deliver data products, dashboards and analyses for B2B stakeholders, working with a Product Owner on a shared backlog — engineering depth meeting business delivery cadence. **T** — Translate fuzzy stakeholder asks into prioritized, deliverable data products without over-building, and keep delivery moving at an agile cadence. **A** — I partnered with the Product Owner to refine and prioritize the backlog, pushed back when requests weren't well-formed by digging for the underlying need, and delivered data products and dashboards iteratively. I also drove automation of recurring technical processes so the team spent less time on toil. (At Bosch I did the analogous role formally as Application Owner — SLOs, user training, documentation, vendor management for the analytics suite.) **R** — Stakeholders got data products that fit their real needs, delivered at agile cadence, with recurring manual work automated away — and at Bosch the Application Owner ownership kept a 24/7 analytics suite reliable and adopted across analysis teams. **Probes:** - *"Tell me about a disagreement with a stakeholder."* → have a real one ready: they wanted X, the underlying need was Y, you proposed Y, outcome. - *"How do you say no?"* → reframe around the underlying need and priority/impact, not a flat refusal. --- ## Quick-reference: which story for which prompt | If they ask about… | Lead with | |---|---| | Ownership / "most impactful project" | Story 1 (SW-2) or Story 4 (BS-1) | | A hard technical decision / trade-off | Story 2 (SW-1, Iceberg) | | Ambiguity / no clear requirements | Story 3 (SW-7) — *the JD's mission* | | Production reliability / pressure | Story 4 (BS-1) | | Going beyond your remit / raising the bar | Story 5 (BS-4) | | Conflict / collaboration / stakeholders | Story 6 (SW-4) | | Failure / "what would you do differently" | Story 2 probe or Story 5 (PoC honesty) | | Leadership without authority | Story 5 (BS-4) or Story 3 (SW-7) | **Delivery reminders:** - Lead with "I," not "we." Name the decision and *why*. - 2–3 min per story; pause for follow-ups rather than monologuing. - Always close on impact (before→after), even when you lack a hard metric. - Be honest about scope and PoC-vs-production — Google interviewers probe, and honesty reads as senior. --- _Generated 2026-06-20. Source: experience_swisscom.md (SW-1/2/4/7), experience_bosch.md (BS-1/3/4), live JD. Pairs with interview_prep_brief.md._