Making User-Sequence Data More Cost-Efficient, Faster, and Easier to Use

12 min read3 days ago--Authors (listed alphabetically)Ads Feature Engineering Infra team: Ajay Venkatakrishnan, Le ZhangCore ML Infra team: Eric Shang, Pihui WeiML Data team: Connor Votroubek, Yi HeUser Understanding team: Camilo Munoz, Simin LiIf you work on ranking, retrieval, or recommendation systems, you’ve probably asked for some version of the same thing: “Give me the last N meaningful actions this user took, with the right enrichments, in a format that’s easy to train and serve ML models.”On paper, that sounds simple. In practice, “user sequences” often become one of the most expensive and fragile parts of the ML data stack.They end up powering everything from training datasets to offline analysis and online inference, so they need to be fresh and complete at the same time.They must remain consistent as you add new events and enrichments.And they have to do all of this while serving latency‑sensitive production workloads.This article walks through how we redesigned our user‑sequence platform to make these sequences cheaper to run, faster to extend, and easier to debug, while still supporting demanding production use cases.What We Mean by “User Sequence”Press enter or click to view image in full sizeIn this context, a user sequence is an ordered list of recent, relevant events for a user, along with the enrichments (signals) attached to each event. Here, enrichments mean all the extra signals we attach to raw events, so they’re useful for models: embeddings (for example, Pin or query representations), contextual features (such as surface, device, or country), and derived attributes or counters that describe how the user interacted with a piece of content over time.A concrete example helps. Imagine a sequence made up of the last 500 engagements a user had with Pinterest Pins. Each event in that sequence might carry a timestamp, an action type, the surface where the action occurred, and a handful of embedding features or categorical attributes.As a data primitive, user sequences are powerful. They capture temporal behavior instead of just aggregates like “how many clicks” over a period. They enable sequence‑aware models such as Transformers, sequence encoders, or attention‑over‑history architectures. And because they preserve fairly raw behavior, they can be reused across ranking, retrieval, exploration, anomaly detection, and other workloads.The catch is that a high‑quality sequence is not just “the N latest events from a log table.” It is the result of a multi‑step process:Ingest events from diverse sources,Filter down to the subset of events that matter,Enrich each event with additional signals (embeddings, metadata, and so on), andFinally assemble those enriched events into a stable, well‑defined sequence representation.Doing this once is easy. Doing it in a way that supports many teams, many event types, and many models over multiple years is where things get interesting.Context: Where Sequences Show Up and Why Quality Is HardUser sequences sit underneath almost every user-facing surface: Home feed(HF), Related Pins (RP), Search Results (SR), and many others. They power both organic products and ads across these surfaces in Pinterest, so any regression in sequence quality shows up quickly in user experience and revenue.From an infrastructure point of view, they show up in three main places.In training datasets, offline pipelines pull long history windows of enriched events per user in order to build sequence features.In offline analysis, data scientists dissect user behavior across sessions, surfaces, or campaigns using sequence‑level queries.And in online inference, real‑time services fetch up‑to‑date user sequences at request time to feed ranking and retrieval models.Across these use cases, sequence quality turns out to be multi‑dimensional. Freshness measures how quickly new events and enrichments show up in the sequence. Completeness asks whether late‑arriving events, corrections, or backfills are eventually reflected. Consistent enrichment is about ensuring that the same enrichments are available across streaming and batch, and that training and serving see aligned data. Stable schemas matter as well: downstream consumers need schemas to be versioned and predictable, not silently changed.One more constraint is that this is a multi‑tenant platform. It has to support many teams and models, each with different needs and lifecycles. That makes correctness, observability, and operability just as important as raw throughput or latency.Goals (and Non‑Goals)When we stepped back to redesign the platform, we framed the work with a small set of explicit goals and non‑goals.GoalsProvide a consistent “events → enriched signals → sequences” contract.Downstream consumers such as ML engineers and data scientists should see a stable, well‑defined interface that explains how events are filtered, enriched, and assembled into sequences, independent of the underlying runtime.Improve cost‑efficiency at scale.The platform should reduce storage and network usage for sequence data while keeping latency and reliability appropriate for online use.Make onboarding new event types and enrichments faster and safer.Adding a new signal or event type should mostly look like changing configuration and a small piece of well‑scoped code, instead of standing up a new bespoke pipeline.Support both real‑time and batch production paths.We want low‑latency updates for serving alongside batch backfills for historical coverage and corrections, with a clear policy for how the two paths merge.Non‑GoalsTo keep the scope tractable, we did not redesign downstream models or ranking architectures; the focus is on the platform that feeds them.We also did not change the product definition of events (what counts as a click, a save, or a conversion). Those semantics remain owned by product and logging teams.The Core Idea: One Definition, Many RuntimesThe key organizing principle for the redesign was simple:Define a signal or event type once, then instantiate it consistently across multiple runtimes.A signal definition captures which raw events to use, which enrichments to apply, and how to assemble enriched events into a sequence. That same definition is then consumed by three different kinds of workloads:Real‑time indexing for low‑latency updates.Batch indexing and backfill for historical data and corrections.Online serving for fetching sequences at inference time.This “one definition, many runtimes” approach avoids the classic split‑brain failure mode where training pipelines build sequences one way from batch tables while serving systems assemble sequences a different way from online stores. Over time, those two views naturally drift apart in subtle ways.Instead, we rely on a single configuration surface plus a shared execution engine to keep indexing, training and serving aligned.Architecture OverviewSystem Architecture DiagramPress enter or click to view image in full sizeAt a high level, the platform is composed of six major pieces that work together.Ingestion (stream and batch).Streaming ingestion handles real‑time events, while batch ingestion reads from data‑warehouse tables, log archives, or snapshots.Enrichment and execution layer.A shared execution engine turns raw events into enriched records based on configuration: filters, joins, and transforms. The same engine powers both streaming and batch pipelines.Real‑time indexer.A streaming job filters incoming events, converts them into a normalized representation, applies enrichments, and writes incremental updates to a time‑versioned store suitable for low‑latency reads.Batch indexer and backfill pipeline.Scheduled batch jobs read historical raw events, apply the same filter and enrichment definitions, and produce longer sequences along with reusable intermediate datasets for backfills and offline consumption.Columnar, time‑partitioned storage.Sequence data is stored in a columnar layout so models can read exactly the fields they need. Time partitioning keeps writes and scans focused on relevant windows, and the dataset layout supports both long‑sequence use cases and efficient truncation for shorter windows.Online serving API.Finally, a serving layer exposes a clean API for requesting user sequences by signal or feature name. It fetches the right columns from storage, performs request-time enrichments, and applies any final selection or trimming logic, such as “last N events within this time window.”From the perspective of a model or client team, this all collapses into a simple contract:Request sequence X for user U, and you’ll get a well‑defined schema of enriched events, with a documented freshness and completeness profile.Design Decision 1: Configuration‑as‑Code for Sequences and EnrichmentsWhat We DidWe moved sequence and enrichment definitions into configuration‑as‑code, expressed in a regular programming language (Python) with a well‑defined schema.Our configurations describe which sequence features exist, how they’re named, and basic metadata such as owners, retention, and lifecycle stage. Event‑type configuration describes, for each event type, which enrichments apply, what filtering logic to use, and what data sources to read from. Enrichment configuration explains how to fetch or derive additional signals (for example, embeddings) and how to map them into the event schema.Press enter or click to view image in full sizeThese configurations are validated, compiled into a portable JSON format, stored in managed internal object storage, and then consumed by the shared execution engine across streaming, batch and serving jobs.Why It MatteredThis approach made onboarding dramatically faster. New event types or enrichments can now be added primarily through configuration, plus small, isolated pieces of code where absolutely necessary, instead of via entirely new pipelines. That significantly reduces the concept‑to‑production time for new signals.Treating configuration as code also improved reviewability and safety. Diffs are human‑readable, code owners can review changes, rollbacks are straightforward, and version history lives in standard version control systems.A clearer separation of concerns followed naturally. ML and product teams focus on what they want (events, features, and filters) while platform teams focus on how to execute that configuration reliably and efficiently.Design Decision 2 — Shared Execution Engine with Pluggable ExecutorsThe ConceptWe introduced a shared execution engine responsible for reading configuration, connecting to data sources (kafka, logs, tables, feature stores), running filtering and featurization, calling enrichment services or joining against offline tables, and finally writing enriched results to storage.Within this engine, an executor is a plugin that converts a raw event into one or more enriched records. In plain terms, the executor is the “business logic module” for a particular event type or grouping, while the execution engine handles everything around it.Why It MatteredThe shared engine allowed us to reuse the same core enrichment logic in both streaming jobs that handle near‑real‑time events and batch jobs that process historical data. That minimized code duplication and reduced drift between batch and real‑time behavior.Practical BoundariesTo keep the system maintainable, we drew a clear line between framework and plugin code.Framework responsibilities include wiring data sources and sinks, handling concurrency, retries, and backpressure, and parsing and validating configuration. Executors own the business‑specific filtering and featurization logic and the mapping from raw events to normalized user‑event representations.Design Decision 3: Lambda Architecture for Fresh and Complete SequencesThe ChallengeSequence consumers want two things that naturally pull in opposite directions. On one hand, they need freshness: “I want this morning’s actions reflected in ranking now.” On the other hand, they care about completeness and correctness: “If late events show up tomorrow, I still want my sequences and training data to be right.”Real‑world data is messy. Events arrive late. Enrichment sources are recomputed or corrected. Backfills introduce new historical coverage months after the fact.The ApproachTo balance these requirements, we adopted a lambda‑style architecture for user sequences.A streaming path processes events as they arrive and maintains a near‑real‑time view of user sequences for online inference. A batch path periodically recomputes enriched events and sequences from raw historical data, producing long sequences and reusable datasets for backfills and offline analysis.The two paths cooperate instead of competing. The streaming path maintains the “now” view of the world, while the batch path focuses on “fixing history” and ensuring that training and long‑term analytics see consistent, corrected data.Design Decision 4: Columnar, Time‑Partitioned Storage with Table SemanticsPress enter or click to view image in full sizeWhat We ChoseBefore this redesign, we stored sequences as large, consolidated “enriched event” blobs. Every online call or offline scan had to pull the whole payload — even if a model only needed a small subset of features — so request fan‑out turned directly into heavy payload size and I/O on our storage systems.We moved sequence storage to a columnar, time‑partitioned layout that behaves like a set of tables. Each enrichment or feature lives in its own column, and reads can select only the columns they need for a given model or analysis. Data is partitioned by time bucket so that writes and scans stay constrained to relevant partitions as history grows. Engineers can query these datasets with familiar table abstractions, which makes it easy to compare runs, versions, or backfill strategies by inspecting partitions.Why It MatteredThis design improved both efficiency and operability. Columnar storage improves compression and reduces network bandwidth by avoiding wide “enriched event” blobs when only a few features are needed. Time partitioning keeps I/O bounded even as the system accumulates long histories.Operationally, having clear table semantics makes it much easier to inspect anomalous days or event types, validate new enrichments, and compare old and new pipelines side by side.Migration, Rollout, and MeasurementRedesigning a platform is one thing; migrating existing production workloads is another. We treated migration as a first‑class project.Migration StrategyWe followed an event type by event type approach.For a given event type, we first ran the new pipeline in parallel with the existing one and generated “shadow” sequences. We then compared those shadow outputs to the legacy sequences over a defined period.Since we are regenerating the data using completely new jobs, we had to accept that the data won’t have a 100% match due to the nature of our online systems. As a result, we had to have thorough validations to prove that our new system was producing approximately the same sequences when compared to the legacy system.We decided on a strategy of using two tiers of comparisons, an event-level comparison, which compared field-by-field of events we matched between our old and new indexing jobs, as well as a sequence-level comparison, comparing the shadow sequence output with the legacy sequence output. Alongside performing A/B experiments using our new data, these validations gave us the confidence that we could safely swap our pipelines with no impact.Once we were confident in the behavior, we performed a controlled cutover by shifting consumers to read from the new architecture. We then iterated the same process across additional event types, steadily deprecating the legacy path.What We Measured and AchievedTo stay within company policies, we only describe qualitative outcomes here.On cost, we saw significant infrastructure cost reductions once large event types were fully migrated, primarily because of more efficient storage formats, fewer replicas where appropriate, and lower network transfer per request.On productivity, the time to onboard new enrichments and event types dropped substantially. Most changes moved from bespoke pipeline work to configuration updates and small, composable executors.On quality, our major recommendation surfaces saw improved engagement metrics after switching to sequences produced by the new platform, while still staying within quality and safety expectations.Operational ReadinessThroughout migration and into steady state, we invested heavily in observability and operational hygiene.We set up dashboards tracking sequence freshness and lag, event and enrichment coverage, schema drift and configuration rollout status, and serving latency and error rates..These foundations turned out to be crucial. A platform that many teams rely on will eventually have bad days; the difference between a minor blip and a major incident often comes down to whether you can quickly see what went wrong and where.Future WorkThere is still plenty to improve, and many of the directions generalize beyond any single company.We want richer self‑serve tooling so that adding new signals feels more like filling out a template than editing infrastructure code. That includes wizards for new signals, static analysis for configurations, and automated backfill orchestration for common patterns.We are also interested in stronger correctness guarantees. Anomaly detection over both indexing and serving paths would further harden the system.Finally, we plan to broaden coverage and add richer signals. That includes extending sequence coverage to more event types and surfaces and adding higher‑level behavioral abstractions on top of raw event sequences, such as session‑level or object‑level views. The challenge is to do that while preserving the core “events → enriched signals → sequences” contract that keeps the platform coherent.AcknowledgementsA big thank you to everyone who contributed through discussions, design reviews, and recurring syncs that helped shape and unblock this work. In no particular order: Alekhya Pyla, Chuxi Wang, Han Wang, Jia Zhan, Kangnan Li, Kyle Soares, Laksh Bhasin (He Him), Nilesh Gohel, Se Won Jang, Xue Xia, Yang Tang, Yi He, Anton Arboleda, Yi PanAnd thank you to Archer Liu, Haoyang Li, Hongbo Deng, Qingxian Lai, Shun-ping Chiu, and Yingjian Ding for their great management support.

Making User-Sequence Data More Cost-Efficient, Faster, and Easier to Use

Making User-Sequence Data More Cost-Efficient, Faster, and Easier to Use

Other newsrooms on this story

Related reading

Enhancing Ad Relevance: Integrating Real-Time Context into Sequential…

Optimizing ML Workload Network Efficiency (Part I): Feature Trimmer

Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve…

KernelEvolve: How Meta’s Ranking Engineer Agent Optimizes AI Infrastructure

The Sequence AI of the Week #891: Prompting a Spreadsheet : Inside Google’s…

Time Series Made So Easy My Aunt Got It on the Second Read | Towards AI

Other newsrooms on this story

Related reading

Enhancing Ad Relevance: Integrating Real-Time Context into Sequential…

Optimizing ML Workload Network Efficiency (Part I): Feature Trimmer

Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve…

KernelEvolve: How Meta’s Ranking Engineer Agent Optimizes AI Infrastructure

The Sequence AI of the Week #891: Prompting a Spreadsheet : Inside Google’s…

Time Series Made So Easy My Aunt Got It on the Second Read | Towards AI